Wednesday, September 25, 2019

Historical GIS

Playing with a 1953 land survey map and 2019 Google Earth.


I love the chutzpah of Anne Kelly Knowles’s first two lines of her opening essay, “GIS and History” of Placing History. “This book argues that scholars’ use of geographic information systems (GIS) is changing the practice of history. Time will tell whether the argument is prophetic or premature.” The implication is that there is no question of the argument’s truth, only the timeline of it—GIS will change the practice of history. Fortunately, her bravado is well-grounded (pun intended): the book proceeds to demonstrate several non-trivial ways that GIS had already been the primary tool for historical enquiry for many established historians—a demographic anti-futurist almost by definition. So what is historical GIS?

Knowles identifies four key characteristics, which may be summarized as 1) geography significantly driving the history, 2) geography forming a significant part of the historical evidence, 3) the use of a geo-chronological database, and 4) the presentation of maps as part of the historian’s argument. What Placing History needs to show to prove its central argument is that the use of geography for enquiry, evidence, and argument does not merely deepen existing understandings based on the traditional textual sources; it needs to show that in some cases it has challenged existing narratives so compellingly that GIS history has changed the overall historiography for a topic. This it does, but first let’s examine the pedagogical values of GIS.

The late Robert Churchill identifies four “distinct benefits”: 1) the inculcation of “analytical and problem-solving strategies,” 2) the demonstration of the value of the “visualization,” 3) engagement with “social, economic, and political issues,” (he cites the Gulf War, indigenous land claims, personal privacy questions, redlining, and gerrymandering) and 4) an interdisciplinary bridge. The first two points are more compelling to me personally, but his arguments are all valid. Amy Hillier then grabs the baton and argues more practically for all the ways that historical GIS reaches “a technology-savvy generation,” as well as some valuable advice concerning praxis. The use of the 1896 social class maps of W.E.B. Dubois provide a powerful example of the power of GIS in a somewhat counter-intuitive way: Dubois recognized the power of spatial illustration so clearly that he took months of innumerable labor hours to produce what GIS can now reproduce in a fraction of the time.
So aside from teaching history, what is the impact of GIS on history itself? Frankly, I’m amazed by how much it had already done at the time of the publication of Placing History, some 15 years ago. The nature of empire when communication traveled at the speed of a horse (not even wearing a saddle) has been the subject of much historical enquiry going at least as far back as Edward Gibbon. With Richard Taibert and Tom Elliot’s GIS-powered analysis and manipulation of the medieval Peutinger map, our understanding is deepening. This almost inverts Walter Benjamin’s powerful point about the lost of auras in the age of mechanical reproduction—with digital reproduction, I feel closer to the lives of the roughly 50 million citizens, slaves, and subjects of the Roman Empire. Peter K. Bol’s work did much the same for imperial China, as well as stretching my understanding of how insignificant a boundary might have been, even to a provincial governor. What does all this imply for history? I think as we become more comfortable leaning on geographic data as evidence the idea of visualizations as distractions from historical narrative will diminish, and the history will be stronger. No less significantly, historians will begin to appreciate more and more just how profound the old cliché that the map is not the territory really is—when you think about it, the are few things more breathtakingly oversimplifying than a smooth boundary line.

Historical scholarship will not abandon their bias for the written record anytime soon, and I’m inclined to think that’s a good thing. However, maps, spatial data, and GIS are all on the rise, both leading up to Placing History and continuing since. I have long believed that the curious historical relationship between space and time has been under-examined and under-utilized. Speaking extremely broadly, since the rise of the recording industry, popular music seems to entail in some cases a lag of decades from urban centers to least populated areas. When my sister and her husband arrived in the remote village of Kapchorwa, Uganda in 2005, Dolly Parton and Kenny Rogers were extremely popular there—more than two decades after their peak popularity in the United states. That is an extremely trivial example, but tracking the movement of more catalyzing culture such as Uncle Tom’s Cabin, labor union songs, or private schools since 1954. I think the key will be harnessing movement and sound in the representation of data. This will necessitate historical scholarship not being confined to the physical codex, but to a great extent we’re already there.


Wednesday, September 18, 2019

Data Mining



Queen Victoria's journals--the work of a single person--amount to over sixty million words. By comparison, the 1611 Authorized Translation of the Bible has 785 thousand words, and the total word count of the seven books of the Harry Potter series is 1.1 million words. So imagine a close-reading purist specializing in the history of the British Empire in the Victorian era. The scholar would need to take 70 hours of work for an initial reading, not counting a single pause to jot down a note--for a single source.


All of this underscores the practical value of machine-powered distant reading. Take the example above, in which the Google Ngram viewer has shown the percentage of books each year starting in 1800 that contain the phrase “social equality” or “racial equality.” From 1920 onward, the relationship between these two distinct ideas is unmistakable. It is also super stimulating to consider the theoretical explanations for the peaks and troughs. Intuitively, the cataclysm of World War II seems to have played a role in a dramatic increase in discussion of social and racial equality. The trough between 1945 and 1953 followed by a wave that peaked in 1972 matches near-perfectly to the grand narrative of the Short Civil Rights Movement.

Data mining is super valuable, but at present its immediately recognizable uses are limited to heuristics (a way to discover lines of research) and stimulating visualization (a way to illustrate a point). I’m a traditionalist and have been involved in critical scholarship for more than a decade, so the following is going to sound, well, overly critical. That said, I want to declare that I feel extremely glad to be a student of history at this moment in history, because we are in the process of significantly increasing our power to make sense of the past. Still, equanimity requires some issues to be acknowledged.

As a begrudging student of Jacques Derrida and Michel Foucault, I must point out that the complexity of language. Perhaps the best illustration of this is the simple fact that I am treating a topic that has been much-discussed by tens of thousands, if not millions of people in the English-speaking world, and yet every single sentence in this blog post is unique—not a single sentence placed within quotation marks can be found using Google, besides this once the algorithmic spiders have found this page. So, to take the example above, social and racial equality may be expressed in innumerable ways—circumspect language, synonyms, ironic expression, and slang all distort the measurement of the ideas that the two phrases I searched for represented. Have Americans thought and written less and less about liberty since 1800, or have they simply preferred to call it freedom?




Optical character recognition will get better, but undoubtedly whole wars have been missed because a computer thought they were wans. I’m honestly not too worried about that, because the distortions are probably statistically uniform across our various samples. My biggest concern is with the samples themselves.

Google proudly declares that they have scanned more than 25 million books. Before the Ngram analysis will be a source of reliable insights for me, I need to know more about these 25 million books. How does the high proportion of scientific journals that many humanists have raised concerns about affect the sample? Is it geographically concentrated in particular areas besides correspondence to the population density? All of these things are major potential issues.

Distant reading already has and will continue to empower us to understand our past--particularly the past century--but we need to keep the insights of the linguistic turn in our minds as we realize that potential.

Wednesday, September 11, 2019

Digital History Questions

Just to give my many as-yet-only-theoretical readers a little context, I am currently taking a called Digital Tools for Historians, and in this second week we are continuing to reflect on theory. Dr. French has asked that we consider five big questions in light of eight articles spanning the past twenty years of history as an academic field. Below is my admittedly in artful reflections on these questions.

What is Digital History?

Douglas Seefeldt and William G. Thomas answered this question directly and succinctly just over a decade ago (and counting--take a glance at the timestamp of this blog, historians of the future). It is "an approach to examining and representing the past hat works with the new communication technologies of the computer, the internet network, and software systems." The Big Question, to my mind, is whether digital history is best described as a new set of tools useful for an increasing percentage of historians or a fundamental transformation of what historians do. Even with laughable hyperbole like Louis Rossetto of Wired Magazine heralding "social changes so profound that their only parallel is probably the discover of fire," I actually lean toward fundamental transformation rather than powerful new tools. Yes, historians before and after the digital revolution both seek to understand the past, but it seems as though what understanding means has changed. This is a gross oversimplification, but history's focus on explanation has been replaced in digital history with a concern for visualization. 

The sensationally-obtuse economic analysis of Time on the Cross forty-five years ago provides a nifty cautionary tale for historians too in the grips of the insights of Charles Beard. However, we have not yet repented of our preoccupation with quantification. A good friend drolly remarked that social scientists cannot resist taking two unquantifiable aspects of human life (joy, honor, beauty, etc...) and assigning them to the x-and-y-axes of a graph. Digital historians do this less than psychologists, but we still do it. More often we report on the frequency of sets of words and then sheepishly acknowledge we can only infer possible explanations of the significance of word-use. 

How does 21st century Digital History theory/practice differ from earlier applications of computer technology to historical research, such as the data-driven quantitative history (“cliometrics”) of the 1970s?

As we close out the second decade of the twenty-first century, it still seems as though digital history is in something of a pubescent stage. In brief, we are still swimming around in data, unsure of the best tools to make sense of it. My tone has been a bit too sour up to this point, so I'm going to pivot a bit and say that the past twenty years has clearly shown growth and a lot of good history has been done digitally. It's just that we're still waiting for that hockey stick exponential rise on the line graph of powerful digital contributions, and I might be getting somewhat impatient. 

To answer the header question directly, current digital history is far more focused on experience and scholarly networking than turning everything into numbers, though as indicated above I think we still have further to go before we are no longer guilty of overemphasizing measurement and sample size. I think we are also more wisely trying to make as much source-material available to each other for conversation and collaboration rather than just use the new tools for ourselves and refer readers to an appendix for our data and methodology. 

How does Digital History differ from Digital Humanities?
My analysis here is probably going to be pretty facile, but digital history is a subset of digital humanities that is more concerned with understanding the human past rather than human nature most broadly. For that reason, we tend to be more staid and conservative in our approach to the use of digital tools than other digital humanists, and have used tools that fill gaps within our traditional paradigm, e.g., Global Information Systems (generally, "where on earth did this happen" is much more important to a historian than a sociologist). 

What are the promises/perils of doing Digital History?
The promise of Digital History is the ability to reach a wider audience than ever before, having studied a great number of more diverse sources than ever before, having produced for that audience a product that appeals more directly to their senses. Hyperlinking allows connections to be made with so much less work, and storage is so much easier and information more easily-retrieved than ever before, and with every sign that it will get progressively more easy. 

To my mind, the most significant peril is what has happened to performance art, journalism and photography--the barrier to entry is now so low that the supply is overwhelming demand. Just like Broadway, where you can't make a living, only a killing, the structure of the economy within the academy is experiencing a severe strain, and readjustment seems about as vital as it is inevitable. Historians also fret over how much digital documentation is being lost to the maw of archaic operating systems and a lack of back-compatibility, but I'm uncompelled. We already get more data everyday than anyone could review in a hundred lifetimes, so I think future historians will be okay.

Can we make Digital History, as a field, more inclusive?
Professors Sharon Leon and Sheila Brennan argue pretty implacably that yes, we can do a great deal better, Leon diagnosing and Brennan pointing to tools for a cure. While I found Leon's work a tad uncharitable to her field, given that the second wave of feminism just crashed less than two generations ago, it was nevertheless disturbing to read of so many important female contributions that have been and continue to be marginalized. My big gripe is that Brennan and Leon outline a lot of the smoke and just a bit of the fire of injustice women and other groups have suffered from the white male academy, the wood is out of frame. Why are misogyny, racism, and cultural elitism such barnacles on the human heart and mind? If I have learned anything in the past decade, it is that (self) awareness can never be taken for granted.

Thursday, September 5, 2019

Open Access and Scholars' Costs of Living


[Since I'm reflecting on copyright law, I will not include any of the images from the surveillance footage of Aaron Swartz entering the MIT wiring closet--you can Google it and imagine it here]

In September of 2010, a software technician working for the digital library JSTOR noticed something highly unusual: more than 200,000 separate “sessions” downloading academic articles from their server, all coming from the MIT library. Three months later, the ‘hacktivist’ Aaron Swartz was arrested for breaking and entering. Swartz had used a connection within a wiring closet to download the articles to his laptop. These charges were later dropped when authorities discovered that 1) Swartz was a Harvard research fellow entitled to access the server and 2) though the closet was supposed to have controlled access, it had been unlocked. However, he was charged with the following federal crimes: wire fraud, computer fraud, unlawfully obtaining information from a protected computer, recklessly damaging a protected computer, aiding and abetting, and criminal forfeiture—crimes with a maximum possible sentence of fifty years. He refused a plea bargain that would have entailed six months in a federal prison, and on January 11, 2013, the 26-year-old hanged himself before his trial could begin.

The story of Aaron Swartz is a tragedy, one that illustrates in the starkest terms the tension explored in chapter nine of Digital Humanities: A Primer: the altruistic impulse to share knowledge as freely as possible, and the need for the scholars working to acquire that knowledge to make a living. Part of Swartz’s legacy is greater attention to the gravity of this problem. There is much in this knot: the commodification of knowledge, the value of the role of gatekeepers and those who maintain digital infrastructure, the question of scholarly independence from the source of their funding, and what just compensation for work even means, just to name a few.

As an educator who has been immersed in economics for more than a year now as part of my job, I was bemused by this line: “[M]ost scholars see their work’s value in hiring, tenure and promotion (HTP) terms, not in terms of the commercial marketplace, and they are quite willing to distribute their work as freely as possible” (157). In brief, digital humanists must consider the big picture—the context to use a favorite word of historians—if they hope to continue to buy groceries and pay their bills through their scholarship. The following passage succinctly summarizes the current moment exactly as a primer should:

The late twentieth century therefore inherited two vibrant models of access to information and knowledge: one the patronage model sustained by individual, institutional or governmental resources and the other the commercial model built on the ability to produce vast amounts of inexpensive print for the broadest market possible. (159)

Space requires a bit of a leap, so I will just say that for me it boils down to the following: scholars need to consider who we want to write for—the general public or a specialized audience? In almost any topic, there are elementary concepts that will be necessary for the general public but tedious for specialists (they already know all this), and highly technical questions that the specialists will find stimulating and the general public somewhere between boring and intolerable. From that observation, it strikes me that the most efficient division of labor requires honest soul-searching of each scholar. If I am not just fluent but enjoy the most esoteric and complex aspects of my topic, I should explore those areas in writing intended for fellow specialists. If I respect these issues but find the language difficult and/or my interest wanes as the analysis gets more abstract, I should write for a popular audience. Both kinds of scholars contribute something valuable, and both kinds of scholars should write with a mind to bridge the gap between the two groups. ‘Minding the gap’ will make the expert’s writing clearer and more accessible, and the popular writer’s writing more accurate.

None of the above solves the problem of open access and vocational compensation, but it is part of the solution. Unless a new model is forged, experts will need to be content with their writing being hidden behind pay walls of various kinds, and institutions like JSTOR taking a beefy cut, and popular writers will need to be content not being featured in the most prestigious journals.

Sources:

Dean, John, and John W. Dean. “Dealing With Aaron Swartz in the Nixonian Tradition: Overzealous Overcharging Leads to a Tragic Result.” Verdict Comments, 14 Mar. 2018, verdict.justia.com/2013/01/25/dealing-with-aaron-swartz-in-the-nixonian-tradition.

MacFarquhar, Larissa. “The Darker Side of Aaron Swartz.” The New Yorker, The New Yorker, 10 July 2019, www.newyorker.com/magazine/2013/03/11/requiem-for-a-dream.