From a Distance, What do my bankers and their departments look like?

My interest in exploring digital history was, in fact, prompted by listening to various digital scholars talk about what might be gained from “distant” reading. What patterns or connections might be turned up through data mining that I would not (or could) not see simply by reading the articles myself. Over the time that I have been reading about women’s departments–particularly in promotional literature for banks–I have formed some impressions. Would those working premises be born out if I were to look at materials more systematically. For instance, it has seemed to me that commercial banks, esp. in the 1920s, presenting themselves as modern, usually had three features: 1. an impressive vault; 2. an effective ventilation system; and 3. a woman’s department. While I can imagine coding to determine if that impression is accurate, I am also wondering what other elements might I have missed. And, of course, if I can confirm these elements (or determine others), the question remains–what connection might there be among them?

To be continued.

Source: From a Distance, What do my bankers and their departments look like?

Text Mining

Yesterday’s work with text mining was both helpful and daunting. I know what I want to do with it–mine deeply into documents from both the liberal and conservative politicians I am studying and look for any themes that would not be apparent from the close reading of individual documents–but my biggest challenge is finding an overlap between the right documents and documents that can be turned into text files. I have figured out how to convert the Public Papers of the President into text files to load in Voyant but that will only get me so far and just one side of the equation. This requires more thought about sources and questions to ask of my topic. I’m intrigued with the possibilities but not sure how I will get there or what I will find. In a perfect world I will be able to discern how and when rhetoric about “liberalism,” “conservatism,” the “federal government,” the “welfare state,” and other ideologically contested terms and concepts shifted.

Source: Text Mining

Text Mining

The data sets I’m working with in my current book project do not seem big enough to benefit from the text mining processes we discussed today, powerful as those techniques may be.  I’m in the late stages of the current project, however, and today’s session led me to start thinking about subjects in my field that might be better suited for text-mining tools.

Cherokee Freedman Enrollment Card

Cherokee Freedman Enrollment Card

It strikes me, for instance, that the enrollment records compiled for Native communities facing allotment in the late nineteenth and early twentieth centuries might be worthy of text-mining and visualization.  These records are vast in number, cover a huge geographical area, and adhere to certain bureaucratic forms.  Some have already been digitized, although not (as far as I know) in machine-readable form.  I suspect that these records, with the right processing, would make very good fodder for the tools we discussed today.  I’m not sure what distant reading of these materials would yield, but the idea may warrant further exploration.

Source: Text Mining

Distant reading. . .

Task:  Write a short post, considering how distant reading might apply to your individual projects.

Right now I don’t see distant reading connecting to my churchscape project as it currently stands.  I see potential for distant reading with other work I’m doing, but I am painfully aware that I don’t have good useable data with which to work.  That’s my current stumbling block.  Existing bodies of digital materials in good shape for text mining don’t relate very directly to my research interests.  And the task of creating such bodies of material on my own seems quite daunting.  Perhaps more feasible as a collaborative effort with other scholars in my field?

Data mining

Although I cannot thing of how I might use data mining in my current projects, it could be useful in the future. I am impressed with Overview, which would be especially useful in working with large corpora of digital documents. As government documents become increasingly digitized–emails replacing typewritten memoranda, e.g.–data mining tools will become more helpful and even necessary.

Data Mining

My sources are not digitized, so I would not undertake a large scale data mining project. I will likely use data mining as an exploratory tool for my research. I might use keyword searching to assess how use of words like “citizen” or “nation” changed over time. In addition, I might compare how Americans wrote about African Americans, Native Americans, and German Americans. I see more applications for data mining in my teaching. Explorations of data mining tools provide excellent opportunities for students to see new conclusions in historical data and to assess the possibilities of digital history.

Icebox v. refrigerator.

Our DH seminar homework for tonight is to write a brief blog post considering how we might use text mining in our upcoming digital history projects. Unfortunately for me, a project about an underwater mining town doesn’t seem particularly text mining friendly.  Don’t get me wrong, I found, for example, this particular tool to be something potentially really useful as a way to get control of my growing corpus of Harvey Wiley literature.  However, from my perspective, text mining is probably the least useful DH strategy that I’ve encountered here in the last week and a half or so.

The one time I did to some text mining may suggest why.  This is the Google Ngram for “refrigerator v. icebox” (with “ice box” thrown in just for good measure)*:

Refrigerator vs. iceboxI first did this Ngram while writing Chapter Six of Refrigeration Nation in order to confirm something that I already knew from my research: that before the advent of the electric refrigerator, what we now know as “iceboxes” were called “refrigerators” and that icebox is a term invented to differentiate boxes full of ice from the appliance that now runs in everybody’s kitchens. The fact that the terms “icebox”and “ice box” basically come out of nowhere precisely during the time when the first electric refrigerators were being developed basically confirm that fact.

Apparently, confirming things you already think you know is the best way to use text mining. I think that’s a good thing, as I’m not sure how I ever would have footnoted this in the book. In fact, how COULD you footnote this in a book if the corpus keeps changing?

But it is a pretty good trick to play with students that the cultural historians probably adore.

* Click the picture if you’re interested in a clear look.

Source: Icebox v. refrigerator.

Text Mining “Interchange: The Promise of Digital History”

One of the reading  assignments participants read during the Doing Digital History Institute was the 2008 JAH article, “Interchange: The Promise of Digital History.” The “Interchange” sought to explore the burgeoning field of digital history, tackling questions of definition, pedagogy, forms of institutional support, possible effects on the meaning and process of historical research, and the resonance digital history might have with various publics who might encounter it.

Today, Fred Gibbs introduced us to the concepts of data and text mining, and so I decided to see if I could apply what I learned to the JAH article. Would interesting patterns emerge from the various interviews that appeared? My initial work focused on converting the article into a plain text (.txt) file. I then divided it into a variety of smaller files: the questions posed by the JAH editor, all of the responses offered by each individual participant, and each question accompanied by its related set of answers (and here do I wish I knew how to automate this process instead of cutting and pasting for an hour). In the end, I had one large question file, eight participant files, and nine individual question files, as well as the original .txt file. Different computations were run through Voyant Tools.

Caveat: it is important to note that I have no idea how this interview was edited. I am assuming that the final printed comments reflected the overall contributions of each of the interviewees, but I most certainly cannot be sure.

Overall, one can see the general emphasis of the article through a simple word cloud:

The cloud specifically excludes common English words, as well as  other common, but probably less helpful, words: digital, history, historians, historical, each of the author’s names (which were used to signal the start of each of their contributions in the article), and the word “JAH”, which was used before each question. We are left with the following twenty most frequently used words:

Certainly a couple of themes begin to emerge from this basic analysis. First, the emphasis digital historians placed on the field being “new” is clearly apparent, especially as “new” is frequently followed by “media” or “digital technology” in the article. Given the article’s goal of identifying and describing digital history as a new enterprise historians were embarking upon, this may not be surprising; however, the strong use of the word does supply evidence for Fred Gibbs’ point today of the somewhat overstated dichotomy between “traditional” (textually-based) history and “new” (digitally-based) history.

The interviewees also signaled a strong interest in thinking about “research” and “scholarship,” both of which appear more frequently than the word “student.” What might be even more interesting is the way that “research” and “scholarship” appear throughout the article, whereas “student” is mainly concentrated in the early questions on pedagogy:

Yet, despite the importance of words like “new”, “research”, and “scholarship” in the printed discussion, it is also worth noting how similarly the remaining words appear in frequency. In fact, 86% of the top fifty words in the article fall within one standard deviation of the mean (85% if the top three results are excluded). Thus participants appear to have been equally interested in most of the topics covered in the article.

If we examine the responses by interviewee, though, we do see some interesting differences begin to emerge. First, it is worth noting that each interviewee is not represented equally in the interview:

43% of all the text is supplied by two individuals. Consequently, if might be useful to see if Cohen and Thomas had a particular effect on the overall pattern of words in the article.

A graph of the seven most common words broken down by interviewee reveals some important trends:

First, Cohen’s responses overwhelmingly focused on “research”, “new”, “web”, “scholarship,” and “work.” Given his position as Director of the Roy Rosenzweig Center for History and New Media at the time of the interview, this is probably not very surprising. “Medium” and “scholarship” also appear quite frequently in Thomas’ interview, which given his role in the Valley of the Shadow project should also not be too surprising.

Fewer clear trends appear when these seven highly ranked words are analyzed by question:

In the end, this post is mainly an experiment to see if I could use the tools we were taught, but the data does allow for some broad conclusions to be drawn. Overall, it seems that the interview and interviewees were mainly concerned with thinking about the “newness” of digital history in 2008 – figuring out what it might mean particularly for scholarship, though with a reasonably strong emphasis on pedagogy. It is worth noting that certain topics that have dominated the 2014 Institute discussion, such as the place of public history and museums within and around digital history, are present, but are much lower in the list of frequently used 2008 words. Moreover, “questions”, “methods”, and “process” are also quite low in the list, possibly indicating a certain uncertainty about these topics six years ago.

For simple comparison, one can find a Wordle compiled by Spencer Roberts of participants’ blog posts,and one can see a much stronger emphasis placed on “students”, “project,” “comments”, and “sources.” Whether this change is signaling a shift in DH conversation, is resulting from who the participants of the Institute are (mainly from Master’s-granting degree programs, instead of larger research universities), or is arising from the structure of the Institute is beyond the goals of this overly long post.


Source: Text Mining “Interchange: The Promise of Digital History”

Distant Reading

Today we learned about several interesting ways to analyze texts. In my own research, the best use I have considered so far might be culling city guidebooks for certain phrases or terms. Part of my project has to do with tracing the ways in which the antebellum urban South has been remembered. So I find myself perusing guidebooks looking, for instance, for references to African Americans or to slavery. Being able to do that sort of scan across a broader number of sources could be useful, were I able to sort out the technical aspects.

Distant Reading

During the two weeks of Doing Digital History, I have found some concepts more or less easy to assimilate into my work as a public historian and public history educator. I felt competent and confident when establishing a domain, playing with WordPress, experimenting with Omeka, dabbling with some tools for annotating images, and animating brief stories. I am less comfortable with mapping, though I am beginning to recognize the ways in which some simple tools –like storymap– might be immediately useful to my students. Starting simple will also serve as a point of entry for me, allowing me to work my way toward more complex mapping projects. I feel most tentative about text mining and distant reading.  I’m still not sure I recognize its potential for my own research, and I suspect this is the digital realm I am least likely to put to use in the immediate future.

That said, I may play with Voyant and Overview in my fall public history practicum course. After playing with the technology a bit, I understand that text mining can help my students identify interpretive pathways for a public digital project about slavery and freedom on the border between Maryland and Pennsylvania. Mapping patterns of word use and syntax will encourage students to think more critically about the different uses of words in private contexts and in legal contexts, about the ways in which word use and meanings changed across state boundaries, and about the words chosen by free people to describe the experience of freedom in and near a border state. Encouraging my students to play will, I think, help me understand text mining and its value for research and analysis.

Source: Distant Reading