Processing Texts: Making Documents Machine Readable

As planned, I have spent the greater part of the day organizing the approximately 2,000 freedom petition photographs I took at the National Archives into a coherent filing system, organized by term, category of filing, and case number and documenting the image numbers in the spreadsheet I maintained as I was photographing. I think I am a little over half done with this process.

Though I still have about an hour and a half left to dedicate to the D.C. courts project today, I am turning my attention to my other project–Locating Lord Greystoke. Right now we are in the process of building two corpuses of texts–one that is large, inclusive, and will be used in our text analysis efforts, and and a second smaller one of key documents that will be featured on the project’s website.

The document that I am working with now has been reviewed by the project leader, historian Jeannette Jones, who has pulled out selected passages from the text and made note of the people, places, and concepts she wants to be called out on the website. An undergraduate student also working on the project has already run the document through an OCR program, the output of which I will mark up in TEI. The notes on the document prepared by Dr. Jones indicate what will make it into the <profileDesc> tag in the TEI header, which items she wants to appear in the site’s Encyclopedia and thus need to be encoded in the text, and which places are going to appear as mapping points for this particular document. At the moment, the website’s documents are indexed in Solr and transformed by Cocoon, but we are looking into migrating over to a different framework in the very near future.

You can  view a draft of this process in action at the project’s website, where we have set up a proof of concept using minimal documents and our first pass at the project’s mapping interface.

sdafd
A look at my screen: Dr. Jones’ notes; Oxygen, which I use to encode the XML document; and the Google Spreadsheet that is serving as a working bibliography of our project documents.

Managing Collaboration

One of the projects that I am working on is a collaborative effort with scholars at another university–Jennifer Guiliano and Trevor Muñoz at the Maryland Institute for Technology in the Humanities at the University of Maryland. Because there are over a dozen people (located in various places) involved in the project, we have turned to the project management tool Basecamp to help organize our discussions and work plans.

Basecamp's Interface
Basecamp’s Interface

Thus far it has proven to be a valuable tool to keep project participants on the same page, though I am definitely interested in methods or tools used by fellow project managers in the Digital Humanities.

A Day of DH Begins. Or, How Every Day is DH Day.

Today I step off of the sidelines and into the game. After observing last year’s #DayOfDH from afar and blogging about it on my own blog,  I decided to join in in an official capacity this year.

I am a historian by training , and while I do work in an academic setting, my career path is very much alt-ac. Upon graduating with my Master’s Degree from the University of Nebraska-Lincoln in 2012, I was lucky enough to be brought onto a digital history project by William G. Thomas III looking into the history of and implications of petitions for freedom filed by an enslaved woman, Mima Queen, and various other members of her family. The project has since been funded by a Collaborative Research Grant from the National Endowment for the Humanities and has evolved into a larger project looking into the family networks and legal questions found in the records of the Circuit Court for the District of Columbia in the early- to mid-nineteenth century. About a year later, I was also brought in by Jeannette Jones as project manager for her work, Locating Lord Greystoke: U.S. Empire, Race, and the African Question, 1847-1919. The project explores the ways in which the United States understood and participated in the partition of Africa by European powers using  mapping and textual analysis.

My day-to-day work for each project differs, but usually involves a bit of document transcription, TEI encoding, e-mail exchanges or brief meetings with project members, and if I’m lucky, a bit of web design. I do not have an office of my own, so I rotate between the computer lab in the History Department (where the faculty members I work for are) and the Center for Digital Research in the Humanities at the library (where the projects’ tech staff is and where my paychecks come from).

IMG_5532
My workspace in the History Department.

Today I will be working in the History Department, where most of my time will be devoted to finishing up the organization of photographs of freedom petitions I took on research trip to the National Archives last week. I also need to finish preparing for the Mapping Lab I will be leading tomorrow at the DH Bootcamp @ UNL. Later this week, I also have  the Nebraska Forum on Digital Humanities 2014 to attend. Quite a week ahead for DH.

Today's workspace.
Today’s workspace. All the photos are on my MacBook, which looks hilariously tiny next to the iMac I usually use.