I’m getting a late start with my blogging. An early morning meeting took me until lunchtime, and, after a yummy barbecue lunch (pulled pork sandwich), I’m trying to catch up.
More specifically, I’m at California State University, Fullerton today for a meeting about an ongoing project to do “text mining” on a corpus of Classical Chinese texts. I put “text mining” in inverted commas because we’re really using a wide rang of computational methodologies: today’s meeting was mostly about topic modelling. Chinese provides a number of challenges for topic modelling, not the least of which is the fact that Mallet doesn’t process Chinese characters. I think I’ve got around that problem, so the main issues were how to set up parameters for experiments with the individual corpus. Texts range from 350 to 150,000 words, and we have yet to identify a sweet spot for the number of topics to use. Still, there are some promising results, and I’m encouraged by the possibilities.
But now I have to switch gears, as I promised myself that I’d work on my main project, the Archive of Early Middle English. Last night I finished TEI-tagging a (relatively) lengthy text, The Infancy of Christ, from Oxford, Bodleian Library, Laud Misc. 108. I began my markup as our project schema was evolving, so I now have to go back and reconcile my earlier practices. I’m hoping I can get that done today without going into pulled pork coma. If I’m lucky, I’ll also be able to fit in some administrative work.
I’ll also have to decide whether I want to do some grading during Day of DH (which also falls during Spring Break). Starbucks is calling to me.