Tag Archives: japanese

Meiroku zasshi 明六雑誌 project

It’s come to my attention that Fukuzawa Yukichi’s (and others’) early Meiji (1868-1912) journal, Meiroku zasshi 明六雑誌, is available online not just as PDF (which I knew about) but also as a fully tagged XML corpus from NINJAL (and oh my god, it has lemmas). All right!

Screen Shot 2014-04-08 at 11.09.55 AM

I recently met up with Mark Ravina at Association for Asian Studies, who brought this to my attention, and we are doing a lot of brainstorming about what we can do with this as a proof-of-concept project, and then move on to other early Meiji documents. We have big ideas like training OCR to recognize the difference between the katakana and kanji 二, for example; Meiji documents generally break OCR for various reasons like this, because they’re so different from contemporary Japanese. It’s like asking Acrobat to handle a medieval manuscript, in some ways.

But to start, we want to run the contents of Meiroku zasshi through tools like MALLET and Voyant, just to see how they do with non-Western languages (don’t expect any problems, but we’ll see) and what we get out of it. I’d also be interested in going back to the Stanford Core NLP API and seeing what kind of linguistic analysis we can do there. (First, I have to think of a methodology.  :O)

In order to do this, we need whitespace-delimited text with words separated by spaces. I’ve written about this elsewhere, but to sum up, Japanese is not separated by spaces, so tools intended for Western languages think it’s all one big word. There are currently no easy ways I can find to do this splitting; I’m currently working on an application that both strips ruby from Aozora bunko texts AND splits words with a space, but it’s coming slowly. How to get this with Meiroku zasshi in a quick and dirty way that lets us just play with the data?

So today after work, I’m going to use Python’s eTree library for XML to take the contents of the word tags from the corpus and just spit them into a text file delimited by spaces. Quick and dirty! I’ve been meaning to do this for weeks, but since it’s a “day of DH,” I thought I’d use the opportunity to motivate myself. Then, we can play.

Exciting stuff, this corpus. Unfortunately most of NINJAL’s other amazing corpora are available only on CD-ROMs that work on old versions of Windows. Sigh. But I’ll work with what I’ve got.

So that’s your update from the world of Japanese text analysis.

Japanese apps workshop for new Penn students

Today, we’re having a day in the library for prospective and new Penn students who will (hopefully) join our community in the fall. As part of the library presentations, I’ve been asked to talk about Japanese mobile apps, especially for language learning.

While I don’t consider this a necessarily DH thing, some people do, and it’s a way that I integrate technology into my job – through workshops and research guides on various digital resources. (More on that later.)

I did this workshop for librarians at the National Coordinating Council on Japanese Library Resources (NCC)’s workshop before the Council on East Asian Libraries conference a few weeks ago in March 2014. My focus was perhaps too basic for a savvy crowd that uses foreign languages frequently in their work: I covered the procedure for setting up international keyboards on Android and iOS devices, dictionaries, news apps, language learning assistance, and Aozora bunko readers. However, I did manage to impart some lesser known information: how to set up Japanese and other language dictionaries that are built into iOS devices for free. I got some thanks on that one. Also noted was the Aozora 2 Kindle PDF-maker.

Today, I’ll focus more on language learning and the basics of setting up international keyboards. I’ve been surprised at the number of people who don’t know how to do this, but not everyone uses foreign languages on their devices regularly, and on top of that, not everyone loves to poke around deep in the settings of their computer or device. And keyboard switching on Android can be especially tricky, with apps like Simeji. So perhaps covering the basics is a good idea after all.

I don’t have a huge amount of contact with undergrads compared to the reference librarians here, and my workshops tend to be focused on graduate students and faculty with Japanese language skills. So I look forward to working with a new community of pre-undergrads and seeing what their needs and desires are from the library.

Good morning! Intro and some thoughts

I’m up early on this Day of DH 2014. So much to do!

I thought I’d introduce myself to you all, so you have an idea of my background. I’m not your typical DH practitioner – I’m not in the academy (in a traditional way) and I’m also not working with Western-language materials. My concerns don’t always apply to English-language text or European medieval manuscripts. So, if you looked in Asia I’d be less remarkable, but here in the English-language DH world I don’t run across many people like myself.

Anyway, good morning; I’m Molly, the Japanese Studies Librarian at University of Pennsylvania, also managing Korean collection. That means that I take care of everything – from collection development to reference and instruction – that has to do with Japan/Korea, or is in Japanese/Korean at the library and beyond.

Penn_1

Let’s start off with my background. I went to college at University of Pittsburgh for Computer Science and History (Asian history of course) and studied Japanese there for 4 years. I fully intended at the outset to become a software developer, but somewhere along the line, I decided to apply my skills somewhere outside that traditional path: librarianship. And so off I went (with a two-year hiatus in between) to graduate school for a PhD in Asian studies (Japanese literature and book history) and an MSI in Library Science at University of Michigan. Along the way, I interned at the University of Nebraska-Lincoln’s Center for Digital Research in the Humanities (CDRH), redesigning the website for, and rewriting part of the code of, a text analysis app using XSLT for the Cather archive.

After Michigan, I spent a year as a postdoc at Harvard’s Resichauer Institute, working half-time on my humanities research and half-time on a digital archive (The Digital Archive of Japan’s 2011 Disasters, or JDArchive.) Then, in July 2013, I made my first big step into librarianship here at Penn, and have been happily practicing in my chosen profession since then. I’m still new, and there is a lot to learn, but I’m loving every minute.

I admit, finding ways to integrate my CS and humanities background has been a huge challenge. I was most of the way through graduate school when someone recommended going into DH (which didn’t exactly happen – there aren’t a lot of non-postdoc or non-teaching jobs out there now). My dissertation project, a very close-reading-based analysis of five case studies of single books as objects and in terms of their publishing and reception, did not lend itself at all to a digital methodology other than using digital archives to get ahold of their prefaces and keyworded newspaper databases to find their advertisements and reviews. I used a citation index that goes back to the Meiji (1868-1912) period to find sources. Well, most of my research in fact involved browsing physical issues of early 20th-century magazines in the basement of a library in Japan, and looking at the books themselves in addition to the discourse surrounding them. I simply couldn’t think of anything to do that would be “digital.”

So my research in that area – plus what I’m working on now – have continued to be non-DH, although if you’re the kind of person who involves anything “new media” in the DH definition, it may be a little. (I am not that person.) Why do I still call myself a DH practitioner, and why do I bother participating in the community even now?

Well, despite working full time, I’m still committed to figuring out how to apply my skills to new, more DH-style projects, even as I don’t want my other traditional humanities research to die out either. It’s a balancing act. How to find the time and energy to learn new skills and just plain old carve out space to practice ones I already have?

I have a couple of opportunities. One is my copious non-work free time. (Ha. Ha.) Second is my involvement in the open and focused lab sessions of Vitale II, the digital lab (okay, it’s a room with a whiteboard and a camera) at the Kislak Center for special collections in Van Pelt Library. I have a top-secret brainstorming session with a buddy today about how we can make even more social, mental, and temporal space for DH work in the library on a topically focused basis. I’m jealous of the Literary Lab; that should speak for itself. In any case, I also ran into a fellow Japanese studies DH aspirant at the Association for Asian Studies Conference a few weeks ago too, and he and I are plotting with each other as well.

So there are time and social connections to be made, and collaboration that can take place despite all odds. But it’s still a huge challenge. I can do my DH work at 5:30 am, in the evening (when I have no brainpower left), or early on the weekends. I have many other things competing for my time, not least two other research articles I’m working on. I could also be doing my real work at any of those times without the need to explain.

Yet I do it. It’s because I love making things, because I love bringing my interests together and working on something that involves a different part of my brain from reading and writing. I’m excited about the strange and wonderful things that can come from experimental analysis that, even if they aren’t usable, can make me think more broadly and weirdly.

More to follow. よろしくお願いします!