Day oh DH 2014 has made me discover the joy of blogging, and I’ve started my own, regular blog. It’s called From Text to Tables and I’d be very happy to meet you again over there.
My Day of DH is getting to an end here in Switzerland. I’ve enjoyed taking part in this, exchanging with people sometimes far away, sometimes right beneath, in ways that I wouldn’t have on an ordinary day of DH. Thank you Matrix, and all of you DHers of the world!
As the reader may or may not know, my main ongoing project is Orange Textable, which is a software tool that lets the user do “visual programming” for text analysis (project homepage is here, documentation is here; it’s an add-on of Orange Canvas).
Now the reader most surely knows that today is Day of DH, and since I could spare some time to be part of this, I thought I’d give an example of what Textable lets you do on, you got it, Day of DH data. Not all the details, but still, enough to give an idea of how this Textable thing works. In particular, we’ll mine the Day of DH RSS stream and visualize the distribution of user actions (comment, join, etc.) reported therein.
It all starts with a data import widget instance (i.e. small blue circle representing a computational unit) placed on a blank design area called the canvas:
Double clicking on a widget instance opens its interface, as shown on the righthand side of the above figure. In the case of a URLs instance like this one, we essentially need to specify where to fetch data on the internet and in what encoding they’ll be.
RSS streams are encoded in XML, and this one is made up of such text blocks as this:
<item> <guid isPermaLink=”false“>82cb38ef79e1059723835b141f896114</guid> <title>Ranti Junus joined the group DH in the Curriculum</title> <link>http://dayofdh2014.matrix.msu.edu/activity/p/902/</link><pubDate>Tue, 08 Apr 2014 13:12:18 +0000</pubDate> <slash:comments>0</slash:comments> </item>
To extract those items from the stream, we’ll connect our first instance to an instance of another widget, namely Segment:
Here we do need a bit of code — let’s call this the cost of flexibility. In particular, we need to know the syntax of regular expressions (aka regexes), so that we can indicate to our Segment instance that the segments we’re interested in are the text blocks that are comprised between <item> and </item> in the RSS stream. There are 50 such segments, as we’re told.
The next step is arguably the most difficult one. We want to extract from each “item” the action that is being reported (in the above example, it would be joined). To do that, we’ll use another instance of Segment. This time, however, we’ll need to switch on the “advanced settings” in order to access some more powerful features…
As usual with instances of Segment, we must describe the segments that will be extracted with a regex: here we state that these segments should begin with a <title> XML tag, and shortly after, should contain one of a number of verb forms (wrote, became, posted, changed, and so on).
But there’s more to it: in each such segment, we’ll extract the verb form and use it as an annotation value associated to a specific key, namely the key action. (Note that the parameter &1 in the Annotation value field above means “the text block that corresponds to the first pair of parentheses in the regex”.)
Now we can specifically access this piece of information by saying, e.g., “count the frequency of each possible action value in the data”. Let’s do just that, by means of an instance of the Count widget:
The interface is rather self-explanatory in this case: basically we’re counting values associated with the action annotation key in the items segmentation that we’ve previously built using Segment.
Then after going through an instance of Convert (whose only purpose is to reorganize the data in a way that’s not very interesting to describe here), we can finally display the result using an instance of Distributions:
Did it change between the moment where I started this post and the moment where I’m finishing it? Let’s find out:
Well not much of a change in this period, maybe a few more posts and a few less comments, but writing remains the most frequent action, followed by posting.
More than an earth-shattering discovery about Day of DH, I hope this example shows that Orange Textable lets the user perform quite specific text analyses (almost) without writing a single line of code (regexes set aside). And I’d very much like to hear DHer’s opinion about this.
Like many people in Switzerland and elsewhere, I spend some time in public transportations every day, and Day of DH is no exception.
A good opportunity to review students’ assignments. Today’s featured assignment had much computing but little Humanities. Hopefully my next activity will provide more relevant thoughts for this blog.
Ok here we go. Ordinary child care and ablutions set aside, this day of DH starts with a little data crunching.
My computer has been running all night to deliver good results this morning: it looks like the method I’m working on (with Guillaume Guex) actually makes for a more robust measurement of inflectional diversity. Sounds to me like the beginning of an excellent Day of DH.
The title paraphrases, in terms that carry a distinctive ideological weight in the DH rhetoric, a formulation originally due to Mitchel Resnick (1996), I believe:
Constructional design is a type of meta-design: it involves the design of new tools and activities to support students in their own design activities. In short, constructional design involves designing for designers
Narrowing the scope to the specific field of text analysis, building for builders is what I’ve committed myself to doing in the framework of the Textable project: I have tried to build a construction kit for text analysis. While I became aware of this metaphor rather early in the project’s lifetime, I realized only recently that I am really just a node in a dense network of builders building for other builders.
Building for the likes of me are people such as the members of University of Ljubljana’s Biolab, creators of Orange Canvas, the open source data mining software of which Textable is an extension. I’ve had the opportunity to collaborate with these fine people on the development of Textable’s latest version, and to appreciate the sensible difference between their degree of mastery in programming and software development and mine.
Building for the likes of Biolab are Guido van Rossum and the community driving the development of Python, the general-purpose scripting language in which Orange has been programmed. I can’t fully envision the kind of mental disposition it takes to create a working general-purpose programming language, not to mention a flourishing one. And these folks in turn have built on the buildings of a number of builders, including Alan Turing, who formalized an abstract computing machine in his well-known 1936 paper, as part of an even older discussion between mathematicians.
This could be pursued all the way back to the roots of rational thought, I guess. A radiant, ever-growing structure where the most central nodes have designed the most general and productive conceptual construction kits, and each new layer retains only the constructional features that makes the core, original generative power most readily accessible to ever more specialized classes of builders.
Have I actually written this grandiloquent account? In any event, what I’m wondering is this: given that those who have built for me have been built for by others, could it be that those who I’m building for will in turn build for someone else? If not, how does one get to that higher level of constructional design?