Playing with SSHRC Awards

A couple of weeks back I was scheduled to talk about digital infrastructures at the FedCan AGM and I was preparing to make some arguments about the rise in interest in the digital and (and related topics), as seen through the SSHRC database of awards (in the end one of my kids was unwell that day and I had to recuse myself from the panel). The argument was two-fold, both of which underline the need for a coherent and efficient strategy for digital infrastructures in Canada:

  1. SSHRC (and by extension Canadian tax payers) have made incredible investments in research and I think there’s an obligation to do as much as possible to make that research publicly accessible. Based on 15 years of data in the Awards Search Engine, SSHRC has funded over 50,000 projects for a total of nearly $1.5 billion!
  2. There’s been a distinct rise in research that’s specifically related to contemporary digital society and digital methodologies and that work requires new infrastructures (people, equipment, tools, etc.) to support it.

I quickly prepared some data last time and generated some simple graphs, and I was intending to return to the analysis work today. Alas, as often happens (DayOfDH is no exception), other things got in the way. I do however have time to step through in a bit more detail the preparation of the data in case it may be of use to anyone.

First I went to the SSHRC Awards Search Engine,  which is a remarkable and underutilized resource – I’m sure designers might have some complaints about the interface, but the functionality is great. I selected Competition Year(s) for 1998-2012 and output as Excel.

SSHRC AwardsThis produces a paginated results table, but at the bottom of the page there’s a link to download the Excel file which contains all rows. I opened that in Excel and then saved it as a tab separated values file with the name sshrc.txt (my version of Excel seems stubbornly adamant about that file extension). Worse than the file extension issue is the fact that I don’t get a choice of character encoding or newline format. If you need other formats, options include using Open/LibreOffice for the conversion or opening the resulting file in a text editor and switching the output to UTF-8.

Now I can run the PHP script that reads the sshrc.txt input file (from the same directory), does a bit of cleanup, and then produces annual files of the titles. The code for this is relatively simple:

TSV to annual titiles
Click on the image to see the code as a GitHub Gist.

Once those annual output files have been created, I can combine them into a zip file and send them to Voyant.

SSHRC in Voyant
SSHRC Award Titles by Year – click on the image to view in Voyant

Here I see the remarkable rise in occurrences of the term “digital” over the past 15 years, though much more needs to be said, especially in a bilingual context (“numérique” and variants seem to rise earlier, for instance). As I said near the beginning, my plan for today was to spend much more time playing with the titles from SSHRC-funded projects, but maybe it will have to wait until DayOfDH 2015…