Seminar: Advanced Statistics with R (2022)

Just like last year, I've taught a short two-day block seminar on advanced social science using R at the University of Zurich.

The overall composition remained similar, covering:

  • Basic R syntax (strictly tidyverse)
  • Some linear models, GLM and multilevel models
  • A bit of text analysis (though less than before, skipping quanteda and LDA/stm)
  • AI for social science

Looking back, it is quite interesting to compare my own assessment of the role of AI in social science and the trajectory of its development. When I taught my first AI-related course four years ago, I did not consider deep learning to be a good strategic choice for the typical social scientist in practical work. It was a moonshot that some methods-affine people would want to risk, with unclear outcome, but not something that could be used in practice.

This assessment has reversed much more quickly than I thought: Today, fine tuning a transformer takes a few lines of code with simpletransformers in python or with the text package in R. Setting up the required libraries and drivers is still a pain (especially thanks to the disaster that CUDA versioning, reticulate and conda are), but that's solvable (e.g. through online notebooks or docker containers).

So as of today, I'd say that if social scientists have a problem that can be modeled using neural networks, it's somewhat negligent not to use them. Bag of words does have some use cases that remain, but they are shrinking.