Course in statistics for linguists – spring 2017

The course will be held as a series of 7 sessions of about 75 minutes each, over the spring term 2017. Between each session, practical exercises will be given for homework. The course will use the statistical program R and is supported by the book How to do linguistics with R by Natalia Levshina. We will not cover the entire book, however. We will start at the  beginning and discuss details about progression and contents as we go.

The seminar will be led by Bård Uri Jensen (bard.jensen@inn.no); I am a lecturer of Norwegian language at Inland Norway University of Applied Sciences (in Hamar), but I also have a part-time position at MultiLing. All teaching and material will be given in English.

Overview

Dates and times have been decided, but there might be adjustments if necessary:

  • March 3 11:00 – 12:30, Meeting room, MultiLing
  • March 21 11:00 – 12:30, HW-536
  • April 4 11:00 – 12:30, Meeting room, MultiLing
  • April 25 10:30 – 12:00, Meeting room, MultiLing
  • May 2 10:30 – 12:00, Meeting room, MultiLing
  • May 16 10:30 – 12:00, Meeting room, MultiLing
  • May 23 10:30 – 12:00, Meeting room, MultiLing
  • June 6 13:00 – 14:30, Meeting room, MultiLing

This is meant to be a course in statistics, and not a course in a statistical tool or program. However, in order to practise statistics, on has to use a tool, and in this course we will use a tool called R, which is an open-source, free program. It would be possible to sit in and follow the course without using R, but I believe that in order to learn to do statistics, one actually has to carry out analyses on concrete numbers and study the results. Therefore, I assume that everyone can bring their own computer with the R program installed to the classes, and that everyone can put some effort into doing practical exercises between the classes. This means that everyone must expect to spend some time on learning some basic things about R, if they don’t know this before.

In order to use R, one also needs an editing program for editing R commands. There are several options, and the choice of program is up to each participant. I use Tinn-R, which is a free editing program which is tailor-made for R. It is free, and I am quite pleased with it myself. Many people use RStudio and are happy with that. Using e.g. NotePad or other simple editing tools is also possible, but I do not really recommend that. The Word program can not be used for this purpose. If you choose to use something other than Tinn-R, I might not be able to help you out with any technical problems related to the program.

We will use data examples from the book and also some data which I will make available, possibly also supplied with your own data, if time allows. For each session, you will get reading homework before the session and practical exercises after the session. As a rule, the homework will be given about a fortnight before the next session.

If you have your own data which might be suitable for statistical analysis, it may be a good idea to have them available when you are working with this course. Understanding and remembering concepts and methods is often made simpler if one can experiment with the procedures on data which one understands. I will demonstrate how to move data e.g. from Excel to R.