Course materials for Quantitative Methods in Linguistics (Spring 2014)

Fri 06 June 2014 by Adrian Brasoveanu

Quantitative Methods in Linguistics (spring 2014; upper div. undergraduate course, UCSC Linguistics) provided an introduction to data analysis for linguistics focusing on categorical data and continuous data, and using R. The overarching goal of the course was to give participants the tools and insight to perform data analysis on a data set in the same way that they would build an explanatory linguistic theory to account for a set of generalizations. The course materials (syllabus, lectures notes, slides) are provided below.

The syllabus:

Lecture notes and slides:

  • quant_methods_lecture1.pdf: R as calculator, Variables, Vectors, Lists, More about string manipulation, The basics of working with (local) files
  • quant_methods_lecture2.pdf: Quick recap and some related issues, Patterned vectors, Logical operators, Indexing/slicing with logical operators, Set operators, Counting, Sorting, Printing / Saving to a file, Factors
  • quant_methods_lecture3.pdf: Basic graphics, Data frames, Saving a data frame to a file, Attaching and detaching data frames, Subsetting data frames, Ordering data frames, Lists, Character / String Processing, More graphics, The Brown Corpus
  • quant_methods_lecture4.pdf: Elementary Control (of) Flow Structures (IF, FOR, Example: obtaining the first 12 Fibonacci numbers, WHILE), Taking advantage of the vectorial nature of R and 3 example problems, General programming tips (Top-down design, Debugging and maintenance, Measuring the time your program takes, i.e., very basic profiling), Goals of quantitative modeling and analysis and types of data, Introducing the idea of a probability distribution: Frequency distributions, Becoming familiar with the normal distribution
  • quant_methods_lecture4_related_slides1_intro_prob1.pdf: Sample Spaces and Events (Sample Spaces, Events, Axioms and Rules of Probability), Joint, Conditional and Marginal Probability, Bayes’ Theorem, Independence and Conditional Independence, Random Variables and Probability Distributions, Expectation
  • quant_methods_lecture4_related_slides2_intro_prob2.pdf: Probability: Frequency vs Reasonable Expectation, Generalizing Classical Logic, Patterns of Plausible Inference (Modus Ponens, Modus Tollens, Affirming the Consequent, Denying the Antecedent, Affirming the Consequent of Weaker/Plausible Implications, Probability Theory as the Logic of Data Analysis
  • quant_methods_lecture5.pdf: Introducing least squares models — Data generation, First attempt: the mean of y, Quantifying the error, From sample to population: basic statistical (inductive) inference (Sample size, Sample variance, Putting the two together: the standard error (SE) of the mean, the Central Limit Theorem (CLT), and 95% confidence intervals (CIs)), Understanding the lm output a little bit better: t-distributions, Applying the Central Limit Theorem (CLT) to Bernoulli distributed data (The Bernoulli distribution, The binomial distribution, The binomial distribution and the Central Limit Theorem)
  • quant_methods_lecture6.pdf: Second attempt: two means, i.e., predicting y values based on x2 values, The two-mean linear (regression) model, Analysis of Variance (ANOVA) tables, T-tests: the simple version of the F-test we can use for single categorical parameters, Third attempt: ‘many’ means, i.e., predicting y values based on x1 values, Comparing reg1 with the 1-mean model, The intercept and slope coefficients (Alternative slopes, Alternative intercepts, Alternative intercepts and slopes)
  • quant_methods_lecture7.pdf: Recap and related issues (scatter plot matrices), R-squared, Correlation, An alternative way of calculating correlation, Inference for simple linear regression: from sample to population, The sampling error for the slope, The sampling error for the intercept, Putting it all together: plotting predictions for linear regression models
  • quant_methods_lecture8.pdf: Recap, Fourth attempt: multiple linear regression, Graphical comparison of reg1, reg2 and reg3, ANOVA and model selection, Adding interactions, Interpreting interactions, More on interactions (Multicollinearity and variable centering, Another example of regression with interaction terms)
  • quant_methods_lecture9.pdf: Essentials of linear models (The stochastic component of linear models: probability distributions, The deterministic component of linear models: linear predictors and design matrices), T-tests: more realistic examples (T-test with equal variances, T-test with unequal variances), Simple linear regression (Interpretation of confidence intervals), One-way ANOVA, Random-effects ANOVA, Two-way ANOVA (Aside: using simulation to assess the bias and precision of an estimator, Analysis of two-way ANOVA data), Linear mixed-effects models (random-intercepts models, random-coefficients model without correlation between intercepts and slopes, random-coefficients model with correlation between intercepts and slopes)
  • quant_methods_lecture10.pdf: Basic introduction to logistic regression, Generalized linear models (GLMs) (The identity link, The log link, The logit link), More about odds and log-odds, i.e., logits, The standard logistic distribution, The logistic regression for the CHD~AGE data, A couple of simple examples of GLMs (4 models and their visualization), Model comparison (Deviance and log-likelihood ratios, Background on likelihood functions and maximum likelihood estimates (MLEs), Evaluating the interaction model, Adding polynomial functions as additional predictors)
  • quant_methods_lecture11.pdf: Another logistic regression example: manipulating reasonable doubt, Simulating datasets for logistic regression, A linguistic example: /s/-deletion in Panamanian Spanish, Long format data & logistic regression

The homework assignments, which were language/linguistics-centric, are available upon request.