[Fall 2010] Quantifier Scope, Intro to GLMs and GLMMS

Wed 28 December 2011 by Adrian Brasoveanu

Plan: talk about the analysis of a subpart of the quantifier-scope-tagged corpus that Scott AnderBois, Robert Henderson and I have been working on over the last year; we will focus exclusively on sentences with 2 quantifiers.

Main goal: provide motivation for continuing the discussion of linear regression from last quarter and then move on to generalized linear models (glm), generalized linear mixed-effects models (glmm) and show how we can estimate such models in BUGS in addition to the functions already available in R.

Additional motivation for our glm, glmm & BUGS plan is provided by the following papers and book:

Schedule and materials:

  1. Brief overview of the phenomena and the dataset; binomial “t-test” comparing the wide-scope preference of Subjects vs. Objects; the binomial “t-test” in WinBUGS and comparison with the glm() estimates: CLG-quant-scope-1.r
  2. Generalizing the binomial “t-test” to a binomial “ANOVA” comparing the wide-scope preference of Subjects (reference category) vs. 9 other grammatical functions (Objects, Adjuncts, existential Pivots, various prepositional phrases); generalizing the WinBUGS “t-test” script to the binomial “ANOVA” model; comparison with the glm() estimates: CLG-quant-scope-2.r
  3. Summary of most of the material presented in spring 2010, up to and including multiple linear regression and graphical comparison of a multiple regression model (for the dataset we generated) and the simple regression models we obtain by dropping predictors: CLG-regression-fall2010-1.r
  4. Model selection (main topic) and a little bit on interactions, multicollinearity and permutation tests: CLG-regression-fall2010-2.r
  5. Confidence intervals and confidence regions for (sets of) coefficients, confidence intervals for predicted mean responses and for predicted responses (and the difference between them), extrapolation: CLG-regression-fall2010-3.r
  6. Robert Henderson introduced the Stanford tagger and Mark Greenwood’s rule-based NP chunker and then put them to work on a small part of the tagged LSAT logic-puzzle corpus we introduced earlier in the quarter: pos-chunking-script.zip
  7. Regression & observational data (e.g., corpus data), the proper interpretation of coefficients in multiple regression analyses, partial regression plots, regression and causality, parameter estimation vs. prediction, practical difficulties for regression analysis: CLG-regression-fall2010-4.r