[Fall 2010] Quantifier Scope, Intro to GLMs and GLMMS

Wed 28 December 2011 by Adrian Brasoveanu

Plan: talk about the analysis of a subpart of the quantifier-scope-tagged corpus that Scott AnderBois, Robert Henderson and I have been working on over the last year; we will focus exclusively on sentences with 2 quantifiers.

Main goal: provide motivation for continuing the discussion of linear regression from last quarter and then move on to generalized linear models (glm), generalized linear mixed-effects models (glmm) and show how we can estimate such models in BUGS in addition to the functions already available in R.

Additional motivation for our glm, glmm & BUGS plan is provided by the following papers and book:

Probabilistic Syntax. Christopher D. Manning. In Rens Bod, Jennifer Hay, and Stefanie Jannedy (eds.), Probabilistic Linguistics, MIT Press, 2003.
The Great Number Crunch, Charles Yang. Journal of Linguistics 44, 2008, 205-228 (provides a nice overview and discussion of several papers, including “Probabilistic Syntax”).
Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models, T. Florian Jaeger. Journal of Memory and Language, Volume 59, Issue 4, November 2008, 434-446.
Examples of Mixed-effects Modeling with Crossed Random Effects and with Binomial Data, Hugo Quene & Huub van den Bergh. Journal of Memory and Language, Volume 59, Issue 4, November 2008, 413-425.
Mixed-effects modeling with crossed random effects for subjects and items. R.H. Baayen, D.J. Davidson & D.M. Bates. Journal of Memory and Language, Volume 59, Issue 4, November 2008, 390-412.
A Course in Bayesian Graphical Modeling for Cognitive Science, Michael D. Lee & Eric-Jan Wagenmakers.

Schedule and materials:

Brief overview of the phenomena and the dataset; binomial “t-test” comparing the wide-scope preference of Subjects vs. Objects; the binomial “t-test” in WinBUGS and comparison with the glm() estimates: CLG-quant-scope-1.r
Generalizing the binomial “t-test” to a binomial “ANOVA” comparing the wide-scope preference of Subjects (reference category) vs. 9 other grammatical functions (Objects, Adjuncts, existential Pivots, various prepositional phrases); generalizing the WinBUGS “t-test” script to the binomial “ANOVA” model; comparison with the glm() estimates: CLG-quant-scope-2.r
Summary of most of the material presented in spring 2010, up to and including multiple linear regression and graphical comparison of a multiple regression model (for the dataset we generated) and the simple regression models we obtain by dropping predictors: CLG-regression-fall2010-1.r
Model selection (main topic) and a little bit on interactions, multicollinearity and permutation tests: CLG-regression-fall2010-2.r
Confidence intervals and confidence regions for (sets of) coefficients, confidence intervals for predicted mean responses and for predicted responses (and the difference between them), extrapolation: CLG-regression-fall2010-3.r
Robert Henderson introduced the Stanford tagger and Mark Greenwood’s rule-based NP chunker and then put them to work on a small part of the tagged LSAT logic-puzzle corpus we introduced earlier in the quarter: pos-chunking-script.zip
Regression & observational data (e.g., corpus data), the proper interpretation of coefficients in multiple regression analyses, partial regression plots, regression and causality, parameter estimation vs. prediction, practical difficulties for regression analysis: CLG-regression-fall2010-4.r