[Fall 2010] Quantifier Scope, Intro to GLMs and GLMMS
Wed 28 December 2011 by Adrian BrasoveanuPlan: talk about the analysis of a subpart of the quantifier-scope-tagged corpus that Scott AnderBois, Robert Henderson and I have been working on over the last year; we will focus exclusively on sentences with 2 quantifiers.
Main goal: provide motivation for continuing the discussion of linear regression from last quarter and then move on to generalized linear models (glm), generalized linear mixed-effects models (glmm) and show how we can estimate such models in BUGS in addition to the functions already available in R.
Additional motivation for our glm, glmm & BUGS plan is provided by the following papers and book:
- Probabilistic Syntax. Christopher D. Manning. In Rens Bod, Jennifer Hay, and Stefanie Jannedy (eds.), Probabilistic Linguistics, MIT Press, 2003.
- The Great Number Crunch, Charles Yang. Journal of Linguistics 44, 2008, 205-228 (provides a nice overview and discussion of several papers, including “Probabilistic Syntax”).
- Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models, T. Florian Jaeger. Journal of Memory and Language, Volume 59, Issue 4, November 2008, 434-446.
- Examples of Mixed-effects Modeling with Crossed Random Effects and with Binomial Data, Hugo Quene & Huub van den Bergh. Journal of Memory and Language, Volume 59, Issue 4, November 2008, 413-425.
- Mixed-effects modeling with crossed random effects for subjects and items. R.H. Baayen, D.J. Davidson & D.M. Bates. Journal of Memory and Language, Volume 59, Issue 4, November 2008, 390-412.
- A Course in Bayesian Graphical Modeling for Cognitive Science, Michael D. Lee & Eric-Jan Wagenmakers.
Schedule and materials:
- Brief overview of the phenomena and the dataset; binomial “t-test” comparing the wide-scope preference of Subjects vs. Objects; the binomial “t-test” in WinBUGS and comparison with the glm() estimates: CLG-quant-scope-1.r
- Generalizing the binomial “t-test” to a binomial “ANOVA” comparing the wide-scope preference of Subjects (reference category) vs. 9 other grammatical functions (Objects, Adjuncts, existential Pivots, various prepositional phrases); generalizing the WinBUGS “t-test” script to the binomial “ANOVA” model; comparison with the glm() estimates: CLG-quant-scope-2.r
- Summary of most of the material presented in spring 2010, up to and including multiple linear regression and graphical comparison of a multiple regression model (for the dataset we generated) and the simple regression models we obtain by dropping predictors: CLG-regression-fall2010-1.r
- Model selection (main topic) and a little bit on interactions, multicollinearity and permutation tests: CLG-regression-fall2010-2.r
- Confidence intervals and confidence regions for (sets of) coefficients, confidence intervals for predicted mean responses and for predicted responses (and the difference between them), extrapolation: CLG-regression-fall2010-3.r
- Robert Henderson introduced the Stanford tagger and Mark Greenwood’s rule-based NP chunker and then put them to work on a small part of the tagged LSAT logic-puzzle corpus we introduced earlier in the quarter: pos-chunking-script.zip
- Regression & observational data (e.g., corpus data), the proper interpretation of coefficients in multiple regression analyses, partial regression plots, regression and causality, parameter estimation vs. prediction, practical difficulties for regression analysis: CLG-regression-fall2010-4.r