[Spring 2011] GLMs & GLMMs (ctd.)

Wed 28 December 2011 by Adrian Brasoveanu

Plan: linear regression wrap-up; introduce logistic regression models, maximum likelihood estimation of their parameters and the frequentist quantification of the uncertainty associated with those estimates; introduce the Bayesian approach to regression models with the goal of having a flexible way to estimate all sorts of logistic regression models, including binary / binomial, polytomous / multinomial and ordinal regression models, with and without random effects (this kind of models crop up all the time in linguistic data analysis).

The minimum we will hopefully do: binary logistic regression (and Poisson regression / log-linear models) with and without random effects, both the frequentist and the Bayesian way (using R and WinBUGS); we will use simulated data for a variety of regression models and discuss in detail how to obtain such data. This completes the tutorial on linear regression from spring 2010 & fall 2010.

  1. Quick recap (what is the interpretation of a particular coefficient in multiple linear regression); regression assumptions / model misspecification; checking linearity; basic variable transformations; checking error assumptions: constant variance, normality, correlated errors; finding unusual observations: outliers, leverage, the jackknife, influential observations; some more notes about (multi)collinearity: CLG-linear-regression-spring2011.r

  2. Intro to logistic regression; motivating dataset (CHD\~AGE) available here; logistic vs. linear regression; intro to generalized linear models; probabilities vs. odds vs. log-odds / logits; quick heuristic for interpreting logits (divide the logit by 4 and you get the approximate shift in probability relative to chance, i.e., relative to 0.5 probability); two different ways to conceptualize probabilities as areas under the pdf of the logistic distribution (one of them is standard for binomial logistic regression; the other one opens the way towards ordinal logistic regression); the logistic regression for the motivating dataset: CLG-logistic-regression-spring2011-1.r

  3. A couple of simple examples of glms, a simple binomial ANOVA and a simple binomial ANCOVA, simulating a dataset for a simple logistic regression model, a detailed analysis of a linguistic example (Henrietta Cedergren’s 1973 study of /s/-deletion in Panama Spanish, based on some of Chris Manning’s materials), a simple ordinal logistic regression: CLG-logistic-regression-spring2011-2.r

  4. Three simple examples of mixed-effects models (2 lmms and 1 glmm); comparison of (i) a “complete pooling” approach (analysis done at the individual level; individuals treated as independent), (ii) aggregation approach (analysis done at the cluster level on aggregate measures, addresses a different question than analysis at the lower level, subject to ecological fallacies), (iii) a “no pooling” approach (clusters treated as a fixed effect that is covaried out, i.e., a multiple regression with a dummy variable coding for the groups) and (iv) multilevel modeling: a “partial pooling” approach, intermediate between (i) and (iii) and able to provide the group-level estimates that are the main goal of (ii); allowing for the estimates of variance components to be 0; allowing for random slopes in addition to random intercepts; measurements nested within subjects: CLG-mixed-effects-basics-spring2011.r