Below are publically usable datasets; when referencing, please use the citation listed. If you have any questions please contact me.
Data Sets
(2012) ‘POLITICAL-ADS’ A corpus of 141 TV ads from the 2008 Presidential Election, with each verb phrase and noun phrase annotated by Mechanical Turk workers for scalar polarity across 4 dimensions: the narrator\'s perspective, the annotator\'s perspective, society in general\'s perspective, and a measure of controversiality. Kevin Reschke\'s 2011 M.A. Thesis, which details the construction of this corpus is included in the zip file.
(2012) ‘NPS UCSC Focus of Negation Corpus’ A corpus of 2940 sentences from Propbank annotated for focus of negation. This dataset is a reannotation exercise of the FOC-NEG corpus of Blanco & Moldovan 2011. The accompanying 2012 ACL Workshop paper details the corpus building process.
files in preparation
(2012) ‘Internet Argument Corpus’ A set of 390,704 posts in 11,800 discussions extracted from the online debate site 4forums.com. A 2866 thread/130,206 post extract of the corpus has been manually sided for topic of discussion, and subsets of this topic-labeled extract have been annotated for several dialogic and argumentative markers: degrees of agreement with a previous post, cordiality, audience- direction, combativeness, assertiveness, emotionality of argumentation, and sarcasm.
(2011) ‘The Exclusive Interpretation of Plurals: Supplementary Materials’ The data used in and participant results of four image verification experiments on the determination of plural exclusivity under universal quantified contexts.
(2011) ‘UCSC NPS UMD Persuasion Corpus’ A corpus of 40 English blogs from the Blog Authorship Corpus (Koppel et al. 2006) annotated for persuasive indicators according to Cialdini and Marwell & Schmitt. The blogs contain 25,048 posts, of which 4,603 contain persuasion or persuasive tactics.