Header image

 


ENVS 291 Transition to R

The goal of this class is to help grad students, postdocs, or faculty who have a background in basic statistics and a familiarity with some other statistics package (JMP, SYSTAT, SAS, SPSS) to become comfortable with the R Project as a platform for statistical analyses. It is not meant as a course in statistics, nor will it cover more than a small portion of what is available in R. At the end of the course you should be comfortable managing data in R, making graphs, performing an array of different basic statistical analyses, and be able to use the extensive resources available on line and in books to learn to do just about anything statistical you'd like to in R.

You will get the most out of the course if you spend a couple hours a week playing with R. 

GENERAL NOTES

ENVS 291 Transition to R course will be offered in Winter Quarter 2015.

UCSC graduate students can sign up, as space allows, for ENVS 291-03 (43166)(G Gilbert).

Meetings are TTh 2:00-3:45 Jan 8-28 for the core of the course;
Remaining meetings are 19 Feb, 26 Feb, and 5 March for classes 8, 9, 10.

 

The course will not be offered again until 2017.

 

Page Navigation
Course Handouts Sample Data Sets Resources Useful code PC hints Graphics resources    

Course handouts (green are updated for 2015)
ENVS 291 Transitioning to R Syllabus W2015 Classes 4& 5 Basic Stats: Regression & ANOVA.docx

Class 9 - More Advanced Graphics.docx
Note:must install ggplot2 and grid packages

Class 1 Getting Started.docx Class 6 Basic graphics.docx Class 10a Vegan.docx (must install vegan & lattice)
Class 2 Handling Data.docx Class 7 - Functions, Loops, and Conditional Statements.docx Class 10b PicantePhylomatic.docx (must install picante and taxize packages and preferably standalone phylocom)
Class 3 Summarizing Data.docx Class 8 -Generalized linear models, logistic regression, mixed models, survival .docx Note: must install lme4 package Kembel Picante Walkthrough.pdf
 

Sample Data Sets *
RegressionDataset.csv  ThreeTreatmentDataset.csv  FactorialBlockDataset.csv heightdbh.csv
FERP07data.csv FERPcomfile.csv sometreeallometries.csv NestedAOVdata.csv
mixedmodeldata.csv repeatedmeasuresdata.csv logistregdata.csv AsstDates.csv
davies.bl.lc.new R20120829gg3eric.new FERPSoilsData.csv ggplotdata.csv
ferp_traits.csv ferp_taxa.txt ferp_taxa.new ferp_comm6ha.csv

*RIght-click and choose Save link as... to save file to disk
To open directly into R data frame, first right-click on name, and choose "copy link location".
Then in R console:
myData<-read.csv("paste in name from clipboard here, between quotes") #to open in data frame myData

A few favorite R books and resources
Crawley, MJ 2012. The R Book, 2nd edition. Wiley.This is a wonderful resource for all kinds of things R $125
The executable code in the book can be downloaded from http://www.bio.ic.ac.uk/research/mjcraw/therbook/index.htm
Muenchen, RA. 2009. R for SAS and SPSS Users. Springer.
This book is great for those comfortable in SAS or SPSS, and want to know how to get the same kind of analysis out of R
Zuur, AF, EN Ieno, EHWG Meesters.2009 A Beginner's Guide to R. Springer.I really like the structure of this for walking you through the basics of R
Crawley, MJ. 2005. Statistics. An introduction using R. Wiley. This is a good (though quirky in choice of topics) introductory text on stats, that uses R as the platform. Helfpul hints in how to read R output and get the most out of it.
Stevens, MHH. 2009. A Primer of Ecology with R. Springer. This has great examples of how to explore quantitative ecology in R. A super resource for upper division or grad teaching in ecology
CRAN (the Comprehensive R Archive Network) http://cran.us.r-project.org/ has a ton of great resources, including The R Jounral, FAQs, and Manuals
stackoverflow.com and other such sources are hugely useful. Someone else has tried to solve the same problem as you. I just google "R my current problem" and almost always get useful hits. For instance, "R import dates from Excel"
Kyle Harms (LSU) has pulled together some great tutorials and links to many useful R resources here http://www.kharms.biology.lsu.edu/BIOL7901Spring2013.html

 

 

Useful code
setwd("/Users/gilbertlab/Documents/RPlotData") Set working directory
data1<-read.table(file.choose(),sep=",",header=TRUE) Import comma-delimited csv file through the finder
data1<-read.table(file="mydata.csv",sep="\t",header=TRUE) Import tab-delimited txt file, with specific file name
write.table(a,"out_a.csv",sep=",",quote=FALSE,col.names=NA,row.names=TRUE) Export object a as comma-delimited file called "out_a.csv
png(filename="myplot.jpg"); plot(x,y); dev.off() Output a plot to a png file
str(myData) See structure of object "myData"
head(myData) See first 6 lines, and all columns of "myData"
myData[1:5,3:5] See first 5 lines, and columns 3 to 5 of "myData"
dim(myData) Gets number of rows and columns in "myData"
ls()

shows currently availble objects in workspace

getwd() shows current working directory
d[d$dates1==as.POSIXct(as.character("1996-08-25"),format="%Y-%m-%d"),] subset from a dataframe based on dates
library(car)
a<-read.csv("http://people.ucsc.edu/~ggilbert/Rclass_docs/ThreeTreatmentDataset.csv")
amodel<-aov(a$plant_mass~a$treatment+a$init_size)
Anova(amodel,type="III") #Type III like SAS
anova(amodel) #sequential R default
Get partial (typeIII) sums of squares and F-tests liks SAS default, using car package

getDropboxCSV<-function(){
cat("From Dropbox in your browser, locate the desired .csv file, \nclick Share, and then copy the URL \nfrom the Link to File dialog box.\n")
cat("Paste the URL for your .csv file here: ")
myname<-readLines(n=1)
mynamefix<-sub("www.dropbox","dl.dropboxusercontent",myname)
mynamefix<-sub("dl=0","",mynamefix)
mynamefix<-sub("\\?","",mynamefix)
bin<-getBinaryURL(mynamefix,ssl.verifypeer=FALSE)
justname<-unlist(strsplit(mynamefix,split="/"))[6]
con <- file(justname, open = "wb")
writeBin(bin, con)
close(con)
read.csv(justname)
}

bbb<-getDropboxCSV()

This is an awesome function that lets you open a .csv file directly from Dropbox!

PC Hints capturing some oddities of PC-R that differ from the mac-oriented handouts)

Defining the pathway to load a file:
m<-read.csv(file="c://marm/teaching/rclass/mydataset.csv") #note // after c:
or
m<-read.csv(file="/marm/teaching/rclass/mydataset.csv") #note leave out c:

To clear the workspace, there is no menu selection. Instead type:
rm(list=ls())
To submit code from the R editor window to the console, highlight the code, then press control-R

To scroll through the graphic window, on a mac just use the left and right arrows.
On a PC:
1. Create a graphic.
2. With the graphic window active, go to the "History" tab, next to "File."
3. Click on "Recording." Make sure it is checked.
4. Now each graph you make from this time forward in the current R session will be recorded. You can scroll through graphs using "PgUp" and "PgDn.

For better R editors for the PC, try NppToR and Notepad++
http://sourceforge.net/projects/npptor/
http://notepad-plus-plus.org/

 

 

Graphics resources
Symbol codes
To see all the first 25 available symbols, use this code
x<-rep(seq(1,5),5)
y<-sort(x,decreasing=TRUE)
pch<-seq(1,25)
plot(x,y,pch=pch,cex=2,xlim=c(1,5.4), axes=FALSE,xlab="R symbols",ylab="")
text(x+0.25,y,pch)

# note that for 21-25 you can control the fill (bg) and the border (col) color of the symbols separately. 
# e.g., points(x,y,pch=21,bg="yellow",col="blue")   makes yellow circles with a blue border

Line and arrow codes
x1=rep(1,6); x2<-rep(3,6); y<-seq(6,1); linecodes<-seq(1:6)
plot(0,0,xlim=c(0,10),ylim=c(0,6.2),pch=1,col=0,axes=FALSE,xlab="",ylab="")
for(i in 1:6){lines(c(x1[i],x2[i]),c(y[i],y[i]),lty=linecode[i])}
text(x1-.8,y,linecode,pos=4); text(1.5,0.1,"lines\nlty",pos=4)

for(i in 1:6){lines(c(x1[i]+3,x2[i]+3),c(y[i],y[i]),lty=linecode[i],lwd=linecode[i])}
text(x1-.8+3,y,linecode,pos=4); text(4.5,0.1,"lines\nlwd",pos=4)

for(i in 1:3){arrows(x1[i]+6, y[i],x2[i]+6, y[i],code=linecode[i])}
text(x1[1:3]-.8+6,y[1:3],linecode[1:3],pos=4); text(7,0.1,"arrow\ncode",pos=4)

colornumbers

Color by numbers

x<-rep(seq(1,3,1),3);y<-sort(x);z<-seq(0,8,1)
plot(y~x,pch=15,cex=5,col=z,xlim=c(0,4),ylim=c(0,4),axes=F,xlab="",ylab=""); box()
text(y~x,labels=c(0:8),col=c(1,8,rep(1,7)))

Color by names

colorlist<-read.csv("http://people.ucsc.edu/~ggilbert/Rclass_docs/colorlist.csv")
rect<-as.matrix(cbind(rep(1,580),rep(1,580)))
y<-rep(seq(1,58,1),10); x<-sort(rep(seq(1,10,1),58))
z<-as.character(colorlist$color);textcol<-colorlist$textcode
symbols(y~x,rectangles=rect,xlab="",ylab="",bg=z,xlim=c(0.5,10.5),ylim=c(0,59),inches=FALSE); box()
text(y~x,labels=z,col=textcol,cex=.5)

 

Click on image for large version

Here is a like to a good web site from HTML Color Codes to get, well, HTML color codes
http://html-color-codes.info/

You can then call the color you want as:

plot(y~x,pch=15, col="#8E2BC3")

or use rgb(red,green,blue,alpha) to control the RGB colors and intensity
mycol1<-rgb(142/255,43/255,195/255,.7) #the values here all go from 0 to 1 where max BCG is 255
plot(y~x,pch=19,col=mycol1)

 

 

SCATTERPLOTS


These are the data used for all the scatterplot examples

x<-c(1,2,3,4,5,6,7,8)
y1<-c(2,4,5,7,8,7,9,10)
y2<-c(1,3,2,4,6,5,7,7)

#Basic scatterplot, adding labels (xlab and ylab) and setting min and max of axes (xlim,ylim)

plot(x,y1,xlab="arrival order",ylab="hat size (cm)", ylim=c(0,10),xlim=c(0,8))

 

# Overlay a second series of y values, control symbol with pch and color with col,  and add a legend.

plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),pch=1,col="black")

points(x,y2,pch=19,col="blue")

legend(0,10,c("male","female"),pch=c(1,19),col=c("black","blue"))

#Connect the points with lines
#type"p"=points, "l"=lines, "b"= both,"c" lines alone of "b",
#"o" =overplotted,"h"=histogram-like,"s"=stair steps

plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),pch=1,col="black",type="b")

lines(x,y2,pch=19,col="blue",type="o")

#just lines, controlling line thickness with lwd and line type with lty
#add a legend in the upper left

plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),pch=1,col="black",type="l",lwd=2,lty=2)

lines(x,y2,pch=19,col="blue",type="l",lwd=1,lty=1)

legend("topleft",c("male","female"),lty=c(2,1),col=c("black","blue"),lwd=c(2,1))

#add smooth lowess curves to  each set of points in the scatterplot

plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),col="dark green",pch=1,lwd=2)
lines(lowess(x,y1),lwd=2,lty=3,col="dark green")
points(x,y2,pch=19,col="dark blue")
lines(lowess(x,y2),lwd=2,lty=2,col="dark blue")

legend("topleft",c("male","female"),lty=c(3,2),pch=c(1,19),col=c("dark green

#Use abline to add linear regression lines to each set of points in the scatterplot

plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),col="black",pch=1,lwd=1)
abline(lm(y1~x),lwd=1,lty=1,col="black")

points(x,y2,pch=19,col="blue")
abline(lm(y2~x),lwd=1,lty=2,col="blue")

legend("topleft",c("male","female"),lty=c(1,2),pch=c(1,19),col=c("black","blue"),lwd=c(1,1))

#get the relevant statistics for the regression line, then put on the graph as text
a<-summary(lm(y2~x))  #this puts summary stats of the linear regression of y2 on x into list a
R2<-signif(a$adj.r.squared,3)  #adjusted R squared
F<-signif(a$fstatistic[1],3) #F statistic
ndf<-signif(a$fstatistic[2],1)  #degrees of freedom numerator
ddf<- signif(a$fstatistic[3],1)  #degress of freedom denominator
P<-signif(a$coefficients[2,4],4)   #P value for significant slope

plot(x,y2,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),col="blue",pch=19,lwd=1)

abline(lm(y2~x),lwd=1,lty=1,col="blue")  #puts in the regression line

text(0,9,paste("F=",F,", df=",ndf,",",ddf,"\n","R^2=",R2,", P=",P,sep=""),pos=4)  #adds the statistics


Histograms
Create data to make the histograms

#create a vector of 100 random numbers from a normal distribution, with mean=32 and sd=4.1
> data<-rnorm(100,mean=32,sd=4.1)

#Use hist to make histograms setting bins by range and size
hist(x=data, breaks=seq(from=24,to=44, by=2), freq=TRUE, col="blue", xlab="Ca (ppm)", ylab="Number of samples", main=NA)

#Use hist to make histograms setting bins by number of bins
hist(x=data, breaks=15, freq=TRUE, col="orange", xlab="Ca (ppm)", ylab="Number of samples", main=NA)

#Use hist to make histograms with proportions rather than counts
hist(x=data, breaks=15, freq=TRUE, col="orange", xlab="Ca (ppm)", ylab="Number of samples", main=NA)

# freq = TRUE gives counts

# freq=FALSE gives proportion of observations

#Use Lattice histogram to Make Histograms setting bin size
#Short version:
library(lattice)
histogram(x=data, col="blue", type="count", xlab="Ca (ppm)", ylab="Number of samples", breaks=seq(from=20,to=45,by=2), aspect=.75, scales=list(cex=1.2,tck=c(1,-1)))

#Annotated version
histogram(
x=data, #data column from which to make the histogram
col="blue", #fill color for columns see colors() for list
type="count", #sets the vertical axis - count, percent, or density
xlab="Ca (ppm)", #label for x axis
ylab="Number of samples", #label for y axis
breaks=seq(from=0,to=5,by=.5), #min, max, and bin interval for bins
aspect=.75, #ratio of vertical to horizontal plot shape desired
scales=list(cex=1.2,tck=c(1,-1))) #sets tick marks and numbers on axes
#cex is the multiple of the default size for numbers font
#tck=c(left/bottom,right/top) tick size (negative for inside)

#Use Lattice histogram to Make Histograms setting number of bins
#Short version:
library(lattice); histogram(x=data, col="pink", type="count", xlab="Ca (ppm)", ylab="Number of samples", lab="Number of samples", endpoints=c(24,44), nint=25, aspect=.75, scales=list(cex=1.2,tck=c(1,-1)))library(lattice)

#Annotated version
histogram(
x=data, #data column from which to make the histogram
col="pink", #fill color for columns see colors() for list
type="count", #sets the vertical axis - count, percent, or density
xlab="Ca (ppm)", #label for x axis
ylab="Number of samples", #label for y axis
lab="Number of samples", #label for y axis
endpoints=c(24,44), #endpoints sets the low and high values to include
nint=25, #nint sets the number of bins
aspect=.75, #ratio of vertical to horizontal plot shape desired
scales=list(cex=1.2,tck=c(1,-1))) #sets tick marks and numbers on axes
#cex is the multiple of the default size for numbers font
#tck=c(left/bottom,right/top) tick size (negative for inside)

Quantile-Quantile normal plot

#Make a Quantile-Quantile Plot to examine normality of distribution

qqnorm(data)

Bar charts

Basic barplot of mean values for each treatment.

barplot(myData$mean, xlab="Treatments", ylab="Plant biomass (g)", names=myData$treat, col="grey", ylim=c(0,10)) 
#gives basic barplot of means

box()   #draw a box around the plot to make it pretty

Add error bars
bp<-barplot(myData$mean)  #Gets the x-midpoints of each bar
# they serve as x values for error bars, here they are 0.7, 1.9, 3.1, 4.3

barplot(myData$mean, xlab="Treatments", ylab="Plant biomass (g)", names=myData$treat, col="grey", ylim=c(0,10)) 
#gives basic barplot of means
box()   #draw a box around the plot to make it pretty

# add ± 1 s.d. lines to each mean
arrows(bp, myData$mean-myData$sd, bp, myData$mean+myData$sd, lwd=1.5, angle=90, length=0.1, code=3)

#code =3 is double headed, angle=90 is flat heads, length=0.1 is width of head

#arrows are drawn from x1,y1 to x2, y2 for each treatment

Turn it horizontal
bp<-barplot(myData$mean)

barplot(myData$mean, xlab="Plant biomass (g)", ylab=" Treatments ", names=myData$treat, col="grey", xlim=c(0,10), horiz=TRUE)
box()
arrows(myData$mean-myData$sd, bp, myData$mean+myData$sd, bp,lwd=1.5, angle=90, length=0.1, code=3)

Add designation of significant differences among bars

bp<-barplot(myData$mean)
barplot(myData$mean, xlab="Treatments", ylab="Plant biomass (g)", names=myData$treat, col="grey", ylim=c(0,11))
box()
arrows(bp, myData$mean-myData$sd, bp, myData$mean+myData$sd, lwd=1.5, angle=90, length=0.1, code=3)
text(x=(bp-.2),y=myData$mean+.4,labels=c("A","B","B","A"))
text(x=4.3,y=9, labels="F=5.7\ndf=3,15\nP≤0.03")

#use text to add whatever labels to the plot

# the \n indicates a carriage return
  Bar Plots with two factors

data1   If your data look like this, you must first reshape them
Sex Species count
Male       A     5
Male       B     7
Male       C     9
Male       D    10
Female    A     4
Female    B     6
Female    C    10
Female    D    11

Reshape the data at left to a table format:
re<-reshape(data1,idvar="Sex",timevar="Species",direction="wide")
names(re)<-c("Sex","A","B","C","D")
rem<-as.matrix(re[,2:5])   #the counts data need to be in a matrix form

re                                               rem
Sex      A B  C   D                     A  B   C   D
Male    5  7   9  10                     5   7    9  10
Female 4  6 10  11                     4   6  10  11

Stack the male and female bars within a species
barplot(rem,xlab="Species",ylab="Number of individuals", col=c("light blue","green"),beside=FALSE)
box()  #puts a pretty box around it

legend(0.2,20,re$Sex,bty="o",fill=c("light blue","green"))

#adds a legend at x=0.2, y=20, bty="n" would have no frame

Put the males and females size by side
barplot(rem,xlab="Species",ylab="Number of individuals",col=c("light blue","green"),beside=TRUE)
box()

legend(1,10,re$Sex,bty="o",fill=c("light blue","green"))

Pie charts (Seldom the best way to show data, but when you need one …)

Data frame "d" for these pie charts includes
group count
    a       40
    b       20
    c       30
    d       10

pie(d$count,labels=d$group)

Pretty up the labels:

pie(d$count,labels=c("QUERPA","LITHDE","PSEUME","TOXIDI"))

 

Change the colors and rotate the pie

pie(d$count,labels=c("QUERPA","LITHDE","PSEUME","TOXIDI"),col=c("blue","red","magenta","pink"),init.angle=90)

Make it cubist to reduce usefulness and readability

pie(d$count,labels=c("QUERPA","LITHDE","PSEUME","TOXIDI"),col=c("blue","red","magenta","pink"),edges=TRUE,init.angle=90)

Trellis graphs using the Lattice Package

library(lattice) # load the lattice package
Create data frame d using code at right

#make data
x1<-seq(1,10,.2); y1<-(x1^2)*runif(46,0,.5); x2<-x1;
d<-as.data.frame(cbind(x1,x2,y1));
d$sex<-rep(c("male","female"),23); d$age<-c(rep("juv",23),rep("old",23))

# simple xy scatterplot, putting the y variable on a log 10 scale (log="e") for natural log
# ticks are outward on Left/Bottom and inward for Right/Top
#pch=19 uses filled circles

xyplot(y1~x1, data=d, pch=19, scales=list(tck=c(1,-1), y=list(log=10)))

# Make a plot with different symbols for males and females (variable sex)

xyplot(y1~x1, data=d, group=sex,
par.settings=simpleTheme(cex=1.2,pch=c(19,1),col=c("blue","red")), #uses simpleTheme to set symbol attributes
auto.key=list(TRUE,x=.02,y=.9), #use the par.settings to create a legend in the upper left. x and y are 0-1 scale
scales=list(tck=c(1,-1))) #set where and direction of ticks

 

#Make split scatterplot putting each group (sex) into a separate grid panel

xyplot(y1~x1 |sex, data=d,
pch=19, scales=list(tck=c(1,-1),alternating=c(1,1)))

#tck c(1,-1) sends left/bottom ticks out and right/top ticks in

#alternating sets labels bottom/left for both panels (0=none, 1=B/L, 2=T/R, 3=Both)

#Make 2x2 grid split scatterplot with multiple grouping variables (sex and age)


xyplot(y1~x1 |sex+age, data=d,
pch=19, scales=list(tck=c(1,-1),alternating=c(1,1)))

#tck c(1,-1) sends left/bottom ticks out and right/top ticks in

#alternating sets labels bottom/left for both panels (0=none, 1=B/L, 2=T/R, 3=Both)

#plot two dependent variables in the same grid panel, split on sex

xyplot(y1+x2~x1 |sex, data=d,
pch=c(19,1), scales=list(tck=c(1,-1),alternating=c(1,1)),
par.settings=simpleTheme(cex=0.9,pch=c(19,1),col=c("blue","red")),
auto.key=list(TRUE,x=0.01,y=0.9),ylab="y1 or x2")

#tck c(1,-1) sends left/bottom ticks out and right/top ticks in
#alternating sets labels bottom/left for both panels (0=none, 1=B/L, 2=T/R, 3=Both)
#par.settings uses simpleTheme function to set symbol attributes
#auto.key places a key in upper left

#split xyplots into two grid panels based on one factor (sex)
#and within a panel, use different symbols for a second factor (age)

xyplot(y1~x1 |sex, groups=age, data=d,
pch=c(19,1), scales=list(tck=c(1,-1),alternating=c(1,1)),
par.settings=simpleTheme(cex=0.9,pch=c(19,1),col=c("blue","red")),
auto.key=list(TRUE,x=0.01,y=0.9))

#make a density plot of variable y1, including outward tick markson Left/Bottom but none on Right/Top

densityplot(~y1,data=d,scales=list(tck=c(1,0)))

 

# dotplot is used for plotting points of continuous variables, grouped by factors

dotplot(y1~sex, data=d, pch=19,
col=c("red","blue"), scales=list(tck=c(1,-1)))

 

#box and whiskers plot using bwplot
#box is 25,50, 75 quartiles and whiskers are min and max

bwplot(y1~sex, data=d,
col=c("red","blue"), scales=list(tck=c(1,-1)))

#box and whiskers plot using bwplot, split into grid panels based on age
#box is 25,50, 75 quartiles and whiskers are min and max

bwplot(y1~sex | age, data=d,
+ col=c("red","blue"), scales=list(tck=c(1,-1)))

x<-rep(seq(1,10,1),10); y<-sort(x); z<-rnorm(100,mean=10,sd=6) #create 3-D data

levelplot(z~y*x)

 

x<-rep(seq(1,10,1),10); y<-sort(x); z<-rnorm(100,mean=10,sd=6) #create 3-D data

contourplot(z~y*x)

x<-rep(seq(1,10,1),10); y<-sort(x); z<-rnorm(100,mean=10,sd=6) #create 3-D data

wireframe(z~y*x)