|
|
The goal of this class is to help grad students, postdocs, or faculty who have a background in basic
statistics and a familiarity with some other statistics package (JMP, SYSTAT, SAS, SPSS) to
become comfortable with the R Project as a platform for statistical analyses. It is not meant as a
course in statistics, nor will it cover more than a small portion of what is available in R. At the
end of the course you should be comfortable managing data in R, making graphs, performing an
array of different basic statistical analyses, and be able to use the extensive resources available
on line and in books to learn to do just about anything statistical you'd like to in R.
You will get the most out of the course if you spend a couple hours a week playing with R.
*RIght-click and choose Save link as... to save file to disk
To open directly into R data frame, first right-click on name, and choose "copy link location".
Then in R console:
myData<-read.csv("paste in name from clipboard here, between quotes") #to open in data frame myData
Crawley, MJ 2012. The R Book, 2nd edition. Wiley.This is a wonderful resource for all kinds of things R $125
The executable code in the book can be downloaded from http://www.bio.ic.ac.uk/research/mjcraw/therbook/index.htm |
Muenchen, RA. 2009. R for SAS and SPSS Users. Springer.
This book is great for those comfortable in SAS or SPSS, and want to know how to get the same kind of analysis out of R |
Zuur, AF, EN Ieno, EHWG Meesters.2009 A Beginner's Guide to R. Springer.I really like the structure of this for walking you through the basics of R |
Crawley, MJ. 2005. Statistics. An introduction using R. Wiley. This is a good (though quirky in choice of topics) introductory text on stats, that uses R as the platform. Helfpul hints in how to read R output and get the most out of it. |
Stevens, MHH. 2009. A Primer of Ecology with R. Springer. This has great examples of how to explore quantitative ecology in R. A super resource for upper division or grad teaching in ecology |
CRAN (the Comprehensive R Archive Network) http://cran.us.r-project.org/ has a ton of great resources, including The R Jounral, FAQs, and Manuals |
stackoverflow.com and other such sources are hugely useful. Someone else has tried to solve the same problem as you. I just google "R my current problem" and almost always get useful hits. For instance, "R import dates from Excel" |
Kyle Harms (LSU) has pulled together some great tutorials and links to many useful R resources here http://www.kharms.biology.lsu.edu/BIOL7901Spring2013.html |
setwd("/Users/gilbertlab/Documents/RPlotData") |
Set working directory |
data1<-read.table(file.choose(),sep=",",header=TRUE) |
Import comma-delimited csv file through the finder |
data1<-read.table(file="mydata.csv",sep="\t",header=TRUE) |
Import tab-delimited txt file, with specific file name |
write.table(a,"out_a.csv",sep=",",quote=FALSE,col.names=NA,row.names=TRUE) |
Export object a as comma-delimited file called "out_a.csv |
png(filename="myplot.jpg");
plot(x,y);
dev.off() |
Output a plot to a png file |
str(myData) |
See structure of object "myData" |
head(myData) |
See first 6 lines, and all columns of "myData" |
myData[1:5,3:5] |
See first 5 lines, and columns 3 to 5 of "myData" |
dim(myData) |
Gets number of rows and columns in "myData" |
ls() |
shows currently availble objects in workspace |
getwd() |
shows current working directory |
d[d$dates1==as.POSIXct(as.character("1996-08-25"),format="%Y-%m-%d"),] |
subset from a dataframe based on dates |
library(car)
a<-read.csv("http://people.ucsc.edu/~ggilbert/Rclass_docs/ThreeTreatmentDataset.csv")
amodel<-aov(a$plant_mass~a$treatment+a$init_size)
Anova(amodel,type="III") #Type III like SAS
anova(amodel) #sequential R default |
Get partial (typeIII) sums of squares and F-tests liks SAS default, using car package |
getDropboxCSV<-function(){
cat("From Dropbox in your browser, locate the desired .csv file, \nclick Share, and then copy the URL \nfrom the Link to File dialog box.\n")
cat("Paste the URL for your .csv file here: ")
myname<-readLines(n=1)
mynamefix<-sub("www.dropbox","dl.dropboxusercontent",myname)
mynamefix<-sub("dl=0","",mynamefix)
mynamefix<-sub("\\?","",mynamefix)
bin<-getBinaryURL(mynamefix,ssl.verifypeer=FALSE)
justname<-unlist(strsplit(mynamefix,split="/"))[6]
con <- file(justname, open = "wb")
writeBin(bin, con)
close(con)
read.csv(justname)
}
bbb<-getDropboxCSV() |
This is an awesome function that lets you open a .csv file directly from Dropbox! |
capturing some oddities of PC-R that differ from the mac-oriented handouts)
Defining the pathway to load a file:
m<-read.csv(file="c://marm/teaching/rclass/mydataset.csv") #note // after c:
or
m<-read.csv(file="/marm/teaching/rclass/mydataset.csv") #note leave out c:
|
To clear the workspace, there is no menu selection. Instead type:
rm(list=ls()) |
To submit code from the R editor window to the console, highlight the code, then press control-R |
To scroll through the graphic window, on a mac just use the left and right arrows.
On a PC:
1. Create a graphic.
2. With the graphic window active, go to the "History" tab, next to "File."
3. Click on "Recording." Make sure it is checked.
4. Now each graph you make from this time forward in the current R session will be recorded. You can scroll through graphs using "PgUp" and "PgDn. |
For better R editors for the PC, try NppToR and Notepad++
http://sourceforge.net/projects/npptor/
http://notepad-plus-plus.org/
|
|
 |
Symbol codes
To see all the first 25 available symbols, use this code
x<-rep(seq(1,5),5)
y<-sort(x,decreasing=TRUE)
pch<-seq(1,25)
plot(x,y,pch=pch,cex=2,xlim=c(1,5.4), axes=FALSE,xlab="R symbols",ylab="")
text(x+0.25,y,pch)
#
note that for 21-25 you can control the fill (bg) and the border (col) color of the symbols separately.
#
e.g., points(x,y,pch=21,bg="yellow",col="blue") makes yellow circles with a blue border |
 |
Line and arrow codes
x1=rep(1,6); x2<-rep(3,6); y<-seq(6,1); linecodes<-seq(1:6)
plot(0,0,xlim=c(0,10),ylim=c(0,6.2),pch=1,col=0,axes=FALSE,xlab="",ylab="")
for(i in 1:6){lines(c(x1[i],x2[i]),c(y[i],y[i]),lty=linecode[i])}
text(x1-.8,y,linecode,pos=4); text(1.5,0.1,"lines\nlty",pos=4)
for(i in 1:6){lines(c(x1[i]+3,x2[i]+3),c(y[i],y[i]),lty=linecode[i],lwd=linecode[i])}
text(x1-.8+3,y,linecode,pos=4); text(4.5,0.1,"lines\nlwd",pos=4)
for(i in 1:3){arrows(x1[i]+6, y[i],x2[i]+6, y[i],code=linecode[i])}
text(x1[1:3]-.8+6,y[1:3],linecode[1:3],pos=4); text(7,0.1,"arrow\ncode",pos=4) |
 |
Color by numbers
x<-rep(seq(1,3,1),3);y<-sort(x);z<-seq(0,8,1)
plot(y~x,pch=15,cex=5,col=z,xlim=c(0,4),ylim=c(0,4),axes=F,xlab="",ylab=""); box()
text(y~x,labels=c(0:8),col=c(1,8,rep(1,7))) |
 |
Color by names
colorlist<-read.csv("http://people.ucsc.edu/~ggilbert/Rclass_docs/colorlist.csv")
rect<-as.matrix(cbind(rep(1,580),rep(1,580)))
y<-rep(seq(1,58,1),10); x<-sort(rep(seq(1,10,1),58))
z<-as.character(colorlist$color);textcol<-colorlist$textcode
symbols(y~x,rectangles=rect,xlab="",ylab="",bg=z,xlim=c(0.5,10.5),ylim=c(0,59),inches=FALSE); box()
text(y~x,labels=z,col=textcol,cex=.5)
Click on image for large version |
 |
Here is a like to a good web site from HTML Color Codes to get, well, HTML color codes
http://html-color-codes.info/
You can then call the color you want as:
plot(y~x,pch=15, col="#8E2BC3")
or use rgb(red,green,blue,alpha) to control the RGB colors and intensity
mycol1<-rgb(142/255,43/255,195/255,.7) #the values here all go from 0 to 1 where max BCG is 255
plot(y~x,pch=19,col=mycol1)
|

|
These are the data used for all the scatterplot examples
x<-c(1,2,3,4,5,6,7,8)
y1<-c(2,4,5,7,8,7,9,10)
y2<-c(1,3,2,4,6,5,7,7)
|
 |
#Basic scatterplot, adding labels (xlab and ylab) and setting min and max of axes (xlim,ylim)
plot(x,y1,xlab="arrival order",ylab="hat size (cm)", ylim=c(0,10),xlim=c(0,8)) |
 |
# Overlay a second series of y values, control symbol with pch and color with col, and add a legend.
plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),pch=1,col="black")
points(x,y2,pch=19,col="blue")
legend(0,10,c("male","female"),pch=c(1,19),col=c("black","blue")) |
 |
#Connect the points with lines
#type"p"=points, "l"=lines, "b"= both,"c" lines alone of "b",
#"o" =overplotted,"h"=histogram-like,"s"=stair steps
plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),pch=1,col="black",type="b")
lines(x,y2,pch=19,col="blue",type="o") |
 |
#just lines, controlling line thickness with lwd and line type with lty
#add a legend in the upper left
plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),pch=1,col="black",type="l",lwd=2,lty=2)
lines(x,y2,pch=19,col="blue",type="l",lwd=1,lty=1)
legend("topleft",c("male","female"),lty=c(2,1),col=c("black","blue"),lwd=c(2,1)) |
 |
#add smooth lowess curves to each set of points in the scatterplot
plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),col="dark green",pch=1,lwd=2)
lines(lowess(x,y1),lwd=2,lty=3,col="dark green")
points(x,y2,pch=19,col="dark blue")
lines(lowess(x,y2),lwd=2,lty=2,col="dark blue")
legend("topleft",c("male","female"),lty=c(3,2),pch=c(1,19),col=c("dark green |
 |
#Use abline to add linear regression lines to each set of points in the scatterplot
plot(x,y1,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),col="black",pch=1,lwd=1)
abline(lm(y1~x),lwd=1,lty=1,col="black")
points(x,y2,pch=19,col="blue")
abline(lm(y2~x),lwd=1,lty=2,col="blue")
legend("topleft",c("male","female"),lty=c(1,2),pch=c(1,19),col=c("black","blue"),lwd=c(1,1)) |
 |
#get the relevant statistics for the regression line, then put on the graph as text
a<-summary(lm(y2~x)) #this puts summary stats of the linear regression of y2 on x into list a
R2<-signif(a$adj.r.squared,3) #adjusted R squared
F<-signif(a$fstatistic[1],3) #F statistic
ndf<-signif(a$fstatistic[2],1) #degrees of freedom numerator
ddf<- signif(a$fstatistic[3],1) #degress of freedom denominator
P<-signif(a$coefficients[2,4],4) #P value for significant slope
plot(x,y2,xlab="arrival order",ylab="hat size (cm)",ylim=c(0,10),xlim=c(0,8),col="blue",pch=19,lwd=1)
abline(lm(y2~x),lwd=1,lty=1,col="blue") #puts in the regression line
text(0,9,paste("F=",F,", df=",ndf,",",ddf,"\n","R^2=",R2,", P=",P,sep=""),pos=4) #adds the statistics |
Create data to make the histograms |
#create a vector of 100 random numbers from a normal distribution, with mean=32 and sd=4.1
> data<-rnorm(100,mean=32,sd=4.1) |
 |
#Use hist to make histograms setting bins by range and size
hist(x=data, breaks=seq(from=24,to=44, by=2), freq=TRUE, col="blue", xlab="Ca (ppm)", ylab="Number of samples", main=NA) |
 |
#Use hist to make histograms setting bins by number of bins
hist(x=data, breaks=15, freq=TRUE, col="orange", xlab="Ca (ppm)", ylab="Number of samples", main=NA) |
 |
#Use hist to make histograms with proportions rather than counts
hist(x=data, breaks=15, freq=TRUE, col="orange", xlab="Ca (ppm)", ylab="Number of samples", main=NA)
# freq = TRUE gives counts
# freq=FALSE gives proportion of observations |
 |
#Use Lattice histogram to Make Histograms setting bin size
#Short version:
library(lattice)
histogram(x=data, col="blue", type="count", xlab="Ca (ppm)", ylab="Number of samples", breaks=seq(from=20,to=45,by=2), aspect=.75, scales=list(cex=1.2,tck=c(1,-1)))
#Annotated version
histogram(
x=data, #data column from which to make the histogram
col="blue", #fill color for columns see colors() for list
type="count", #sets the vertical axis - count, percent, or density
xlab="Ca (ppm)", #label for x axis
ylab="Number of samples", #label for y axis
breaks=seq(from=0,to=5,by=.5), #min, max, and bin interval for bins
aspect=.75, #ratio of vertical to horizontal plot shape desired
scales=list(cex=1.2,tck=c(1,-1))) #sets tick marks and numbers on axes
#cex is the multiple of the default size for numbers font
#tck=c(left/bottom,right/top) tick size (negative for inside) |
 |
#Use Lattice histogram to Make Histograms setting number of bins
#Short version:
library(lattice); histogram(x=data, col="pink", type="count", xlab="Ca (ppm)", ylab="Number of samples", lab="Number of samples", endpoints=c(24,44), nint=25, aspect=.75, scales=list(cex=1.2,tck=c(1,-1)))library(lattice)
#Annotated version
histogram(
x=data, #data column from which to make the histogram
col="pink", #fill color for columns see colors() for list
type="count", #sets the vertical axis - count, percent, or density
xlab="Ca (ppm)", #label for x axis
ylab="Number of samples", #label for y axis
lab="Number of samples", #label for y axis
endpoints=c(24,44), #endpoints sets the low and high values to include
nint=25, #nint sets the number of bins
aspect=.75, #ratio of vertical to horizontal plot shape desired
scales=list(cex=1.2,tck=c(1,-1))) #sets tick marks and numbers on axes
#cex is the multiple of the default size for numbers font
#tck=c(left/bottom,right/top) tick size (negative for inside) |
|
 |
#Make a Quantile-Quantile Plot to examine normality of distribution
qqnorm(data) |
 |
Basic barplot of mean values for each treatment.
barplot(myData$mean, xlab="Treatments", ylab="Plant biomass (g)", names=myData$treat, col="grey", ylim=c(0,10))
#gives basic barplot of means
box() #draw a box around the plot to make it pretty |
 |
Add error bars
bp<-barplot(myData$mean) #Gets the x-midpoints of each bar
# they serve as x values for error bars, here they are 0.7, 1.9, 3.1, 4.3
barplot(myData$mean, xlab="Treatments", ylab="Plant biomass (g)", names=myData$treat, col="grey", ylim=c(0,10))
#gives basic barplot of means
box() #draw a box around the plot to make it pretty
# add ± 1 s.d. lines to each mean
arrows(bp, myData$mean-myData$sd, bp, myData$mean+myData$sd, lwd=1.5, angle=90, length=0.1, code=3)
#code =3 is double headed, angle=90 is flat heads, length=0.1 is width of head
#arrows are drawn from x1,y1 to x2, y2 for each treatment |
 |
Turn it horizontal
bp<-barplot(myData$mean)
barplot(myData$mean, xlab="Plant biomass (g)", ylab=" Treatments ", names=myData$treat, col="grey", xlim=c(0,10), horiz=TRUE)
box()
arrows(myData$mean-myData$sd, bp, myData$mean+myData$sd, bp,lwd=1.5, angle=90, length=0.1, code=3) |
 |
Add designation of significant differences among bars
bp<-barplot(myData$mean)
barplot(myData$mean, xlab="Treatments", ylab="Plant biomass (g)", names=myData$treat, col="grey", ylim=c(0,11))
box()
arrows(bp, myData$mean-myData$sd, bp, myData$mean+myData$sd, lwd=1.5, angle=90, length=0.1, code=3)
text(x=(bp-.2),y=myData$mean+.4,labels=c("A","B","B","A"))
text(x=4.3,y=9, labels="F=5.7\ndf=3,15\nP≤0.03")
#use text to add whatever labels to the plot
# the \n indicates a carriage return |
|
data1 If your data look like this, you must first reshape them
Sex Species count
Male A 5
Male B 7
Male C 9
Male D 10
Female A 4
Female B 6
Female C 10
Female D 11 |
Reshape the data at left to a table format:
re<-reshape(data1,idvar="Sex",timevar="Species",direction="wide")
names(re)<-c("Sex","A","B","C","D")
rem<-as.matrix(re[,2:5]) #the counts data need to be in a matrix form
re rem
Sex A B C D A B C D
Male 5 7 9 10 5 7 9 10
Female 4 6 10 11 4 6 10 11 |
 |
Stack the male and female bars within a species
barplot(rem,xlab="Species",ylab="Number of individuals", col=c("light blue","green"),beside=FALSE)
box() #puts a pretty box around it
legend(0.2,20,re$Sex,bty="o",fill=c("light blue","green"))
#adds a legend at x=0.2, y=20, bty="n" would have no frame |
 |
Put the males and females size by side
barplot(rem,xlab="Species",ylab="Number of individuals",col=c("light blue","green"),beside=TRUE)
box()
legend(1,10,re$Sex,bty="o",fill=c("light blue","green")) |
(Seldom the best way to show data, but when you need one …)
 |
Data frame "d" for these pie charts includes
group
count
a 40
b 20
c 30
d 10
pie(d$count,labels=d$group) |
 |
Pretty up the labels:
pie(d$count,labels=c("QUERPA","LITHDE","PSEUME","TOXIDI")) |
 |
Change the colors and rotate the pie
pie(d$count,labels=c("QUERPA","LITHDE","PSEUME","TOXIDI"),col=c("blue","red","magenta","pink"),init.angle=90) |
 |
Make it cubist to reduce usefulness and readability
pie(d$count,labels=c("QUERPA","LITHDE","PSEUME","TOXIDI"),col=c("blue","red","magenta","pink"),edges=TRUE,init.angle=90) |
library(lattice) # load the lattice package
Create data frame d using code at right
|
#make data
x1<-seq(1,10,.2); y1<-(x1^2)*runif(46,0,.5); x2<-x1;
d<-as.data.frame(cbind(x1,x2,y1));
d$sex<-rep(c("male","female"),23); d$age<-c(rep("juv",23),rep("old",23)) |
 |
# simple xy scatterplot, putting the y variable on a log 10 scale (log="e") for natural log
# ticks are outward on Left/Bottom and inward for Right/Top
#pch=19 uses filled circles
xyplot(y1~x1, data=d, pch=19, scales=list(tck=c(1,-1), y=list(log=10))) |
 |
# Make a plot with different symbols for males and females (variable sex)
xyplot(y1~x1, data=d, group=sex,
par.settings=simpleTheme(cex=1.2,pch=c(19,1),col=c("blue","red")), #uses simpleTheme to set symbol attributes
auto.key=list(TRUE,x=.02,y=.9), #use the par.settings to create a legend in the upper left. x and y are 0-1 scale
scales=list(tck=c(1,-1))) #set where and direction of ticks
|
 |
#Make split scatterplot putting each group (sex) into a separate grid panel
xyplot(y1~x1 |sex, data=d,
pch=19, scales=list(tck=c(1,-1),alternating=c(1,1)))
#tck c(1,-1) sends left/bottom ticks out and right/top ticks in
#alternating sets labels bottom/left for both panels (0=none, 1=B/L, 2=T/R, 3=Both) |
 |
#Make 2x2 grid split scatterplot with multiple grouping variables (sex and age)
xyplot(y1~x1 |sex+age, data=d,
pch=19, scales=list(tck=c(1,-1),alternating=c(1,1)))
#tck c(1,-1) sends left/bottom ticks out and right/top ticks in
#alternating sets labels bottom/left for both panels (0=none, 1=B/L, 2=T/R, 3=Both) |
 |
#plot two dependent variables in the same grid panel, split on sex
xyplot(y1+x2~x1 |sex, data=d,
pch=c(19,1), scales=list(tck=c(1,-1),alternating=c(1,1)),
par.settings=simpleTheme(cex=0.9,pch=c(19,1),col=c("blue","red")),
auto.key=list(TRUE,x=0.01,y=0.9),ylab="y1 or x2")
#tck c(1,-1) sends left/bottom ticks out and right/top ticks in
#alternating sets labels bottom/left for both panels (0=none, 1=B/L, 2=T/R, 3=Both)
#par.settings uses simpleTheme function to set symbol attributes
#auto.key places a key in upper left |
 |
#split xyplots into two grid panels based on one factor (sex)
#and within a panel, use different symbols for a second factor (age)
xyplot(y1~x1 |sex, groups=age, data=d,
pch=c(19,1), scales=list(tck=c(1,-1),alternating=c(1,1)),
par.settings=simpleTheme(cex=0.9,pch=c(19,1),col=c("blue","red")),
auto.key=list(TRUE,x=0.01,y=0.9)) |
 |
#make a density plot of variable y1, including outward tick markson Left/Bottom but none on Right/Top
densityplot(~y1,data=d,scales=list(tck=c(1,0)))
|
 |
# dotplot is used for plotting points of continuous variables, grouped by factors
dotplot(y1~sex, data=d, pch=19,
col=c("red","blue"),
scales=list(tck=c(1,-1)))
|
 |
#box and whiskers plot using bwplot
#box is 25,50, 75 quartiles and whiskers are min and max
bwplot(y1~sex, data=d,
col=c("red","blue"), scales=list(tck=c(1,-1))) |
 |
#box and whiskers plot using bwplot, split into grid panels based on age
#box is 25,50, 75 quartiles and whiskers are min and max
bwplot(y1~sex | age, data=d,
+ col=c("red","blue"), scales=list(tck=c(1,-1))) |
 |
x<-rep(seq(1,10,1),10); y<-sort(x);
z<-rnorm(100,mean=10,sd=6) #create 3-D data
levelplot(z~y*x)
|
 |
x<-rep(seq(1,10,1),10); y<-sort(x);
z<-rnorm(100,mean=10,sd=6) #create 3-D data
contourplot(z~y*x) |
 |
x<-rep(seq(1,10,1),10); y<-sort(x);
z<-rnorm(100,mean=10,sd=6) #create 3-D data
wireframe(z~y*x) |
|
|