5.5 Multivariate analysis – PCA/RDA


We can perform many methods to visualize and analyze multivarate data. In the tutorial you can find scripts and a short description to 3 of the most commonly used ones:

  • Cluster analysis
  • Multi dimensional scaling (MDS)
  • PCR/RDA

In this post, we will look at Principal Component Analysis (PCR) and RDA.

For each method, there are some variations. You can (and should) test several analysis, transformations and settings and compare them.  Right now, this is a preliminary version. It uses some of your own dataset (one for benthos, one for algae; you find it on the student server under “data analysis”). It shoudl cover most of what you need for your projects.

I have also included some plot settings for customized plots of the analysis. Hopefully, this will be extended with a proper tutorial soon.

The analysis are rund with using the following libraries:

[code language=”r”]
# libraries ####
library("vegan") #package written for vegetation analysis
library("MASS") # ‘MASS’ to access function isoMDS()
library(stats) # e.g for hclust() function
[/code]

Before starting, remember to set your working directory, and import your files. For those analysis you need observations in rows, variables in columns and unique rownames. The example data in this script can be found on the student server adn the script for import as well.


[code language="r"]
#### PCA/RDA ##############################################################################################
library(vegan)

## to run a pca, you need som additional data preparation as in preparation for MDS ####
# to start with, you split you rdatasheet containing both environmental data and community data into 2 seperate datasheets: one for community data and one for environmental data

#then you need to remove columns and rows with only 0
df <- df[which(rowSums(df)>0),] #here no columns with only 0
#you need to remove corresponding rows both in the datasheet for community and environmental data

#next you need to check if all parameters in your community-sheet are numeric => you can run the analysis on non-numeric values
str(df)

## PCA ####

# We use the function rda() in vegan; if it is run on a single table (here the community data sheet) it calculates PCA
# have a look at different results with different transformations
# the usual Chi-square transformed data in PCA is similar but not identical to CA with chi-square distance
pca <- rda(df) #PCA on untransformed community data
df.chi <- decostand(df, "chi.square")
pca.chi <- rda(df.chi) #PCA on community with chi square transformed data

#plots
par(mfrow=c(1,2), mar=c(3, 3, 3, 3))#set up plotting area for 2 plots beside each other
plot(pca) # PCA biplot (in vegan biplots are performed by 'plot' function)
plot(pca.chi)

##RDA####
Mod1 <- rda(df,df.envir) # now we use both community data and environmental data as input
plot(Mod1, type="n", choices = c(1, 2))
text(Mod1, "species", col="blue", cex=0.7)
points(Mod1, pch=21, col="red", bg="yellow", cex=1)
text(Mod1, dis="cn", cex=0.8)
head(summary(Mod1), tail=2)

#with Hellinger transformed data
Mod2 <- rda(decostand(B_taxa, method = "hellinger"),B_envir.st)
plot(Mod2, type="n", choices = c(1, 2))
text(Mod2, "species", col="blue", cex=0.7)
points(Mod2, pch=21, col="red", bg="yellow", cex=1)
text(Mod2, dis="cn", cex=0.8)
head(summary(Mod2), tail=2)
[/code]

  Fant du det du lette etter? Did you find this helpful?
[Average: 0]