# Worksheet Correlation and Regression

Solve the Statistical question pro

Question 2
Janssenet al. (2007) studied the relationships between a variety of abiotic factors and benthic invertebrate abundance at sites on beaches along the Dutch coast. One of these abiotic factors was the relative height of the site in relationship to the average sea level of the area (NAP). Positive values of NAP indicate sites that are higher than the average sea level, whereas negative values indicate sites that are below the average sea level.
The data are in the file sle251dutch.csv and the relevant variables are the response variable, richness (richness of invertebrate species), and the predictor variable, NAP (relative height of the site in relationship to the average sea level of the area).
Format of sle251dutch.csv data file

Site    NAP    richness
1    0.045    11
2    -1.036    10
3    -1.336    13
4    0.616    11
5    -0.684    10
..    ..    ..
Site    The number of the site where the samples were collected
NAP    Relative height of the site in relationship to the average sea level of the area
Predictor variable
richness    Richness of invertebrate species
Response variable

a)    Janssenet al. (1996) were interested in modeling the linear relationship between invertebrate richness (response) and the relative height of the site in relationship to the average sea level (predictor). List the following:

The biological inference of interest

The biological null hypothesis derived from above

The statistical null hypothesis (H0) derived from above

b)    Draw a scatterplot of NAP against richness.  Draw boxplots for each variable as well. Any evidence of skewness in the distributions or nonlinearity?

To create scatterplot in R
Graphs
Scatterplot
Select x-variable (NAP) and y-variable (richness)
Check Marginal boxplots and Least-squares line
Unselect Smooth line and show spread
OK

c)    Fit the regression model richness = intercept + slope x NAP.

To fit linear regression and create an ANOVA table in R
Statistics
Fit models
Linear regression…
You can enter a name for the results object (Enter name for model:) but its simplest to just use the name that R provides.
Select richnessfrom Response variablelist
Select NAPfrom Explanatory variables list.
OK
Models
Hypothesis tests
ANOVA table
Select Partial, ignoring marginality (“Type III”).
OK

Examine the regression output and identify and interpret the following:

Sample y-intercept
Value (estimate in the R output):
Interpretation:

Slope of regression line (NAP)
Value(estimate in the R output):
Interpretation:

t statistic for main H0 (regression slope equals zero)
Value:
Interpretation:

P-value for main H0 (regression slope equals zero)
Value:
Interpretation:

r2 value (multiple R-squared)
Value:
Interpretation:

d)    Complete the following ANOVA table from the regression analysis

Source of variation    SS    df    MS    F ratio
Regression

Residual

Total
44
Note: To get the MS values from the output – remember to divide the SS value by the df.

e)    What conclusions would you draw from the regression analysis (statistical and biological)?

f)    What invertebrate richness would you predict for a new site with an NAP of -2? Simply plug -2 into your regression equation and calculate predicted richness.