Thursday, July 18, 2014

We want to model sepal ratios, do we need separate models for each species?

Iris dataset

  • Included in R package datasets
  • Originally by Fisher or Anderson
  • Measurements for petal and sepal width and length
  • Fifty flowers of three species
    • Setosa
    • Versicolor
    • Virginica
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

LOESS Model

  • Locally Weighted Scatterplot Smoothing
  • General model and species-specific models
    • Standard Error bands
    • Grey band for general model
    • Coloured bands for respective species
  • Adjustable 'wobblyness' of models
  • Adjustable point size and opacity

Results

An interactive version of this plot is available here:

https://bquast.shinyapps.io/Iris-Presentation