MDS - RK's Musings

Purpose
Exploratory Multivariate Analysis from Venables and Ripley

This is basic Factor analysis plot

> data(iris3)
> ir <- rbind(iris3[, , 1], iris3[, , 2], iris3[, , 3])
> ir.species <- factor(c(rep("s", 50), rep("c", 50), rep("v", 50)))
> ir.pca <- princomp(log(ir), cor = T)
> summary(ir.pca)
Importance of components:
                          Comp.1    Comp.2     Comp.3    Comp.4
Standard deviation     1.7124583 0.9523797 0.36470294 0.1656840
Proportion of Variance 0.7331284 0.2267568 0.03325206 0.0068628
Cumulative Proportion  0.7331284 0.9598851 0.99313720 1.0000000
> plot(ir.pca)
> loadings(ir.pca)
Loadings:
         Comp.1 Comp.2 Comp.3 Comp.4
Sepal L.  0.504 -0.455  0.709  0.191
Sepal W. -0.302 -0.889 -0.331
Petal L.  0.577        -0.219 -0.786
Petal W.  0.567        -0.583  0.580

               Comp.1 Comp.2 Comp.3 Comp.4
SS loadings      1.00   1.00   1.00   1.00
Proportion Var   0.25   0.25   0.25   0.25
Cumulative Var   0.25   0.50   0.75   1.00

MDS

> ir.scal <- cmdscale(dist(ir), k = 2, eig = T)
> ir.scal$points[, 2] <- -ir.scal$points[, 2]
> eqscplot(ir.scal$points, type = "n")
> text(ir.scal$points, labels = as.character(ir.species), cex = 0.8)
> distp <- dist(ir)
> dist2 <- dist(ir.scal$points)
> sum((distp - dist2)^2/sum(distp^2))
[1] 0.001746943

Let me simulate a dataset with 2 dimensions

> x <- cbind(runif(50, 10, 50), runif(50, 1, 2))
> y <- cbind(runif(50, 10, 20), runif(50, 5, 6))
> z <- rbind(x, y)
> lab <- c(rep("A", 50), rep("B", 50))
> plot(x, y)

> ir.scal <- cmdscale(dist(z), k = 2, eig = T)
> ir.scal$points[, 2] <- -ir.scal$points[, 2]
> eqscplot(ir.scal$points, type = "n")
> text(ir.scal$points, labels = as.character(lab), cex = 0.8)

I get a basic idea of Multidimensional Scaling… But the actual math behind it is not really clear!!! Got to read Kruskal and Wish Monograph to get a more detailed understanding of the math behind it..