Boxplot Characteristics
Purpose
I thought I knew everything about boxplot and was even trying to skip the first chapter on boxplots. How naive of me ? I had recently heard stanford prof speaking about mindsets.
If there are 8 data points let’s say 1,2,3,…8 What is the median ?
> x <- 1:10 > print((x[5] + x[6])/2) [1] 5.5 > print(median(x)) [1] 5.5 |
Whats the first quartile and third quartile?
> boxplot(x)
> y <- (boxplot(x))
> print(y)
$stats
[,1]
[1,] 1.0
[2,] 3.0
[3,] 5.5
[4,] 8.0
[5,] 10.0
attr(,"class")
1
"integer"
$n
[1] 10
$conf
[,1]
[1,] 3.001801
[2,] 7.998199
$out
numeric(0)
$group
numeric(0)
$names
[1] "1" |

Well, at 33 years of age, I have learnt a lesson that , knowledge about anything is not fixed. It is growing
I was thinking that first quartile is at 3 and third quartile is at 8 But R results are little different. conf attribute shows that it is Why ? I don’t know the answer as yet..
> boxplot.default
function (x, ..., range = 1.5, width = NULL, varwidth = FALSE,
notch = FALSE, outline = TRUE, names, plot = TRUE, border = par("fg"),
col = NULL, log = "", pars = list(boxwex = 0.8, staplewex = 0.5,
outwex = 0.5), horizontal = FALSE, add = FALSE, at = NULL)
{
args <- list(x, ...)
namedargs <- if (!is.null(attributes(args)$names))
attributes(args)$names != ""
else rep(FALSE, length.out = length(args))
groups <- if (is.list(x))
x
else args[!namedargs]
if (0 == (n <- length(groups)))
stop("invalid first argument")
if (length(class(groups)))
groups <- unclass(groups)
if (!missing(names))
attr(groups, "names") <- names
else {
if (is.null(attr(groups, "names")))
attr(groups, "names") <- 1:n
names <- attr(groups, "names")
}
cls <- sapply(groups, function(x) class(x)[1])
cl <- if (all(cls == cls[1]))
cls[1]
else NULL
for (i in 1:n) groups[i] <- list(boxplot.stats(unclass(groups[[i]]),
range))
stats <- matrix(0, nrow = 5, ncol = n)
conf <- matrix(0, nrow = 2, ncol = n)
ng <- out <- group <- numeric(0)
ct <- 1
for (i in groups) {
stats[, ct] <- i$stats
conf[, ct] <- i$conf
ng <- c(ng, i$n)
if ((lo <- length(i$out))) {
out <- c(out, i$out)
group <- c(group, rep.int(ct, lo))
}
ct <- ct + 1
}
if (length(cl) && cl != "numeric")
oldClass(stats) <- cl
z <- list(stats = stats, n = ng, conf = conf, out = out,
group = group, names = names)
if (plot) {
if (is.null(pars$boxfill) && is.null(args$boxfill))
pars$boxfill <- col
do.call("bxp", c(list(z, notch = notch, width = width,
varwidth = varwidth, log = log, border = border,
pars = pars, outline = outline, horizontal = horizontal,
add = add, at = at), args[namedargs]))
invisible(z)
}
else z
}
<environment: namespace:graphics> |
Ok, the Five number summary is as follows median, lower quartile, upper quartile, extremes
> median(x)
[1] 5.5
> y$conf
[,1]
[1,] 3.001801
[2,] 7.998199
> y$conf + c(-1.5, 1.5) * diff(y$conf)
[,1]
[1,] -4.492797
[2,] 15.492797 |
Ok, to end with here are the basic properties of a boxplot
- Median and Mean bars are measures of location
- Relative location of the median and the mean in the box is a measure of skewness
- Length of the box and whiskers are a measure of spread
- Length of the whiskers indicate the tail length of the distribution
- Outlying points are indicated with * / o
- The boxplots do not indicate multi modality or clusters
- If we compare the relative size and location of the boxes, we are comparing distributions
So, Obviously Histograms are better for understanding multimodal distributions