book_cover

“Dummies” books, despite their popularity are scorned by experienced programmers for various reasons. One of them, I guess is that such books lead a newbie in to understanding the subject as a motley collection of recipes for various tasks. Be that as it may, this book is a very well organized book catering to a newbie R programmer.

A few years ago, books such as these on R were just not available. R being written “by statisticians”, “for statisticians” had a steep learning curve for a beginner. Thanks to the massive increase in the number of packages, online forums , R has hit mainstream and I think a surest sign of this is when you see a dummies book on it.

This book covers the most common syntax that one uses in writing basic R code. Data types, functions, apply family functions, various packages that are useful in data munging operations(reshape, plyr), date operations(xts, zoo)visualization (lattice, ggplot2, grid graphics), are all covered in this book. A curious newbie would go through this book and come off armed with some elementary skills in writing functions and creating plots. Soon, enough he/she will realize that syntax of this language is just one part of the whole story. The real power of R is that it helps one understand statistics and modeling in a better way.  For example, behind a simple looking function called density in base R, the help manual gives the following description :

The algorithm used in density.default disperses the mass of the empirical distribution function over a regular grid of at least 512 points and then uses the fast Fourier transform to convolve this approximation with a discretized version of the kernel and then uses linear approximation to evaluate the density at the specified points.

The statistical properties of a kernel are determined by sig^2 (K) = int(t^2 K(t) dt) which is always = 1 for our kernels (and hence the bandwidth bw is the standard deviation of the kernel) and R(K) = int(K^2(t) dt). MSE-equivalent bandwidths (for different kernels) are proportional to sig(K) R(K) which is scale invariant and for our kernels equal to R(K). This value is returned when give.Rkern = TRUE.

Learning and using R effectively is obviously tied to how one is eager to understand statistics and modeling deeply. You can probably use the off the shelf density function from base R, but what’s the point if you don’t understand how the function is built and you can’t write a pseudo code for it !

Books such as these help a newbie to get over the initial learning curve so that he can code a few things and get his hands dirty. However the actual learning takes place after such books are read and when one slogs through and tries to understand what’s happening behind all these powerful functions. Since it is open source all the code is available for everyone to see. That’s the beauty of R.  In any case, this book is not the typical dummies book. It is one of the good books out there for an R newbie.