Purpose
It is mainly to use plyr package and check its utility in my daily work



Using Diamonds dataset 1.Make a new data set that has the average depth and price of the diamonds in the data set

> library(ggplot2)
> data(diamonds)
> new <- summarise(diamonds, mean(price), mean(depth))

2.Add a new column to the data set that records each diamond’s price per carat

> data(diamonds)
> diamonds$price.per.carat <- diamonds$price/diamonds$carat
> new <- transform(diamonds, carat_price = price/carat)

3.Make a data set that only includes diamonds with an Ideal cut.

> new <- diamonds[diamonds$cut == "Ideal", ]
> new <- subset(diamonds, cut == "Ideal")

4.Create a new data set that groups diamonds by their cut and displays the average price of each group.

> test <- function(dframe) {
+     mean(dframe$price)
+ }
> new <- ddply(diamonds, c("cut"), test)
> new <- ddply(diamonds, "cut", summarise, avg.price = mean(price))

5.Create a new data set that groups diamonds by color and displays the average depth and average table for each group

> new <- ddply(diamonds, "color", summarise, avg.depth = mean(depth),
+     avg.table = mean(table))

6.Add two columns to the diamonds data set. The first column should display the average depth of diamonds in the diamond’s color group. The second column should display the average table of diamonds in the diamonds color group.

> new <- ddply(diamonds, "color", transform, avg.depth = mean(depth),
+     avg.table = mean(table))

7.Make a data set that contains all of the unique combinations of cut, color, and clarity, as well the average price of diamonds in each group.

> new <- ddply(diamonds, .(color, cut, clarity), transform, avg.price = mean(price))
> new <- ddply(diamonds, c("color", "cut", "clarity"), summarize,
+     avg.price = mean(price))

8.Add a column to the diamonds data set that displays the average price for all diamonds with a diamond’s cut, color, and clarity.

> new <- ddply(diamonds, c("color", "cut", "clarity"), transform,
+     avg.price = mean(price))

The most important difference between summarize and transform is that summarize does not add new columns to the dataframe transform adds a new column to the dataframe

9.Do diamonds with the best cut fetch the best price for a given amount of carats?

> new <- ddply(diamonds, c("cut"), summarise, avg.price = mean(price/carat))

10.Which color diamonds seem to be largest on average (in terms of carats)?

> new <- ddply(diamonds, c("color"), summarise, avg.price = mean(carat))

11.What color of diamonds occurs the most frequently among diamonds with ideal cuts?

> new <- ddply(subset(diamonds, cut = "Ideal"), "color", summarize,
+     length(color))

12.Which clarity of diamonds has the largest average table per carats?

> new <- ddply(diamonds, "clarity", summarize, mean(table/carat))

13.Which diamond has the largest price per carat compared other diamonds with its cut, color, and clarity?

> new <- ddply(diamonds, c("cut", "color", "clarity"), summarize,
+     val = mean(price/carat))
> new[order(new$val, decreasing = T), ][1, ]
          cut color clarity      val
116 Very Good     D      IF 11346.51

14.What is the average price per carat of diamonds that cost more than 10000?

> new <- summarize(subset(diamonds, price > 10000), mean(price/carat))

15.Display the largest diamond depth observed for each clarity group

> new <- ddply(diamonds, c("clarity"), summarize, val = max(depth))


dlply is used to apply in lets say - if you run a model on a subset of data then you can store the result of the dataset in to a list

ldply can be used to work on list components and then get it back to dataframe.



Players and Babynames dataset , I will try it whenever I have time

Basic takeaways - Split + Apply + Combine - Transform , Summarize and Subset are useful functions - colwise is useful too - ddply is extremely useful - dlply is used to store models on specific subsets - ldply is used to store summarizing on the model output