2009 KDD Cup entry – Model Description

http://www.vcasmo.com/swf/vcasmo.swf Key Steps : Did not use R for data import operation - Used SPSS to read the data Feature Selection - Used R in this step Data Cleaning - Treatment of Categorical variables was a problem Software used : SAS + R Techniques used : Gradient Boosting machine(gbm package) Rationale : Handling of missing values Robustness against extreme values Handling categorical and continous variables Models interaction between predictors Can model nonlinear dependencies Fitting Time : Couple of hours on a desktop

Quote for the day

“ When that time comes, I try to be alone and silent for several hours; I need a lot of time to rid my mind of the noise outside and to cleanse my memory of life’s confusion. I light candles to summon the muses and guardian spirits. I place flowers on my desk to intimidate tedium and the complete works of Pablo Neruda beneath the computer with the hope they will inspire me by osmosis.

R Cookbook : Summary

Books on R are tricky to read especially when the sheer amount of things that R can do is mind-boggling. So, there are books that range from very specialized to very generic and there is no choice but to refer this gigantic range of collection based on one’s needs. The flip side to this vast amount of stuff is, “it is likely that a first timer would fail to see the forest for the trees”.

Fifty Days of Solitude : Summary

It has started raining in Mumbai and the pleasant climate after three months of scorching heat, enlivens the spirit. Will attempt to write a few words about this book. Since I have been staying alone for the past few years in Mumbai, I have gone back to “Sitar” which I could not practice in NY for a couple of reasons: Firstly, I missed the space needed for practicing any instrument. Staying with two other guys in a flat was not particularly conducive to playing an instrument without distractions.