2009 KDD Cup entry – Model Description
http://www.vcasmo.com/swf/vcasmo.swf
Key Steps :
Did not use R for data import operation - Used SPSS to read the data Feature Selection - Used R in this step Data Cleaning - Treatment of Categorical variables was a problem Software used : SAS + R
Techniques used : Gradient Boosting machine(gbm package)
Rationale :
Handling of missing values Robustness against extreme values Handling categorical and continous variables Models interaction between predictors Can model nonlinear dependencies Fitting Time : Couple of hours on a desktop