Data Smart : Summary

Data Science is a very loose word and can mean different things in different situations. However one thing is certain, the principles used in tacking problems are from diverse fields. Drew Conway has this Venn diagram on his blog : In such a diverse field one does not know where to start and how to start. Someone has made a nice Metromap too. All said and done, this is a field that has considerable entry barriers.

Efficient Simulation Smoother

This paper gives the details of a useful algorithm that speeds up the simulation of state vectors from a state space model. The algorithm runs very quick as compared to other methods. I ran the algorithm for a simple local level model inference via Gibbs sampling and found the speed to be considerably faster than other Forward Filter Backward Sampling algorithms. For a more generic Bayesian inference, this algorithm will no doubt cut the computation time significantly.

In Praise of Walking

A lovely article written by Shiv Visvanathan : My father loved to walk. It was his great ritual, his idea of prayer and work. Every morning at four, the house would echo with the thump of his shoes, the tumbler of coffee, as he hurried out. My dachshund, a wise ten-year-old would wait impatiently, grumbling melodramatically about any delay. Whoever talked of walking a dog never understood man or beast.

Bumping

Classification trees fail miserably in some cases and in such situations, bumping might be a good method. A stylized example of bumping is as follows : Imagine that there are two covariates x1 and x2 and the true class labels dependend on XORing the two covariates. The orange labels represent one class and blue labels represent another class. If you run any sort of plain vanilla classification algorithm that does greedy binary splits, the algo will fail.