Surplus educated

“A little educated man has done this for the society , What will you surplus educated people do for the society?” , asks an Indian Innovator.

Introduction to WinBUGS for Ecologists : Review

image

WinBUGS ( Windows Version of Bayesian analysis Using Gibbs Sampling) is a software that was developed less than a decade ago, in 2003, by Spiegelhalter. Today it is widely used by statisticians, researchers in various fields.  Its popularity is also directly related to increasing acceptance of Bayesian statistics. What’s the purpose of WinBUGS ?

WinBUGS lets one specify almost arbitrarily complex statistical models using a fairly simple model definition language that describes the stochastic and deterministic “local" relationships among all observable and unobservable quantities in a fully specified statistical model.

Quote for the day

Analyzing data is a little like fixing a motorbike but in reverse: it consists of breaking a data set into its parts e.g., covariate effects and variances), whereas fixing a bike means putting all the parts of a bike into the right place. One way to convince yourself that you really understand how a bike works is to first dismantle and then  reassemble it again to a functioning vehicle. Similarly, for data analysis, by first assembling a data set and then breaking it apart into recognizable parts by analyzing it, you can prove to yourself that you really understand the analysis. 

The Joy of X : Review

image

This book is indeed a joy to read. There were many “aha” moments, some of which are :

  • Google’s Page rank explained using a simple Markov chain example. Demonstrates the power of linear algebra.

  • Thinking about conditional probability in terms of frequencies is more intuitive and less confusing than the usual Bayes formula.

  • Power Law is the new Normal Distribution of the world.They are everywhere.

  • Log scale verbalized brilliantly :Markings on the axis differ by the same factor than same absolute number.

So Good They Can’t Ignore You : Review

image

Cal Newport wants to find out an answer to a nagging question in his mind, “Why do some people end up loving what they do, while so many others fail at this goal ?”. Researching on this question leads him on to a path where he finds rather unconventional answers. Through this book , he shares his findings.

Most of us have equate “passion” as intense love affair with one’s work. There is a also a belief that “passion” is a necessary condition in finding THE RIGHT work. In the first part of the book, the author debunks Passion Hypothesis. What is Passion Hypothesis ? The key to occupational happiness is to first figure out what you’re passionate about and then find a job that matches this passion.

Nutrition

image

The month of September vanished from my life as my erratic eating habits and my work schedule took a toll on my health. Having recovered now, one of the biggest changes I have made to my life is, my diet. Thanks to a colleague of mine, Gautam, who suggested this book, I have started changing a few things in my daily schedule.

Usually I don’t even cast a glance on books with such titles.But this one, I took some time out to go over it. To my surprise, the content was refreshing. The author dispels a lot of myths about nutrition, diet and weight loss. Here are a few points that I will keep in mind:

Principles of Data Mining : Review

image

One of the reasons for going over this book is, to shuttle between the macro and micro world of modeling. One can immerse in specific type of techniques/algos in stats, forever. But I can’t. I typically tend to take a break and go over macro aspects of modeling from time to time. Books like these give an intuitive sense of “What are the types of models that one builds? I like such books as they make me aware of inductive uncertainty associated with building models. Let me summarize the main points in the book.

The Little Book of Talent : Review

book_cover

I liked Daniel Coyle’s “Talent Code” that talks about the importance of “deep practice” in achieving mastery in any field.  Not for the message of deep practice as it was already repeated in many books/articles, but for the varied examples in the book.

Here comes another book on the same lines by the same author. This book is a collection of thoughts and ideas from author’s field work, packaged as “TIPS” to improve one’s skillset.  These tips are categorized in to three categories, “Getting Started”, “Improving Skills”, and “Sustaining Progress”.

In All Likelihood : Review

book_cover

Likelihood function is a very useful mathematical object in statistics. With it, you can perform the two main tasks in statistics,i.e. estimation and inference. If you can get the distribution right or the overall structural equation right,  you can do all types of stats; univariate stats, multivariate stats , linear models, generalized linear model, mixture modeling, mixed effects model and even non parametric statistics to an extent. All of this can be done from scratch with  one math object, “Likelihood function” +  pen & paper + a plain vanilla optimization routine.  

The Zen of Steve Jobs

image

This graphic novel talks about Steve Jobs and Zen Buddhist priest Kobun, who acted as Jobs’ spiritual guru. Hard core Apple fans might like to know the kind of conversations that Jobs had with Kobun . However I felt the book was pointless. I think it is merely trying to cash in on two aspects, 1) Increasing popularity of graphic novels among adults and 2) Steve Jobs death in Oct 2011.

Bayesian and Likelihood Principle

Stumbled on to an interesting paper that connects Bayesian ideas to Likelihood based inference. Both are  related in the sense that Likelihood based Inference can be thought of a Bayesian Inference with uniform/vague prior. However when you get down to estimating and inferring from the data using these two philosophies, the math, the equations you use, the code you need to write are completely different.

This paper by Steel talks about whether a hard core Bayesian must accept Likelihood principle or not. It talks about two versions of Likelihood Principle(LP) that can be easily connected to the posterior-prior Bayes framework. The first version(LP1) is where you see different sets of data but they don’t change the likelihood function and the second version(LP2) is where you evaluate a competing hypotheses with the same dataset.

Document before Coding

[slideshare id=7249526&w=427&h=356&fb=0&mw=0&mh=0&style=border-bottom: #ccc 0px solid; border-left: #ccc 1px solid; margin-bottom: 5px; border-top: #ccc 1px solid; border-right: #ccc 1px solid&sc=no]

Ex Libris : Summary

0C__Cauldron_Books_Reviews_ex_libris_for_blog_book_cover

The author Anne Fadiman considers herself a common reader. Who is a common reader ? In her words,

The common reader differs from the critic and the scholar. She is worse educated, and nature has not gifted her so generously. She reads for her own pleasures rather than to impart knowledge or correct the opinion of others. Above all, she is guided by an instinct to create for himself, out of whatever odds and ends she can come by , some kind of whole.

Your Brain at Work : Summary

0C__Cauldron_Books_Reviews_your_brain_at_work_for_blog_book_cover

Was recovering from a brief illness. Tried reading this book just to recover from my drowsy and sullen mood.

I found the first part of this book interesting. Given the amount of information overload it often helps us to understand how our brain functions. “How do we use our brains for understanding, deciding, recalling, memorizing and inhibiting information ?” is an important question that we all need to answer, to function efficiently in our lives. I loved the initial part of the book because it uses the metaphor of stage and audience to explain the scientific aspects of our brain, more specifically the prefrontal cortex. Also the book is organized in such a way that it is presented as a play with various scenes(like a theater play). Each scene has two takes, first take is one where the actor has no clue on how the brain works and messes it up, and the second take is where the actor performs in full cognizance of the workings of the brain.

An Introduction to Generalized Linear Models

image[image

This book is written by Annette J. Dobson, a Biostatistics Professor at University of Queensland(Brisbane). I had come across this book way back in May 2010 and had worked through out the book. Here is what I wrote about it back then. While trying to understand local likelihood modeling, I realized that I had forgotten some basic principles relating to diagnostics and model evaluation for GLM. Sometimes I wonder what makes things stick. May be there is no magic bullet at all. One has to keep revisiting concepts to understand and remember them.

Knight Capital – Will it become extinct

Via LA Times :

The high-speed trading arms race being waged on Wall Street has finally claimed its first major casualty.

Knight Capital Group, a brokerage that handles nearly 11% of all stock trading in U.S. companies, is in danger of collapsing after a software glitch triggered millions of unintended orders. The New Jersey firm lost $440 million in less than an hour — nearly four times the company’s profit last year.

Teaching Kids to Read

At Harlem Village Academies (a network of public charter schools in US),

Every student (fifth graders) reads fifty books a year!

Isn’t it amazing ?

Quote for the day

Writing a novel is like driving a car at night. You can see only as far as your headlights, but you can make the whole trip that way.You don’t have to see where you’re going, you don’t have to see your destination or everything you will pass along the way. You just have to see two or three feet ahead of you. This is right up there with the best advice on writing, or life, I have ever heard.

The Devotion of Suspect X

image

The book begins with a single mother murdering her ex-husband. To create the perfect alibi and lead every police investigation in to a dead-end, her neighbor, a quiet high school math teacher, offers to help her. The math teacher almost succeeds in doing so, and nobody figures out the real story behind the murder except, a physics professor, whom the detectives seek help from.

The most engrossing thing about this book is that you are given the murder, i.e. the complete picture at the very beginning of the book and you encounter various characters in the novel who try fit together the pieces of the jigsaw puzzle So you always kind of, follow the intuitions, deductions, inferences of various characters about the murder and ultimately you can’t help feeling lost too. You start wondering, whether it is possible to implicate the murderer at all? But then the physics professor cracks it and puts it all together. The final picture that emerges out of the jigsaw puzzle is something that is totally unexpected.

Spectral Analysis of Time-Series Data : Summary

image

I stumbled on to this book way back in September 2010 and had been intending to work on frequency domain aspect of time series since then. I am embarrassed to admit that almost 2 years have passed since then and it was lying in my inventory crying to be picked up. Why did I put off reading this book for so long a time? May be, I am not managing my time properly. In any case, I finally decided to go over this book. In the recent few weeks I have spent some time understanding Fourier Transforms (Continuous and Discrete). I was thrilled to see so many connections between Fourier transform and Statistics. It was like getting a set of new eyes to view many aspects of Statistics from a Fourier Transform angle. Central Limit theorem appears “wonderful” from a convolution standpoint. Density estimation becomes so much more beautiful once you understand the way FFT is used to compute the kernel density, etc. The simple looking density function in any of the statistical software packages has one of the most powerful algorithms behind it, the “Fast Fourier Transformation”. I can go on and on listing many such aha moments. But let me stop and focus on summarizing this book. With Fourier math fresh in my mind, the book turned out to be easy sailing. In this post, I will attempt to summarize the main chapters of the book.

Future Babble : Summary

0C__Cauldron_Books_Reviews_future_babble_for_blog_book_cover

Introduction

The author begins the book with a slew of examples involving predictions that never materialized. These examples span a wide range of fields like economics, social sciences, finance, politics, etc. The author also sneaks in examples of his parents and grandparents lives to show how their lives panned out in ways that were completely unpredictable. Well, Do these examples prove anything ? You can quote volumes of predictions going wrong , but if you ask the people who predicted them, they always seem to have a defense. What are the common answers that experts give, when asked about the failed predictions

Dark Pools : Summary

0C__Cauldron_Books_Reviews_dark_pools_for_blog_book_cover

As early as 1997, the financial markets comprised blue chip stocks traded by specialists at NYSE , other stocks traded at NASDAQ by specialists and a small scale electronic system. Fast forward to 2012, the US market comprises 40 trading destinations. There are four public exchanges - NYSE, NASDAQ, Direct Edge and BATS. Inside each of these exchanges there are various destinations. NYSE has NYSE Arca, NYSE Amex, NYSE Euro next and NYSE Alternext, NASDAQ has three markets, BATS and Direct Edge have two market destinations with in themselves. There are toxic Dark pools.There are Internalizers - Citadels , Knight tradings of the world that execute trades with in their trading pools. The system, as you can see, has become extremely complex. Dark pools and internalizers accounted for 40 % of all trading volume in 2012. The pace of developments have been unbelievable. How did this all come about ? Why has the average holding time of a stock gone down from 8 month in 2000 to 2 months in 2008 and finally drop to 22 seconds in 2011 ? This book tries to answer some of the questions. This book is not so much about dark pools as it is about tracing the personalities behind some important firms in the modern high speed trading world.

Linear Models with R : Summary

image[image

The book is written by Julian Faraway , a Statistics Professor at University of Bath.  The book seems to be culmination of lecture notes that the professor might have used over the years for teaching linear models. Whatever be the case, the book is wonderfully organized.  If you already know the math behind linear models, the book does exactly as it promises in the title,i.e, it equips the reader to do linear modeling in R.

An Introduction to the Bootstrap : Summary

imageimageimage

Bradley Efron

 

Robert J. Tibshirani

When I first encountered “Bootstrap” algorithm and looked at its application , I was literally blown away. Here was a methodology that took the traditional statistics head-on. Armed with a computer and a basic algorithm for bootstrap, you can pretty much do any sort of analysis that you can find in traditional statistical world. Guess what, you hardly need to remember any complicated formulae. Somehow people who are used to traditional way of doing statistics do not like bootstrapping for various reasons. This is from my personal experience. I had developed a model that involved bootstrapping and it was a decent way to handle uncertainty. However in the recent times, the model went through a review and the reviewers were somehow averse to using bootstrapping. Well, I had to go with their decision, because sometimes it is better to get things done than to be in a perpetual state of debate. In any case, it was unlikely that I would have convinced them about the superiority of a bootstrapped approach. Having said that, my romance with bootstrapping is steady J. I love the way things can be done using basic bootstrapping algorithm. In fact these days whenever I end up solving something in a parametric way, I always double check whether bootstrap is giving similar results. If the results don’t match, I tend to cast a suspicious eye on the parametric method and trust the bootstrap method. There are very few books written on bootstrap, may be a handful even though the basic idea was introduced 30 years ago.

What is a p-value anyway : Summary

image

Statistics stands on two pillars, estimation and inference. Pretty much anything you work on stats, you end up either estimating something or inferring something. If you take a random sample of people who have taken a stats101 course at some point in their lives and ask them what was the course all about , a most likely answer would be, “It was something to do with p-values”. Statistics at it core is about comparing a set of numbers with each other , with theoretical models and with past experience. But most of the introductory textbooks contain scary formulae and distribution tables that need to be used by students. I think if you are a teacher introducing statistics to a new batch of students, it will do a world of good, if you dramatize a specific act : Walk in to the class and tear all the pages in the appendix that have these tables and arcane formulae that only scare people out of developing a statistical mindset. It will at least drive home the point that there is no formal textbook to interpret real life data. Why do you think textbooks make assumptions about the distributions of the data ? Pause for a few seconds and think about it. Well, one of the main reasons is that unless you have some assumptions, you cannot fill up the textbook with neat formulae. Yes, think about it. Unless you assume a certain distribution, you cannot put a neat formula for estimate. You cannot put a neat formula for confidence interval and so on and so forth. What’s the use of those formulae ? Not much.

Relevance of Stats

Via What Are the Odds That Stats Would Be This Popular?

“Most of my life I went to parties and heard a little groan when people heard what I did,” says Robert Tibshirani, a statistics professor at Stanford University. “Now they’re all excited to meet me.”

It’s not because of a new after-shave. Arcane statistical analysis, the business of making sense of our growing data mountains, has become high tech’s hottest calling. There are billions of bytes generated daily, not just from the Internet but also from sciences like genetics and astronomy. Companies like Google and Facebook, as well as product marketers, risk analysts, spies, natural philosophers and gamblers are all scouring the info, desperate to find a new angle on what makes us and the world tick. Computing has become cheap and available enough to process any number of formulas.

Silence

A beautiful poem via Jaisri :

Misunderstand me correctly…

Music begins where words cease.

When music ceases, silence!

When all arts aspire to

the condition of music;

What does music aspire to?

Silence.

The logic of music leads

eventually to silence!

Music must come from silence.

Come from it and return to it.

Perhaps everything will end in fire.

Fire, then silence.

That is how everything end, after all.

But, misunderstand me correctly.

Statistical Pragmatism

Via  The Big Picture :

The note summarizes that one does not need to have a split personality to apply Bayesian or Frequentist thinking to stats. Separating Real World with Theoretical World from the perspective of data shows that both the schools of thought kind of have the same assumptions.

========================================================

Frequentist OR Bayesian Inference

imageimage

========================================================

Frequentist AND Bayesian Inference

image

image

Statistical Pragmatism
Bayesian and Frequentist approach

Born Standing Up : Summary

image

Getting good at any craft takes a long time, be it deal making or trading or mastering a game or programming. However books/articles,etc. out in the world are stufffed with “get rich quick”, “get smart quick”,“get skills quick” kind of messages. I can’t speak of other fields but programming books, that implicitly carry these messages, are always in majority, be it in the offline and online bookstores. ‘Learn XYZs language in 30 days’, ‘Become an ace coder in X weeks’ etc. For all such coders who believe such books and coding something for 30 days is going to transform you, make your venture the next instagram, this book has a lot to offer. Steve Martin gives a biographical account of his life where he claims his achievement is not a result of some sudden random event that got him to spotlight and has since then remained in spotlight. Far from it, its a story of sweat, tears, rejections and occasional acceptances, smiles and rewards . Steve starts off his book saying

Elements of Hindustani Classical Music

image

This book is written by Shruti Jauhari, a noted Hindustani Classical Singer and a disciple of K.J.Yesudas. The book is an accessible introduction to the various elements of Hindustani Classical Music. To read a language, you need to know the alphabets and the grammar associated with it. In the same way, to listen to Hindustani music, you need to have a basic knowledge of Svar,Raag and Taal   One often comes across terms like bandish, gharana, alap, vilambit , taal, etc in the context of Hindustani Music. Unless one has some understanding of these terms, it is difficult to be a discerning listener.

The Razor’s Edge : Summary

book_cover

It has been a long time since I have read any fiction book. So, thought of reading one, on this weekend to take a break from stats, programming and the usual routine. It took me about 9 hours to read the entire book and I must say that there was not a single instance during those 9 hours that I felt like taking a break. So, the book has a smooth flow of prose with just enough characters that you don’t lose the story or get bored anywhere. The plot has about 15 characters , including one played by the author himself, Somerset Maugham but not all 15 characters get the same footage(obviously) .The protagonist of the story is Laurence Larry Darrell who goes on a spiritual quest and the book tries to weave a story around that quest. The other 13 characters come and go in with varying periodicity through out the book. Besides Larry, the main characters in the story are Isabel who loves Larry but ends up marrying Gray Maturin, Elliott Templeton, Isabel’s uncle whose sole purpose in life is to socialize and be at parties, Sophie MacDonald, a poetess turned whore turned dope addict, whom Larry almost ends up getting married, but thanks to a Isabel’s devious plan, never does so.

IPython Notebook

I have started using the development version of IPython from github and the experience has been extremely pleasant. Writing code and doing interactive data analysis has become so much more easier in iPython Notebook.

Fernando Perez at Pycon 2012 says that,  very soon, there will Sphinx support, debugging support and other parallel computing support that will be built in to IPython. Will eagerly await the production version of IPython notebook. However development in whatever form it is in, is good enough to experiment and do quick data analysis. The fact that matplotlib graphs appear inline is something that I found really useful.

Steal like an Artist

image

Austin Kleon’s book is essentially a blog post turned in to a book .The basic theme of the book is that in art as in other forms, you have to steal. The word steal has a negative connotation and hence the book differentiates between two kinds of thefts:

image

Basically theft in this sense means that you are free from the burden of trying to be completely original. Most of the book is common sense stuff but how often do we really follow commonsense ? 

Univariate Distribution Relationships

There are a ton of relationships between various distributions. One typically understands them, remembers them based on the kind of stats work that one does. If one does a lot of Bayesian stuff, one tends to remember the conjugate priors and related distributions. If you are doing survival modeling, you tend to focus on specific distributions like weibull etc. If you are in to OR work, Erlang, gamma, beta distribution parameters are at your finger tips. Irrespective of the type of analysis that one does, it is always good to have a decent overview of various random variables and the connections between them. The connections that one must understand should typically encompass :

Doing vs Owning

Via Lemire

Over 20 years ago, back when I was in high school, I went on a sailboat trip. I was so impressed that I decided to own a sailboat one day. I realized that a sailboat was expensive, and I guess I thought that owning a boat would not only be cool, it would be a symbol of my success.

How did it go? Today, not only do I now own two sailboats, I’m building a third one. Of course, they are radio-controlled boats, about 4 feet tall and 2 feet long. I find that I really like to design and build these little boats.

Pandas

PANDAS

Pandas is a Python library developed by Wes McKinney. The USP of Pandas is that it provides “Data Frame” environment in Python.  Dataframe is one of the commonly used data structures in R. The first thing that pains you when you start coding in Python , after having worked in R is that – “there is no readily available data frame object” . Even though there is NumPy ndarray object, it is not as flexible and extensible as the dataframe in R.  Pandas library addresses this problem and provides DataFrame and a ton of associated functionality that can be used for data munging, data cleaning and interactive data exploration. If you look at random data cleaning code written using Pandas and R, they will look very similar. However pandas is like R’s dataframe on steroids. Pandas also has some preliminary graphing capabilities using matplotlib.I think as the library matures, it will be a default module in any data analyst’s toolkit.

Algo trading norms

Via Mint :

Announcing the long-awaited guidelines for so-called algorithmic trading, or “algo” as it’s popularly known, India’s capital market regulator on Friday insisted on a stringent and advanced risk management system to provide support for such trades as their speed and volume entail a higher risk. Automated execution logic enables the rapid execution of a large number of trade orders. Stock exchanges have been given three months to complete the approval process for brokers currently executing orders through algos.

Matplotlib for Python Developers : Summary

clip_image002

For any number crunching work, be it making a report or developing a model or doing data diagnostics, visualization of the data is imperative. Be it univariate or multivariate data, visuals help us to look beyond the summary statistics or test statistics. For someone working in finance, there is an entire discipline of `Technical Trading’ where buy , sell , stop-loss decisions are made based on visuals. Whether one believes it or not is a different question altogether. Keeping Technical analysis side, there is an obvious need to look at data, be it histograms, density plots, contour plots, barplots, boxplots, etc. Tools that churn out these graphics are compulsory in any data analyst’s toolbox. My toolbox contains ggplot2, lattice and base-R. I had started using ggplot2 package, 4 years ago, and since then I have been using it regularly in my work. Since the output is usually publication ready, one of the real life situations where I had used ggplot2 visuals was in an annual newsletter to investors that reported their portfolio performance. I don’t think anybody cared about what the visuals were saying as long as the portfolio was making money. But using ggplot2 definitely lent a professional look to the newsletter.