Code Kata 7

The last couple of kata have been programming challenges; let’s move back into mushier, people-oriented stuff this week.

This kata is about reading code critically our own code. Here’s the challenge. Find a piece of code you wrote last year sometime. It should be a decent sized chunk, perhaps 500 to 1000 lines long. Pick code which isn’t still fresh in your mind.

Now we need to do some acting. Read through this code three times. Each time through, pretend something different. Each time, jot down notes on the stuff you find. The first time through, pretend that the person who wrote this code is the best programmer you know. Look for all the examples of great code in the program. The second time through, pretend that the person who wrote this code is the worst programmer you know. Look for all the examples of horrible code and bad design. The third (and last) time though, pretend that you’ve been told that this code contains serious bugs, and that the client is going to sue you to bankruptcy unless you fix them. Look for every potential bug in the code.Now look at the notes you made. What is the nature of the good stuff you found? Would you find similar good stuff in the code you’re writing today. What about the bad stuff; are similar pieces of code sneaking in to your current code too. And finally, did you find any bugs in the old code? If so, are any of them things that that you’d want to fix now that you’ve found them. Are any of them systematic errors that you might still be making today?

My Workings

Ok Here is the code I am going to examine. It is a pairs model script that I ran in 2009 Feb. It is a script titled \verb|1.Model_1_Main_v6_1.R|

Learnings - Code is Written by a best programmer

  • In the beginning of the program, I had listed these packages as required packages for running the script : fArma, fSeries, TSA, pspline, fUnitRoots, RSQLite, lmtest. I don’t remember these packages properly. Well fArma was used something relating to residuals, fUnitRoots, it was something about checking whether the residual is stationary or not, RSQLite was the driver needed to interact with Sqlite, lmtest- don’t remember at all
  • Use of reshape library to melt the dataset and cast it back accordingly. Interesting way to separate close , high, low prices from a csv file
  • If there are x rows and y columns of data, if I want to do a pivot on two variables, then one can use reshape package. If one needs to do a pivot on one variable then you can use stack. In that aspect, reshape is a fantastic package. Pick the relevant fields that will figure in the pivot, melt the data so that the data is organized in to all possible combinations of the relevant fields. Then cast the data accordingly.
  • using reshape package - If you ever want to draw a pivot, use reshape package
  • I completely forgot Phillips-Ouliaris test, a test for the null hypothesis that x is not cointegrated. A library called tseries has po.test and I had used to check whether x and y are cointegrated.
  • The script is very smart in the sense that for checking whether two stocks are cointegrated, It checks po.test both ways , check the strength of cointegrating relationshop both ways
  • unitrootTest,uroot, urkpssTest are used for used for cheking stationarity.

Learnings - Code is Written by a worst programmer

  • Populating the tradedate as a last column in the dataframe and then deleting the last column with out mentioning that it is the trade date column that has been populated , is the one that is getting deleted. I had to rerun the code to understand what it is doing. This is a bad programming practice. You populate a column in the data frame, use it for some purpose and when you delete, atleast refer to the column name so that people can understand it
  • Why to write a file with rownumbers and then delete it. Its better to write a file with rownames as false
  • Sometimes stack is used, sometimes melt is used. There is no consistency in coding. If there is an excel with rownames and column names.
  • mixing tab and spaces. extremely bad programming practice.
  • Redundant code
> n <- 5
> pair.combinations <- matrix(data = NA, nrow = n * (n + 1)/2,
     ncol = 2)
> rowcount <- 1
> for (i in 1:n) {
     for (j in 1:i) {
         pair.combinations[rowcount, 1] <- i
         pair.combinations[rowcount, 2] <- j
         rowcount <- rowcount + 1
     }
 }
> pair.combinations <- data.frame(pair.combinations)
> colnames(pair.combinations) <- c("i", "j")
  • Should have used tapply , do.call , strsplit , outer functions to optimize the above code. The above 12 lines of code can be reduced to 6 lines of code.
> n <- 5
> pair.combinations <- outer(1:n, 1:n, paste)
> pair.combinations <- cbind(pair.combinations[upper.tri(pair.combinations)])
> pair.combinations <- do.call(rbind, (strsplit(pair.combinations[,
     1], " ")))
> pair.combinations <- t(apply(pair.combinations, 1, as.numeric))
> colnames(pair.combinations) <- c("i", "j")
> print(pair.combinations)
      i j
 [1,] 1 2
 [2,] 1 3
 [3,] 2 3
 [4,] 1 4
 [5,] 2 4
 [6,] 3 4
 [7,] 1 5
 [8,] 2 5
 [9,] 3 5
[10,] 4 5
  • Unnecessary data is being populated in pair.combinations. Stock X cointegrated with Stock Y where X and Y should be different. The code is populated values where X and Y are same. Subsequently, it is removing the elements where such a condition occurs. Pathetic\!

This experience of looking at the code from two perspectives is fantastic

Moving Forward By Looking Back

Perhaps you’re not like me, but whenever I try this exercise I find things that pleasantly surprise me and things that make me cringe in embarrassment. I find the occasional serious bug (along with more frequent less but serious issues). So I try to make a point of looking back at my code fairly frequently. However, doing this six months after you write code is not the best way of developing good software today. So the underlying challenge of this kata is this: how can we get into the habit of critically reviewing the code that we write, as we write it? And can we use the techniques of reading code with different expectations (good coder, bad coder, and bug hunt) when we’re reviewing our colleagues code?