clip_image001

Prof. Hadley Wickham, the creator of ggplot2 and other useful packages like plyr, reshape etc. has one strong advice to R programmers – “Read other’s code”. This comes from a person who has developed 30 packages till date. We all have an immense urge to program, code up something, view the results, tweak our code to make it work etc. However pausing to read somebody else’s source code requires a certain amount of hard work, willingness to learn from others . In R particularly, where all the functions are documented really well, one hardly NEEDS to go in to the code. But that’s exactly what Hadley Wickham recommends.

In that sense, this book by Zed A Shaw has a similar message for Python. You have to read code on a regular basis and it is typically hard work to read what other people have written. I guess that’s the reason why this book is titled, “Learn Python The Hard Way”. This book introduces Python step by step in 52 exercises where the author gives pointers to various modules, websites for the reader to figure out stuff. So, all the exercises have one common structure – “ introduce a topic and make the reader curious to check out things from other sources”.

As a newbie, I found this book interesting for a couple of reasons. Firstly, the author urges the reader to type out every single line of code in the book. No copy pasting allowed when you learning something new. The other thing I liked about the book is about author giving clear instructions to the reader to follow a directory structure for a Python project. For a long time I never followed any specific directory structure funda for many projects in whatever languages I have coded. However once I learnt Ruby on Rails, I understood the advantage of following a nice standardized directory structure for any task/ project/ library. Not all frameworks make strong recommendations like Ruby on Rails. So, the programmer has to figure out something that works. That’s usually a trial and error process. Starting from a well thought out directory structure in Python is going to be helpful in the long run when you want to go back, review or commit the project to Version control system.

Let me list down the things that I learnt from this book.

  • %d , %s, %r  are used for substituting stuff in the string

  • %() is the used to substitute the respective variable in the string

  • Learnt about close , read , write , readline , truncate functions

  • raw_input is used to get input from the user

  • Functions appear similar to functions defined in ruby.

  • Functions should start with def

  • You need not put an open parenthesis right after the function name

  • You can leave spaces after the parenthesis (

  • You can leave spaces after the closing parenthesis ) and colon

  • You go back to usual code environment from a functional environment by writing with no indent. Unlike ruby there is no need to put an end at the end of every def

  • At the end of def statement , there is a need to put colon

  • You have to indent all the lines of code in a function with 4 spaces.

  • Duplicate argument names are not allowed

  • Variables in script are not connected to variables in the function

  • The variables in the function are not connected to variables in the script.

  • f.seek(0) takes you back to start of the file

  • f.readline() reads a specific line in the text return at the end of function can be be used to return something from a function.

  • Exercise 23 was very interesting as it asked me to go and visit bitbucket.org and then browse a random python project, click on source and write about whatever I could find interesting about the project. This exercise says this “ When you do this exercise, think of yourself as an anthropologist, trucking through a new land with just barely enough of the local language to get around and survive.” Despite hardly knowing any aspects of Python, I looked up bitbucket.org and started randomly browsing a source program stumbled on bootstrap-py3k.py file from pyquery. This is what I could make out from the file.

    • You can import a ton of libraries by listing them down separated by comma

    • import X imports the module X, and creates a reference to that module in the current name space. Or in the other words, after you’ve run this statement, you can use X.name to refer to things defined in module X

    • from X import * imports the module X, and creates references in the current name space to all the public objects defined by that module. X in itself is not defined. So X.name will not work but name will work

    • from X import a, b, c imports the module X and creates references in the current namespace to the given objects

    • try Except. Similar to the try catch in java and other languages

    • Unlike R, If loop has no bracket and has a terminating colon

    • for x in list() - This is similar to what you find in R.

  • The program that I randomly browsed was overwhelming. Goes on to say the distance I need to walk before I can code properly in Python.

  • beans, jars, crates = secret_formula(start_point) - This is very different from the usual assignment that you get to see in other languages. In R something like this, c(x,y,z) <- test() does not work.

  • I learnt the function pop that can be used on words

  • Any block of code needs to have 4 spaces before the actual statement begins

  • function definition or an if definition should terminate with colon

  • if - elif - else , elif is equivalent to else if

  • Format strings –  %f for floating decimal point, %d for single integer decimal %r for string %s for string

  • Some of the functions associated with lists - append(x) appends the elements of x , extend(x) extends x by adding all items in the list. Some of the other functions are insert, remove, pop, index, count, sort, reverse

  • You can use lists as stacks, use append and pop function to mirror LIFO principle

  • Python lists start from index 0

  • Wrote a program that incorporates all the 100 key words mentioned in the first  36 chapters.I think the author’s suggestion write such a program is pretty useful in getting up to speed with the syntax

  • In Python, any number that begins with 0 is considered as Octal number

  • The built-in number objects in Python support integers, floating-point numbers and complex numbers.

  • All numbers in Python are immutable objects, meaning that when you perform any operation on a number object, you always produce a new number object

  • Long integer has no predefined memory and its minimum and max values are dictated by machine architecture

  • floating point in python is similar to double in C - 53 bits precision

  • Always use “ ” for strings so that you can use single quotes with in the string

  • Use triple-quoted string for a bigger string. Line breaks in the literal are preserved as new line characters

  • Tuples are like lists, except they are immutable

  • Tuples may contain immutable objects

  • join is an interesting keyword. If you say stringvar.append(list[3:5]) , it adds stringvar between each of the list items

  • dir(li) returns a list of all the methods of a list.

  • dir(d) returns a list of the names of dictionary methods

  • Tuples may contain immutable objects

  • Lists and Dictionaries are the power horses in Python. They are best utilized as iterators.Lists are ordered collection whereas Dictionary is an unordered collection

  • Understood the importance of map operators.

  • Lists can have functions embedded in them.

  • Each of the functions in the class takes self

  • Indentation is very appealing. I don’t have to worry about the painful flower brackets

  • You can assign functions to any variable , you can put functions in a list. 

  • __init__ sets up all the initial variables of the class. It is the constructor function for the class

  • self is a key word that is used in the context of the class. it is similar to `this' function in C++

  • To instantiate a dictionary , either use X = dict() or x= {}

  • Use str function to convert numbers to strings

  • Another use of dictionaries is that you can assign some data to a key and be certain that there will no be duplicates. If at all you try to add a duplicate to dictionary, the data for the last entered item would be taken as the relevant data for the key.Let’s say you are reading a lot of items and you want to find the last encountered data, you can use dict for that purpose

  • Names starting AND ending with double underscores work differently.

  • __doc__ when used in this sense print x.__doc__ prints the comment in the function.

  • __dict__ contains attributes in a class instance

  • Learnt about pass keyword. pass is a null operation - when it is executed, nothing happens. It is useful as a placeholder when a statement is required syntactically, but no code needs to be executed for example

  • Learnt the way to use setattr  and getattr 

  • The takeaways from PEP8 , the python coding style guide are,

    • Arguments on the first line forbidden if you are not using vertical alignments

    • Use spaces or tabs. Don’t mix both. If possible use only spaces

    • Maximum line length is 74 for code and 72 for comments and doc string.

    • Separate top-level function and class definitions with two blank lines

    • Method definitions inside a class are separated by a single blank line

    • Use blank lines in functions ,sparingly, to indicate logical sections.

    • Imports should be on separate lines

    • Imports should be at the top of the file

    • Imports should be grouped in the following order : standard library imports, related third party imports, local application/library specific imports

    • Avoid whitespaces

      • Immediately inside parentheses, brackets or braces

      • Immediately before comma, semicolon, or colon

      • Immediately before the open parenthesis that starts the argument list of a function call

      • Immediately before the open parenthesis that starts an indexing or slicing

      • More than one space around an assignment operator to align it with another

    • Always surround binary operators with a single space on either side

    • Use spaces around arithmetic operators

    • Don’t use spaces around sign when used to indicate a keyword argument or a default parameter value.

    • Multiple statements in a single line are generally discouraged

    • Module names should have short all-lowercase names

    • Class names use CapWords convention

    • function names should be lowercase

    • Always use self for the first argument to instance methods

    • Always use cls for the first argument to class method

    • Method names inside the class - use lowercase words separated by underscores

    • Leading underscore in a variable name for private variables

    • Two underscores to invoke Python’s name mangling rules

    • Constants are CAPS_LETTERS

  • Here are the steps that I have followed to install python packages

    • Download ez_setup.py in to Scripts folder

    • Set the System Path, Windows path variable to contain this directory

    • Double click on this ez_setup.py module

    • RPy needs R.dll and hence you need to give to append the relevant folder name in the path variable

    • easy_install executable and other scripts get populated in Scripts folder

    • Installed pip using  easy_install pip

    • Installed numpy .

    • Download win32all, windows extensions from Mark Hammond .

    • Downloaded RPy installable

    • Installed scipy using easy_install

    • Currently I am using R 2.13. Rpy is built for R 2.12. Hence had to download R2.12 , give its bin in windows path and then invoke the python script.

    • Some description of the packages that I have installed

      • pip - used for installing packages which are present in the PyPy package index

      • distribute - Is a lower level tool for building, installing and managing Python packages

      • VirtualEnv - Is a tool to create isolated environments for Python

      • Nose - Extends unittest to make testing easier

      • NumPy/SciPy : This pair of libraries provide array and matrix structurs, linear algebra routines, numerical optimization, random number generation, statistics routines, differential equation modeling, Fourier transforms etc. Basically you get the entire MATLAB toolkit for free

      • Need to check out these modules sometime – IPython, Cython, PyTables, PyQT, TreeDict, SQLAlchemy, Sage, Enthought, Sphinx. Loooooong way to go !

    • Understanding setup.py 

      • What is it ? This is used by Python distutils as a standard way for installing third party Python modules. Before distutils, module creators would have to create install files for all the different platforms. That activity is made redundant ,thanks to setup.py.

      • What does this file contain ? It contains all the calls Python makes to distutils

      • What happens when we run it ? There are two things that happen. First is the build step which puts all the files which need to be copied in to the distribution root’s build directory. Second step is the install phase where all the files are copied to the install directory for installation

  • The steps I carried out in order to install a custom build python module are :

    • Make a copy the default skeleton package and rename it to project name, XYZ

    • Rename the NAME module to the XYZ

    • Rename NAME in the all the relevant files

    • Ceate a rk module and put it in XYZ folder

    • Edit the setup.py to contain

    • Run python setup.py dist - This will create a zip file in dist folder

    • Extract the dist folder wherever you want and run pythonsetup.py  install

    • The module is installed in python and you can import in the code and start using it

    • If you have to uninstall this egg, you have to run pip uninstall “name given in the setup.py”

  • Came across HitchHiker’s guide to Packaging on the net. Will go over it someday at leisure

  • Exercise 46 has a nice introduction to Python packaging. Somehow I find packaging in Python much easier than R. Well I should not comparing two different things , but as a amateur programmer, I think that, it is easy to install R packages , at least there is no additional stuff to install to install packages. a command like install.packages() is all that is needed. For Python, it is not as straight forward as it seems. You need to install pip, or an executable and then use easy_install command. For packaging though, I found steps in python to be crystal clear and easy to do. However in R, for some reason I found it a bit difficult to learn this stuff. R CMD makes it easy though, but if you want to put in documentation for function, tests, etc , it will take some time to learn packaging in R. May be I am just rambling here. R enthusiasts will dismiss my statement that it is far easier to package stuff in R than in any other language. May be ..But somehow after going through this chapter, creating an installable in Python appears very intuitive and easy. The fact that Dropbox, a startup that became a fantastic years in the recent times uses Python for everything says a lot about the versatility and usefulness of Python packaging.

  • Some gyan on unit testing

    • Write one test file for each module you make

    • Keep test cases short

    • Reference to doctests and nosetests Have to read more about them someday

  • Learnt about isdigit function that can be invoked on a string to check whether it is a string

  • I quickly browsed Chapter 50-51-52 as I don’t think I will doing any webdev work in the times to come. If at all I need to do something I will probably use ruby on rails and get it done. 

The author concludes the book with a superb reminder to any programmer

Which programming language you learn and use doesn’t matter. Do not get sucked into the religion surrounding programming languages as that will only blind you to their true purpose of being your tool for doing interesting things.