# Python packages to master

**Via Activestate**

http://www.activestate.com/blog/2010/06/must-have-python-packages-finance

**Here are the survey results in order of their top choice rankings:**

**NumPy**- the fundamental library needed for scientific and financial computing with Python as it contains a powerful N-dimensional array object, advanced array slicing methods, convenient array reshaping methods and libraries with numerical routines for basic linear algebra functions, basic Fourier transforms and sophisticated random number capabilities.**SciPy**- a suite of scientific tools for Python. It depends on the NumPy library, and it gathers a variety of high level science and engineering modules together as a single package. SciPy provides modules for statistics, optimization, numerical integration, linear algebra, Fourier transforms, signal processing, image processing, genetic algorithms, ODE solvers and special functions**matplotlib**- A numerical plotting library that provides production quality 2-D numerical plotting functionality in a variety of hardcopy formats and interactive environments across platforms.**MySQL for Python**- A pure Python binding for MySQL, allowing the user to integrate MySQL execution into any Python script.**PyQT**-a popular Python binding of the cross-platform GUI toolkit Qt.**xlrd**- Library for developers to extract data from Microsoft Excel spreadsheet files (also see xlwt)**RPy2**- RPy2 is a simple Python interface for R, able to execute any R function from within a Python script.**NetworkX**- This tool is used for analyzing network data**SymPy**- SymPy contains nearly all of the same functionality (algebraic evaluation, differentiation, expansion, complex numbers, etc.) as SimPy, but is contained in a pure Python distribution.**Boost.Python**- This C++ library enables seamless interoperability between C++ and Python (see)**PyMC**- PyMC implements the Metropolis-Hastings algorithm as a Python class, providing flexibility when building your model. PyMC is also highly extensible, and well supported by the community.**SimPy**- Short for “Simulation in Python”, an object-oriented, process-based discrete-event simulation language, making it a wholesale agent-based modeling environment written entirely in Python.**Pycluster**- This package contains efficient implementations of hierarchical and k-means clustering..**NLTK**- Natural Language Toolkit is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language.- An honorable mention should go to wxpython , a set of Python extension modules that wrap the cross-platform GUI classes from wxWidgets that received a number of write-in votes.

Via Drew Conway

http://www.drewconway.com/zia/?p=204

##### - NumPy

NumPy, short for Numeric Python, is the cornerstone of Python’s mathematics and statistics operations. All scientific computing in Python starts and ends with NumPy!

Download NumPy

- SciPy

SciPy, short for Scientific Python, is the little brother of NumPy, as it relies on NumPy data types for its operations. To distinguish itself, SciPy adds several of its own sophisticated data types, and integration and optimization techniques. Many of the packages proceeding this rely on some combination of NumPy and SciPy.

Download SciPy

- Matplotlib

The third tine on Python’s scientific trident, Matplotlib (pylab) is the standard for 2D plotting. Highly extensible, and will display your results just the way you like ‘em.

Download Matplotlib

- NetworkX

This package is what motivated me to learn Python. This is the best tool for analyzing network data–period. For novice social network analysts/graph theorist, the learning curve will be steep, but taking the time to learn NX will preclude you from having to waste your time with other inferior tools. Oh, and for those of you with accreditation concerns, its subversion is maintained by Los Alamos National Laboratory.

Download NetworkX

- PyMC

This one is for all of you Bayesian/MCMC modelers out there. PyMC implements the Metropolis-Hastings algorithm as a Python class, providing flexibility when building your model. PyMC is also highly extensible, and well supported by the community.

Download PyMC

- SimPy

Short for “Simulation in Python”, SimPy is an object-oriented, process-based discrete-event simulation language, making it a wholesale agent-based modeling environment written entirely in Python. While not as robust as REPAST or NetLogo, SimPy provides an excellent tool set for designing experiments, and because it is pure Python, the data can be fed to other analytical packages.

Download SimPy

- SymPy

Not to be confused with the previous entry, SymPy is an full-featured Python library for symbolic mathematics. Oliver suggested I add Sage to the list, which is an excellent tool, but SymPy contains nearly all of the same functionality (algebraic evaluation, differentiation, expansion, complex numbers, etc.), but is contained in a pure Python distribution. This package is great for researchers who want symbolic mathematics support, but have no access to mega-expensive computer algebra systems, likeMathematica.

Download SymPy

**UPDATE**: How to use Python and SymPy to solve optimization problems.

- html5lib

After the fall of BeautifulSoup, I was desperate for a web data parser that equaled soup’s flexibility and easy of use. Enter html5lib. If you need to download and organize large amounts of data from the Internet in a quick and easy way, then html5lib is the only package you will need. This module also supports the BeautifulSoup tree type, as well as many others, making it incredibly useful across a wide range of tasks. To take advantage of its power, you will need a little background in HTML (or XML, if that happens to be what you are parsing), but there are many tutorials available online to get you up to speed quickly.

Download html5lib

- Pycluster

There are many clustering algorithms available for Python, but many of these packages are designed to cluster one-dimensional data. Data collected by social scientist, however, is often of a higher dimension–enter Pycluster. This package contains efficient implementations of hierarchical and k-means clustering, with several options for measuring distance. Still waiting for a clever binding to Matplotlib to draw the dendrogram, but in the meantime, you can use their Java program TreeView to display result.

Download Pycluster

- cjson

This module implements a very fast JSON encoder/decoder for Python. JSON (JavaScript Object Notation) is useful for many things, but most notably for social scientist is how many social networking sites use JSON to encode public data about their users and their users’ relationships. JSON is also what is returned by Google’s SocialGraph API, so cjson allows researchers to feed this social network data directly into Python data types.

Download cjson

- Pyevolve

A complete pure python genetic algorithm framework. I am wearing my computer science background on my sleeve with this one, but for people serious about designing pure Python agent-based models, Pyevolve provides the tools to create intricate experimental environments.

Download Pyevolve

- MySQL for Python

A pure Python binding for MySQL, allowing the user to integrate MySQL execution into any Python script. Very straightforward and simple to use, and since many social science data sets are stored on MySQL databases, a necessity.

Download MySQL for Python

**Updated 4/6/2009>:** I have been negligent, as it pointed out in the comments, RPy has functionally been replaced by RPy2.

- RPy2

There are very few statistical calculations that the combination of NumPy and SciPy cannot handle, but there are NO statistical operations R cannot do. RPy2 is a simple Python interface for R, able to execute any R function from within a Python script.

Download RPy2