« back

Installing Python modules on PAWS Internal

Madhu said that the global version of pip is out of data and needs to be updated on a per-user basis.

Upgrading should get you to pip 8+, and then wheels (the new python distribution format) instead of eggs should get installed.

You can update & install within the notebook, but if you prefer to do it in Terminal after SSH'ing to notebook1001.eqiad.wmnet, you can add the path to your ~/.bash_profile:

[[ -r ~/.bashrc ]] && . ~/.bashrc
export PATH=${PATH}:~/venv/bin
export http_proxy=http://webproxy.eqiad.wmnet:8080
export https_proxy=http://webproxy.eqiad.wmnet:8080

Then you can use and upgrade pip:

In [2]:
!pip install --upgrade pip
Downloading/unpacking pip from https://pypi.python.org/packages/b6/ac/7015eb97dc749283ffdec1c3a88ddb8ae03b8fad0f0e611408f196358da3/pip-9.0.1-py2.py3-none-any.whl#md5=297dbd16ef53bcef0447d245815f5144
  Downloading pip-9.0.1-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Installing collected packages: pip
  Found existing installation: pip 1.5.6
    Uninstalling pip:
      Successfully uninstalled pip
Successfully installed pip
Cleaning up...

Then we can install (for example):

  • Data
  • Visualization
  • Statistical Modeling and Machine Learning
    • StatsModels for statistical analysis
    • Scikit-Learn for machine learning
    • hyperopt-sklearn for hyper-parameter optimization and finding the best classifier
    • sklearn-pandas provides a bridge between Scikit-Learn's machine learning methods and pandas-style Data Frames
    • PyStan interface to Stan probabilistic programming language for Bayesian inference
    • PyMC3 for Bayesian modeling and probabilistic machine learning
    • Patsy for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. (Patsy brings the convenience of R "formulas" to Python)
    • TensorFlow for machine learning using data flow graphs
    • Edward for probabilistic modeling, inference, and criticism (uses PyMC3 and TensorFlow)
pip install \
    pandas pandas-datareader requests beautifulsoup4 feather-format \ 
    seaborn bokeh \
    statsmodels scikit-learn hyperopt sklearn-pandas pystan pymc3 patsy

Warning: Install TensorFlow v1.0 specifically:

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0rc2-cp34-cp34m-linux_x86_64.whl
pip install $TF_BINARY_URL
pip install edward

Testing, 1, 2, 3

Let's check that things work!

In [13]:
import seaborn as sns
iris = sns.load_dataset('iris')
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
In [14]:
%matplotlib inline
import seaborn as sns; sns.set()
sns.pairplot(iris, hue='species', size=1.5);
/home/bearloga/venv/lib/python3.4/site-packages/matplotlib/font_manager.py:1297: UserWarning: findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans
  (prop.get_family(), self.defaultFamily[fontext]))

Updating installed modules

This nifty command comes to us courtesy of rbp at Stack Overflow:

pip freeze --local | grep -v '^\-e' | cut -d = -f 1  | xargs -n1 pip install -U