« back
# Installing Python modules on PAWS Internal¶

Madhu said that the global version of `pip`

is out of data and needs to be updated on a per-user basis.

Upgrading should get you to pip 8+, and then wheels (the new python distribution format) instead of eggs should get installed.

You can update & install within the notebook, but if you prefer to do it in Terminal after SSH'ing to notebook1001.eqiad.wmnet, you can add the path to your **~/.bash_profile**:

```
[[ -r ~/.bashrc ]] && . ~/.bashrc
export PATH=${PATH}:~/venv/bin
export http_proxy=http://webproxy.eqiad.wmnet:8080
export https_proxy=http://webproxy.eqiad.wmnet:8080
```

Then you can use and upgrade `pip`

:

In [2]:

```
!pip install --upgrade pip
```

Then we can install (for example):

**Data**- Pandas for analysis-friendly data structures (e.g. Series & DataFrame)
- Pandas Data Reader for remote data access in Pandas
- Requests for HTTP requests
- BeautifulSoup for web-scraping
- Feather is an Apache Arrow-based file format that efficiently stores pandas DataFrame objects on disk.
**Note**: You can read/write feather files into/out of R using the sister R interface (available on CRAN).

**Visualization**- Seaborn for data visualization (also installs Matplotlib):
- Bokeh for interactive dataviz

**Statistical Modeling and Machine Learning**- StatsModels for statistical analysis
- Scikit-Learn for machine learning
- hyperopt-sklearn for hyper-parameter optimization and finding the best classifier
- sklearn-pandas provides a bridge between Scikit-Learn's machine learning methods and pandas-style Data Frames
- PyStan interface to Stan probabilistic programming language for Bayesian inference
- PyMC3 for Bayesian modeling and probabilistic machine learning
- Patsy for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. (Patsy brings the convenience of R "formulas" to Python)
- TensorFlow for machine learning using data flow graphs
- Edward for probabilistic modeling, inference, and criticism (uses PyMC3 and TensorFlow)

```
pip install \
pandas pandas-datareader requests beautifulsoup4 feather-format \
seaborn bokeh \
statsmodels scikit-learn hyperopt sklearn-pandas pystan pymc3 patsy
```

**Warning**: Install TensorFlow v1.0 specifically:

```
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0rc2-cp34-cp34m-linux_x86_64.whl
pip install $TF_BINARY_URL
pip install edward
```

Let's check that things work!

In [13]:

```
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
```

Out[13]:

In [14]:

```
%matplotlib inline
import seaborn as sns; sns.set()
sns.pairplot(iris, hue='species', size=1.5);
```

This nifty command comes to us courtesy of rbp at Stack Overflow:

```
pip freeze --local | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 pip install -U
```