Accessing Python

We recommend using the Anaconda distribution of Python, though that is not required. You can download it here. Make sure to use Python 3.7, 3.8, or 3.9 and not Python 2.*.

Once you’ve installed Python, please install the following packages: - numpy - scipy - pandas - dask - dask.distributed - dask.bag - dask.array - dask.dataframe - dask.multiprocessing

Assuming you installed the Anaconda Python, you should be able to do this:

conda install numpy scipy pandas dask

Access via the SCF

Python from the command line

Once you get your SCF account (which you’ll need for our discussion of parallelization and big data and for PS6), you can access Python or IPython from the UNIX command line as soon as you login to an SCF server. Just SSH to an SCF Linux machine (e.g., arwen.berkeley.edu or radagast.berkeley.edu) and run ‘python’ or ‘ipython’ from the command line.

More details on using SSH are here. Note that if you have the Ubuntu subsystem for Windows, you can use SSH directly from the Ubuntu terminal.

Python via Jupyter notebook

You can use a Jupyter notebook to run Python code from the SCF JupyterHub or the Berkeley DataHub. Select Start My Server. Then, unless you are running long or parallelized code, just click Spawn (in other words, accept the default ‘standalone’ partition). On the next page select ‘New’ and ‘Python 3’.

To finish your session, click on Control Panel and Stop My Server. Do not click Logout.