I prefer to do my coding in a Jupyter Notebook, as my previous posts have mentioned. However, I have not run across any good documentation on how to optimize the notebook, for either a python or R kernel. I am going to mention a few helpful hints I have found. Here is the link to the Project Jupyter site.
First a basic comment on how to create a notebook where you want it. You need to navigate to the directory where you want the notebook to be created. I use the Windows PowerShell command-line shell. When you open it up, you are at your home directory. Use the “dir” command to see what is in that directory, and then use the “cd” (change directory) command to navigate to the directory you want to end up in. If it is a longer path, you should enclose in quotes. If you need to create a new directory, use the “md” or “mkdir” command to create a new directory. For example, my long path is – “….\Jupyter Notebooks\Python Notebooks”, and while at SciPy 2016 I created an new folder, and this directory is “….\Jupyter Notebooks\Python Notebooks\SciPy16” – to which I added a folder for each tutorial I attended.
Once you get into the final directory, type “Jupyter Notebook”, and a new notebook will be opened. The first page that opens up is the “Home” page, and if your notebook exists, you can select it here. If it doesn’t yet exist, then select “New” if the upper right, select your notebook type (for me R or Python 3), and it will launch the notebook. (This notebook is from a pandas tutorial I attended at SciPy 2016 – “Analyzing and Manipulating Data with Pandas by Jonathon Rocher (excellent presentation if want to watch the video being created).
Once you click on the “pandas_tutorial”, this Jupyter notebook will open up.
A nice feature is that if you clone GitHub repository into that folder, and start a new Jupyter Notebook, then all the files that go with that repository are immediately available for use.
Importing data in a Jupyter Notebook.
If you are tired of hunting down the path for a data set, there is an easy way to find a data set and get it into the directory of the Jupyter notebook. Go to the “Home” page, and select “Upload” and you will be taken to the “file upload” application. Navigate to where you stored the data set on your computer, select, and then it will load that onto the home page. You can then easily load it into your specific Jupyter notebook that is associated with that directory.
Matplotlib figure display options.
If you don’t specify how to display your figures in the Jupyter notebook, when you create a figure using matplotlib, a separate window will open and display the graph. This window is nice because it is interactive, and you can zoom in on the graph, save it, put labels in, etc. There is a way to do this in the Jupyter notebook.
The first option I learned about was:
%matplotlib inline
This would display the graph in the notebook, but it was no longer interactive.
However, if you use:
%matplotlib notebook
The figures will now show up in the notebook , and still be interactive. I learned this during the pandas tutorial at SciPy 2016.
You can also set your figure size by:
LARGE_FIGSIZE = (12,8) # for example
Some pandas optimization hints
Use:
pandas.set_option()
to set a large number of options. For example:
pandas.set_option(“display.max_rows”, 16)
and only 16 rows of data will be displayed. There are many options, so just use “pandas.set_option?” command to see what is available.
If you have other useful Jupyter notebook tips, would love to hear about them.