Here is a great cheat sheet from the very informative Analytics Vidhya site on using Pandas for Data Exploration.

# Month: December 2015

# Learning R, Jupyter Notebooks, and KNIME

I currently have about a month off between my last course Math for Modelers, and my next course, Statistical Analysis in Northwestern University’s Master of Science in Predictive Analytics program. I have started to catch up on some things on my to do list.

The first of these tasks is getting to know R better. I have a little bit of previous R experience, but at a very, very basic level. I am currently working my way through DataCamp‘s excellent series of R programming courses. I completed the “Introduction to R” course which consisted of 6 chapters covering the basics, vectors, matrices, factors, data frames and lists. This was very informative and done nicely. I am currently almost finished with the “Intermediate R” course. This has 5 chapters covering, conditionals and flow control, loops, functions, the apply family, and utilities. I highly recommend these courses for people starting to learn R, they are very nicely done.

I am also using Jupyter Notebooks as I go through these courses. I just started using these, and wish I would have found them much earlier. I would have done my Python work in them as I went through the math for modelers course. I just added the R kernel and have been doing the code and taking notes on R as I progress through these courses. I wish Northwestern University would consider using these for courses in which there is programming.

I am also starting to explore KNIME. I was first introduced to KNIME earlier this year when I visited Dr. Randall Moorman’s predictive monitoring lab at the University of Virginia. They were using KNIME on a very elaborate project and I was very impressed with the functionality of this platform. KNIME is an “open-source, enterprise-grade analytics platform”, that can be used to “discover the potential hidden in their data, mine for fresh insights, or predict new futures”. I am very early in my exploration of this platform, but I am very impressed so far, and am excited to get to work on a project using this. I will post further updates as I learn more about it.

Lastly, a few words about what I am listening to and reading. I am currently listening to the audio version of “The Master Algorithm” by Pedro Domingos. This is a must read book for practitioners of predictive analytics and anyone who is interested in machine learning. I am reading the print version of “Superforecasting: The art and science of prediction” by Philip E. Tetlock and Dan Gardner. This is an excellent read as well. I will try to review them in more detail when I am done.

# Northwestern University MSPA 400 Math for Modelers course, final thoughts

I had previously posted my interim thoughts on this course, and now that the course is finished, thought I would add my final thoughts.

The final examination was fair and a mixture of the math and Python. You could certainly pass the course if you didn’t keep up with the Python, and do the exercises, but would be much more difficult.

The last section of the course was on calculus. Weeks 6 and 7 were devoted to a review of differential calculus and weeks 8 and 9 were devoted to integral calculus. Dr. Goldfeder continued to stress the real world application of the concepts learned.

We had a week off over the Thanksgiving holiday, which allowed us to catch up and review before the final examination. I took this time to both review the math (a little), and review Python (a lot). I went back through each weeks Python assignments to make sure I understood the concepts and could work through the code. I HIGHLY recommend this. Looking back I wished I would have spent more independent time applying Python and writing code to do the problems as much as possible as we were going through the course. I encourage future students in this class to attempt to do this.

After the class ended I started catching up on my to do list, which included how to use Jupyter Notebooks. After doing more exploring of the Jupyter Notebook, I wished I would have found them earlier. They are very useful for learning code, and taking notes at the same time. I would encourage students to look at these when they start this course. I wish Northwestern University would do what several other universities have done, and that is start teaching the class using these notebooks. This would be extremely useful for the Python part of the course. I have now been brushing up on R using the same Jupyter Notebooks, with an R kernel installed. I plan on using this notebook as I go through my next class, statistical analysis, which uses R. Here is the link to Project Jupyter’s webpage.

My overall assessment of the math for modelers course is highly positive, and I feel as if I learned what I set out to learn, and got my money’s worth. It is a very demanding class time wise, but for those interested in analytics, this is a foundational set of knowledge that must be learned.