Data Science, Northwestern University MSPA, Uncategorized

DataCamp’s Importing Data in Python Part 1 and Part 2.

I recently finished these DataCamp  courses and really liked them.  I highly recommend them to students in general and especially to the students in Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.

Importing Data in Python Part 1 is described as:

As a Data Scientist, on a daily basis you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before doing any of these, however, you will need to know how to get data into Python. In this course, you’ll learn the many ways to import data into Python: (i) from flat files such as .txts and .csvs; (ii) from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files; (iii) from relational databases such as SQLite & PostgreSQL.

Importing Data in Python Part 2 is described as:

As a Data Scientist, on a daily basis you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before doing any of these, however, you will need to know how to get data into Python. In the prequel to this course, you have already learnt many ways to import data into Python: (i) from flat files such as .txts and .csvs; (ii) from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files; (iii) from relational databases such as SQLite & PostgreSQL. In this course, you’ll extend this knowledge base by learning to import data (i) from the web and (ii) a special and essential case of this: pulling data from Application Programming Interfaces, also known as APIs, such as the Twitter streaming API, which allows us to stream real-time tweets.

 

Uncategorized

Learning R, Jupyter Notebooks, and KNIME

Rlogo-1                          jupyterknime-86204569

 

I currently have about a month off between my last course Math for Modelers, and my next course, Statistical Analysis in Northwestern University’s Master of Science in Predictive Analytics program.  I have started to catch up on some things on my to do list.

The first of these tasks is getting to know R better.  I have a little bit of previous R experience, but at a very, very basic level.  I am currently working my way through DataCamp‘s excellent series of R programming courses.  I completed the “Introduction to R” course which consisted of 6 chapters covering the basics, vectors, matrices, factors, data frames and lists.  This was very informative and done nicely.  I am currently almost finished with the “Intermediate R” course.  This has 5 chapters covering, conditionals and flow control, loops, functions, the apply family, and utilities.  I highly recommend these courses for people starting to learn R, they are very nicely done.

I am also using Jupyter Notebooks  as I go through these courses.  I just started using these, and wish I would have found them much earlier.  I would have done my Python work in them as I went through the math for modelers course.  I just added the R kernel and have been doing the code and taking notes on R as I progress through these courses.  I wish Northwestern University would consider using these for courses in which there is programming.

I am also starting to explore KNIME.  I was first introduced to KNIME earlier this year when I visited Dr. Randall Moorman’s predictive monitoring lab at the University of Virginia.  They were using KNIME on a very elaborate project and I was very impressed with the functionality of this platform.  KNIME is an “open-source, enterprise-grade analytics platform”, that can be used to “discover the potential hidden in their data, mine for fresh insights, or predict new futures”.  I am very early in my exploration of this platform, but I am very impressed so far, and am excited to get to work on a project using this.  I will post further updates as I learn more about it.

Lastly, a few words about what I am listening to and reading.  I am currently listening to the audio version of “The Master Algorithm” by Pedro Domingos.  This is a must read book for practitioners of predictive analytics and anyone who is interested in machine learning.  I am reading the print version of “Superforecasting: The art and science of prediction” by Philip E. Tetlock and Dan Gardner.  This is an excellent read as well.  I will try to review them in more detail when I am done.