Data Scientist, Northwestern University MSDS Program, Northwestern University MSPA

Northwestern University’s Masters of Science in Predictive Analytics (MSPA) becomes the Masters of Science in Data Science (MSDS)

Starting in the Spring Quarter of 2018 the MSPA (Masters of Science in Predictive Analytics)  program became the MSDS (Masters of Science in Data Science) program.  This was announced in January of 2018 and the name change become official in the Spring Quarter of 2018.  Existing MSPA students had the options of staying in the MSPA program with it’s requirements, or transferring over to the MSDS program.  I elected to transfer to the MSDS program.  There is a webex on the MSDS program – click here for the webex.

In the webinar, Dr. Thomas Miller, the faculty director of the MSPA and now the MSDS programs, related that Northwestern University’s MSPA program started in the fall of 2011, before the term data science was a widely known or used term.  However, since then it has become mainstream, and has emerged as a discipline in it’s own right.   Therefore the decision to change the name of the program.

Data science was described by Dr. Miller as “an emerging, integrative academic discipline” encompassing Business needs (strategy, management, leadership, communication skills), Modeling (statistics, machine learning, and model building), and Information Technology (databases, etc).  Each of these is covered in the MSDS program.

Dr. Miller also commented that the main programming language moving forward would be Python.   Initially when the program was formed, SAS and SPSS were the main languages.  Python and R were brought in at a later date.   R will still be used in some courses in the Analytics and Modeling Specialization courses.   He did not make it clear whether SAS would still be an option though.

MSDS Program Overview

You need to successfully complete 12 courses.  There are core courses, elective courses, and specialization options.

Core Courses

MSDS 400 – Math for Data Scientists

MSDS 401 – Statistical Analysis

MSDS 402 – Introduction to Data Science

MSDS 420 – Database Systems and Data Preparation

MSDS 422 – Practical Machine Learning

MSDS 460 – Decision Analytics

MSDS 475 – Project Management or MSDS 480 Business Leadership and Communications

MSDS 498 – Capstone or MSDS 590 – Thesis

 

A new elective was created for students with limited programming background:

MSDS 430 – Python for Data Science

Specializations

 

Analytics and Modeling Specialization

Designed for data scientists seeking technical roles as data analysts, applied statisticians, and modelers. Courses focus on statistical inference and applications of predictive models.

Required Courses:

MSDS 410 – Regression and Multivariate Analysis

MSDS 411 – Generalized Linear Models

Plus 2 electives

 

Data Engineering Specialization

Designed for students seeking technical positions focused on designing, developing, implementing, and maintaining systems for data science.

Required Courses:

MSDS 432 – Foundations of Data Engineering

MSDS 434 – Analytics Application Development

Plus 2 electives

 

Analytics Management Specialization

Designed for students seeking technical leadership and data science management positions.

Required Courses:

MSDS 474 – Accounting and Finance for Analytics Managers

MSDS 475 – Project Management

MSDS 480 – Business Leadership and Communications

(Students in this specialization have to take both 475 and 480)

Plus 2 electives

 

*Artificial Intelligence and Deep Learning Specialization

*This has not been officially announced – this information is from comments that Dr. Thomas Miller made during  a MSDS 422 Sync session.  He said that this specialization is being developed – so take these comments as being preliminary.  I personally am really excited about this specialization, as I just finished MSDS 422 – Practical Machine Learning – and realize the growing importance of machine learning now and in the future.

Required Courses:

MSDS 453 – changing from Text Analytics to Natural Language Processing

MSDS 458 – Artificial Intelligence and Deep Learning

Plus 2 electives

These new electives are being created:

Computer Vision

Software Robotics

 

Listing of all current elective courses:

MSDS 410 – Regression Analysis

MSDS 411 – Generalized Linear Models

MSDS 413 – Times Series Analysis and Forecasting

MSDS 430 – Python for Data Science

MSDS 432 – Foundations of Data Engineering

MSDS 434 – Analytics Application Development

MSDS 436 – Analytics Systems Analysis

MSDS 450 – Marketing Analysis

MSDS 451 – Financial and Risk Analytics

MSDS 452 – Web and Network Data Science

MSDS 453 – Text Analytics – soon to become Natural Language Processing

MSDS 454 – Data Visualization

MSDS 456 – Sports Performance Analytics

MSDS 457 – Sports Management Analytics

MSDS 458 – Artificial Intelligence and Deep Learning

MSDS 459 – Information Retrieval and Real-Time Analytics

MSDS 470 – Analytics Entrepreneurship

MSDS 472 – Analytics Consulting

MSDS 474 – Accounting and Finance for Analytics Managers

MSDS 490 – Special Topics in Data Science

 

 

 

 

Machine Learning, Northwestern University MSDS Program, Northwestern University MSPA

Northwestern University MSDS (formerly MSPA) 422 – Practical Machine Learning Course Review

This course was taught by Dr. Thomas Miller, who is the faculty director of the Data Science program (formerly known as the Predictive Analytics program – I am going to post an article discussing the program name change from the Master of Science in Predictive Analytics (MSPA) to the Master of Science in Data Science (MSDS)).  Overall, this was an excellent review of machine learning, and is a required core course for all students in the program.  It is most definitely a foundational course for any student of data science in today’s world.  It is also a foundational course for the Artificial Intelligence and Deep Learning specialization, which is currently being developed (more on this in a subsequent post as well).  The course covers the following topics:

  • Supervised, Unsupervised, and Semi-supervised learning
  • Regression versus Classification
  • Decision Trees and Random Forests
  • Dimensionality Reduction techniques
  • Clustering Techniques
  • Feature Engineering
  • Artificial Neural Networks
  • Deep Neural Networks
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)

This course uses Python and the Python Libraries Scikit-Learn and TensorFlow. In addition to using Jupyter Notebooks to run my code, I also learned how to run TensorFlow from the Command Line, which is a faster way of running neural networks through a large number of epochs. The course is currently offered in R as well, but they will be discontinuing the R course, and only offering the Python/TensorFlow course starting in the fall semester.   Dr. Miller commented that they will be using Python much more extensively going forward, especially in the AI/Deep Learning specialization courses.  R apparently will still be offered in the Analytics/Modeling courses – 410 (Regression Analysis) and 411 (Generalized Linear Models).   I did learn to use Python/Scikit-Learn/TensorFlow at an intermediate level, and feel like I have a great foundation to build upon, in terms of programming.

Course Structure

There is required reading every week, mainly from the two required textbooks, although there are a few articles to read as well.  There were a total of 5 sync sessions which reviewed various topics.   I wish the sync sessions had been a little more robust, and covered the current assignments and the coding required to complete the assignments.  I found this very helpful in previous courses.  There were weekly discussion board assignments, which covered basic concepts, and turned out to be very informative, especially since a lot of the topics covered on the final exam were covered in these discussions.  There are weekly assignments which must be completed, in which you either develop the code yourself, or use a skeletal code base provided and build upon it.   These ranged from very easy to very difficult, especially as you moved into the artificial neural networks.  There was a non-proctored final exam and a proctored final exam.

Textbooks

Primary Textbooks:

Géron, A. 2017. Hands-On Machine Learning with Scikit-Learn & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol, Calif.: O’Reilly. [ISBN-13 978-1-491-96229-9] Source code available at https://github.com/ageron/handson-ml  This was the primary textbook for most of the course.  It is an excellent text with lots of great coding examples.

Müller, A. C. and Guido, S. 2017. Introduction to Machine Learning with Python: A Guide for Data Scientists. Sebastopol, Calif.: O’Reilly. [ISBN-13: 978-1449369415] Code examples at https://github.com/amueller/introduction_to_ml_with_python

Reference Textbook:

Izenman, A. J. 2008. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. New York: Springer. [ISBN-13: 978-0-387-78188-4] This was used very little.

Learning Outcomes (from syllabus):

Learning Outcomes Practical Machine Learning is a survey course with a long list of learning outcomes:

  • Explain the learning algorithm trade-offs, balancing performance within training data and robustness on unobserved test data.
  • Distinguish between supervised and unsupervised learning methods.
  • Distinguish between regression and classification problems
  • Explain bootstrap and cross-validation procedures
  • Explore and visualize data and perform basic statistical analysis
  • List alternative methods for evaluating classifiers.
  • List alternative methods for evaluating regression
  • Demonstrate the application of traditional statistical methods for classification and regression
  • Demonstrate the application of trees and random forests for classification and regression
  • Demonstrate principal components for dimension reduction.
  • Demonstrate principal components regression
  • Describe hierarchical and non-hierarchical clustering techniques
  • Describe how semi-supervised learning may be utilized in addressing classification and regression problems
  • Explain how measurement and feature engineering are relevant to modeling
  • Describe how artificial neural networks are constructed from logical connections of artificial neurons and activation functions
  • Demonstrate the use of artificial neural networks (including deep neural networks) in classification and regression
  • Describe how convolutional neural networks are constructed
  • Describe how recurrent neural networks are constructed
  • Distinguish between autoencoders and other forms of unsupervised learning
  • Describe applications of autoencoders
  • Explain how the results of machine learning can be useful to business managers
  • Transform data and research results into actionable insights

 

Weekly Assignments

Here are the weekly learning titles and assignments:

Week 1.  Introduction to Machine Learning

  • Assignment 1. Exploring and Visualizing Data

Week 2.  Supervised Learning for Classification

  • Assignment 2. Evaluating Classification Models

Week 3.  Supervised Learning for Regression

  • Assignment 3. Evaluating Regression Models

Week 4. Trees and Random Forests

  • Assignment 4. Random Forests

Week 5.  Unsupervised Learning

  • Assignment 5. Principal Components Analysis

Week 6. Neural Networks

  • Assignment 6. Neural Networks

Week 7.  Deep Learning for Computer Vision

  • Assignment 7. Deep Learning

Week 8.  Deep Learning for Natural Language Procession

  • Assignment 8 Natural Language Processing

Week 9.  Neural Networks Autoencoders

  • No assignment

 

Final Examinations

There were 2 final examinations, one being non-proctored and the other proctored.  The non-proctored exam was open book, and tested your ability to look at data and the various analytical techniques, and interpret the results of the analyses.  The proctored final exam was closed book and covered general concepts.

Final Thoughts

This was a great overview of some of the more important topics in machine learning.  I was able to get a good theoretical background in these topics, and learned the coding necessary to perform these.   This is a great foundation upon which to add more advanced and in-depth use of these techniques.  This course really challenged me to rethink what analytical techniques I should be learning and applying in the future, to the point that I am going to change my specialization to Artificial Intelligence and Deep Learning.