Northwestern University MSPA, Predictive Analytics

Northwestern University MSPA Predict 410, Regression and Multivariate Analysis Course Review

This was the most demanding of Northwestern University’s MSPA (Master of Science in Predictive Analytics) courses I have taken so far, and also the most rewarding.   This course is the backbone of the predictive analytics program and foundational to becoming a predictive modeler.  The course covers Linear Regression (Simple Linear Regression and Multiple Linear Regression) and Multivariate Analysis (Principal Component Analysis, Factor Analysis, and Cluster Analysis).

This course is an applied course, so you have to understand the mathematics, but don’t have to do in-depth calculations using matrix algebra (it could be much, much worse!).  This is in keeping with the philosophy that the MSPA program is an applied program, preparing students to go out and start working in this field.

I took this course from Professor Syamala Srinivasan, Ph.D.   She is at the top of the list of Northwestern Professors, who already are very high quality.  I would highly recommend her if you are considering this class.  She has gone above and beyond, with her textbook lectures, additional lectures on topics of interest, SAS tutorials, SAS demo’s for each week’s assignments, sync sessions, response to questions both on the discussion boards and by e-mail.  Her level of involvement in creating the course work and in teaching the class are phenomenal.  I can’t say enough good things about her.

The course structure is as follows.  Every week there is required reading from a variety of textbooks and articles from the library.  There are PowerPoint lectures with audio for each textbook chapter.  In addition there are usually several other special topic PowerPoint/audio lectures.  A recorded video session then goes over the assignment for the week, and the SAS code used for the assignment.   Participation in the week’s discussion board is mandatory and extremely helpful.  The assignments build upon each other and get more complex.  There are intermittent quizzes.  The final exam is a two-part, one a take home exam, the second an online-proctored exam.

Now for the particulars.   This course isn’t for those who are time challenged already.  I would not recommend taking a second course with this one, unless you have a lot of spare time.  I spent a good 20-30 hours per week on this course, and wished I actually had more time to devote to it.  I read almost every mandatory reading assignment and optional reading assignment, so you could cut corners and devote less time, but I would worry about not learning the content of this foundational material.


Regression Textbooks Required

  1. Montgomery, D.C., Peck, E.A., and Vining, G.G. (2012). Introduction to Linear Regression Analysis. (5th Edition). New York, NY: Wiley [ISBN-13: 978-0470542811]

This textbook is the main one used throughout the course.  It has sections that are difficult to get through, but the foundational material is there.

2.  Everitt, B. (2009). Multivariable Modeling and Multivariate Analysis for the Behavioral Sciences. Boca Raton, FL: CRC Press [ISBN-13: 978-1439807699]

This is a good supplement to the Montgomery textbook, with coding examples – both in the book and on-line – using R.

Regression Textbooks Optional

  1.  Pardoe, I. (2012). Applied Regression Modeling. (2nd Edition). New York, NY: Wiley [ISBN-13: 978-1118097281]

This was my favorite textbook, and is definitely more understandable and is written from an applied standpoint.

2.  Ryan, A. G. Montgomery, D.C., Peck, E.A., and Vining, G.G. (2013). Solutions Manual to Introduction to Linear Regression Analysis. New York, NY: Wiley [ISBN-13: 978-1118471463]  To be honest, I didn’t use this a lot.

3.  Sheather, S. (2009). A Modern Approach to Regression with R. Springer [ISBN-13 978-1441918727]  I didn’t use this at all, but it will be handy when working through real world problems using R later.


SAS Textbooks

  1. Cody, R. (2011). SAS Statistics By Example. Cary, N.C.: SAS Publishing. [ISBN-13 978-1607648000]
  2. Delwiche, L., and Slaughter, S. (2012). The Little SAS Book: A Primer. (5th Edition). Cary, NC: SAS Publishing. [ISBN-13: 978-1612903439]

I used both of these books as references a fair amount.

In addition there were quite a few reference articles in the library.  Some of these were very good, some were very detailed.


This course uses SAS for all analysis and visualization.  You could use R, but the course is built around SAS.  I will say I came into the course with a bias against SAS (from ignorance mainly – but also due to the cost of the license for this, and a move away from these closed systems to open systems like Python and R.  I am a huge Python proponent.)  However, I have come to like SAS for how easy it was to learn, and how easy it is to do data analysis and visualization.

An imperative is start learning SAS before the course starts.  You will get an email and syllabus from Dr. Srinivasan early listing what you need to study.  There are SAS tutorials and readings.  I also did the learning within SAS.  I completed the on-line SAS Programming 1: Essentials e-Course, which was very helpful.  There are also multiple additional free courses that you can take.

You can use SAS through the SSCC – Social Sciences Computing Cluster (no additional charge), through the web based SAS Studio (no additional charge), or you can purchase a license.  I exclusively used SAS Studio and had no problems.


The Learning Goals of the course are:

  • Develop statistical modeling as a three step process consisting of: (1) exploratory data analysis, (2) model identification, and (3) model validation.
  • Understand how to use automated variable selection as a tool for model identification and as a tool for exploratory data analysis in the presence of a large number of predictor variables or a set of unlabeled predictors.
  • Develop a working understanding of the conceptual (theoretical) foundations of linear regression, principal components analysis, factor analysis, and cluster analysis with the objective of being capable of applying these techniques appropriately and validating their results.
  • Develop a conceptual and practical understanding of the difference between statistical inference and predictive modeling and how it affects our choices and actions in the statistical modeling process.
  • Learn the basics of the SAS Data Step, data manipulation with SAS, and SAS procedures (PROCS) for fitting statistical models.

Weekly Reading and Video assignments

Each week there are required textbook readings, optional textbook readings, course reserve readings, and lecture videos.  The weekly videos are PowerPoint presentations with audio, and go over the textbook readings for that week.  In addition, there are other lectures on special topics.

The special topics include:

Statistical Preliminaries and Notation

Statistical Assumptions for OLS Regression

Estimation and Inference for OLS Regression

Analysis of Variance and Related topics in OLS Regression

Hat Matrix Lecture

Statistical Inference vs Predictive Modeling in OLS Regression

Special Topic: Dummy Variables Hypothesis Testing

Special Topic Lecture (Degrees of Freedom)

Special Topic Lecture (Likelihood Function)

Special Topic Lecture (Mallow’s Cp)

Hypothesis Testing Multiple Linear Regression

Factor Analysis Example Lecture


Sync Sessions

There are a total of 4 Sync sessions.  These are invaluable as Dr. Srinivasan reviews the recent material, but then puts it all into a larger context.


There are a total of 8 assignments.  These are a combination of using SAS to do analysis and visualization, as well as having to provide an analysis of the produced outcomes.  The code is pretty much already written.  You will have to make a few modifications, but the focus is on using SAS, and the assignments are designed to test your ability to perform regression and multivariate analysis, not struggle producing code from scratch.  This was a very nice feature.  Each week there is a SAS demo video lecture where the Professor runs through the code and the assignment – extremely helpful.

Here are the titles of the assignments:

Assignment 1: Getting to know your data.

Assignment 2: Regression model building.

Assignment 3: Data analysis and regression.

Assignment 4: Statistical inference in linear regression.

Assignment 5: Automated variable selection, multicollinearity, and predictive modeling.

Assignment 6: Principal components in predictive modeling.

Assignment 7: Factor analysis

Assignment 8: Cluster analysis.


Discussion Boards

These were extremely robust.  You have to answer 3 questions posed by Professor Srinivasan, and then actively engage in discussions around what other people posted.  The questions were relevant and helped the learning process.  The discussions were robust and enhanced the learning.

Follow up by Dr. Srinivasan

Each week she would send out several e-mails – on how the assignments went, to clarify issues presented in the discussion boards, and to follow up on quizzes.  These were very helpful.

Quizzes and Tests

There were a total of 5 open book quizzes.  These were very doable,  but somewhat demanding.

There were 2 final examinations – a take home exam (1 hour) and a proctored 2 hour exam.  These were challenging but doable.

Final Thoughts

This has been the highlight of the MSPA’s courses so far, as this course is the foundation for building predictive models.  The other courses I have taken were a lead up to this course.  Dr. Srinivasan has gone above and beyond and delivers a high quality product.  My favorite course and Professor so far.



Northwestern University MSPA, Predictive Analytics

Northwestern University’s MSPA (Master of Science in Predictive Analytics) Program review by a recent student graduate.

I ran across this blog posting today by a student who finished the MSPA program.  My Thoughts on Northwestern University’s MSPA is written by Bhaskar Karambelkar, a student who graduated this summer.  He provides a comprehensive overview of the program, and rates each course on Course Content, Professor Engagement, Overall Value to the Program, and Overall Value to Me.  This is well written, and worth a read by anyone considering this program.

This prompted me to look for other bloggers who are in the course or who have finished the course.  I ran across a few who posted once, but did not post any follow up.  If anyone knows of any other active bloggers, please let me know.

The official Northwestern University MSPA site is:

There are two Linked In groups that may interest you as well.

The “Northwestern University MS Predictive Analytics” group is “for current students and alumni of the Northwestern MSPA Program”.  There are useful articles posted, and questions posed to the group about which professors to take for the courses, sharing of syllabus, etc. It is very useful to browse when considering which class/professor to take.  There are 2,097 members currently.

The “Networking Group for Northwestern University’s MS in Predictive Analytics Program” is “an open group to allow student’s in Northwestern University’s MS in Predictive Analytics Program to network with each other. The group is open to others, including recruiters, who may be interested in networking with us.  The advantage of having a networking group are three fold. First, it will enable us to have a common communication point without have to be linked directly to each other. Second, it will enable us to have a lasting connection to current, future, and past students. And third, it will enable us to be easily found by recruiters.  Please note that this is not an “alumni” group and that this group has no official affiliation with Northwestern University.”  There are 3,455 members currently, and the content is pretty similar to the other Linked In group.




Becoming a Healthcare Data Scientist, Data Science, Data Scientist, Data Visualization, Northwestern University MSPA, Predictive Analytics

Northwestern University MSPA 402, Intro to Predictive Analytics Review

Summing this course up in one word = WOW.  This course should be taken early on because it is extremely motivating, and will help motivate  you to get through the other beginning courses such as Math for Modelers and Stats.  This course is a high level overview of why and how analytics should be performed.  It describes not only predictive analytics but the whole analytics spectrum and what it means to be an “analytical competitor”.  While you do not perform any actual analytics, you will understand why getting good at this is so important.

I took this course from Dr. Gordon Swartz, and highly recommend him.  Interestingly, he has bachelor degrees in nuclear engineering and political science from MIT, an  MBA from Northeastern University and a doctorate in business administration from Harvard.  His sync sessions were very informative and practical, and he provided on-going commentary in the discussion boards.

The course description is –  “This course introduces the field of predictive analytics, which combines business strategy, information technology, and modeling methods. The course reviews the benefits and opportunities of data science, organizational and implementation issues, ethical, regulatory, and compliance issues. It discusses business problems and solutions regarding traditional and contemporary data management systems and the selection of appropriate tools for data collection and analysis. It reviews approaches to business research, sampling, and survey design.”

The course is structured around required textbook reading, assigned articles, assigned videos, weekly discussions, one movie (Moneyball) and 4 projects.


The reading requirements are daunting, but doable.  You will (should) read 6 books in 10 weeks – a total of 1,590 pages.  There are 14 articles to read.  Each week has a short video as well.

These are the assigned books.  At first glance, this list will not seem to be a little odd with seemingly unrelated books.  However, they all help create the overall picture of analytics, and are all valuable.  I will provide just a brief overview of each, and plan to post more in-depth reviews of them later this summer.

Davenport TH, Harris JG.  2007. Competing on Analytics:  The New Science of Winning.  Boston Massachusetts: Harvard Business School Publishing.

This is the first text you read, for good reason.  It provides the backbone for the course.  You will learn about what it means to be an analytical competitor, how to evaluate an organizations analytical maturity, and then how to build an analytical capability.

Siegel E.  2013.  Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This is a must read for anyone going into predictive analytics, by one of the pioneers of this field.  It describes in detail what predictive analytics is, and gives numerous real life examples of organizations using these predictive models.

Few S.  2013.  Information Dashboard Design: Displaying data for at-a-glance monitoring.  Burlingame California: Analytics Press.

I will admit that when I first got this book I was very confused about why it was being included in a course on predictive analytics.  However, this turned out to be one of the best reads of the course.  For anyone who is in analytics and has to display information, especially in a dashboard format,  this is a must read.  This describes what dashboards are really for, and the science behind creating effective dashboards.  You will never look at a dashboard the same way in the future, and you will be critical of most commercially developed dashboards, as they are more about displaying flashiness and fancy bells and whistles rather than the functional display of pertinent data in the most effective format.  I can’t say enough good things about this book, a classic.

Laursen GHN, Thorlund J.  2010.  Business Analytics for Managers: Taking Business Intelligence Beyond Reporting.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This is a great overview of business analytics.  This is especially valuable in it’s explanation of how the analytics needs to support the strategy of the organization.

Franks B.  2012.  Taming the Big Data Tidal Wave: Finding opportunities in huge data streams with advanced analytics.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This was an  optional read, but I recommend reading it.  It is written in a very understandable way, and provides a great overview of the big data analytics ecosystem.

Groves RM, Fowler FJ, Couper MP, Lepkowski JM, Singer E, Tourangeau R.  2009.  Survey Methodology.  Hoboken New Jersey: John H Wiley and Sons, Inc.

I will admit this was my least favorite book, but having said that, I learned a ton from it.  For anyone who will even think about using survey’s to collect data, this is a must read. However the 419 pages make this a chore.  It would be nice to have an abridged version.  What it does, though, is wake  you up to how complex the process of creating, deploying, and analyzing surveys is.  I grudgingly admit this was a valuable read.


There are some really great articles included in the reading list.


There are videos that were developed by another professor that review the weeks material.  I did not find these especially helpful, but they did provide an overview of the weeks information, and might be  helpful if you are having some trouble understanding the material.

Weekly Discussions

Again, the weekly discussion are where it happens.  There are one or more topics that are posted.  There are usually some really great comments posted, and you can gain a lot of insight if you actually think about what you are posting, and what other people have posted.  If you post on the last day a brief paragraph, then you are missing out on some valuable information.


The first course I have taken where a movie was required.  There are discussions around this movie and one of the assignments involves creating an analysis of the Oakland A’s and how they used analytics.  I enjoyed the movie and thinking about this.


There are four assignments where you must create a paper of varying lengths.  You must create this using the appropriate APA format, so it is useful for refining these skills.

I found these to be challenging, fun, motivating, and extremely enlightening.  These called for the application of what we learned to some real world situations.  For one of these, I performed an in-depth analysis of our organizations analytics which involved interviewing our senior leadership.  As a result of these interviews, it really started the process of moving our organization to the next analytical maturity level in a very meaningful way.

Another project involved the creation of a draft dashboard using the best practices outlined by Stephen Few in his text.  This was a great learning experience for me, and one that will translate into much better dashboards at our organization.

The last project involved creating a meaningful and valid survey.  This was informative as well, and I actually might send out my survey.


Overall, this was a fantastic course.  This will make it clear why we need to do this well, and what doing this well looks like.  After this, the actual work of understanding and developing predictive models begins.  Again, I feel as if got my money’s worth (not an easy thing to say since these courses are pricey!).

Summer Activities

I am taking the summer off and am trying to catch up on the projects that have been piling up.  For fun I am learning SQL (great book – Head First SQL by Lynn Beighley) and working my way through several Python Udemy courses.  I will be attending the SciPy 2016 Conference in Austin Texas in July as well, and am super excited about this. I will be going to tutorials on Network Science, Data Science is software, Time Series analysis and Pandas. If you are attending, give me a shout out.










Data Science, Northwestern University MSPA, Predictive Analytics

Northwestern University MSPA 401, Introduction to Statistics Review

I finished this course last week, and thought I would post my thoughts before I forget them.

I was in Professor Roy Sanford’s section, and I HIGHLY recommend him.  He is an extremely experienced practitioner, and very knowledgeable of statistics and in using R for statistical analysis.

The course is focused on several aspects – learning basic statistics, learning R to perform statistical analysis, and engaging the students to participate in discussions that are pertinent to the material being learned.

Learning Statistics

The core text for the course is Ken Black’s Business Statistics For Contemporary Decision Making, 8th Edition.  It is a loose leaf binder text so you can remove the sections you are studying, which makes it nice.  It is a very down to earth text, with plenty of examples and problems.  Their is a companion website called WileyPlus that has videos to watch and a variety of problems/exercises.

A second supplemental statistical text is Rand R. Wilcox’s Basic Statistics: Understanding Conventional Methods and Modern Insights.  There are selected readings which highlight some contemporary issues.  Not as easy to read as Black’s text, but still informative.

Learning R

The coursework is presented using R.  You don’t HAVE to learn to use R, but you would be an idiot not to take advantage of this opportunity.  There is a great deal of effort putting into devising the curriculum to help you learn R.   This is well thought out, and I feel very confident that I have obtained a good working knowledge of R on which to build.  I was astounded to read a comment on the LinkedIn group – Networking Group for Northwestern University’s MS in Predictive Analytics Program –  from a previous student who took this course, who commented he didn’t really learn any R because he didn’t do any of the R reading or assignments.  To me, learning R was just as important as learning the statistics.  Plus I don’t know how you could do the Data Analysis Projects without learning R. Learning R is accomplished through reading various text’s, watching weekly video’s on R produced by Prof. Sanford, and then doing exercises.  Plus there are R resources and lessons, including links to

I did the work in both RStudio and in a Jupyter Notebook using the R kernel. The Jupyter Notebook was my favorite way of doing the assignments because I could refer back to them.  But some things are way easier to do in RStudio, like installing packages and data sets, so sometimes I switched between the two.  See my other blog posts for information about Jupyter Notebooks.

The first R text is Winston Chang’s R Graphics Cookbook.  This takes you through the R basics and gets you up to speed quickly visualizing data.  There is a little bit about using the base plotting function in R, but most of the book is about visualizing using the ggplot2 package.  If you follow the exercises, you will get good at plotting and visualizing data.  You will learn scatter plots, line graphs, bar graphs, histograms, box plots (a lot – I finally understand what to do with a box plot), functions, QQ plots (I finally understand these as well).  All of these are extremely helpful in what you will spend a lot of time learning, Exploratory Data Analysis (EDA).

The second R text is Jared P. Lander’s R for Everyone: Advanced Analytics and Graphics.  This dives more deeply into using R for things other than data visualization and graphics, although it includes this as well.  This is a very easy to read and follow text.

The third R text is John Verzani’s Using R for Introductory Statistics: 2nd Edition.  This book is a very deep dive into R’s capability to do statistical analysis.  Although very detailed, it is understandable with great examples.

The last R text is downloadable from the site, Sarah Stowell’s Using R for Statistics.  This is also a very practical book on both statistics and visualization.

Don’t be overwhelmed by the number of text’s and reading, it is doable, and I would do it all.  If you do that, you will not be able to say you did not get your money’s worth.

In addition there are beginning videos and lessons about learning R, including links to   There are weekly Calculations with R assignment, which include a video with examples.  There are exercises with these weekly assignments as well.  Finally there are R lessons which take you through learning R in an organized manner.

Sync Sessions and Videos

Professor Sanford holds a sync session every other week.  These are extremely informative and helpful.  You don’t have to watch live, but you need to watch later.  The sync sessions in Predict 400 were optional and you could get by fine without watching them.  Not the case here.  You will learn a lot from these.

The same holds for the videos he has created to go along with the weekly R exercises.  These are must watch videos.

Data Analysis Projects

There are two data analysis projects.  You will learn how to apply what you are learning to a hypothetical data analysis project.  These are pretty challenging, but VERY worthwhile.  These show the applied focus of the MSPA program, and I found them beneficial.  The first one really focused on doing some exploratory data analysis.  The second one was twice as long as the first, and you applied what you learned later in the course, including the creation of a linear regression model.  You will definitely want to start early on these, and put in the effort to do these correctly, as together they constitute 2/5’s of your grade.

Bi-weekly Tests

There are 4 bi-weekly tests which are very fair and doable.  Together they constitute 1/5 of your grade.

Final Exam

The final exam is also very fair and doable.  Much easier if you have paid attention to learning R, as you can use R to do the exam.  This is 1/5 of your grade.

Communications and Discussions

There are Communications discussion sections set up for statistics and R.  You can post a question anytime in either and get a rapid response from either Prof. Sandford or the R TA.  Our R TA was Todd Peterson, and he was extremely knowledgeable, helpful, and responsive.

Every week there are two discussions around topics you are learning.  These are student driven, and if taken seriously, you can learn a lot from each other.  There are some extremely bright and talented students in these classes who have great real world experience in a variety of sectors.   The final discussion section is a recap of what you learned that week, and Prof. Sanford participates in that discussion.


I spent between 20-30 hours per week doing the coursework.  You wouldn’t have to spend that much time, especially if this material is not new for you.  But I wanted to really learn the material, not just pass the class.

I really enjoyed this course on many fronts.  I found learning about statistics and R together was very complementary.  In fact, I cannot imagine doing any kind of statistical analysis without using a language such as R.  I am now trying to recreate what I learned in R using Python.  I really feel as if I got my money’s worth.



Becoming a Healthcare Data Scientist, Northwestern University MSPA, Predictive Analytics

Interim Review of Northwestern University’s MSPA Math for Modelers course.

Predict 400, Math for Modelers Course, Northwestern University MSPA

I am going to summarize my experience to date with Northwestern University’s Master of Science in Predictive Analytics program. I am past the halfway point (week 7 of 9) of my first trimester in this program. I am enrolled in one course, Predict 400, Math for Modelers. This is being taught by Professor Philip Goldfeder.

I will first describe the outline of how the course works. This is an asynchronous learning experience, for the most part. We have had one live session with Prof. Goldfeder. The coursework is presented through the online platform called Canvas. There are three main components to the class, which I will describe in greater detail below. The first component is learning the actual math. The second is participating in discussions about questions posed each week by Prof. Goldfeder. The third is learning Python.

What I really love about this program is how it brings together the book work, homework, learning python, and getting help for problems/questions, into one place. I have been trying to informally do this on my own, and it was frustrating for me to try and learn math/machine learning/etc. using either books or other online courses, learn python a separate way, and then have difficulty getting my questions answered. It is a 1000% easier when this is all rolled into one. Even though this is a lot more expensive than doing it on your own, to me it is worth every penny.

Professor Philip Goldfeder.  He is a great Professor for this course. He received great reviews in the CTEC (Course Teacher and Evaluation Council, these are visible when signing up for classes) and I see why. Not only is he extremely knowledgeable, he is also very engaged with the students and seems genuinely interested in making sure we learn and understand the material. He is also great at challenging the students to think of ways to apply the concepts learned to real world examples. I highly recommend him.

Canvas platform. This is where you go to do everything. It has sections for Announcements, Syllabus, Modules (describe each week’s assignments and is where you download things), Grades, People (section where everyone gets to describe themselves and you get to know your classmates), and Discussions.

The math. The introductory course, Math for Modelers is designed to be a “Review of fundamental concepts from calculus, linear algebra, and probability with a focus upon applications in statistics and predictive modeling. The topics covered will include systems of linear equations and matrices, linear programming, concepts of probability underlying both classical and Bayesian statistics, differential calculus and integration.” This is a very aggressive review of linear algebra, probability, differential calculus and integral calculus. This would be easier for someone who has taken these courses recently, but is challenging for me since it has been decades since I learned this (not sure I really learned some of this the first time around). You are assigned 1-2 chapters a week from the textbook “ Lial, Greenwell and Ritchey (2012). Student Solutions Manual for Finite Mathematics and Calculus with Applications, 9th Ed.”  Prof. Goldfeder prepares a high level video that reviews the material in the chapters. He also posts PowerPoint presentations of the material in each chapter.

Homework. You are then required to complete a homework assignment each week, which covers the material in the chapters. This is typically 20-30 questions. This is completed through the Pearson educational application. This is a FANTASTIC resource. The textbook is online here. Each chapter has it’s own section, and you can do problems in each sub-chapter of each chapter. If you struggle with the solution, you can actually have the application walk you step by step through the problem, and show you similar problems. There are links out to the textbook that takes you right to the section dealing with the problem you are working on. There are also videos available to view on the topic as well. I almost always do all of the study problems. The homework is another section in here, and that is how you submit your homework. Homework is worth 25% of your grade.

Discussions. This is a surprisingly difficult section. The NU MSPA program is designed as an applied program, and designed to use real world examples and learning. To that end, Dr. Goldfeder challenges us each week to come up with real world examples or explanations of the material we are learning. To formulate a response to this can take a surprising amount of time if you take it seriously, but in doing so I have learned a lot. The process makes you think about how these concepts could be used in the real world. You are supposed to post your discussion response by the middle of the week so that you can participate in the discussions about what you posted, as well as what your classmates posted. The kicker is that you can’t see what other students have posted, until you post your submission. I have learned a lot from these discussions. The other students in the course have such a wide background, that they can weigh on the topics in a meaningful way. We have students with backgrounds in sports analytics, actuaries, people working in industry, medicine, computer science, etc. The discussions are worth 25% of your grade.

Python. This could be extremely challenging if you have not had any exposure to Python or programming. I knew this would be a challenge, so I did take a few Python courses (Codecademy’s Python course at, How to think like a Computer Scientist at prior to enrolling in the class. However, I would label myself still a beginner in Python, and the exercises challenged me to expand my knowledge of Python. However, I personally think this is one of the most gratifying portions of the course. I really enjoy combining what we are learning with Python. We cover the basics of Python, creating graphs and plots, using NumPy and SciPy. I love this part of the course. This is done through the Enthought Canopy platform. This has the interactive editor, the package manager, and of great value, the “Training on demand”, which is a very comprehensive series of instructional videos. These cover basic and advanced functionalities. Well worth the money, just for access to these videos. There is no grade each week for the Python assignments, however, you need to keep up with these. There were questions on the midterm that specifically required the use of Python to analyze the question and display the results. We have a Python TA assigned to the class who is very responsive to questions. In addition, students post code and help provide input on any questions.

Tests. The midterm is worth 25% of the grade as is the final examination. The midterm was a take home test, and required a substantial investment of time to complete. In addition there was the regular homework/reading for that week, although the discussion that week was optional. This is a week when you would want to cut yourself some slack and allow extra time. I had a heavy work week that week, and regretted not thinking about this ahead of time to give myself a lighter work schedule.

Time requirement. I am finding that I am devoting 20-30 hours per week to do all of this. You could devote less time if you were more up to date on the math or Python. But remember, I am doing this to learn and retain the information. So I am doing all of the reading in the textbook, doing all of the example problems and “your turn problems”, and almost all of the chapter problems in the Pearson application. I have not had time to do all the problems in back of the textbook however. I also try to provide meaningful input into the discussions, both in my submission, and commenting on what other students have posted. I have also been trying to continue to dive deeper into learning Python.

Typical week. I usually try to do the textbook reading on Monday and Tuesday. (All of the assignments are due midnight Sunday night, so Monday starts a new week). I don’t do a lot of problems initially as I want to get through the reading, so I can apply it to my discussion. Then on Wednesday I like to start working on my discussion submission and try to get it in by Wednesday, or Thursday at the latest. That way I can participate in the discussions in a meaningful way. After I get my discussion submitted, I go back and work through the chapter problems in Pearson. I like to get to the homework section on Saturday. Ideally I like to have Sunday to do the Python reading and assignments.

My overall assessment of this course is that I am extremely satisfied. I think this is very professionally done, I am learning the math, I am being challenged to think about applying this to the real world, and I am learning Python. There is definitely a lot going on, but that is why I signed up for this. I feel as if I am getting my money’s worth.

Becoming a Healthcare Data Scientist, Predictive Analytics

It’s official now, I have been accepted into Northwestern University’s Master of Science in Predictive Analytics Program

Just a brief update as I attempt to record significant events/thoughts on my journey to become a data scientist.

I just received my official confirmation that I have been accepted into Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I am due to start in the fall.  Here is the link if you are unfamiliar with the program.


I selected this program for many reasons. First, it is oriented towards what I really want to do – predictive analytics.  This should give me a solid foundational background upon which I can build the specific skills and specialization I need to develop smarter bedside monitoring systems and help predict patient outcomes.

Second, this is an online curriculum.  At this point in my life I am not easily able to move or change jobs.  I am a practicing Emergency Physician.  I am a Chief Medical Information Officer for my health system.  And I am the Medical Director for our flight service.  I really enjoy all of these pursuits, and am not willing to give this up to move and go back to school full time.  Nor could I with my family obligations.   So the online option works best for me.

Third, Northwestern has a great reputation in this field.  I did my due diligence and researched the program, and the feedback I received was overwhelmingly positive.  As an aside, I did most of my undergraduate degree at Northwestern in the 1980’s (I did not graduate, but would love to finally have a degree from them, so this will hopefully allow me to do this!).

So the easy part is over.  I have been accepted.  Now I will have to do the hard work of learning all of the new material and keep up with the coursework, while still performing my regular day jobs.