Data Science, Data Visualization

Data Science Skill Network Visualization

I came across this great visualization by Ferris Jumah (see link Ferris Jumah’s blog post) about the relationships between data science skills listed by “Data Scientists” on their LinkedIn profiles.

data science skill networkbor55data science skill networkTo view a higher resolution image go to:

How many of these skills have you mastered?

Ferris’s conclusions about a few key themes:

  1.  Approach data with a mathematical mindset.
  2. Use a common language to access, explore and model data.
  3. Develop strong computer science and software engineering backgrounds.





Data Science, Data Scientist

Who is Doing What/Earning What in Data Science Infographic

Are  you confused yet about the different roles/titles that people can have in the data analytics industry?   I think this might help add to your confusion.  This is a very nicely done infographic by DataCamp (  It is presented for your viewing pleasure and consideration.   Where do you fit into this categorization?  And does your compensation match your title match your responsibilities match your usefulness to your organization?



Data Science

Text Cleaning Using Python Infographic

Here is an infographic about using Python for text cleaining from the Analytics Vidya website (

Here is the link:

In addition to this information, Matt Crowson, the Python TA for my Math for Modelers course at Northwestern, suggested the following as well.

NLTK (Natural Language Tool Kit)

SciKit Learn


Becoming a Healthcare Data Scientist, Northwestern University MSPA, Predictive Analytics

Interim Review of Northwestern University’s MSPA Math for Modelers course.

Predict 400, Math for Modelers Course, Northwestern University MSPA

I am going to summarize my experience to date with Northwestern University’s Master of Science in Predictive Analytics program. I am past the halfway point (week 7 of 9) of my first trimester in this program. I am enrolled in one course, Predict 400, Math for Modelers. This is being taught by Professor Philip Goldfeder.

I will first describe the outline of how the course works. This is an asynchronous learning experience, for the most part. We have had one live session with Prof. Goldfeder. The coursework is presented through the online platform called Canvas. There are three main components to the class, which I will describe in greater detail below. The first component is learning the actual math. The second is participating in discussions about questions posed each week by Prof. Goldfeder. The third is learning Python.

What I really love about this program is how it brings together the book work, homework, learning python, and getting help for problems/questions, into one place. I have been trying to informally do this on my own, and it was frustrating for me to try and learn math/machine learning/etc. using either books or other online courses, learn python a separate way, and then have difficulty getting my questions answered. It is a 1000% easier when this is all rolled into one. Even though this is a lot more expensive than doing it on your own, to me it is worth every penny.

Professor Philip Goldfeder.  He is a great Professor for this course. He received great reviews in the CTEC (Course Teacher and Evaluation Council, these are visible when signing up for classes) and I see why. Not only is he extremely knowledgeable, he is also very engaged with the students and seems genuinely interested in making sure we learn and understand the material. He is also great at challenging the students to think of ways to apply the concepts learned to real world examples. I highly recommend him.

Canvas platform. This is where you go to do everything. It has sections for Announcements, Syllabus, Modules (describe each week’s assignments and is where you download things), Grades, People (section where everyone gets to describe themselves and you get to know your classmates), and Discussions.

The math. The introductory course, Math for Modelers is designed to be a “Review of fundamental concepts from calculus, linear algebra, and probability with a focus upon applications in statistics and predictive modeling. The topics covered will include systems of linear equations and matrices, linear programming, concepts of probability underlying both classical and Bayesian statistics, differential calculus and integration.” This is a very aggressive review of linear algebra, probability, differential calculus and integral calculus. This would be easier for someone who has taken these courses recently, but is challenging for me since it has been decades since I learned this (not sure I really learned some of this the first time around). You are assigned 1-2 chapters a week from the textbook “ Lial, Greenwell and Ritchey (2012). Student Solutions Manual for Finite Mathematics and Calculus with Applications, 9th Ed.”  Prof. Goldfeder prepares a high level video that reviews the material in the chapters. He also posts PowerPoint presentations of the material in each chapter.

Homework. You are then required to complete a homework assignment each week, which covers the material in the chapters. This is typically 20-30 questions. This is completed through the Pearson educational application. This is a FANTASTIC resource. The textbook is online here. Each chapter has it’s own section, and you can do problems in each sub-chapter of each chapter. If you struggle with the solution, you can actually have the application walk you step by step through the problem, and show you similar problems. There are links out to the textbook that takes you right to the section dealing with the problem you are working on. There are also videos available to view on the topic as well. I almost always do all of the study problems. The homework is another section in here, and that is how you submit your homework. Homework is worth 25% of your grade.

Discussions. This is a surprisingly difficult section. The NU MSPA program is designed as an applied program, and designed to use real world examples and learning. To that end, Dr. Goldfeder challenges us each week to come up with real world examples or explanations of the material we are learning. To formulate a response to this can take a surprising amount of time if you take it seriously, but in doing so I have learned a lot. The process makes you think about how these concepts could be used in the real world. You are supposed to post your discussion response by the middle of the week so that you can participate in the discussions about what you posted, as well as what your classmates posted. The kicker is that you can’t see what other students have posted, until you post your submission. I have learned a lot from these discussions. The other students in the course have such a wide background, that they can weigh on the topics in a meaningful way. We have students with backgrounds in sports analytics, actuaries, people working in industry, medicine, computer science, etc. The discussions are worth 25% of your grade.

Python. This could be extremely challenging if you have not had any exposure to Python or programming. I knew this would be a challenge, so I did take a few Python courses (Codecademy’s Python course at, How to think like a Computer Scientist at prior to enrolling in the class. However, I would label myself still a beginner in Python, and the exercises challenged me to expand my knowledge of Python. However, I personally think this is one of the most gratifying portions of the course. I really enjoy combining what we are learning with Python. We cover the basics of Python, creating graphs and plots, using NumPy and SciPy. I love this part of the course. This is done through the Enthought Canopy platform. This has the interactive editor, the package manager, and of great value, the “Training on demand”, which is a very comprehensive series of instructional videos. These cover basic and advanced functionalities. Well worth the money, just for access to these videos. There is no grade each week for the Python assignments, however, you need to keep up with these. There were questions on the midterm that specifically required the use of Python to analyze the question and display the results. We have a Python TA assigned to the class who is very responsive to questions. In addition, students post code and help provide input on any questions.

Tests. The midterm is worth 25% of the grade as is the final examination. The midterm was a take home test, and required a substantial investment of time to complete. In addition there was the regular homework/reading for that week, although the discussion that week was optional. This is a week when you would want to cut yourself some slack and allow extra time. I had a heavy work week that week, and regretted not thinking about this ahead of time to give myself a lighter work schedule.

Time requirement. I am finding that I am devoting 20-30 hours per week to do all of this. You could devote less time if you were more up to date on the math or Python. But remember, I am doing this to learn and retain the information. So I am doing all of the reading in the textbook, doing all of the example problems and “your turn problems”, and almost all of the chapter problems in the Pearson application. I have not had time to do all the problems in back of the textbook however. I also try to provide meaningful input into the discussions, both in my submission, and commenting on what other students have posted. I have also been trying to continue to dive deeper into learning Python.

Typical week. I usually try to do the textbook reading on Monday and Tuesday. (All of the assignments are due midnight Sunday night, so Monday starts a new week). I don’t do a lot of problems initially as I want to get through the reading, so I can apply it to my discussion. Then on Wednesday I like to start working on my discussion submission and try to get it in by Wednesday, or Thursday at the latest. That way I can participate in the discussions in a meaningful way. After I get my discussion submitted, I go back and work through the chapter problems in Pearson. I like to get to the homework section on Saturday. Ideally I like to have Sunday to do the Python reading and assignments.

My overall assessment of this course is that I am extremely satisfied. I think this is very professionally done, I am learning the math, I am being challenged to think about applying this to the real world, and I am learning Python. There is definitely a lot going on, but that is why I signed up for this. I feel as if I am getting my money’s worth.