Becoming a Healthcare Data Scientist, Data Scientist, Healthcare Predictive Analytics, Northwestern University MSPA

Physician Data Scientist Part II. The Why.

I was recently reminded by a reader of my blog (thanks Al) that I had not followed up on a comment that I was going to post a second part to a blog that was posted on 7.7.2015 – “Physician Data Scientists – Why and What Type? Part I“.  Now that I am in between classes, I have the time to work on this.   Looking back at this original post, I am somewhat amazed at all that has happened in the last 1 1/2 years.

I am currently the interim Chief Information Officer (CIO) and Chief Medical Information Officer (CMIO) for our integrated healthcare system.   I stepped into the interim CIO role (helped in part by my Northwestern University MSPA Master of Science in Predictive Analytics coursework) after the departure of our previous CIO last year.  Prior to that I had been one of our systems CMIO’s – facilitating and communicating the needs for technology to help improve clinical outcomes to IT, while communicating back to Physicians and Leadership the limitations of current technologies.  I never really aspired to become either the interim CIO or a CMIO, these opportunities simply arose because of my journey to become better educated about the use of data and analytics to improve clinical outcomes – ie to become a Physician Data Scientist.  I will explain how I ended up in my current role.

My interest in data and analytics is a fairly recent phenomenon, occurring because of a chance meeting with someone who has since become one of my closest friends – Curt Lindberg – who has a PhD in Complexity Science, and is the Director of our Complexity in Healthcare Center.  I met him during a project to improve our process for getting patients into our healthcare system from outside facilities more efficiently.  At that time I was a practicing Emergency Physician and the Medical Director of our MedFlight Air Ambulance service.  Curt introduced me to complexity science and my life has not been the same – it was a transformational career moment for me.  I ended up being part of a small group of researchers who were trying to develop smarter patient monitoring systems.  Their work has inspired me to try and contribute in my own way to this field – called predictive monitoring.

Predictive monitoring is an unofficial term for what this group is trying to accomplish.  While the technology inside the monitors has changed drastically since the 1970’s, what the monitors do has not.  These monitors display certain physiologic markers of interest – blood pressure, pulse rate, temperature, oxygen level, ekg pattern, etc.  You can see what is happening to the patient right at that time, or you can go back and review what happened to them in the past (minimally), but there is no information about predicting what will happen to them in the future (are they predicted to get better, go into sudden cardiac arrest, stop breathing, or develop an overwhelming infection called sepsis, etc).  The goal is to incorporate predictive algorithms into these monitoring systems.

I have been fortunate to meet some giants in this field.  Dr. J. Randall Moorman  from the University of Virginia, who developed the first commercial predictive monitoring system – the HeRO monitor.  The largest ever randomized clinical trial in neonatal patients (premature babies) was conducted using this monitor.  It showed that the monitor was able to identify certain physiological patterns, and translate those patterns into a risk for developing an overwhelming infection (late onset neonatal sepsis).  This risk was detected an average of 18 hours before a clinical diagnosis was made, allowing for earlier treatments and interventions.  This translated into a 22% reduction in mortality.  Dr. Andrew Seely  is a Thoracic Surgeon at the University of Ottawa who has developed a model to predict the success of removing a breathing tube from a patient and not have to replace it because they weren’t ready to have it removed.   We got to participate in that clinical trial.  We also got to participate in a trial conducted by Ryan Arnold, now at Christiana Care in Newark Delaware, on trying to predict clinical outcomes using heart rate variability analyses.

In addition to collaborating with these researchers working on their projects, I became especially fascinated with a research article written by one of the countries leading trauma surgeons, Dr. Mitchell Cohen and his colleagues at San Francisco General Hospital and the University of California San Francisco – Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.  I will confess that I felt frustrated when I talked with the researchers about the underlying mathematical concepts and analytical techniques they were using, because I just did not understand them well.  This ignorance ignited what I will freely admit is now an obsession to understand these concepts and techniques.

I started off trying to educate myself using text books, taking on-line MOOC’s – Massive Online Open Courses, and enrolling in courses offered on the web.  I still felt very frustrated because these courses didn’t really go into the depth that I thought I needed.  When I look at the giants in this field of predictive analytics, these few researchers seemed to have both the clinical knowledge and understanding of why this research was so important, and they were also able to understand the mathematical and analytical concepts and techniques necessary to do research in this field.  I wanted to be like them.

I became very interested in becoming a data scientist at that point.  I eventually enrolled in Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I have not regretted this decision.  I currently am halfway through the program, and am finally into the especially relevant coursework.  I just finished the major foundational course – Linear Regression and Multivariate Analysis.  The courses up until then had been preparing me to take this course.   I realized I had come full circle when I re-read Mitchell Cohen’s article, and realized that I now finally understood the concepts and results.  That was an extremely satisfying moment for me.

This has been quite the educational journey for me.   I feel like I have a much better understanding of statistics. I am getting somewhat competent in a few programming languages – R, Python, and SAS.  I am using Jupyter Notebooks for my programming work.   I have dabbled with data science platforms like KNIME, and this quarter will be learning to use virtual machines, IBM Watson Analytics, ANGOSS, and Microsoft Azure machine learning – as part of my next class on Generalized Linear Models.

I finally feel as if I am able to start applying what I have been learning for the last 1 1/2 years – to start developing predictive models to improve clinical outcomes.  A few of my goals are to help our organization become more data driven, and to continue to work on developing predictive algorithms that could be incorporated into beside monitoring systems, further improving the outcomes of patients.

This is my journey to date from becoming a practicing Emergency Physician with no interest in data or analytics, to where I am now, halfway finished with my Master’s program.  The real journey of applying what I have learned to real world problems has just started but will get more robust as I learn more.





Becoming a Healthcare Data Scientist, Data Science, Data Scientist, Data Visualization, Northwestern University MSPA, Predictive Analytics

Northwestern University MSPA 402, Intro to Predictive Analytics Review

Summing this course up in one word = WOW.  This course should be taken early on because it is extremely motivating, and will help motivate  you to get through the other beginning courses such as Math for Modelers and Stats.  This course is a high level overview of why and how analytics should be performed.  It describes not only predictive analytics but the whole analytics spectrum and what it means to be an “analytical competitor”.  While you do not perform any actual analytics, you will understand why getting good at this is so important.

I took this course from Dr. Gordon Swartz, and highly recommend him.  Interestingly, he has bachelor degrees in nuclear engineering and political science from MIT, an  MBA from Northeastern University and a doctorate in business administration from Harvard.  His sync sessions were very informative and practical, and he provided on-going commentary in the discussion boards.

The course description is –  “This course introduces the field of predictive analytics, which combines business strategy, information technology, and modeling methods. The course reviews the benefits and opportunities of data science, organizational and implementation issues, ethical, regulatory, and compliance issues. It discusses business problems and solutions regarding traditional and contemporary data management systems and the selection of appropriate tools for data collection and analysis. It reviews approaches to business research, sampling, and survey design.”

The course is structured around required textbook reading, assigned articles, assigned videos, weekly discussions, one movie (Moneyball) and 4 projects.


The reading requirements are daunting, but doable.  You will (should) read 6 books in 10 weeks – a total of 1,590 pages.  There are 14 articles to read.  Each week has a short video as well.

These are the assigned books.  At first glance, this list will not seem to be a little odd with seemingly unrelated books.  However, they all help create the overall picture of analytics, and are all valuable.  I will provide just a brief overview of each, and plan to post more in-depth reviews of them later this summer.

Davenport TH, Harris JG.  2007. Competing on Analytics:  The New Science of Winning.  Boston Massachusetts: Harvard Business School Publishing.

This is the first text you read, for good reason.  It provides the backbone for the course.  You will learn about what it means to be an analytical competitor, how to evaluate an organizations analytical maturity, and then how to build an analytical capability.

Siegel E.  2013.  Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This is a must read for anyone going into predictive analytics, by one of the pioneers of this field.  It describes in detail what predictive analytics is, and gives numerous real life examples of organizations using these predictive models.

Few S.  2013.  Information Dashboard Design: Displaying data for at-a-glance monitoring.  Burlingame California: Analytics Press.

I will admit that when I first got this book I was very confused about why it was being included in a course on predictive analytics.  However, this turned out to be one of the best reads of the course.  For anyone who is in analytics and has to display information, especially in a dashboard format,  this is a must read.  This describes what dashboards are really for, and the science behind creating effective dashboards.  You will never look at a dashboard the same way in the future, and you will be critical of most commercially developed dashboards, as they are more about displaying flashiness and fancy bells and whistles rather than the functional display of pertinent data in the most effective format.  I can’t say enough good things about this book, a classic.

Laursen GHN, Thorlund J.  2010.  Business Analytics for Managers: Taking Business Intelligence Beyond Reporting.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This is a great overview of business analytics.  This is especially valuable in it’s explanation of how the analytics needs to support the strategy of the organization.

Franks B.  2012.  Taming the Big Data Tidal Wave: Finding opportunities in huge data streams with advanced analytics.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This was an  optional read, but I recommend reading it.  It is written in a very understandable way, and provides a great overview of the big data analytics ecosystem.

Groves RM, Fowler FJ, Couper MP, Lepkowski JM, Singer E, Tourangeau R.  2009.  Survey Methodology.  Hoboken New Jersey: John H Wiley and Sons, Inc.

I will admit this was my least favorite book, but having said that, I learned a ton from it.  For anyone who will even think about using survey’s to collect data, this is a must read. However the 419 pages make this a chore.  It would be nice to have an abridged version.  What it does, though, is wake  you up to how complex the process of creating, deploying, and analyzing surveys is.  I grudgingly admit this was a valuable read.


There are some really great articles included in the reading list.


There are videos that were developed by another professor that review the weeks material.  I did not find these especially helpful, but they did provide an overview of the weeks information, and might be  helpful if you are having some trouble understanding the material.

Weekly Discussions

Again, the weekly discussion are where it happens.  There are one or more topics that are posted.  There are usually some really great comments posted, and you can gain a lot of insight if you actually think about what you are posting, and what other people have posted.  If you post on the last day a brief paragraph, then you are missing out on some valuable information.


The first course I have taken where a movie was required.  There are discussions around this movie and one of the assignments involves creating an analysis of the Oakland A’s and how they used analytics.  I enjoyed the movie and thinking about this.


There are four assignments where you must create a paper of varying lengths.  You must create this using the appropriate APA format, so it is useful for refining these skills.

I found these to be challenging, fun, motivating, and extremely enlightening.  These called for the application of what we learned to some real world situations.  For one of these, I performed an in-depth analysis of our organizations analytics which involved interviewing our senior leadership.  As a result of these interviews, it really started the process of moving our organization to the next analytical maturity level in a very meaningful way.

Another project involved the creation of a draft dashboard using the best practices outlined by Stephen Few in his text.  This was a great learning experience for me, and one that will translate into much better dashboards at our organization.

The last project involved creating a meaningful and valid survey.  This was informative as well, and I actually might send out my survey.


Overall, this was a fantastic course.  This will make it clear why we need to do this well, and what doing this well looks like.  After this, the actual work of understanding and developing predictive models begins.  Again, I feel as if got my money’s worth (not an easy thing to say since these courses are pricey!).

Summer Activities

I am taking the summer off and am trying to catch up on the projects that have been piling up.  For fun I am learning SQL (great book – Head First SQL by Lynn Beighley) and working my way through several Python Udemy courses.  I will be attending the SciPy 2016 Conference in Austin Texas in July as well, and am super excited about this. I will be going to tutorials on Network Science, Data Science is software, Time Series analysis and Pandas. If you are attending, give me a shout out.










Data Science, Data Scientist

Who is Doing What/Earning What in Data Science Infographic

Are  you confused yet about the different roles/titles that people can have in the data analytics industry?   I think this might help add to your confusion.  This is a very nicely done infographic by DataCamp (  It is presented for your viewing pleasure and consideration.   Where do you fit into this categorization?  And does your compensation match your title match your responsibilities match your usefulness to your organization?