Becoming a Healthcare Data Scientist, Data Scientist, Healthcare Predictive Analytics, Northwestern University MSPA

Physician Data Scientist Part II. The Why.

I was recently reminded by a reader of my blog (thanks Al) that I had not followed up on a comment that I was going to post a second part to a blog that was posted on 7.7.2015 – “Physician Data Scientists – Why and What Type? Part I“.  Now that I am in between classes, I have the time to work on this.   Looking back at this original post, I am somewhat amazed at all that has happened in the last 1 1/2 years.

I am currently the interim Chief Information Officer (CIO) and Chief Medical Information Officer (CMIO) for our integrated healthcare system.   I stepped into the interim CIO role (helped in part by my Northwestern University MSPA Master of Science in Predictive Analytics coursework) after the departure of our previous CIO last year.  Prior to that I had been one of our systems CMIO’s – facilitating and communicating the needs for technology to help improve clinical outcomes to IT, while communicating back to Physicians and Leadership the limitations of current technologies.  I never really aspired to become either the interim CIO or a CMIO, these opportunities simply arose because of my journey to become better educated about the use of data and analytics to improve clinical outcomes – ie to become a Physician Data Scientist.  I will explain how I ended up in my current role.

My interest in data and analytics is a fairly recent phenomenon, occurring because of a chance meeting with someone who has since become one of my closest friends – Curt Lindberg – who has a PhD in Complexity Science, and is the Director of our Complexity in Healthcare Center.  I met him during a project to improve our process for getting patients into our healthcare system from outside facilities more efficiently.  At that time I was a practicing Emergency Physician and the Medical Director of our MedFlight Air Ambulance service.  Curt introduced me to complexity science and my life has not been the same – it was a transformational career moment for me.  I ended up being part of a small group of researchers who were trying to develop smarter patient monitoring systems.  Their work has inspired me to try and contribute in my own way to this field – called predictive monitoring.

Predictive monitoring is an unofficial term for what this group is trying to accomplish.  While the technology inside the monitors has changed drastically since the 1970’s, what the monitors do has not.  These monitors display certain physiologic markers of interest – blood pressure, pulse rate, temperature, oxygen level, ekg pattern, etc.  You can see what is happening to the patient right at that time, or you can go back and review what happened to them in the past (minimally), but there is no information about predicting what will happen to them in the future (are they predicted to get better, go into sudden cardiac arrest, stop breathing, or develop an overwhelming infection called sepsis, etc).  The goal is to incorporate predictive algorithms into these monitoring systems.

I have been fortunate to meet some giants in this field.  Dr. J. Randall Moorman  from the University of Virginia, who developed the first commercial predictive monitoring system – the HeRO monitor.  The largest ever randomized clinical trial in neonatal patients (premature babies) was conducted using this monitor.  It showed that the monitor was able to identify certain physiological patterns, and translate those patterns into a risk for developing an overwhelming infection (late onset neonatal sepsis).  This risk was detected an average of 18 hours before a clinical diagnosis was made, allowing for earlier treatments and interventions.  This translated into a 22% reduction in mortality.  Dr. Andrew Seely  is a Thoracic Surgeon at the University of Ottawa who has developed a model to predict the success of removing a breathing tube from a patient and not have to replace it because they weren’t ready to have it removed.   We got to participate in that clinical trial.  We also got to participate in a trial conducted by Ryan Arnold, now at Christiana Care in Newark Delaware, on trying to predict clinical outcomes using heart rate variability analyses.

In addition to collaborating with these researchers working on their projects, I became especially fascinated with a research article written by one of the countries leading trauma surgeons, Dr. Mitchell Cohen and his colleagues at San Francisco General Hospital and the University of California San Francisco – Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.  I will confess that I felt frustrated when I talked with the researchers about the underlying mathematical concepts and analytical techniques they were using, because I just did not understand them well.  This ignorance ignited what I will freely admit is now an obsession to understand these concepts and techniques.

I started off trying to educate myself using text books, taking on-line MOOC’s – Massive Online Open Courses, and enrolling in courses offered on the web.  I still felt very frustrated because these courses didn’t really go into the depth that I thought I needed.  When I look at the giants in this field of predictive analytics, these few researchers seemed to have both the clinical knowledge and understanding of why this research was so important, and they were also able to understand the mathematical and analytical concepts and techniques necessary to do research in this field.  I wanted to be like them.

I became very interested in becoming a data scientist at that point.  I eventually enrolled in Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I have not regretted this decision.  I currently am halfway through the program, and am finally into the especially relevant coursework.  I just finished the major foundational course – Linear Regression and Multivariate Analysis.  The courses up until then had been preparing me to take this course.   I realized I had come full circle when I re-read Mitchell Cohen’s article, and realized that I now finally understood the concepts and results.  That was an extremely satisfying moment for me.

This has been quite the educational journey for me.   I feel like I have a much better understanding of statistics. I am getting somewhat competent in a few programming languages – R, Python, and SAS.  I am using Jupyter Notebooks for my programming work.   I have dabbled with data science platforms like KNIME, and this quarter will be learning to use virtual machines, IBM Watson Analytics, ANGOSS, and Microsoft Azure machine learning – as part of my next class on Generalized Linear Models.

I finally feel as if I am able to start applying what I have been learning for the last 1 1/2 years – to start developing predictive models to improve clinical outcomes.  A few of my goals are to help our organization become more data driven, and to continue to work on developing predictive algorithms that could be incorporated into beside monitoring systems, further improving the outcomes of patients.

This is my journey to date from becoming a practicing Emergency Physician with no interest in data or analytics, to where I am now, halfway finished with my Master’s program.  The real journey of applying what I have learned to real world problems has just started but will get more robust as I learn more.





Healthcare Predictive Analytics

“The Formula” – great summer reading and some implications for healthcare predictive analytics.

I would like to recommend “The Formula” by Luke Dormehl for a good summer read.   I am enjoying this book so far.  I think it should be a must read for all of those interested in predictive analytics and predictive modelling.  A couple of passages from the beginning of the book are provided below.


“Algorithms sort, filter and select the information that is presented to us on a daily basis.”  “… are changing the way that we view … life, the universe, and everything.”

“To make sense of a big picture, we reduce it …  To take an abstract concept such as human intelligence and turn it into something quantifiable, we abstract it further, stripping away complexity and assigning it a seemingly arbitrary number, which becomes a person’s IQ.”

“What is new is the scale that this idea is now being enacted upon , to the point that it is difficult to think of a field of work or leisure that is not subject to algorithmization and The Formula.  This book is about how we reached this point, and how the age of the algorithm impacts and shapes subjects as varied as human creativity, human relationships, notions of identity, and matters of law.”

“Algorithms are very good at providing us with answers in all of these cases.  The real question is whether they give us the answers we want (my emphasis).”

This takes us back to George E.P. Box’s famous quote “all models are wrong, but some are useful”.   We can create algorithms for almost anything, but how useful are they.   Accurate models can be created that work really well on deterministic systems, but are much harder to develop on complex systems.   As you strip away features to be studied from that complex system, you lose the impact of that feature on the system. You try to select features that do not have a huge impact on the performance of the system, but this is often unknowable in advance.

One of the great challenges in clinical medicine is trying to determine or predict what is going to happen to a patient in the future.   We know generally that smoking is bad, too much alcohol is bad, being overweight is bad, not exercising is bad, not sleeping enough is bad.  We know these are bad for the overall population of people.  However we do not know how each of these effect a single patient, nor how they are interrelated.   We would like to develop models that can predict what will happen if you have certain conditions (predictive modeling), and then look at what would happen if you took certain courses of action/treatments/preventative actions(prescriptive modeling).  The results of these models would allow clinicians and patients to be better informed and choose the best pathway forward.

Of particular interest to me, I would like to be able to predict real-time what is going to happen to a patient I am seeing in the emergency room.    This is a complex situation.   Their current state – physiologic vital signs (level of consciousness, blood pressure, pulse, respiratory rate, temperature, blood oxygen level, respiratory variability, heart rate variability, ekg,  etc.), along with their current laboratory and radiological imaging findings will define their current problem or diagnosis.  The patients past medical history, medications, allergies, social support, living environment, etc.,  will have major impacts on how they respond to their current illness or injury.  We would like to aggregate all of this information into predictive and prescriptive models that could predict future states.   Are the patients safe to be discharged home or do they need to be admitted?  If they need to be admitted, can they go to the short stay unit, a bed with cardiac monitoring, a bed with cardiac monitoring, or the intensive care  unit?  Given the current treatment, what will their response to this treatment be – will they get better or worse?  Will they develop sepsis?  Will they develop respiratory failure and require a tube be placed down their throat and a ventilator to breathe for them?

A particularly exciting area ripe for development is the internet of things.   The internet of things is going to revolutionize how we collect data, both at home and in the hospital.   This much-needed capability will allow us to monitor patients at home,  detect illnesses much earlier, monitor responses to therapies, etc.,  and will be useful for a whole host of things we haven’t even imagined yet.

These are some of the complex questions that face us now in medicine.  I am excited to participate in this quest to answer some of these vexing questions using all of the analytical tools that are currently available – whether “small data”  using standard descriptive and inferential statistics, predictive analytics, and big data analytics.