Becoming a Healthcare Data Scientist

Physician Data Scientist – Why and What Type? Part I.

Why would a practicing Emergency Medicine Physician want to become a Data Scientist, and what type of Data Scientist could I become?

I will provide my answers to those two questions, starting with what type of Data Scientist in this post, followed by Why I want to become a Data Scientist in Part 2.

First – What kind of Data Scientist do I see myself becoming?

Types of Data Scientists

I am going to use the framework that Bill Voorhies referenced in his blog post “How to Become a Data Scientist” (http://data-magnum.com/how-to-become-a-data-scientist/).  He used the framework developed by Harris, Murphy and Vaisman in their 2013 O’Reilly report “Analyzing the Analyzers.  An Introspective Survey of Data Scientists and Their Work“, available for free at http://www.oreilly.com/data/free/analyzing-the-analyzers.csp.  They describe 4 different subtypes of Data Scientists – Data Businessperson, Data Creative, Data Developers, and Data Researchers.  Figure 3-3 shows the skill sets strengths in each group. Below figure 3-3 I will provide a synopsis of how they described each subset.

2015-07-07_20-01-14

Data Businesspeople are most focused on the organization and how data projects yield profit.  They are leaders and entrepreneurs.  They have technical skills and work with real data.  They are the most likely group to have an MBA, and have an undergraduate Engineering degree.

Data Creatives are seen as the broadest of the Data Scientists, excelling at applying a wide range of tools and technologies to a problem, or creating innovative prototypes at hackathons, the quintessential Jack of All Trades.    They are seen as Artists.  They have substantial business experience.

Data Developers are focused on the technical problem of managing data – how to get it, store it, learn from it.  They are writing a lot of code, and have substantial computer science backgrounds.  They have more of the machine learning/big data skills than the other groups.

Data Researchers have a strong background in statistics, and have an academic background.

What type of Data Scientist do I see myself becoming?

I see myself fitting into two categories – a mix of the “Data Businesspeople” and the “Data Creative” subtypes of data scientists.   Although it will be easiest to become the Data Businesspeople type, I have aspirations of becoming more of a Data Creative or Jack of All Trades type as well.  I will discuss the different skill sets used in the analysis, and where I see my current strengths, and where my future strengths need to be developed in order to achieve these goals.

In terms of business skills, I have a broad general understanding of medicine in general, and emergency medicine in particular.   I also understand the Prehospital Emergency Medical Services environment, having started my career as an EMT-Paramedic, and having served as a Medical Director for several EMS services.   I am currently the Medical Director for our Air Ambulance service.  In addition, as a Chief Medical Information Officer, I understand the IT needs of clinicians and health care workers, and the technical realities of what IT can deliver.    I also serve as the Physician Liaison to our BI/Enterprise Analytics Division.   I see my experience and knowledge as a subject matter expert for clinical medicine driving the kinds of research questions that our data science/data analytics team attempt to answer.

I already have a deep interest in developing predictive algorithms that could be incorporated into bedside monitoring technologies that would be used to predict future states and detect early clinical deterioration.   This information could be used to guide triage decisions for clinicians;  is the patient safe to be discharged home, or do they need to be admitted to the hospital?   If they need to be admitted, do they need to be in the ICU, or is an unmonitored bed going to be ok?  Is the patient predicted to recover uneventfully, or do they have a high probability of deterioration requiring high resource utilization and admission to the ICU?  Does a patient at a small rural critical access hospital need to be transferred to a tertiary care facility that might be hundreds of miles away, taking them away from their family support network, and exposing them to the dangers of transfer and the costs of transfer (currently between $25,000 – $75,000), or can they be safely treated at their hometown facility.   Will the Internet of Things help us to remotely monitor patients at home, or even in the hospital, to detect either improvement or deterioration, before it is clinically apparent, thereby allowing earlier treatments and interventions and improving outcomes?  These are some of the important unanswered questions in my mind.

My weakest current skill, and continued weakest skill going forward I see as programming or hacking.   That is why I will never be a pure Data Creative type.   I do want to get competent at more than a basic level, in order to be able to do some of the work myself, and hand off the really complicated code to a true programmer.   I am currently working on learning Python, having finished the Codecademy course, and am almost finished with Zed A. Shaw’s “Learn Python the Hard Way”.   I know some R as well, mainly for statistical analysis.  Having said that, I am a novice coder at best.

I am extremely interested in machine learning and big data.  I would really like to become adept at analyzing big data because I see the potential of this approach in analyzing healthcare data.  This will be a big focus of mine.

I have a basic background in math and statistics, and am actually looking forward to relearning them again.   I think I will learn a tremendous amount now that I understand the importance of having this background.  I am currently working my way through the textbook we will be using in the fall for the math for modelers course.

When you consider all of the factors, my largest skill set is my business or subject matter experience.   I think this will allow me to be a better leader in choosing which analytics projects we pursue.   Having a good background in what types of analyses are possible, and which type are good for what situation, will help me make better decisions, and understand the results.   I am hopeful that I will then be able to translate the insights learned into understandable and actionable information that can be presented to the various stakeholders.

I am also hopeful that I can help drive the changes that are needed across the organization, based on the insight learned.  That is the basis for the “Learning Health System” concept.   A Learning Health System has to be able to capture important data, analyze it, gain insights, diffuse these insights, and rapidly change behavior incorporating these insights.  Our institution is currently trying to understand the meaning/basic concepts of a Learning Health System and put in place the framework and people necessary to achieve the goals of this system.  I hope to contribute to this in a meaningful manner.  There are also national initiatives on becoming Learning Health Systems.  The Learning Health Community (http://www.learninghealth.org/home/) is an excellent resource listing  core values, and some of the organizations also working on this goal.

In my next post, I will answer the question of Why I want to become a data scientist.

Healthcare Predictive Analytics

“The Formula” – great summer reading and some implications for healthcare predictive analytics.

I would like to recommend “The Formula” by Luke Dormehl for a good summer read.   I am enjoying this book so far.  I think it should be a must read for all of those interested in predictive analytics and predictive modelling.  A couple of passages from the beginning of the book are provided below.

9780399170539_p0_v2_s260x420

“Algorithms sort, filter and select the information that is presented to us on a daily basis.”  “… are changing the way that we view … life, the universe, and everything.”

“To make sense of a big picture, we reduce it …  To take an abstract concept such as human intelligence and turn it into something quantifiable, we abstract it further, stripping away complexity and assigning it a seemingly arbitrary number, which becomes a person’s IQ.”

“What is new is the scale that this idea is now being enacted upon , to the point that it is difficult to think of a field of work or leisure that is not subject to algorithmization and The Formula.  This book is about how we reached this point, and how the age of the algorithm impacts and shapes subjects as varied as human creativity, human relationships, notions of identity, and matters of law.”

“Algorithms are very good at providing us with answers in all of these cases.  The real question is whether they give us the answers we want (my emphasis).”

This takes us back to George E.P. Box’s famous quote “all models are wrong, but some are useful”.   We can create algorithms for almost anything, but how useful are they.   Accurate models can be created that work really well on deterministic systems, but are much harder to develop on complex systems.   As you strip away features to be studied from that complex system, you lose the impact of that feature on the system. You try to select features that do not have a huge impact on the performance of the system, but this is often unknowable in advance.

One of the great challenges in clinical medicine is trying to determine or predict what is going to happen to a patient in the future.   We know generally that smoking is bad, too much alcohol is bad, being overweight is bad, not exercising is bad, not sleeping enough is bad.  We know these are bad for the overall population of people.  However we do not know how each of these effect a single patient, nor how they are interrelated.   We would like to develop models that can predict what will happen if you have certain conditions (predictive modeling), and then look at what would happen if you took certain courses of action/treatments/preventative actions(prescriptive modeling).  The results of these models would allow clinicians and patients to be better informed and choose the best pathway forward.

Of particular interest to me, I would like to be able to predict real-time what is going to happen to a patient I am seeing in the emergency room.    This is a complex situation.   Their current state – physiologic vital signs (level of consciousness, blood pressure, pulse, respiratory rate, temperature, blood oxygen level, respiratory variability, heart rate variability, ekg,  etc.), along with their current laboratory and radiological imaging findings will define their current problem or diagnosis.  The patients past medical history, medications, allergies, social support, living environment, etc.,  will have major impacts on how they respond to their current illness or injury.  We would like to aggregate all of this information into predictive and prescriptive models that could predict future states.   Are the patients safe to be discharged home or do they need to be admitted?  If they need to be admitted, can they go to the short stay unit, a bed with cardiac monitoring, a bed with cardiac monitoring, or the intensive care  unit?  Given the current treatment, what will their response to this treatment be – will they get better or worse?  Will they develop sepsis?  Will they develop respiratory failure and require a tube be placed down their throat and a ventilator to breathe for them?

A particularly exciting area ripe for development is the internet of things.   The internet of things is going to revolutionize how we collect data, both at home and in the hospital.   This much-needed capability will allow us to monitor patients at home,  detect illnesses much earlier, monitor responses to therapies, etc.,  and will be useful for a whole host of things we haven’t even imagined yet.

These are some of the complex questions that face us now in medicine.  I am excited to participate in this quest to answer some of these vexing questions using all of the analytical tools that are currently available – whether “small data”  using standard descriptive and inferential statistics, predictive analytics, and big data analytics.

Becoming a Healthcare Data Scientist, Predictive Analytics

It’s official now, I have been accepted into Northwestern University’s Master of Science in Predictive Analytics Program

Just a brief update as I attempt to record significant events/thoughts on my journey to become a data scientist.

I just received my official confirmation that I have been accepted into Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I am due to start in the fall.  Here is the link if you are unfamiliar with the program.

http://sps.northwestern.edu/program-areas/graduate/predictive-analytics/index.php

2015-05-18_9-00-59

I selected this program for many reasons. First, it is oriented towards what I really want to do – predictive analytics.  This should give me a solid foundational background upon which I can build the specific skills and specialization I need to develop smarter bedside monitoring systems and help predict patient outcomes.

Second, this is an online curriculum.  At this point in my life I am not easily able to move or change jobs.  I am a practicing Emergency Physician.  I am a Chief Medical Information Officer for my health system.  And I am the Medical Director for our flight service.  I really enjoy all of these pursuits, and am not willing to give this up to move and go back to school full time.  Nor could I with my family obligations.   So the online option works best for me.

Third, Northwestern has a great reputation in this field.  I did my due diligence and researched the program, and the feedback I received was overwhelmingly positive.  As an aside, I did most of my undergraduate degree at Northwestern in the 1980’s (I did not graduate, but would love to finally have a degree from them, so this will hopefully allow me to do this!).

So the easy part is over.  I have been accepted.  Now I will have to do the hard work of learning all of the new material and keep up with the coursework, while still performing my regular day jobs.

Becoming a Healthcare Data Scientist

Greetings from Billings Montana!

This is my first entry in my new blog.  Let me tell you a little about myself, and why I decided to start this blog.

My name is Randy Thompson.  I am an Emergency Physician and I practice at the Billings Clinic in Billings Montana.   I am also the Chief Medical Information Officer (CMIO) for the Billings Clinic.  In my role as CMIO, I act as a bridge between the world of clinical medicine on one hand, and the Information Technology world on the other.  As part of this effort I am branching out into the world of social media, and have established linked in and twitter accounts, and now have this blog.  The information and views expressed in this blog will be mine, and do not reflect the views or policy of the Billings Clinic.

There are many things I would like to blog about, but the main reason for establishing this blog is to document the subjects that I feel very passionate about.   These include my quest to become a data scientist, my interest in predictive monitoring (developing smarter bedside physiological monitors), and my interest in complexity science as it applies to human health and disease, as well as organizational dynamics and development.

One of my biggest goals in blogging is to chronicle my journey to become a data scientist.  I have a huge interest in healthcare analytics in general and big data analytics in particular.  We have had difficulties in funding, recruiting, and hiring data scientists, so I thought the best solution for me and my organization was for me to become one.  Initially I tried to do this informally with books and online courses, but this Spring made the decision to do this formally.  I will be starting an online Master’s program this Fall to accomplish this goal.  I would like to document this journey from analytic novice to data scientist on this blog.

I also plan on documenting the field of predictive monitoring as time goes on.   I will blog more about this in the future, but will briefly describe this effort.  The bedside physiologic monitors used in healthcare have not fundamentally changed since the 1970’s.   They provide information about what is happening to the patient now, and you can retrieve some information about what happened to them in the past.  However, there are only a few monitoring systems that can predict what will happen to the patient in the future.  Will they get better?  Will they get worse?  Will they develop sepsis(an overwhelming systemic infection)?  Will they deteriorate and require a ventilator to help them breathe?  These are the questions we would like the monitors to help us answer.  In order to do this, we need to develop predictive algorithms that could be incorporated into these monitors, making them much smarter and more clinically relevant.

Hopefully, you will find this journey interesting and informative.