Becoming a Healthcare Data Scientist

Physician Data Scientist – Why and What Type? Part I.

Why would a practicing Emergency Medicine Physician want to become a Data Scientist, and what type of Data Scientist could I become?

I will provide my answers to those two questions, starting with what type of Data Scientist in this post, followed by Why I want to become a Data Scientist in Part 2.

First – What kind of Data Scientist do I see myself becoming?

Types of Data Scientists

I am going to use the framework that Bill Voorhies referenced in his blog post “How to Become a Data Scientist” (http://data-magnum.com/how-to-become-a-data-scientist/).  He used the framework developed by Harris, Murphy and Vaisman in their 2013 O’Reilly report “Analyzing the Analyzers.  An Introspective Survey of Data Scientists and Their Work“, available for free at http://www.oreilly.com/data/free/analyzing-the-analyzers.csp.  They describe 4 different subtypes of Data Scientists – Data Businessperson, Data Creative, Data Developers, and Data Researchers.  Figure 3-3 shows the skill sets strengths in each group. Below figure 3-3 I will provide a synopsis of how they described each subset.

2015-07-07_20-01-14

Data Businesspeople are most focused on the organization and how data projects yield profit.  They are leaders and entrepreneurs.  They have technical skills and work with real data.  They are the most likely group to have an MBA, and have an undergraduate Engineering degree.

Data Creatives are seen as the broadest of the Data Scientists, excelling at applying a wide range of tools and technologies to a problem, or creating innovative prototypes at hackathons, the quintessential Jack of All Trades.    They are seen as Artists.  They have substantial business experience.

Data Developers are focused on the technical problem of managing data – how to get it, store it, learn from it.  They are writing a lot of code, and have substantial computer science backgrounds.  They have more of the machine learning/big data skills than the other groups.

Data Researchers have a strong background in statistics, and have an academic background.

What type of Data Scientist do I see myself becoming?

I see myself fitting into two categories – a mix of the “Data Businesspeople” and the “Data Creative” subtypes of data scientists.   Although it will be easiest to become the Data Businesspeople type, I have aspirations of becoming more of a Data Creative or Jack of All Trades type as well.  I will discuss the different skill sets used in the analysis, and where I see my current strengths, and where my future strengths need to be developed in order to achieve these goals.

In terms of business skills, I have a broad general understanding of medicine in general, and emergency medicine in particular.   I also understand the Prehospital Emergency Medical Services environment, having started my career as an EMT-Paramedic, and having served as a Medical Director for several EMS services.   I am currently the Medical Director for our Air Ambulance service.  In addition, as a Chief Medical Information Officer, I understand the IT needs of clinicians and health care workers, and the technical realities of what IT can deliver.    I also serve as the Physician Liaison to our BI/Enterprise Analytics Division.   I see my experience and knowledge as a subject matter expert for clinical medicine driving the kinds of research questions that our data science/data analytics team attempt to answer.

I already have a deep interest in developing predictive algorithms that could be incorporated into bedside monitoring technologies that would be used to predict future states and detect early clinical deterioration.   This information could be used to guide triage decisions for clinicians;  is the patient safe to be discharged home, or do they need to be admitted to the hospital?   If they need to be admitted, do they need to be in the ICU, or is an unmonitored bed going to be ok?  Is the patient predicted to recover uneventfully, or do they have a high probability of deterioration requiring high resource utilization and admission to the ICU?  Does a patient at a small rural critical access hospital need to be transferred to a tertiary care facility that might be hundreds of miles away, taking them away from their family support network, and exposing them to the dangers of transfer and the costs of transfer (currently between $25,000 – $75,000), or can they be safely treated at their hometown facility.   Will the Internet of Things help us to remotely monitor patients at home, or even in the hospital, to detect either improvement or deterioration, before it is clinically apparent, thereby allowing earlier treatments and interventions and improving outcomes?  These are some of the important unanswered questions in my mind.

My weakest current skill, and continued weakest skill going forward I see as programming or hacking.   That is why I will never be a pure Data Creative type.   I do want to get competent at more than a basic level, in order to be able to do some of the work myself, and hand off the really complicated code to a true programmer.   I am currently working on learning Python, having finished the Codecademy course, and am almost finished with Zed A. Shaw’s “Learn Python the Hard Way”.   I know some R as well, mainly for statistical analysis.  Having said that, I am a novice coder at best.

I am extremely interested in machine learning and big data.  I would really like to become adept at analyzing big data because I see the potential of this approach in analyzing healthcare data.  This will be a big focus of mine.

I have a basic background in math and statistics, and am actually looking forward to relearning them again.   I think I will learn a tremendous amount now that I understand the importance of having this background.  I am currently working my way through the textbook we will be using in the fall for the math for modelers course.

When you consider all of the factors, my largest skill set is my business or subject matter experience.   I think this will allow me to be a better leader in choosing which analytics projects we pursue.   Having a good background in what types of analyses are possible, and which type are good for what situation, will help me make better decisions, and understand the results.   I am hopeful that I will then be able to translate the insights learned into understandable and actionable information that can be presented to the various stakeholders.

I am also hopeful that I can help drive the changes that are needed across the organization, based on the insight learned.  That is the basis for the “Learning Health System” concept.   A Learning Health System has to be able to capture important data, analyze it, gain insights, diffuse these insights, and rapidly change behavior incorporating these insights.  Our institution is currently trying to understand the meaning/basic concepts of a Learning Health System and put in place the framework and people necessary to achieve the goals of this system.  I hope to contribute to this in a meaningful manner.  There are also national initiatives on becoming Learning Health Systems.  The Learning Health Community (http://www.learninghealth.org/home/) is an excellent resource listing  core values, and some of the organizations also working on this goal.

In my next post, I will answer the question of Why I want to become a data scientist.

Becoming a Healthcare Data Scientist

My Current Baseline Data Scientist Skill Set

It will be interesting to compare my skill set once I finish the predictive analytics program to my current skill set.  I will outline my current skills so I can come back later and compare the two.

I will organize my skills using the format presented by Mitch Sanders in his blog article posted on 8.27.13 “Data Science – Capturing, Analyzing, and Presenting Data Skills”.  (http://datareality.blogspot.com/2013/08/data-scientist-core-skills.html).

1.  Capturing Data

Programming and Database skills:

I am weak in this area.  I have used R a bit to do some statistical analysis in the past.  I am currently learning Python  as I write this.  So far, I have found that Codecademy’s Python course is the best learning platform for me.  My next favorite resource is Zed Shaw’s book, “Learn Python the Hard Way”.  I really like his practical approach.  “Introducing Python.  Modern computing in simple packages” by Bill Lubanovic is also good, but but a bit more advanced.  Finally, the Visual Quickstart Guide “Python” by Toby Donaldson is a quick reference guide.  Going past basic programming, my skills are near or below zero.  I do not know how to use Hadoop, Java, SQL, Hive or Pig.

Business Domain Expertise and Knowledge

This is my strongest area of expertise.  I started off in medicine in 1984 as a basic EMT, became a EMT-Paramedic, and then Paramedic Educator.  I finished medical school (University of Illinois College of Medicine in Peoria Illinois) in 1994, and my Emergency Medicine Residency at Saint Francis Hospital in Peoria Illinois in 1997.   I have practiced academic and community based emergency medicine since then.   I have been a medical director for both ground based EMS and for a flight program.  I am also one of our health system’s Chief Medical Information Officers (CMIO), so have had to learn the field of Healthcare Information Technology as well.   In my current role I have a special interest in Business Intelligence and Analytics, including predictive analytics.  My passion is for developing smarter systems that can provide information about a patients risk of developing certain diseases/conditions, risk of deterioration/death, early detection of sub-clinical illness, and information about a patient’s response to treatment and therapy.  Hence my interest in predictive analytics.

Data Modeling, Warehouse, and Unstructured Data Skills.

I have minimal skills in this category.

2.  Analyzing Data

Math Skills.

I have basic math skills, but it has been a long time since I have had to do more than basic math, including calculus and linear algebra.  After I finish getting a basic foundation in Python, my next step is to refresh my knowledge of math/calculus/linear algebra before starting my “Math for Modelers” course this fall.

Statistical  and Analytical Skills

I do have a little better grasp of descriptive and inferential statistics.   But I will need to increase my knowledge of the advanced statistical techniques not commonly used in medicine today.  These would include predictive analytics, regression, multivariate analysis, linear models, time series analysis, machine learning, etc.

3.  Presenting Data

I am really excited to learn about and improve my data visualization skills.  I am really pushing hard for our organization to move away from excel and PowerPoint based presentations of data, to more relevant methods.

Storytelling Skills

I am a pretty good storyteller, but would like to improve my skills, especially in presenting the data and stories around the data.  I would like to help people  understand the insight created by the data analysis, and then help them move to operationalizing that insight, and driving organization change to improve patient outcomes.

In summary, my strongest skills are my love of data and analytics, my (obsessive) desire to become a data scientist, and my domain knowledge as it pertains to healthcare.  My other skills will have to be works in progress.

I would love to hear comments on what you think, and any recommendations/advice for students just starting this journey.

June 10, 2015

Becoming a Healthcare Data Scientist, Predictive Analytics

It’s official now, I have been accepted into Northwestern University’s Master of Science in Predictive Analytics Program

Just a brief update as I attempt to record significant events/thoughts on my journey to become a data scientist.

I just received my official confirmation that I have been accepted into Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I am due to start in the fall.  Here is the link if you are unfamiliar with the program.

http://sps.northwestern.edu/program-areas/graduate/predictive-analytics/index.php

2015-05-18_9-00-59

I selected this program for many reasons. First, it is oriented towards what I really want to do – predictive analytics.  This should give me a solid foundational background upon which I can build the specific skills and specialization I need to develop smarter bedside monitoring systems and help predict patient outcomes.

Second, this is an online curriculum.  At this point in my life I am not easily able to move or change jobs.  I am a practicing Emergency Physician.  I am a Chief Medical Information Officer for my health system.  And I am the Medical Director for our flight service.  I really enjoy all of these pursuits, and am not willing to give this up to move and go back to school full time.  Nor could I with my family obligations.   So the online option works best for me.

Third, Northwestern has a great reputation in this field.  I did my due diligence and researched the program, and the feedback I received was overwhelmingly positive.  As an aside, I did most of my undergraduate degree at Northwestern in the 1980’s (I did not graduate, but would love to finally have a degree from them, so this will hopefully allow me to do this!).

So the easy part is over.  I have been accepted.  Now I will have to do the hard work of learning all of the new material and keep up with the coursework, while still performing my regular day jobs.