Becoming a Healthcare Data Scientist

Physician Data Scientist – Why and What Type? Part I.

Why would a practicing Emergency Medicine Physician want to become a Data Scientist, and what type of Data Scientist could I become?

I will provide my answers to those two questions, starting with what type of Data Scientist in this post, followed by Why I want to become a Data Scientist in Part 2.

First – What kind of Data Scientist do I see myself becoming?

Types of Data Scientists

I am going to use the framework that Bill Voorhies referenced in his blog post “How to Become a Data Scientist” (  He used the framework developed by Harris, Murphy and Vaisman in their 2013 O’Reilly report “Analyzing the Analyzers.  An Introspective Survey of Data Scientists and Their Work“, available for free at  They describe 4 different subtypes of Data Scientists – Data Businessperson, Data Creative, Data Developers, and Data Researchers.  Figure 3-3 shows the skill sets strengths in each group. Below figure 3-3 I will provide a synopsis of how they described each subset.


Data Businesspeople are most focused on the organization and how data projects yield profit.  They are leaders and entrepreneurs.  They have technical skills and work with real data.  They are the most likely group to have an MBA, and have an undergraduate Engineering degree.

Data Creatives are seen as the broadest of the Data Scientists, excelling at applying a wide range of tools and technologies to a problem, or creating innovative prototypes at hackathons, the quintessential Jack of All Trades.    They are seen as Artists.  They have substantial business experience.

Data Developers are focused on the technical problem of managing data – how to get it, store it, learn from it.  They are writing a lot of code, and have substantial computer science backgrounds.  They have more of the machine learning/big data skills than the other groups.

Data Researchers have a strong background in statistics, and have an academic background.

What type of Data Scientist do I see myself becoming?

I see myself fitting into two categories – a mix of the “Data Businesspeople” and the “Data Creative” subtypes of data scientists.   Although it will be easiest to become the Data Businesspeople type, I have aspirations of becoming more of a Data Creative or Jack of All Trades type as well.  I will discuss the different skill sets used in the analysis, and where I see my current strengths, and where my future strengths need to be developed in order to achieve these goals.

In terms of business skills, I have a broad general understanding of medicine in general, and emergency medicine in particular.   I also understand the Prehospital Emergency Medical Services environment, having started my career as an EMT-Paramedic, and having served as a Medical Director for several EMS services.   I am currently the Medical Director for our Air Ambulance service.  In addition, as a Chief Medical Information Officer, I understand the IT needs of clinicians and health care workers, and the technical realities of what IT can deliver.    I also serve as the Physician Liaison to our BI/Enterprise Analytics Division.   I see my experience and knowledge as a subject matter expert for clinical medicine driving the kinds of research questions that our data science/data analytics team attempt to answer.

I already have a deep interest in developing predictive algorithms that could be incorporated into bedside monitoring technologies that would be used to predict future states and detect early clinical deterioration.   This information could be used to guide triage decisions for clinicians;  is the patient safe to be discharged home, or do they need to be admitted to the hospital?   If they need to be admitted, do they need to be in the ICU, or is an unmonitored bed going to be ok?  Is the patient predicted to recover uneventfully, or do they have a high probability of deterioration requiring high resource utilization and admission to the ICU?  Does a patient at a small rural critical access hospital need to be transferred to a tertiary care facility that might be hundreds of miles away, taking them away from their family support network, and exposing them to the dangers of transfer and the costs of transfer (currently between $25,000 – $75,000), or can they be safely treated at their hometown facility.   Will the Internet of Things help us to remotely monitor patients at home, or even in the hospital, to detect either improvement or deterioration, before it is clinically apparent, thereby allowing earlier treatments and interventions and improving outcomes?  These are some of the important unanswered questions in my mind.

My weakest current skill, and continued weakest skill going forward I see as programming or hacking.   That is why I will never be a pure Data Creative type.   I do want to get competent at more than a basic level, in order to be able to do some of the work myself, and hand off the really complicated code to a true programmer.   I am currently working on learning Python, having finished the Codecademy course, and am almost finished with Zed A. Shaw’s “Learn Python the Hard Way”.   I know some R as well, mainly for statistical analysis.  Having said that, I am a novice coder at best.

I am extremely interested in machine learning and big data.  I would really like to become adept at analyzing big data because I see the potential of this approach in analyzing healthcare data.  This will be a big focus of mine.

I have a basic background in math and statistics, and am actually looking forward to relearning them again.   I think I will learn a tremendous amount now that I understand the importance of having this background.  I am currently working my way through the textbook we will be using in the fall for the math for modelers course.

When you consider all of the factors, my largest skill set is my business or subject matter experience.   I think this will allow me to be a better leader in choosing which analytics projects we pursue.   Having a good background in what types of analyses are possible, and which type are good for what situation, will help me make better decisions, and understand the results.   I am hopeful that I will then be able to translate the insights learned into understandable and actionable information that can be presented to the various stakeholders.

I am also hopeful that I can help drive the changes that are needed across the organization, based on the insight learned.  That is the basis for the “Learning Health System” concept.   A Learning Health System has to be able to capture important data, analyze it, gain insights, diffuse these insights, and rapidly change behavior incorporating these insights.  Our institution is currently trying to understand the meaning/basic concepts of a Learning Health System and put in place the framework and people necessary to achieve the goals of this system.  I hope to contribute to this in a meaningful manner.  There are also national initiatives on becoming Learning Health Systems.  The Learning Health Community ( is an excellent resource listing  core values, and some of the organizations also working on this goal.

In my next post, I will answer the question of Why I want to become a data scientist.