Becoming a Healthcare Data Scientist, Data Scientist, Healthcare Predictive Analytics, Northwestern University MSPA

Physician Data Scientist Part II. The Why.

I was recently reminded by a reader of my blog (thanks Al) that I had not followed up on a comment that I was going to post a second part to a blog that was posted on 7.7.2015 – “Physician Data Scientists – Why and What Type? Part I“.  Now that I am in between classes, I have the time to work on this.   Looking back at this original post, I am somewhat amazed at all that has happened in the last 1 1/2 years.

I am currently the interim Chief Information Officer (CIO) and Chief Medical Information Officer (CMIO) for our integrated healthcare system.   I stepped into the interim CIO role (helped in part by my Northwestern University MSPA Master of Science in Predictive Analytics coursework) after the departure of our previous CIO last year.  Prior to that I had been one of our systems CMIO’s – facilitating and communicating the needs for technology to help improve clinical outcomes to IT, while communicating back to Physicians and Leadership the limitations of current technologies.  I never really aspired to become either the interim CIO or a CMIO, these opportunities simply arose because of my journey to become better educated about the use of data and analytics to improve clinical outcomes – ie to become a Physician Data Scientist.  I will explain how I ended up in my current role.

My interest in data and analytics is a fairly recent phenomenon, occurring because of a chance meeting with someone who has since become one of my closest friends – Curt Lindberg – who has a PhD in Complexity Science, and is the Director of our Complexity in Healthcare Center.  I met him during a project to improve our process for getting patients into our healthcare system from outside facilities more efficiently.  At that time I was a practicing Emergency Physician and the Medical Director of our MedFlight Air Ambulance service.  Curt introduced me to complexity science and my life has not been the same – it was a transformational career moment for me.  I ended up being part of a small group of researchers who were trying to develop smarter patient monitoring systems.  Their work has inspired me to try and contribute in my own way to this field – called predictive monitoring.

Predictive monitoring is an unofficial term for what this group is trying to accomplish.  While the technology inside the monitors has changed drastically since the 1970’s, what the monitors do has not.  These monitors display certain physiologic markers of interest – blood pressure, pulse rate, temperature, oxygen level, ekg pattern, etc.  You can see what is happening to the patient right at that time, or you can go back and review what happened to them in the past (minimally), but there is no information about predicting what will happen to them in the future (are they predicted to get better, go into sudden cardiac arrest, stop breathing, or develop an overwhelming infection called sepsis, etc).  The goal is to incorporate predictive algorithms into these monitoring systems.

I have been fortunate to meet some giants in this field.  Dr. J. Randall Moorman  from the University of Virginia, who developed the first commercial predictive monitoring system – the HeRO monitor.  The largest ever randomized clinical trial in neonatal patients (premature babies) was conducted using this monitor.  It showed that the monitor was able to identify certain physiological patterns, and translate those patterns into a risk for developing an overwhelming infection (late onset neonatal sepsis).  This risk was detected an average of 18 hours before a clinical diagnosis was made, allowing for earlier treatments and interventions.  This translated into a 22% reduction in mortality.  Dr. Andrew Seely  is a Thoracic Surgeon at the University of Ottawa who has developed a model to predict the success of removing a breathing tube from a patient and not have to replace it because they weren’t ready to have it removed.   We got to participate in that clinical trial.  We also got to participate in a trial conducted by Ryan Arnold, now at Christiana Care in Newark Delaware, on trying to predict clinical outcomes using heart rate variability analyses.

In addition to collaborating with these researchers working on their projects, I became especially fascinated with a research article written by one of the countries leading trauma surgeons, Dr. Mitchell Cohen and his colleagues at San Francisco General Hospital and the University of California San Francisco – Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.  I will confess that I felt frustrated when I talked with the researchers about the underlying mathematical concepts and analytical techniques they were using, because I just did not understand them well.  This ignorance ignited what I will freely admit is now an obsession to understand these concepts and techniques.

I started off trying to educate myself using text books, taking on-line MOOC’s – Massive Online Open Courses, and enrolling in courses offered on the web.  I still felt very frustrated because these courses didn’t really go into the depth that I thought I needed.  When I look at the giants in this field of predictive analytics, these few researchers seemed to have both the clinical knowledge and understanding of why this research was so important, and they were also able to understand the mathematical and analytical concepts and techniques necessary to do research in this field.  I wanted to be like them.

I became very interested in becoming a data scientist at that point.  I eventually enrolled in Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I have not regretted this decision.  I currently am halfway through the program, and am finally into the especially relevant coursework.  I just finished the major foundational course – Linear Regression and Multivariate Analysis.  The courses up until then had been preparing me to take this course.   I realized I had come full circle when I re-read Mitchell Cohen’s article, and realized that I now finally understood the concepts and results.  That was an extremely satisfying moment for me.

This has been quite the educational journey for me.   I feel like I have a much better understanding of statistics. I am getting somewhat competent in a few programming languages – R, Python, and SAS.  I am using Jupyter Notebooks for my programming work.   I have dabbled with data science platforms like KNIME, and this quarter will be learning to use virtual machines, IBM Watson Analytics, ANGOSS, and Microsoft Azure machine learning – as part of my next class on Generalized Linear Models.

I finally feel as if I am able to start applying what I have been learning for the last 1 1/2 years – to start developing predictive models to improve clinical outcomes.  A few of my goals are to help our organization become more data driven, and to continue to work on developing predictive algorithms that could be incorporated into beside monitoring systems, further improving the outcomes of patients.

This is my journey to date from becoming a practicing Emergency Physician with no interest in data or analytics, to where I am now, halfway finished with my Master’s program.  The real journey of applying what I have learned to real world problems has just started but will get more robust as I learn more.

 

 

 

 

Becoming a Healthcare Data Scientist, Data Science, Data Scientist, Data Visualization, Northwestern University MSPA, Predictive Analytics

Northwestern University MSPA 402, Intro to Predictive Analytics Review

Summing this course up in one word = WOW.  This course should be taken early on because it is extremely motivating, and will help motivate  you to get through the other beginning courses such as Math for Modelers and Stats.  This course is a high level overview of why and how analytics should be performed.  It describes not only predictive analytics but the whole analytics spectrum and what it means to be an “analytical competitor”.  While you do not perform any actual analytics, you will understand why getting good at this is so important.

I took this course from Dr. Gordon Swartz, and highly recommend him.  Interestingly, he has bachelor degrees in nuclear engineering and political science from MIT, an  MBA from Northeastern University and a doctorate in business administration from Harvard.  His sync sessions were very informative and practical, and he provided on-going commentary in the discussion boards.

The course description is –  “This course introduces the field of predictive analytics, which combines business strategy, information technology, and modeling methods. The course reviews the benefits and opportunities of data science, organizational and implementation issues, ethical, regulatory, and compliance issues. It discusses business problems and solutions regarding traditional and contemporary data management systems and the selection of appropriate tools for data collection and analysis. It reviews approaches to business research, sampling, and survey design.”

The course is structured around required textbook reading, assigned articles, assigned videos, weekly discussions, one movie (Moneyball) and 4 projects.

Readings

The reading requirements are daunting, but doable.  You will (should) read 6 books in 10 weeks – a total of 1,590 pages.  There are 14 articles to read.  Each week has a short video as well.

These are the assigned books.  At first glance, this list will not seem to be a little odd with seemingly unrelated books.  However, they all help create the overall picture of analytics, and are all valuable.  I will provide just a brief overview of each, and plan to post more in-depth reviews of them later this summer.

Davenport TH, Harris JG.  2007. Competing on Analytics:  The New Science of Winning.  Boston Massachusetts: Harvard Business School Publishing.

This is the first text you read, for good reason.  It provides the backbone for the course.  You will learn about what it means to be an analytical competitor, how to evaluate an organizations analytical maturity, and then how to build an analytical capability.

Siegel E.  2013.  Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie or Die.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This is a must read for anyone going into predictive analytics, by one of the pioneers of this field.  It describes in detail what predictive analytics is, and gives numerous real life examples of organizations using these predictive models.

Few S.  2013.  Information Dashboard Design: Displaying data for at-a-glance monitoring.  Burlingame California: Analytics Press.

I will admit that when I first got this book I was very confused about why it was being included in a course on predictive analytics.  However, this turned out to be one of the best reads of the course.  For anyone who is in analytics and has to display information, especially in a dashboard format,  this is a must read.  This describes what dashboards are really for, and the science behind creating effective dashboards.  You will never look at a dashboard the same way in the future, and you will be critical of most commercially developed dashboards, as they are more about displaying flashiness and fancy bells and whistles rather than the functional display of pertinent data in the most effective format.  I can’t say enough good things about this book, a classic.

Laursen GHN, Thorlund J.  2010.  Business Analytics for Managers: Taking Business Intelligence Beyond Reporting.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This is a great overview of business analytics.  This is especially valuable in it’s explanation of how the analytics needs to support the strategy of the organization.

Franks B.  2012.  Taming the Big Data Tidal Wave: Finding opportunities in huge data streams with advanced analytics.  Hoboken New Jersey: John H Wiley and Sons, Inc.

This was an  optional read, but I recommend reading it.  It is written in a very understandable way, and provides a great overview of the big data analytics ecosystem.

Groves RM, Fowler FJ, Couper MP, Lepkowski JM, Singer E, Tourangeau R.  2009.  Survey Methodology.  Hoboken New Jersey: John H Wiley and Sons, Inc.

I will admit this was my least favorite book, but having said that, I learned a ton from it.  For anyone who will even think about using survey’s to collect data, this is a must read. However the 419 pages make this a chore.  It would be nice to have an abridged version.  What it does, though, is wake  you up to how complex the process of creating, deploying, and analyzing surveys is.  I grudgingly admit this was a valuable read.

Articles

There are some really great articles included in the reading list.

Videos

There are videos that were developed by another professor that review the weeks material.  I did not find these especially helpful, but they did provide an overview of the weeks information, and might be  helpful if you are having some trouble understanding the material.

Weekly Discussions

Again, the weekly discussion are where it happens.  There are one or more topics that are posted.  There are usually some really great comments posted, and you can gain a lot of insight if you actually think about what you are posting, and what other people have posted.  If you post on the last day a brief paragraph, then you are missing out on some valuable information.

Moneyball

The first course I have taken where a movie was required.  There are discussions around this movie and one of the assignments involves creating an analysis of the Oakland A’s and how they used analytics.  I enjoyed the movie and thinking about this.

Assignments

There are four assignments where you must create a paper of varying lengths.  You must create this using the appropriate APA format, so it is useful for refining these skills.

I found these to be challenging, fun, motivating, and extremely enlightening.  These called for the application of what we learned to some real world situations.  For one of these, I performed an in-depth analysis of our organizations analytics which involved interviewing our senior leadership.  As a result of these interviews, it really started the process of moving our organization to the next analytical maturity level in a very meaningful way.

Another project involved the creation of a draft dashboard using the best practices outlined by Stephen Few in his text.  This was a great learning experience for me, and one that will translate into much better dashboards at our organization.

The last project involved creating a meaningful and valid survey.  This was informative as well, and I actually might send out my survey.

Summary

Overall, this was a fantastic course.  This will make it clear why we need to do this well, and what doing this well looks like.  After this, the actual work of understanding and developing predictive models begins.  Again, I feel as if got my money’s worth (not an easy thing to say since these courses are pricey!).

Summer Activities

I am taking the summer off and am trying to catch up on the projects that have been piling up.  For fun I am learning SQL (great book – Head First SQL by Lynn Beighley) and working my way through several Python Udemy courses.  I will be attending the SciPy 2016 Conference in Austin Texas in July as well, and am super excited about this. I will be going to tutorials on Network Science, Data Science is software, Time Series analysis and Pandas. If you are attending, give me a shout out.

 

 

 

 

 

 

 

 

 

Becoming a Healthcare Data Scientist, Northwestern University MSPA

Northwestern University MSPA 400 Math for Modelers course, final thoughts

 

 

I had previously posted my interim thoughts on this course, and now that the course is finished, thought I would add my final thoughts.

The final examination was fair and a mixture of the math and Python.  You could certainly pass the course if you didn’t keep up with the Python, and do the exercises, but would be much more difficult.

The last section of the course was on calculus.  Weeks 6 and 7 were devoted to a review of differential calculus and weeks 8 and 9 were devoted to integral calculus.   Dr. Goldfeder continued to stress the real world application of the concepts learned.

We had a week off over the Thanksgiving holiday, which allowed us to catch up and review before the final examination.  I took this time to both review the math (a little), and review Python (a lot).  I went back through each weeks Python assignments to make sure I understood the concepts and could work through the code.  I HIGHLY recommend this.  Looking back I wished I would have spent more independent time applying Python and writing code to do the problems as much as possible as we were going through the course.  I encourage future students in this class to attempt to do this.

After the class ended I started catching up on my to do list, which included how to use Jupyter Notebooks.   After doing more exploring of the Jupyter Notebook, I wished I would have found them earlier.  They are very useful for learning code, and taking notes at the same time.  I would encourage students to look at these when they start this course.  I wish Northwestern University would do what several other universities have done, and that is start teaching the class using these notebooks.  This would be extremely useful for the Python part of the course.  I have now been brushing up on R using the same Jupyter Notebooks, with an R kernel installed.  I plan on using this notebook as I go through my next class, statistical analysis, which uses R.  Here is the link to Project Jupyter’s webpage.

My overall assessment of the math for modelers course is highly positive, and I feel as if I learned what I set out to learn, and got my money’s worth.  It is a very demanding class time wise, but for those interested in analytics, this is a foundational set of knowledge that must be learned.

Becoming a Healthcare Data Scientist, Northwestern University MSPA, Predictive Analytics

Interim Review of Northwestern University’s MSPA Math for Modelers course.

Predict 400, Math for Modelers Course, Northwestern University MSPA

I am going to summarize my experience to date with Northwestern University’s Master of Science in Predictive Analytics program. I am past the halfway point (week 7 of 9) of my first trimester in this program. I am enrolled in one course, Predict 400, Math for Modelers. This is being taught by Professor Philip Goldfeder.

I will first describe the outline of how the course works. This is an asynchronous learning experience, for the most part. We have had one live session with Prof. Goldfeder. The coursework is presented through the online platform called Canvas. There are three main components to the class, which I will describe in greater detail below. The first component is learning the actual math. The second is participating in discussions about questions posed each week by Prof. Goldfeder. The third is learning Python.

What I really love about this program is how it brings together the book work, homework, learning python, and getting help for problems/questions, into one place. I have been trying to informally do this on my own, and it was frustrating for me to try and learn math/machine learning/etc. using either books or other online courses, learn python a separate way, and then have difficulty getting my questions answered. It is a 1000% easier when this is all rolled into one. Even though this is a lot more expensive than doing it on your own, to me it is worth every penny.

Professor Philip Goldfeder.  He is a great Professor for this course. He received great reviews in the CTEC (Course Teacher and Evaluation Council, these are visible when signing up for classes) and I see why. Not only is he extremely knowledgeable, he is also very engaged with the students and seems genuinely interested in making sure we learn and understand the material. He is also great at challenging the students to think of ways to apply the concepts learned to real world examples. I highly recommend him.

Canvas platform. This is where you go to do everything. It has sections for Announcements, Syllabus, Modules (describe each week’s assignments and is where you download things), Grades, People (section where everyone gets to describe themselves and you get to know your classmates), and Discussions.

The math. The introductory course, Math for Modelers is designed to be a “Review of fundamental concepts from calculus, linear algebra, and probability with a focus upon applications in statistics and predictive modeling. The topics covered will include systems of linear equations and matrices, linear programming, concepts of probability underlying both classical and Bayesian statistics, differential calculus and integration.” This is a very aggressive review of linear algebra, probability, differential calculus and integral calculus. This would be easier for someone who has taken these courses recently, but is challenging for me since it has been decades since I learned this (not sure I really learned some of this the first time around). You are assigned 1-2 chapters a week from the textbook “ Lial, Greenwell and Ritchey (2012). Student Solutions Manual for Finite Mathematics and Calculus with Applications, 9th Ed.”  Prof. Goldfeder prepares a high level video that reviews the material in the chapters. He also posts PowerPoint presentations of the material in each chapter.

Homework. You are then required to complete a homework assignment each week, which covers the material in the chapters. This is typically 20-30 questions. This is completed through the Pearson educational application. This is a FANTASTIC resource. The textbook is online here. Each chapter has it’s own section, and you can do problems in each sub-chapter of each chapter. If you struggle with the solution, you can actually have the application walk you step by step through the problem, and show you similar problems. There are links out to the textbook that takes you right to the section dealing with the problem you are working on. There are also videos available to view on the topic as well. I almost always do all of the study problems. The homework is another section in here, and that is how you submit your homework. Homework is worth 25% of your grade.

Discussions. This is a surprisingly difficult section. The NU MSPA program is designed as an applied program, and designed to use real world examples and learning. To that end, Dr. Goldfeder challenges us each week to come up with real world examples or explanations of the material we are learning. To formulate a response to this can take a surprising amount of time if you take it seriously, but in doing so I have learned a lot. The process makes you think about how these concepts could be used in the real world. You are supposed to post your discussion response by the middle of the week so that you can participate in the discussions about what you posted, as well as what your classmates posted. The kicker is that you can’t see what other students have posted, until you post your submission. I have learned a lot from these discussions. The other students in the course have such a wide background, that they can weigh on the topics in a meaningful way. We have students with backgrounds in sports analytics, actuaries, people working in industry, medicine, computer science, etc. The discussions are worth 25% of your grade.

Python. This could be extremely challenging if you have not had any exposure to Python or programming. I knew this would be a challenge, so I did take a few Python courses (Codecademy’s Python course at http://www.codecademy.com/learn/python, How to think like a Computer Scientist at interactivepython.org) prior to enrolling in the class. However, I would label myself still a beginner in Python, and the exercises challenged me to expand my knowledge of Python. However, I personally think this is one of the most gratifying portions of the course. I really enjoy combining what we are learning with Python. We cover the basics of Python, creating graphs and plots, using NumPy and SciPy. I love this part of the course. This is done through the Enthought Canopy platform. This has the interactive editor, the package manager, and of great value, the “Training on demand”, which is a very comprehensive series of instructional videos. These cover basic and advanced functionalities. Well worth the money, just for access to these videos. There is no grade each week for the Python assignments, however, you need to keep up with these. There were questions on the midterm that specifically required the use of Python to analyze the question and display the results. We have a Python TA assigned to the class who is very responsive to questions. In addition, students post code and help provide input on any questions.

Tests. The midterm is worth 25% of the grade as is the final examination. The midterm was a take home test, and required a substantial investment of time to complete. In addition there was the regular homework/reading for that week, although the discussion that week was optional. This is a week when you would want to cut yourself some slack and allow extra time. I had a heavy work week that week, and regretted not thinking about this ahead of time to give myself a lighter work schedule.

Time requirement. I am finding that I am devoting 20-30 hours per week to do all of this. You could devote less time if you were more up to date on the math or Python. But remember, I am doing this to learn and retain the information. So I am doing all of the reading in the textbook, doing all of the example problems and “your turn problems”, and almost all of the chapter problems in the Pearson application. I have not had time to do all the problems in back of the textbook however. I also try to provide meaningful input into the discussions, both in my submission, and commenting on what other students have posted. I have also been trying to continue to dive deeper into learning Python.

Typical week. I usually try to do the textbook reading on Monday and Tuesday. (All of the assignments are due midnight Sunday night, so Monday starts a new week). I don’t do a lot of problems initially as I want to get through the reading, so I can apply it to my discussion. Then on Wednesday I like to start working on my discussion submission and try to get it in by Wednesday, or Thursday at the latest. That way I can participate in the discussions in a meaningful way. After I get my discussion submitted, I go back and work through the chapter problems in Pearson. I like to get to the homework section on Saturday. Ideally I like to have Sunday to do the Python reading and assignments.

My overall assessment of this course is that I am extremely satisfied. I think this is very professionally done, I am learning the math, I am being challenged to think about applying this to the real world, and I am learning Python. There is definitely a lot going on, but that is why I signed up for this. I feel as if I am getting my money’s worth.

Becoming a Healthcare Data Scientist

Physician Data Scientist – Why and What Type? Part I.

Why would a practicing Emergency Medicine Physician want to become a Data Scientist, and what type of Data Scientist could I become?

I will provide my answers to those two questions, starting with what type of Data Scientist in this post, followed by Why I want to become a Data Scientist in Part 2.

First – What kind of Data Scientist do I see myself becoming?

Types of Data Scientists

I am going to use the framework that Bill Voorhies referenced in his blog post “How to Become a Data Scientist” (http://data-magnum.com/how-to-become-a-data-scientist/).  He used the framework developed by Harris, Murphy and Vaisman in their 2013 O’Reilly report “Analyzing the Analyzers.  An Introspective Survey of Data Scientists and Their Work“, available for free at http://www.oreilly.com/data/free/analyzing-the-analyzers.csp.  They describe 4 different subtypes of Data Scientists – Data Businessperson, Data Creative, Data Developers, and Data Researchers.  Figure 3-3 shows the skill sets strengths in each group. Below figure 3-3 I will provide a synopsis of how they described each subset.

2015-07-07_20-01-14

Data Businesspeople are most focused on the organization and how data projects yield profit.  They are leaders and entrepreneurs.  They have technical skills and work with real data.  They are the most likely group to have an MBA, and have an undergraduate Engineering degree.

Data Creatives are seen as the broadest of the Data Scientists, excelling at applying a wide range of tools and technologies to a problem, or creating innovative prototypes at hackathons, the quintessential Jack of All Trades.    They are seen as Artists.  They have substantial business experience.

Data Developers are focused on the technical problem of managing data – how to get it, store it, learn from it.  They are writing a lot of code, and have substantial computer science backgrounds.  They have more of the machine learning/big data skills than the other groups.

Data Researchers have a strong background in statistics, and have an academic background.

What type of Data Scientist do I see myself becoming?

I see myself fitting into two categories – a mix of the “Data Businesspeople” and the “Data Creative” subtypes of data scientists.   Although it will be easiest to become the Data Businesspeople type, I have aspirations of becoming more of a Data Creative or Jack of All Trades type as well.  I will discuss the different skill sets used in the analysis, and where I see my current strengths, and where my future strengths need to be developed in order to achieve these goals.

In terms of business skills, I have a broad general understanding of medicine in general, and emergency medicine in particular.   I also understand the Prehospital Emergency Medical Services environment, having started my career as an EMT-Paramedic, and having served as a Medical Director for several EMS services.   I am currently the Medical Director for our Air Ambulance service.  In addition, as a Chief Medical Information Officer, I understand the IT needs of clinicians and health care workers, and the technical realities of what IT can deliver.    I also serve as the Physician Liaison to our BI/Enterprise Analytics Division.   I see my experience and knowledge as a subject matter expert for clinical medicine driving the kinds of research questions that our data science/data analytics team attempt to answer.

I already have a deep interest in developing predictive algorithms that could be incorporated into bedside monitoring technologies that would be used to predict future states and detect early clinical deterioration.   This information could be used to guide triage decisions for clinicians;  is the patient safe to be discharged home, or do they need to be admitted to the hospital?   If they need to be admitted, do they need to be in the ICU, or is an unmonitored bed going to be ok?  Is the patient predicted to recover uneventfully, or do they have a high probability of deterioration requiring high resource utilization and admission to the ICU?  Does a patient at a small rural critical access hospital need to be transferred to a tertiary care facility that might be hundreds of miles away, taking them away from their family support network, and exposing them to the dangers of transfer and the costs of transfer (currently between $25,000 – $75,000), or can they be safely treated at their hometown facility.   Will the Internet of Things help us to remotely monitor patients at home, or even in the hospital, to detect either improvement or deterioration, before it is clinically apparent, thereby allowing earlier treatments and interventions and improving outcomes?  These are some of the important unanswered questions in my mind.

My weakest current skill, and continued weakest skill going forward I see as programming or hacking.   That is why I will never be a pure Data Creative type.   I do want to get competent at more than a basic level, in order to be able to do some of the work myself, and hand off the really complicated code to a true programmer.   I am currently working on learning Python, having finished the Codecademy course, and am almost finished with Zed A. Shaw’s “Learn Python the Hard Way”.   I know some R as well, mainly for statistical analysis.  Having said that, I am a novice coder at best.

I am extremely interested in machine learning and big data.  I would really like to become adept at analyzing big data because I see the potential of this approach in analyzing healthcare data.  This will be a big focus of mine.

I have a basic background in math and statistics, and am actually looking forward to relearning them again.   I think I will learn a tremendous amount now that I understand the importance of having this background.  I am currently working my way through the textbook we will be using in the fall for the math for modelers course.

When you consider all of the factors, my largest skill set is my business or subject matter experience.   I think this will allow me to be a better leader in choosing which analytics projects we pursue.   Having a good background in what types of analyses are possible, and which type are good for what situation, will help me make better decisions, and understand the results.   I am hopeful that I will then be able to translate the insights learned into understandable and actionable information that can be presented to the various stakeholders.

I am also hopeful that I can help drive the changes that are needed across the organization, based on the insight learned.  That is the basis for the “Learning Health System” concept.   A Learning Health System has to be able to capture important data, analyze it, gain insights, diffuse these insights, and rapidly change behavior incorporating these insights.  Our institution is currently trying to understand the meaning/basic concepts of a Learning Health System and put in place the framework and people necessary to achieve the goals of this system.  I hope to contribute to this in a meaningful manner.  There are also national initiatives on becoming Learning Health Systems.  The Learning Health Community (http://www.learninghealth.org/home/) is an excellent resource listing  core values, and some of the organizations also working on this goal.

In my next post, I will answer the question of Why I want to become a data scientist.

Becoming a Healthcare Data Scientist

My Current Baseline Data Scientist Skill Set

It will be interesting to compare my skill set once I finish the predictive analytics program to my current skill set.  I will outline my current skills so I can come back later and compare the two.

I will organize my skills using the format presented by Mitch Sanders in his blog article posted on 8.27.13 “Data Science – Capturing, Analyzing, and Presenting Data Skills”.  (http://datareality.blogspot.com/2013/08/data-scientist-core-skills.html).

1.  Capturing Data

Programming and Database skills:

I am weak in this area.  I have used R a bit to do some statistical analysis in the past.  I am currently learning Python  as I write this.  So far, I have found that Codecademy’s Python course is the best learning platform for me.  My next favorite resource is Zed Shaw’s book, “Learn Python the Hard Way”.  I really like his practical approach.  “Introducing Python.  Modern computing in simple packages” by Bill Lubanovic is also good, but but a bit more advanced.  Finally, the Visual Quickstart Guide “Python” by Toby Donaldson is a quick reference guide.  Going past basic programming, my skills are near or below zero.  I do not know how to use Hadoop, Java, SQL, Hive or Pig.

Business Domain Expertise and Knowledge

This is my strongest area of expertise.  I started off in medicine in 1984 as a basic EMT, became a EMT-Paramedic, and then Paramedic Educator.  I finished medical school (University of Illinois College of Medicine in Peoria Illinois) in 1994, and my Emergency Medicine Residency at Saint Francis Hospital in Peoria Illinois in 1997.   I have practiced academic and community based emergency medicine since then.   I have been a medical director for both ground based EMS and for a flight program.  I am also one of our health system’s Chief Medical Information Officers (CMIO), so have had to learn the field of Healthcare Information Technology as well.   In my current role I have a special interest in Business Intelligence and Analytics, including predictive analytics.  My passion is for developing smarter systems that can provide information about a patients risk of developing certain diseases/conditions, risk of deterioration/death, early detection of sub-clinical illness, and information about a patient’s response to treatment and therapy.  Hence my interest in predictive analytics.

Data Modeling, Warehouse, and Unstructured Data Skills.

I have minimal skills in this category.

2.  Analyzing Data

Math Skills.

I have basic math skills, but it has been a long time since I have had to do more than basic math, including calculus and linear algebra.  After I finish getting a basic foundation in Python, my next step is to refresh my knowledge of math/calculus/linear algebra before starting my “Math for Modelers” course this fall.

Statistical  and Analytical Skills

I do have a little better grasp of descriptive and inferential statistics.   But I will need to increase my knowledge of the advanced statistical techniques not commonly used in medicine today.  These would include predictive analytics, regression, multivariate analysis, linear models, time series analysis, machine learning, etc.

3.  Presenting Data

I am really excited to learn about and improve my data visualization skills.  I am really pushing hard for our organization to move away from excel and PowerPoint based presentations of data, to more relevant methods.

Storytelling Skills

I am a pretty good storyteller, but would like to improve my skills, especially in presenting the data and stories around the data.  I would like to help people  understand the insight created by the data analysis, and then help them move to operationalizing that insight, and driving organization change to improve patient outcomes.

In summary, my strongest skills are my love of data and analytics, my (obsessive) desire to become a data scientist, and my domain knowledge as it pertains to healthcare.  My other skills will have to be works in progress.

I would love to hear comments on what you think, and any recommendations/advice for students just starting this journey.

June 10, 2015

Becoming a Healthcare Data Scientist

Am I a little anxious to get started with classes at Northwestern?

I had a nice conversation with my academic adviser at Northwestern this morning and laid out a preliminary plan.  I then signed up for my first class – 400-DL Math for Modelers.  This will cover a review of matrices, linear programming, probability, differential and integral calculus with an emphasis on applications.

I actually ordered my textbook, Finite Mathematics and Calculus with Applications by Lial, including MyMathLab.  Even though classes don’t start until September, I do want to start reviewing the textbook, as I am a little anxious since it has been so many years since I took these courses.

2015-06-05_17-14-18