Artificial Intelligence, Cerner, Electronic Health Record (EHR), Healthcare Analytics, Healthcare Technology, Machine Learning

Cerner’s Strategy to Deploy “Intelligence” into the Cerner Ecosystem – Insights from the 2018 Cerner Strategic Client Summit

I feel privileged to have been invited for the second year in a row to Cerner’s Strategic Client Summit.  The meeting location – downtown San Diego – and the Summit content were both fantastic.  I will attempt to summarize a few key concepts, and why I feel optimistic about where Cerner is heading – both from an overall perspective as well as from an improved product and end-user experience perspective.

I was very impressed with the collaboration between Cerner President Zane Burke and Cerner’s new Chairman and CEO Brent Shafer.   I had the opportunity to have several conversations with both Zane, whom I have known for a while now, and Brent whom I just had an opportunity to meet for the first time.  I found Brent to be very personable, and very thoughtful about his vision for Cerner.  I was also impressed with the collaborative relationship between Zane and Brent, and think they will lead Cerner in the correct direction.

I am not trying to downplay the roles that many people play inside of Cerner, because there are a lot of great things going on, but I think there are two strategic people that Cerner needs to pay attention to in order to move their EHR to the next level.

The first is Paul Weaver, Cerner’s Vice President of User Experience.  I first saw and met Paul at the 2017 Strategic Client Summit in Dallas Texas, and was very impressed by his vision and enthusiasm.  He hails from the gaming industry, and brings his expertise to the world of healthcare software, where it is much needed.  Here is a link to a 13 minute podcast where Paul talks about the importance of the user experience.  At the 2017 Summit, he used the words “user delight” as his goal of how the interaction with the EHR would make end users feel.  I don’t know about you, but I have used many words to describe my feelings of how the EHR made me feel, and none of them were delight!  His message – all interactions with a software program elicit some type of an emotional reaction.  He wants those reactions to be positive, decreasing stress, and making both patients and healthcare providers/workers “happier and healthier”.  This is a laudable goal, and will help in the fight to combat physician (and other healthcare worker) burnout/suicide – since negative experiences with the EHR are almost always identified as one of the top contributing factors to burnout.  This is an EXTREMELY important area for Cerner to get right, and they need to support Paul Weaver in his efforts to accomplish his goals.

The second strategic person is David Cohen, Vice President for Intellectual Property Development.  David’s presentation, “Activating Intelligence to Transform Care”, was visionary, and he articulated the concepts of machine learning and artificial intelligence, and how to utilize them in healthcare better than anyone else I have heard or read to date.  I will provide some high level overview of his vision below.  At this time it appears they have branded these efforts as “Cerner Intelligence – Leveraging the Power of Data”.

Cerner sees the new demands on health care as being proactive health management (vs reactive sick care); cross-continuum care system (vs fragmented niche care); rewards for quality, safety and efficiency (vs rewards for volume); and person and care team-centric (vs clinician-centric).

Value “drivers” were presented.    These were specific areas where Cerner intends to deploy their Intelligence to make meaningful improvements.   These included clinical and quality drivers, operational drivers, financial drivers, and drivers around improving the experience.  I feel these are appropriate areas to start deploying this Intelligence.  I can post more on this when this information becomes publicly available, because there are some important key areas that if realized, will bring great value to organizations.

Where David’s presentation got really interesting was when he started presenting how Cerner’s areas of focus were on using machine learning, artificial intelligence, and knowledge management.   I am going to provide his definitions of each, because I think they are defined very nicely.

  • Machine Learning:  Leveraging the power of data and statistical methods to create new insights and workflow optimizations
  • AI experiences:  Leverage Artificial Intelligence capabilities that mimic human behaviors such as voice, vision, language, and conversation to enhance human abilities
  • Knowledge management:  Ensure data is complete, contextual, and accurately represented using standards based medical vocabularies\

David then started talking about “AI Experiences” (see diagram from Cerner below – reprinted with permission).   I am convinced that Cerner gets where they should be going in regards to incorporating AI into healthcare, on a very practical basis.  This starts with the inputs into the AI systems, the transformations of those inputs by the system, the incorporation into the knowledge management systems, and most importantly, the AI applications that will make the EHR a true virtual partner in the healthcare process – for providers, patients, and healthcare workers.  What was shown was more than a concept, and the demo’s they put on showed that they are making progress on these. The concept of a mouse-less and keyboard-less interaction with the EHR may be a reality, sooner rather than later.  I encouraged Cerner executives to support these initiatives deeply and at the highest levels.

Cerner AI

 

Overall, I am very excited and optimistic about Cerner’s vision, and for the prospect of them delivering meaningful improvements and solutions – both near-term and long-term.  Their focus on improving the user experience and making it “delightful” is a very important initiative.   Their focus on using data to improve – almost everything – is foundational for moving all of us forward.  My plea to Cerner is to continue to very deeply support these initiatives, and the talented people they have focused on these.

Data Science, Deep Learning, Machine Learning, Neural Networks

Neural Networks, Deep Learning, Machine Learning resources

I have come across a few great resources that I wanted to share.  For students taking a machine learning class (like Northwestern University’s MSDS 422 Practical Machine Learning) these are great references, and a way to learn about them before, during, or after the class.  This is not a comprehensive list, just a starter.

Textbook

There is a free online textbook, Neural Networks and Deep Learning.

Videos

There is a great math visualization site called 3Blue1Brown and they have a YouTube channel.  There are 4 videos on neural networks/deep learning which are really informative and a good introduction.

  1.  But what *is* a Neural Network? Chapter 1, deep learning
  2.  Gradient Descent, how neural networks learn. Chapter 2, deep learning
  3.  What is backpropagation really doing? Chapter 3, deep learning
  4.  Backpropagation calculus. Appendix to deep learning chapter 3.

There is a great playlist on Essence of linear algebra, which is a great review and explanation of linear algebra and matrix operations.  I wish I would have seen this when I was learning it.

Scikit-Learn Tutorials

There are tutorials on the Scikit-Learn site.

TensorFlow tutorials

They provide a link to this Google “Machine Learning Crash Course” – Google’s fast-paced, practical introduction to machine learning.

The TensorFlow site has a Tutorials page.  There are tutorials for Images, Sequences, Data Representation, and a few other things.

 

Google AI

Google has it’s own education site (which also has the Machine Learning Crash Course referenced above).

 

Blog sites

Adventures in Machine Learning, Andy Thomas’s blog.

This is a must view site, and worth visiting several times over.   Andy does a great job explaining the topics and has some great visuals as well.  These are fantastic tutorials.  I have listed only a few below.

Neural Networks Tutorial – A Pathway to Deep Learning

Python TensorFlow Tutorial – Build a Neural Network

Convolutional Neural Networks Tutorial in TensorFlow

Word2Vec work embedding tutorial in Python and TensorFlow

Recurrent neural networks and LSTM tutorial in Python and TensorFlow

 

colah’s blog – Christopher Olah’s blog

Another great blog, with lots of good postings.  A few are listed below.

Deep Learning, NLP, and Representations

Neural Networks, Types and Functional Programming

 

Courses

DataCamp – one of my favorite learning sites.  It does require a subscription.

DataCamp currently has 9 Python machine learning courses, which are listed below.  They also have 9 R machine learning courses.

Machine Learning with the Experts: School Budgets

Deep Learning in Python

Building Chatbots in Python

Natural Language Processing Fundamentals in Python

Unsupervised Learning in Python

Linear Classifiers in Python

Extreme Gradient Boosting wiht XGBoost

HR Analytics in Python: Predicting Employee Churn

Supervised Learning with Scikit-Learn

 

Udemy courses

Udemy is also a favorite learning site.  You can generally get the course for about $10.

My favorite Udemy learning series is from Lazy Programmers Inc.  They have a variety of courses.  Their blog site explains what order to take the courses in.   There are many other courses from different instructors as well.

Deep Learning Prerequisites: The Numpy stack in Python

Deep Learning Prerequisites: Linear Regression in Python

Deep Learning Prerequisites: Logistic Regression in Python

Data Science: Deep Learning in Python

Modern Deep Learning in Python

Convolutional Neural Networks in Python

Recurrent Neural Networks in Python

Deep Learning with Natural Language Processing in Python

Advanced AI: Deep Reinforcement Learning in Python

Plus many other courses on Supervised and Unsupervised Learning, Bayesian ML, Ensemble ML, Cluster Analysis, and a few others.

 

If you have other favorite machine learning resources, please let me know.

 

 

Data Scientist, Northwestern University MSDS Program, Northwestern University MSPA

Northwestern University’s Masters of Science in Predictive Analytics (MSPA) becomes the Masters of Science in Data Science (MSDS)

Starting in the Spring Quarter of 2018 the MSPA (Masters of Science in Predictive Analytics)  program became the MSDS (Masters of Science in Data Science) program.  This was announced in January of 2018 and the name change become official in the Spring Quarter of 2018.  Existing MSPA students had the options of staying in the MSPA program with it’s requirements, or transferring over to the MSDS program.  I elected to transfer to the MSDS program.  There is a webex on the MSDS program – click here for the webex.

In the webinar, Dr. Thomas Miller, the faculty director of the MSPA and now the MSDS programs, related that Northwestern University’s MSPA program started in the fall of 2011, before the term data science was a widely known or used term.  However, since then it has become mainstream, and has emerged as a discipline in it’s own right.   Therefore the decision to change the name of the program.

Data science was described by Dr. Miller as “an emerging, integrative academic discipline” encompassing Business needs (strategy, management, leadership, communication skills), Modeling (statistics, machine learning, and model building), and Information Technology (databases, etc).  Each of these is covered in the MSDS program.

Dr. Miller also commented that the main programming language moving forward would be Python.   Initially when the program was formed, SAS and SPSS were the main languages.  Python and R were brought in at a later date.   R will still be used in some courses in the Analytics and Modeling Specialization courses.   He did not make it clear whether SAS would still be an option though.

MSDS Program Overview

You need to successfully complete 12 courses.  There are core courses, elective courses, and specialization options.

Core Courses

MSDS 400 – Math for Data Scientists

MSDS 401 – Statistical Analysis

MSDS 402 – Introduction to Data Science

MSDS 420 – Database Systems and Data Preparation

MSDS 422 – Practical Machine Learning

MSDS 460 – Decision Analytics

MSDS 475 – Project Management or MSDS 480 Business Leadership and Communications

MSDS 498 – Capstone or MSDS 590 – Thesis

 

A new elective was created for students with limited programming background:

MSDS 430 – Python for Data Science

Specializations

 

Analytics and Modeling Specialization

Designed for data scientists seeking technical roles as data analysts, applied statisticians, and modelers. Courses focus on statistical inference and applications of predictive models.

Required Courses:

MSDS 410 – Regression and Multivariate Analysis

MSDS 411 – Generalized Linear Models

Plus 2 electives

 

Data Engineering Specialization

Designed for students seeking technical positions focused on designing, developing, implementing, and maintaining systems for data science.

Required Courses:

MSDS 432 – Foundations of Data Engineering

MSDS 434 – Analytics Application Development

Plus 2 electives

 

Analytics Management Specialization

Designed for students seeking technical leadership and data science management positions.

Required Courses:

MSDS 474 – Accounting and Finance for Analytics Managers

MSDS 475 – Project Management

MSDS 480 – Business Leadership and Communications

(Students in this specialization have to take both 475 and 480)

Plus 2 electives

 

*Artificial Intelligence and Deep Learning Specialization

*This has not been officially announced – this information is from comments that Dr. Thomas Miller made during  a MSDS 422 Sync session.  He said that this specialization is being developed – so take these comments as being preliminary.  I personally am really excited about this specialization, as I just finished MSDS 422 – Practical Machine Learning – and realize the growing importance of machine learning now and in the future.

Required Courses:

MSDS 453 – changing from Text Analytics to Natural Language Processing

MSDS 458 – Artificial Intelligence and Deep Learning

Plus 2 electives

These new electives are being created:

Computer Vision

Software Robotics

 

Listing of all current elective courses:

MSDS 410 – Regression Analysis

MSDS 411 – Generalized Linear Models

MSDS 413 – Times Series Analysis and Forecasting

MSDS 430 – Python for Data Science

MSDS 432 – Foundations of Data Engineering

MSDS 434 – Analytics Application Development

MSDS 436 – Analytics Systems Analysis

MSDS 450 – Marketing Analysis

MSDS 451 – Financial and Risk Analytics

MSDS 452 – Web and Network Data Science

MSDS 453 – Text Analytics – soon to become Natural Language Processing

MSDS 454 – Data Visualization

MSDS 456 – Sports Performance Analytics

MSDS 457 – Sports Management Analytics

MSDS 458 – Artificial Intelligence and Deep Learning

MSDS 459 – Information Retrieval and Real-Time Analytics

MSDS 470 – Analytics Entrepreneurship

MSDS 472 – Analytics Consulting

MSDS 474 – Accounting and Finance for Analytics Managers

MSDS 490 – Special Topics in Data Science

 

 

 

 

Machine Learning, Northwestern University MSDS Program, Northwestern University MSPA

Northwestern University MSDS (formerly MSPA) 422 – Practical Machine Learning Course Review

This course was taught by Dr. Thomas Miller, who is the faculty director of the Data Science program (formerly known as the Predictive Analytics program – I am going to post an article discussing the program name change from the Master of Science in Predictive Analytics (MSPA) to the Master of Science in Data Science (MSDS)).  Overall, this was an excellent review of machine learning, and is a required core course for all students in the program.  It is most definitely a foundational course for any student of data science in today’s world.  It is also a foundational course for the Artificial Intelligence and Deep Learning specialization, which is currently being developed (more on this in a subsequent post as well).  The course covers the following topics:

  • Supervised, Unsupervised, and Semi-supervised learning
  • Regression versus Classification
  • Decision Trees and Random Forests
  • Dimensionality Reduction techniques
  • Clustering Techniques
  • Feature Engineering
  • Artificial Neural Networks
  • Deep Neural Networks
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)

This course uses Python and the Python Libraries Scikit-Learn and TensorFlow. In addition to using Jupyter Notebooks to run my code, I also learned how to run TensorFlow from the Command Line, which is a faster way of running neural networks through a large number of epochs. The course is currently offered in R as well, but they will be discontinuing the R course, and only offering the Python/TensorFlow course starting in the fall semester.   Dr. Miller commented that they will be using Python much more extensively going forward, especially in the AI/Deep Learning specialization courses.  R apparently will still be offered in the Analytics/Modeling courses – 410 (Regression Analysis) and 411 (Generalized Linear Models).   I did learn to use Python/Scikit-Learn/TensorFlow at an intermediate level, and feel like I have a great foundation to build upon, in terms of programming.

Course Structure

There is required reading every week, mainly from the two required textbooks, although there are a few articles to read as well.  There were a total of 5 sync sessions which reviewed various topics.   I wish the sync sessions had been a little more robust, and covered the current assignments and the coding required to complete the assignments.  I found this very helpful in previous courses.  There were weekly discussion board assignments, which covered basic concepts, and turned out to be very informative, especially since a lot of the topics covered on the final exam were covered in these discussions.  There are weekly assignments which must be completed, in which you either develop the code yourself, or use a skeletal code base provided and build upon it.   These ranged from very easy to very difficult, especially as you moved into the artificial neural networks.  There was a non-proctored final exam and a proctored final exam.

Textbooks

Primary Textbooks:

Géron, A. 2017. Hands-On Machine Learning with Scikit-Learn & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol, Calif.: O’Reilly. [ISBN-13 978-1-491-96229-9] Source code available at https://github.com/ageron/handson-ml  This was the primary textbook for most of the course.  It is an excellent text with lots of great coding examples.

Müller, A. C. and Guido, S. 2017. Introduction to Machine Learning with Python: A Guide for Data Scientists. Sebastopol, Calif.: O’Reilly. [ISBN-13: 978-1449369415] Code examples at https://github.com/amueller/introduction_to_ml_with_python

Reference Textbook:

Izenman, A. J. 2008. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. New York: Springer. [ISBN-13: 978-0-387-78188-4] This was used very little.

Learning Outcomes (from syllabus):

Learning Outcomes Practical Machine Learning is a survey course with a long list of learning outcomes:

  • Explain the learning algorithm trade-offs, balancing performance within training data and robustness on unobserved test data.
  • Distinguish between supervised and unsupervised learning methods.
  • Distinguish between regression and classification problems
  • Explain bootstrap and cross-validation procedures
  • Explore and visualize data and perform basic statistical analysis
  • List alternative methods for evaluating classifiers.
  • List alternative methods for evaluating regression
  • Demonstrate the application of traditional statistical methods for classification and regression
  • Demonstrate the application of trees and random forests for classification and regression
  • Demonstrate principal components for dimension reduction.
  • Demonstrate principal components regression
  • Describe hierarchical and non-hierarchical clustering techniques
  • Describe how semi-supervised learning may be utilized in addressing classification and regression problems
  • Explain how measurement and feature engineering are relevant to modeling
  • Describe how artificial neural networks are constructed from logical connections of artificial neurons and activation functions
  • Demonstrate the use of artificial neural networks (including deep neural networks) in classification and regression
  • Describe how convolutional neural networks are constructed
  • Describe how recurrent neural networks are constructed
  • Distinguish between autoencoders and other forms of unsupervised learning
  • Describe applications of autoencoders
  • Explain how the results of machine learning can be useful to business managers
  • Transform data and research results into actionable insights

 

Weekly Assignments

Here are the weekly learning titles and assignments:

Week 1.  Introduction to Machine Learning

  • Assignment 1. Exploring and Visualizing Data

Week 2.  Supervised Learning for Classification

  • Assignment 2. Evaluating Classification Models

Week 3.  Supervised Learning for Regression

  • Assignment 3. Evaluating Regression Models

Week 4. Trees and Random Forests

  • Assignment 4. Random Forests

Week 5.  Unsupervised Learning

  • Assignment 5. Principal Components Analysis

Week 6. Neural Networks

  • Assignment 6. Neural Networks

Week 7.  Deep Learning for Computer Vision

  • Assignment 7. Deep Learning

Week 8.  Deep Learning for Natural Language Procession

  • Assignment 8 Natural Language Processing

Week 9.  Neural Networks Autoencoders

  • No assignment

 

Final Examinations

There were 2 final examinations, one being non-proctored and the other proctored.  The non-proctored exam was open book, and tested your ability to look at data and the various analytical techniques, and interpret the results of the analyses.  The proctored final exam was closed book and covered general concepts.

Final Thoughts

This was a great overview of some of the more important topics in machine learning.  I was able to get a good theoretical background in these topics, and learned the coding necessary to perform these.   This is a great foundation upon which to add more advanced and in-depth use of these techniques.  This course really challenged me to rethink what analytical techniques I should be learning and applying in the future, to the point that I am going to change my specialization to Artificial Intelligence and Deep Learning.

 

Becoming a Healthcare Data Scientist, Uncategorized

Update on lack of recent blog posts.

It has been a little more than a year since my last blog post, so I thought I would provide an explanation.   The bottom line is that I have not had a lot of free time to update my blog.  Two years ago this month, I took over as the Interim Chief Information Officer (CIO) for the integrated healthcare system that I work for.  This was in addition to my role as one of our system’s Chief Medical Information Officer (CMIO).   The interim CIO position was supposed to be just that, a brief period of time performing this role until a permanent CIO could be selected.  However, it turned out to be a longer period of time.

I have learned a tremendous amount during my  tenure as the interim CIO.  I have a much better appreciation for the roles that both information and technology play in contributing to the success of healthcare providers and organizations understanding and delivering the most effective healthcare to patients.   In order to further educate myself about what a modern digital healthcare CIO’s responsibilities are, I had to take some time off from the MSPA program.  However, I am back in the program (now called the MSDS program – Master of Science in Data Science – more to come about the name change of the program in a future blog post).  I just completed MSDS-422 Practical Machine Learning, and am totally excited by what I learned in this course (more to come on that as well).  As an aside, the practical application of machine learning (a subset of broader artificial intelligence), will (and is starting to) revolutionize healthcare through the much deeper insights obtainable through the use of neural networks and deep learning.   Anyone learning analytics today needs to understand and be able to apply machine learning techniques.  Period.

Data Science, Northwestern University MSPA

Python Tops KDNuggets 2017 Data Science Software Poll

The results of KDNuggets’ 18th annual Software Poll should be fascinating reading for anyone involved in data science and analytics.  Some highlights – Python (52.6%) finally overtook R (52.1%), SQL remained at about 35%, and Spark and Tensorflow have increased to above 20%.

KDNugetts_poll

(Graph taken from http://www.kdnuggets.com/2017/05/poll-analytics-data-science-machine-learning-software-leaders.html/2)

I am about halfway through Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I am very thankful that the program has made learning different languages a priority.  I have already learned Python, Jupyter Notebooks, R, SQL, some NoSQL (MongoDB), and SAS.  In my current class in Generalized Linear Models, I have also started to learn Angoss, SAS Enterprise Miner, and Microsoft Azure machine learning.  However, it looks like you can’t ever stop learning new things – and I am going to have to learn Spark and Tensorflow – to name a few more.

I highly recommend you read this article.

 

Becoming a Healthcare Data Scientist, Data Scientist, Healthcare Predictive Analytics, Northwestern University MSPA

Physician Data Scientist Part II. The Why.

I was recently reminded by a reader of my blog (thanks Al) that I had not followed up on a comment that I was going to post a second part to a blog that was posted on 7.7.2015 – “Physician Data Scientists – Why and What Type? Part I“.  Now that I am in between classes, I have the time to work on this.   Looking back at this original post, I am somewhat amazed at all that has happened in the last 1 1/2 years.

I am currently the interim Chief Information Officer (CIO) and Chief Medical Information Officer (CMIO) for our integrated healthcare system.   I stepped into the interim CIO role (helped in part by my Northwestern University MSPA Master of Science in Predictive Analytics coursework) after the departure of our previous CIO last year.  Prior to that I had been one of our systems CMIO’s – facilitating and communicating the needs for technology to help improve clinical outcomes to IT, while communicating back to Physicians and Leadership the limitations of current technologies.  I never really aspired to become either the interim CIO or a CMIO, these opportunities simply arose because of my journey to become better educated about the use of data and analytics to improve clinical outcomes – ie to become a Physician Data Scientist.  I will explain how I ended up in my current role.

My interest in data and analytics is a fairly recent phenomenon, occurring because of a chance meeting with someone who has since become one of my closest friends – Curt Lindberg – who has a PhD in Complexity Science, and is the Director of our Complexity in Healthcare Center.  I met him during a project to improve our process for getting patients into our healthcare system from outside facilities more efficiently.  At that time I was a practicing Emergency Physician and the Medical Director of our MedFlight Air Ambulance service.  Curt introduced me to complexity science and my life has not been the same – it was a transformational career moment for me.  I ended up being part of a small group of researchers who were trying to develop smarter patient monitoring systems.  Their work has inspired me to try and contribute in my own way to this field – called predictive monitoring.

Predictive monitoring is an unofficial term for what this group is trying to accomplish.  While the technology inside the monitors has changed drastically since the 1970’s, what the monitors do has not.  These monitors display certain physiologic markers of interest – blood pressure, pulse rate, temperature, oxygen level, ekg pattern, etc.  You can see what is happening to the patient right at that time, or you can go back and review what happened to them in the past (minimally), but there is no information about predicting what will happen to them in the future (are they predicted to get better, go into sudden cardiac arrest, stop breathing, or develop an overwhelming infection called sepsis, etc).  The goal is to incorporate predictive algorithms into these monitoring systems.

I have been fortunate to meet some giants in this field.  Dr. J. Randall Moorman  from the University of Virginia, who developed the first commercial predictive monitoring system – the HeRO monitor.  The largest ever randomized clinical trial in neonatal patients (premature babies) was conducted using this monitor.  It showed that the monitor was able to identify certain physiological patterns, and translate those patterns into a risk for developing an overwhelming infection (late onset neonatal sepsis).  This risk was detected an average of 18 hours before a clinical diagnosis was made, allowing for earlier treatments and interventions.  This translated into a 22% reduction in mortality.  Dr. Andrew Seely  is a Thoracic Surgeon at the University of Ottawa who has developed a model to predict the success of removing a breathing tube from a patient and not have to replace it because they weren’t ready to have it removed.   We got to participate in that clinical trial.  We also got to participate in a trial conducted by Ryan Arnold, now at Christiana Care in Newark Delaware, on trying to predict clinical outcomes using heart rate variability analyses.

In addition to collaborating with these researchers working on their projects, I became especially fascinated with a research article written by one of the countries leading trauma surgeons, Dr. Mitchell Cohen and his colleagues at San Francisco General Hospital and the University of California San Francisco – Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.  I will confess that I felt frustrated when I talked with the researchers about the underlying mathematical concepts and analytical techniques they were using, because I just did not understand them well.  This ignorance ignited what I will freely admit is now an obsession to understand these concepts and techniques.

I started off trying to educate myself using text books, taking on-line MOOC’s – Massive Online Open Courses, and enrolling in courses offered on the web.  I still felt very frustrated because these courses didn’t really go into the depth that I thought I needed.  When I look at the giants in this field of predictive analytics, these few researchers seemed to have both the clinical knowledge and understanding of why this research was so important, and they were also able to understand the mathematical and analytical concepts and techniques necessary to do research in this field.  I wanted to be like them.

I became very interested in becoming a data scientist at that point.  I eventually enrolled in Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I have not regretted this decision.  I currently am halfway through the program, and am finally into the especially relevant coursework.  I just finished the major foundational course – Linear Regression and Multivariate Analysis.  The courses up until then had been preparing me to take this course.   I realized I had come full circle when I re-read Mitchell Cohen’s article, and realized that I now finally understood the concepts and results.  That was an extremely satisfying moment for me.

This has been quite the educational journey for me.   I feel like I have a much better understanding of statistics. I am getting somewhat competent in a few programming languages – R, Python, and SAS.  I am using Jupyter Notebooks for my programming work.   I have dabbled with data science platforms like KNIME, and this quarter will be learning to use virtual machines, IBM Watson Analytics, ANGOSS, and Microsoft Azure machine learning – as part of my next class on Generalized Linear Models.

I finally feel as if I am able to start applying what I have been learning for the last 1 1/2 years – to start developing predictive models to improve clinical outcomes.  A few of my goals are to help our organization become more data driven, and to continue to work on developing predictive algorithms that could be incorporated into beside monitoring systems, further improving the outcomes of patients.

This is my journey to date from becoming a practicing Emergency Physician with no interest in data or analytics, to where I am now, halfway finished with my Master’s program.  The real journey of applying what I have learned to real world problems has just started but will get more robust as I learn more.