Becoming a Healthcare Data Scientist, Uncategorized

Update on lack of recent blog posts.

It has been a little more than a year since my last blog post, so I thought I would provide an explanation.   The bottom line is that I have not had a lot of free time to update my blog.  Two years ago this month, I took over as the Interim Chief Information Officer (CIO) for the integrated healthcare system that I work for.  This was in addition to my role as one of our system’s Chief Medical Information Officer (CMIO).   The interim CIO position was supposed to be just that, a brief period of time performing this role until a permanent CIO could be selected.  However, it turned out to be a longer period of time.

I have learned a tremendous amount during my  tenure as the interim CIO.  I have a much better appreciation for the roles that both information and technology play in contributing to the success of healthcare providers and organizations understanding and delivering the most effective healthcare to patients.   In order to further educate myself about what a modern digital healthcare CIO’s responsibilities are, I had to take some time off from the MSPA program.  However, I am back in the program (now called the MSDS program – Master of Science in Data Science – more to come about the name change of the program in a future blog post).  I just completed MSDS-422 Practical Machine Learning, and am totally excited by what I learned in this course (more to come on that as well).  As an aside, the practical application of machine learning (a subset of broader artificial intelligence), will (and is starting to) revolutionize healthcare through the much deeper insights obtainable through the use of neural networks and deep learning.   Anyone learning analytics today needs to understand and be able to apply machine learning techniques.  Period.

Data Science, Northwestern University MSPA, Uncategorized

DataCamp’s Importing Data in Python Part 1 and Part 2.

I recently finished these DataCamp  courses and really liked them.  I highly recommend them to students in general and especially to the students in Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.

Importing Data in Python Part 1 is described as:

As a Data Scientist, on a daily basis you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before doing any of these, however, you will need to know how to get data into Python. In this course, you’ll learn the many ways to import data into Python: (i) from flat files such as .txts and .csvs; (ii) from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files; (iii) from relational databases such as SQLite & PostgreSQL.

Importing Data in Python Part 2 is described as:

As a Data Scientist, on a daily basis you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before doing any of these, however, you will need to know how to get data into Python. In the prequel to this course, you have already learnt many ways to import data into Python: (i) from flat files such as .txts and .csvs; (ii) from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files; (iii) from relational databases such as SQLite & PostgreSQL. In this course, you’ll extend this knowledge base by learning to import data (i) from the web and (ii) a special and essential case of this: pulling data from Application Programming Interfaces, also known as APIs, such as the Twitter streaming API, which allows us to stream real-time tweets.

 

Uncategorized

Learning R, Jupyter Notebooks, and KNIME

Rlogo-1                          jupyterknime-86204569

 

I currently have about a month off between my last course Math for Modelers, and my next course, Statistical Analysis in Northwestern University’s Master of Science in Predictive Analytics program.  I have started to catch up on some things on my to do list.

The first of these tasks is getting to know R better.  I have a little bit of previous R experience, but at a very, very basic level.  I am currently working my way through DataCamp‘s excellent series of R programming courses.  I completed the “Introduction to R” course which consisted of 6 chapters covering the basics, vectors, matrices, factors, data frames and lists.  This was very informative and done nicely.  I am currently almost finished with the “Intermediate R” course.  This has 5 chapters covering, conditionals and flow control, loops, functions, the apply family, and utilities.  I highly recommend these courses for people starting to learn R, they are very nicely done.

I am also using Jupyter Notebooks  as I go through these courses.  I just started using these, and wish I would have found them much earlier.  I would have done my Python work in them as I went through the math for modelers course.  I just added the R kernel and have been doing the code and taking notes on R as I progress through these courses.  I wish Northwestern University would consider using these for courses in which there is programming.

I am also starting to explore KNIME.  I was first introduced to KNIME earlier this year when I visited Dr. Randall Moorman’s predictive monitoring lab at the University of Virginia.  They were using KNIME on a very elaborate project and I was very impressed with the functionality of this platform.  KNIME is an “open-source, enterprise-grade analytics platform”, that can be used to “discover the potential hidden in their data, mine for fresh insights, or predict new futures”.  I am very early in my exploration of this platform, but I am very impressed so far, and am excited to get to work on a project using this.  I will post further updates as I learn more about it.

Lastly, a few words about what I am listening to and reading.  I am currently listening to the audio version of “The Master Algorithm” by Pedro Domingos.  This is a must read book for practitioners of predictive analytics and anyone who is interested in machine learning.  I am reading the print version of “Superforecasting: The art and science of prediction” by Philip E. Tetlock and Dan Gardner.  This is an excellent read as well.  I will try to review them in more detail when I am done.