Machine Learning

Installing TensorFlow GPU Tip – follow the instructions in the referenced blog!

Ok, how hard should this actually be, I mean seriously?

If you are learning how to do machine learning, then you have to have TensorFlow as one of your main tools.  TensorFlow comes in two main versions – the version that runs on the CPU’s in your computer, and the one that runs on GPU’s if your computer has “CUDA-enabled GPU cards”.   There are multiple benefits of using GPUs over CPUs – they are more specialized at performing matrix operations and mathematical transformation, and they run much, much faster.

However, the GPU version of TensorFlow is not that easy to install, in my opinion.  I was unable to get it to work at all on my laptop – a Microsoft Surface Book – which has an i7-6600U CPU  and a NVIDIA GeForce GTX 965M GPU.   I could never take advantage of the GPU, because I could not get all of the dependencies for TensorFlow GPU installed and working correctly, despite multiple hours/days working on this.  I was stuck using the slower and less efficient CPUs whenever I used TensorFlow.

I just purchased a new desktop – an iBUYPOWER – running an i7-8700 CPU and a NVIDIA GeForce  RTX 2070 GPU.  Today I tried to install the GPU version of TensorFlow with no success – until I found a blog post – and I was able to install very easily and quickly following the instructions.  If I were you, I would ignore the instructions posted on TensorFlow, and go immediately to the blog posting and follow those instructions.

The GPU version of TensorFlow markedly improved performance on my desktop.  Using the code example in the post to train LeNet-5 on the MNIST digits data using Keras, the CPU version took 55-59 seconds to complete each individual epoch, while the GPU version took just 4 seconds to complete an epoch – a 14 fold increase in speed.

Here is the post:  https://www.pugetsystems.com/labs/hpc/The-Best-Way-to-Install-TensorFlow-with-GPU-Support-on-Windows-10-Without-Installing-CUDA-1187/

Thank you Dr. Donald Kinghorn!!

TensorFlow GPU

Healthcare Analytics, Healthcare Data and Analytics Leadership

Healthcare Analytics Success Requires Enterprise Analytics, Strategic Alignment and Superior Leadership. Notes from Cleveland Clinic’s CIO Ed Marx in His Keynote Address to the HIMSS Big Data and Healthcare Analytics Forum.

Ed Marx HIMSS

Ed Marx, who is the Chief Information Officer (CIO) for the Cleveland Clinic, gives a fascinating keynote presentation to the HIMSS (Health Information Management and Systems Society) Big Data and Healthcare Analytics Forum.  His presentation is definitely worth watching.  There are a few key points that I would like to review.  These fit in nicely into the framework that I laid out in my blog post Healthcare Chief Data/Analytics Officers Must Master the Realms of Data Excellence, Analytics Excellence, and Leadership Excellence to Become “Leaders of Excellence” in Their OrganizationsIn this framework the three realms that need to be mastered to become a Leader of Excellence are data excellence, analytics excellence and leadership excellence.  Marx’s key points are that healthcare analytics organizations need good analytics people and capabilities, but more importantly need strategic alignment with the organization, and most importantly superior leadership.

One of the really interesting aspects of his presentation is his observation that data and analytics leaders need to passionate about the data, and the use of data to make data-informed decisions.   If the leaders are not passionate about this, then no one else in the organization is going to be either.   He discusses how the use of data became personal for him during his training to become one of the members of Team USA for triathlons, and one of the top 100 triathletes in the world.  The use of data became even more personal when he suffered a heart attack 1 1/2 miles from the finish of a marathon, and how the data helped in his treatment and recovery.

Key Points

An organization needs superior leadership and strategy before the actual analytics details matter.

  •  You can have the greatest analytics department in the world in your organization, but if it is not linked to your strategy it is sub-optimal.
  • If the analytics are not run with superior leadership, it is sub-optimal.
  • The best thing you can do is to really ensure alignment with analytics and your organization and ensure it is led by very capable leaders.
  • Data is never the reason for an organization’s success or failure, but can be the accelerator for either.  So you want to make sure that data is the accelerator for success in your organization.

Data and Analytics are one of the Key Strategic Enablers

One of the goals is culture change.   Making sure that data is readily available, understood and expected.  This will allow discussions to be interactive and supported by data discovery and intuitive visualizations.  This is getting everyone to ask “Show me the data”.   The data is needed to help the culture change and remove emotions from conversations so that more fact-based decisions can be made over emotion-based decisions.

Recommendations

1.  You need to be passionate about the use of data and analytics in your organization.

2.  You need to be visible, and be seen as a champion.  Get out in the organization.

3.  Be creative in the ways that you deliver data.  Data visualization is very important.

4.  Trust is important.

5.  Be bold.  Leadership requires boldness.  That is how you make a difference in life.

6.  Quietness.  You need quietness in your personal and professional life to think and have clarity.

7.  Measurement.  You should have objectives for yourself and key results that you measure.  If you don’t you will “rise to the level of mediocrity”.

8.  Humility.  You must be humble.

9.  Things change and you need to be flexible.  Build things that are movable and agile.

 

This is a must watch for leaders of healthcare data and analytics programs.

 

 

 

Healthcare Analytics, Healthcare Data and Analytics Leadership

Healthcare Chief Data/Analytics Officers Must Master the Realms of Data Excellence, Analytics Excellence, and Leadership Excellence to Become “Leaders of Excellence” in Their Organizations.

 

Data and Analytics leaders word cloud

You may have heard of the concepts of “Analytics Center of Excellence (CoE)” or “Competency Center (CC)” (1).   This is nicely explained in the reference (Wright-Jones, 2015).  This refers to a “cross-organizational group responsible for a specific function …, with an ultimate goal to reduce time to value”.  The main purpose of a CoE is to “establish, identify, develop, and harness cross-functional processes, knowledge, and expertise that have tangible benefits for the business”.  Making this concept become reality is fundamental to the success of healthcare analytics given the critical and central role that analytics plays in the success of Healthcare Organizations (HCOs).   There must be a cross-organizational approach to analytics (the people,  processes and technologies) in order for HCOs to be successful in obtaining insights and then applying the insights to advance the strategic goals of the organizations.

In my previous blog post Collect, Connect, Analyze and Apply – Four Data and Analytics Competencies that All Digital Healthcare Organizations Must Master , I talk about four competencies that an organization must master to be successful.    In this post, I will talk about the competencies that the Leaders of Data and Analytics programs must master.  Just as a Center of Excellence is focused on the organization rather than just the analytics department, the Leaders of the analytics programs must focus on providing leadership to their department, and just as importantly, to the entire organization if they want to truly be “Leaders of Excellence”.  There are three main realms that a leader needs to master – data excellence, analytics excellence, and leadership excellence.

I am indebted to the IT research and advisory firm Gartner, Inc. for the unbelievable amount of research that they have conducted on data and analytics, and I will reference some of their research here.  I don’t usually make plugs for specific organizations, but I have personally obtained an incredible amount of knowledge from their research, and I highly recommend them.  The basic framework for this article was presented in the Gartner research article by Freidman et.al (2018) (2)  and it really challenged me to think about what else was needed for analytics leaders to be successful.  The Freidman article talks about the three “vectors of change and opportunity” that must be “mastered in order to be successful” – data management excellence, analytics excellence, and excellence in change management and leadership.   I have expanded upon this basic framework, adding specific information in the data and analytics sections, as well as talking about additional leadership skills that are needed.

I will use the people, process and technology framework to discuss the elements that are needed under the three categories of data excellence, analytics excellence and leadership excellence.  These three framework elements are important in the data excellence and analytics excellence realms, but in the leadership realm it is mainly the people and the processes that are most important.  While the data/analytics leader does not have to be a deep subject matter expert in the data and analytics realms, they must have deep knowledge of what is necessary to be “excellent” in these realms, and be able to provide leadership and guidance to these departments and to the organization on matters pertaining to these.   However, the data/analytics leader does need to be a deep subject matter expert in the leadership realm, because that is where they bring true value to the organization.

 

Data Excellence

The first realm that needs to be mastered is the data.   It all starts with the data.  If you do not have good quality data that is trusted throughout the organization, then it does not really matter how good your analytics capabilities are or how good your organizational transformation capabilities are – they will be hampered by the lack of good data.

People

You must have people who have deep skill sets in acquiring, transforming, storing, and retrieving data.  It is important to have the capabilities of “Data Engineers”.  Data Engineers are responsible for establishing and maintaining the data and data systems architecture, and work with master data management and data quality.  The organization needs to have the capability to manage data wherever it resides.  This includes whether the data resides in the source systems, enterprise data warehouses, data lakes, logical data warehouses, etc. There needs to be deep capabilities by the people in your organization to manage this data.

Processes

The processes revolve around the ability to manage the data in your organization.  This includes an organization-wide philosophy of moving from collecting data to connecting the data.  It also includes the management of the data, master data management, metadata management, and the use of data catalogs.

There needs to be a focus on not only collecting data, but increasingly on connecting the different data sources (3).  In order for valuable insights to be obtained, data from multiple disparate sources must be connected and analyzed.  It is impractical to put all the data into an enterprise data warehouse and then access it only from there.  There is a need to connect the clinical data from the electronic medical record, financial information from the financial system, customer information from the customer relationship management system, social information from social sources, etc.

Data Management Capabilities

As Edjlali and Friedman (2017)(4) point out, every data and analytics use case requires the following data management capabilities:

      • Describe:  Describe where the data resides and exactly what type of data it is.  This includes the master data elements used across the organization, as well as the metadata that is used to describe the data.
      • Organize:  Organize your data so that it can be retrieved easily and consumed by multiple applications and people.
      • Integrate:  Have the capability to integrate multiple disparate data sources.
      • Share:  Make the data available to multiple applications and people.
      • Govern:  Provide high level governance over the data process.
      • Implement:  Implement the processes that rely on trusted data.  These could be data exploration for insight by a data scientist, or the development of a report by a distributed analyst embedded in a specific department.

 

Master Data Management (MDM)

Master data is the core data that is essential to operations (6).     This is typically the important data that is put in an enterprise data warehouse.  This is be data where the definition of that data elements is well defined, agreed upon, and understood across the enterprise.   This data has to be extremely trustworthy and needs to undergo very robust validation procedures.

Master data management (MDM)  is defined as a comprehensive method of enabling an enterprise to link all of its data to a common reference point (7).   There must be a standardized way to describe, format, store, and access data.  In addition this master data must be updated on a regular basis.    The creation of a data dictionary ( a collection of descriptions of the data objects or items in a data model (8)) is essential to allow this standardization.  There must be a vision and strategy for how MDM is used in the organization, since this master data is central to all import business process, and the there must be confidence and trust in this data (5).

Metadata Management

Metadata is “data that serves to provide context or additional information about other data.  It may also describe the conditions under which the data stored in a database was acquired.” (9)  The amount of data that HCOs have access to now is staggering.   Without a rigorous approach to understanding what data a HCO has access to, it will be impossible for them to realize the full potential of that data.   That is why metadata management is so important.  Metadata is the key to cataloging, identifying and evaluating an organization’s information assets and how they are managed (10).   This is important not only for structured data but is even more important for the larger amount of unstructured data that exists. This directly impacts the data management capabilities referenced above, and will add value to the process if it enables workers to ” describe, organize, integrate, share, govern and implement information assets. (4)”

Data Catalogs

Data catalogs maintain inventories of data assets through the discovery, description and organization of data sets.  They  “offer a fast and inexpensive way to inventory and classify the organizations increasingly distributed and disorganized data assets and map their information supply chains to limit data sprawl. (11)”   However the data catalog initiatives must be linked to the broader metadata management programs described above, as they go hand-in-hand.

Technologies

The technologies underlying the data management excellence realm involve the collection, transformation, storage, retrieval, and connection of data.  This includes the data source systems that are creating the original data, and the numerous data storage systems which include the source systems, the enterprise data warehouse, the logical data warehouse, and data lakes (or whatever term you like for this capability).  This also includes the technology for documenting the master data elements, metadata, and data catalogs.  It is clear that HCOs must have a variety of methods to store data, as it is not practical nor economical to put everything into a structured enterprise data warehouse anymore.  Most HCOs are challenged with storing, retrieving, and analyzing the overwhelming majority of their data – their unstructured data.  These challenges must be addressed – from all three perspectives – people, process and technology.

 

Analytics Excellence

The second realm that needs to be mastered is the analytics realm.  Once the HCO has created data storage and management capabilities, they need to be able to analyze the data to obtain actionable insights and apply these insights to business questions and needs.   The types of analytics that can be applied to the data range from descriptive analytics (what happened),  diagnostic analytics ( why did it happen ),  predictive analytics ( what will happen ), to prescriptive analytics ( what should happen ).    The  analytical techniques  range from basic statistics using spreadsheets all the way up to machine learning  using advanced techniques such as neural networks.    All HCOs  will need to develop these capabilities,  including the advanced capabilities, which in some cases may have to be outsourced.

People

The people component of analytics excellence is in some respects the most challenging.    Unless you are associated with a teaching hospital or University or live in a community with access to highly skilled analytical workers, it can be extremely challenging to both recruit and retain skilled workers.    In order to overcome these challenges most organizations will have to both recruit from the outside, and develop programs to train existing employees on the desired skills and capabilities.    This applies not only to the employees directly under the control of the analytics leader in the analytics department, but also applies to the analysts and citizen data scientists who reside in the other departments.    The analytics leader is responsible for the training and development of all data analysts.   These development programs may take the form of internal classes, access to web based learning, and formal educational classes taking through colleges and universities.  The key here is to support the employees in their journey and to provide them the appropriate funding and time off to pursue these skill developments.  The field of analytics is changing so rapidly that this will be a journey with no finish line, and most employees will need continual development no matter what their level of expertise.

There is a need in some organizations for analysts to move from simple report writers, only providing the data and information that they were asked to provide to the end user – to true analysts providing insight to the end user that helps them answer the business questions or needs.  This requires the analyst to have an understanding of the “why” of the report request as well as some understanding of the domain within which the end user operates.  The analyst then needs to apply what ever analytical techniques they feel are necessary to gather insight from the data so that it may directly answer the business users needs.  This requires an understanding of the analytical techniques appropriate to that analyst’s level of expertise.  This may vary from providing the basic descriptive statistics all the way up to running predictive algorithms and generating forecasts.  The analyst should feel free to suggest to the business user appropriate methods of analysis when these are appropriate, and take every opportunity to educate the business users on how to be more data and insight driven.

Processes

One of the processes that needs to be developed is the end-to-end analytics report process.  This starts off with the report request.  Ideally this should be a standardized web-based report request that is used both centrally in the analytics department as well as by the distributed analysts.  Once the initial report request is filed, the analyst performing the analysis should meet with the end user to discuss their request, to make sure that they understand the need, and to suggest the various analytical methods that could be used to gain the insight.    Once the analysis has been completed,  the analyst needs to meet with the end user to review the analysis for the insights that were obtained.  Part of the process will be to decide how to deliver the data and insights.  Some end users will be happy with an excel spreadsheet and some simple graphics, while others will need dashboards developed and integration of the insights into work flows and processes.

Another process that needs to be developed is deciding who should analyze the data.  Given the increasing need to analyze the growing amount of data that is being collected and connected, it is impractical to rely on a centralized analytics department to deliver all of the insights that the organization needs to become more data and insight driven.  Therefore, some end users who may not even be analysts should be able to perform some data analyses themselves, ie self-service BI (SSBI – Self Service Business Intelligence).  In other cases it will require the expertise of highly trained data scientists and statisticians to perform these analyses.  Obtaining a modern BI (Business Intelligence) and analytics platform is key to acquiring this spectrum of analytics capabilities to perform these diverse analyses.

Technologies

There are multiple technologies that will be needed to perform these analytics.  At the present time there is no single technology platform that can perform all of the analytical tasks required by a digital health care organization.  Therefore a suite of applications will be required.  As described in the article by Tapadihnas (2016) (12) Gartner uses the term modern BI and analytics platform which is a 3-tiered platform of complementary but interrelated analytic capabilities.  This allows employees with different levels of analytics expertise to use the platform to obtain the insights they need for their level of expertise.  The very first level is the information portal.  This level is designed to be used by end users and report writers to generate reports and dashboards.  The second level is the analytics work bench.  This is used by information analysts to do more sophisticated data discovery.  The third level is the data science laboratory.  This is designed to be used by data scientists and statisticians to perform more advanced analytics including machine learning and other artificial intelligence techniques.  This modern BI and analytical platform should be able to take in data from disparate sources and should be able to create sophisticated dashboards with the data and insights.

Almost all HCOs have an Electronic Medical Record (EMR) now.  Most of the large mega-vendors have robust data storage capabilities within their EMR, and have the capability to put structured data into an enterprise data warehouse (EDW).  Some are developing the capability to store data in a “data Lake”, both internal data from their EMR, and external data from other data sources as well.  However they do not all have advanced analytics capabilities built into their platforms.  So there is always a question on how much of the underlying EMR vendors analytical capabilities does an organization use and when does an organization look to an outside vendor to fill in those gaps in capabilities.  The ability to innovate clearly lies with non-EMR vendors who only have to deal the analytical capabilities that they are trying to develop.  At some point the mega-vendors usually catch up to the innovators, however it may take an unacceptable time frame for this to happen.  Therefore in most organizations it will require a mix of EMR vendor and non-EMR vendor analytical technologies to accomplish all that a healthcare organization needs to accomplish.

Data visualization is another area where both the people skills must be developed and the most effective technology acquired.  It is no longer acceptable for reports to be delivered in an excel spreadsheet without any graphics.  Some end users may be comfortable with a spread sheet but visualizing data in a graphical format is clearly a superior method of communicating insights.  Therefore organizations need to develop both the people skills to effectively communicate data graphically, and acquire the technology platforms that make visualizing the data insights easy.

 

Leadership Excellence

This is THE realm where the data and analytics leader needs to have deep subject matter expertise as this is the area where they deliver real value to the organization.

This leadership applies to both the data and analytics department as well as to the organization as a whole.  At the department level, the data and analytics leader is responsible for the development of the culture of the analytics department.  They are also responsible for the development of the analysts skill sets.  They are responsible for resource allocation – both with existing personnel and financial asset allocation – and with creating a road map for future personnel and technological resource investments.

The data and analytics leader should be responsible for developing the skill sets of the deployed analysts who reside in the business units.   Talent development should apply across the enterprise, and not rely on all of the individual silos to train their analysts to the standardized level that is needed.

The data and analytics leader should have the responsibility for making the organization more data and insight driven.  This is a HUGE responsibility as it is extremely difficult for an organization to make this transition.  The most logical person to be assigned this responsibility is the data and analytics leader.  However it obviously requires the support of the executive leadership of the organization from the Board of directors to the CEO to the Senior Leadership Team.  This requires a change in the entire culture of the organization.   The culture needs to change so that everyone asks the same questions – what does the data tell us, and how can these insights be used to improve things.

In order to become more data and insight driven, the data and analytics “literacy” of the organization must be improved.  As defined by Gartner (13) ,  data literacy is “the ability to read, write and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, the application and resulting value.”  Being data literate is a new “organizational readiness factor”.  Improving this data literacy will require another huge culture change.  The data and analytics leader should be responsible for this initiative.  The “base vocabulary” of using “information as a second language” can be understood using the VIA (Value, Information, Analytics) model.  For any analytical need the Value of that analysis needs to be understood.  This is the question, business problem, or process outcome that needs to be understood and improved.  How is the value going to be realized?  The Information component involves understanding what data is available to be analyzed, and what additional data is needed.  The Analytics component involves understanding what analytical or data science methods are appropriate to be applied to the data.  While it may be difficult for the end user to understand the analytics component of this VIA model, they should understand that it is important to look at all three of these components when considering an analytics project.

The data and analytics leader is responsible for developing the data and analytics vision and strategies, not only for the data/analytics department, but for the entire organization.   The vision and strategies need to be laser focused on understanding and improving specific strategic business priorities.  The vision and strategies of the data/analytics departments should reflect the vision and strategies of the organization.  For HCOs these priorities most often are associated with the factors laid out in the “Quadruple Aim” – improving the patient experience (quality of care delivered and satisfaction with their experience), improving the health of populations, reducing the cost of healthcare, and improving the experience of health care providers and workers.

The last area that data and analytics leaders need to have deep competence in is change management and change leadership.  Having effective change management and change leadership skills are a foundational requirement of any data and analytics leader.  Much of what the leader has to do involves change of some sort on both a micro-scale and a macro-scale.

On a macro-scale transforming the organization to a data and insight driven organization will require a tremendous amount of change leadership.  So will improving the data literacy of the organization.  So will getting the organization to understand the importance of investing in it’s data and analytics people and developing the needed skill sets to move forward.

Organizational Change Management (OCM) “describes any set of best practices for introducing and guiding specific, defined changes in the current environment.” (14)  and involves applying these best practices.  Gartner has identified seven OCM best practices:

  • Define the change
  • Build executive support
  • Communicate to the organization
  • Develop the plan
  • Execute the plan
  • Persist through the challenges
  • Reinforce adoption

Gartner has developed a robust change leadership model called the ESCAPE model (14).  This model can be used to develop a resilient organization that is ready and able to respond to needed changes.  The ESCAPE model approaches this from a person-centered perspective.  This person-centered approach helps individuals understand the impact of changes on themselves and others, see their role in the future vision, embrace change opportunities rather than view changes as a threat, and feel a sense of ownership and responsibility for the change success.  Change leadership “requires leaders who invite participation, build trust, foster commitment and strengthen teamwork.” (14)  The goal in a change culture is to “integrate this thinking and these practices into the culture through daily activities, language and interactions”, so that change is seen as something normal and acceptable, rather than something that is negative and “done to them”.

There are two phases to the ESCAPE model.  The first phase is the Inspire phase and has three components.  The components are Envision, Share and Compose.  Individuals need to be Inpired to change by helping them Envision the change and imagine what life will be like after the change.   The new vision needs to be succinctly Composed and then communicated.  The vision needs to be Shared repeatedly and often, so that it is embedded in the organization.

The second phase is the Engage phase, and it has three components.   The components are Attract, Permit and Enable.  Early adopters need to be Attracted (recruited).  A growth mindset needs to be Permitted, and opportunities for experimentation and new behaviors Permitted.  Building, supporting and training for new structures, processes and thinking must be Enabled.

In order to be successful, leaders must get out into the organization and be the go-to person constantly advocating for and then delivering the data and analytics to answer the wicked business questions and issues that face healthcare today.

Conclusion

In order to become “Leaders of Excellence” in their organizations, data and analytics leaders must become masters of data management excellence, analytics excellence and leadership excellence.

 

References

  1. Wright-Jones B.  (2015, December 10). Analytics Centre of Excellence – What’s the Value?  Retrieved from https://blogs.msdn.microsoft.com/data_insights_global_practice/2015/12/10/analytics-centre-of-excellence-whats-the-value/
  2. Friedman T, Tapadinhas J, Judah S, Heudecker N, Herschel G, White A.  (2018, August 1).  Leadership Vision for 2019: Data and Analytics Leader.  Gartner research article.
  3. Heudecker N, Edjlali R.  (2018, January 8). Data Management Strategies Primer for 2018.  Gartner research article.
  4. Edjlali R, Friedman T.  (2017, October 23).  Modern Data Management Requires a Balance Between Collecting Data and Connecting to Data.  Gartner research article.
  5. White A, O’Kane B.  (2017, November 16).  Mastering Master Data Management.  Gartner research article.
  6. Rouse M.  (2012, April).  Master Data.  Retrieved from https://searchdatamanagement.techtarget.com/definition/master-data
  7. Rouse M.  (2018, March).  Master Data Management (MDM).  Retrieved from https://searchdatamanagement.techtarget.com/definition/master-data-management
  8. Rouse M.  (2005, September).  Data Dictionary.  Retrieved from https://searchmicroservices.techtarget.com/definition/data-dictionary
  9. Metadata.  Retrieved from http://www.businessdictionary.com/definition/metadata.html
  10. De Simoni G, Edjlali R.  (2018, January 11).  Develop Valuable Metadata to Exploit Digital Business.  Gartner research article.
  11. Zaidi E, De Simoni G, Edjlali R, Duncan AD.  (2017, December 13).  Data Catalogs are the New Black in Data Management and Analytics.  Gartner research article.
  12. Tapadinhas J.  (2016, April 12).  Select the Right Architecture Model for Your Modern BI and Analytics Platform.  Gartner research article.
  13. Duncan AD.  (2018, January 22).  Data-Centric Facilitators are Crucial for Enabling Data Literacy in Digital Business.  Gartner research article.
  14. Adnams S. (2017, October 12).  CIOs Need Organizational Change Management and Change Leadership for Digital Business.  Gartner research article.
Digital Healthcare Organizations, Healthcare Analytics, Learning Healthcare System, Quadruple Aim

Collect, Connect, Analyze and Apply – Four Data and Analytics Competencies that All Digital Healthcare Organizations Must Master

There is a common axiom that today “every business is a digital business”.  There are several alternatives to this as well – “every business is a technology business” and “every business will become a software business” (Microsoft’s CEO Satya Nadella’s comment at the 2015 US Convergence conference).  Healthcare organizations are no exception – they are all digital, technology, and software businesses now.  And at the heart of all digital businesses is data and analytics.  This concept is well illustrated by the Gartner graphic describing their concept of a “digital business platform”.   They are not referring specifically to technology platforms (they do also have great graphics on this), but instead are referring to the components that digital organizations must master and connect to survive and thrive today.   Their graphic shows a circle in the middle, connected to 4 circles that surround it.   On the outside are the components of IT Systems, Things (devices, the internet of things), Customers, and Ecosystems.  In the center circle, connected to all other circles, is Intelligence.   Hence, my statement earlier that data and analytics are at the center of all digital businesses.  Today all businesses, and especially all healthcare organizations, must realize this, and must make improving their data and analytics capabilities a top strategic priority

What is a “digital business” though?  I have not found a definition that resonates with me.  At the simplest level, I see a digital business as a business that uses digital technology (as opposed to a business that only uses analog technology – I am sure some of these still exist, but are getting rarer).  As I have come to see things from a data and analytics perspective, to be successful, all digital healthcare organizations (meaning ALL healthcare organizations) must master the following data and analytics competencies – “collect, connect, analyze and apply” – to be successful.  I argue that the ease of obtaining and operationalizing these competencies gets harder as you go down the list.  The first two competencies – collecting and connecting – are primarily technical in nature.  Analyzing is a combination of people and technology skills.  Applying the insights obtained from the analysis is vastly more difficult.  This goes to the core of being a data driven organization, and involves people who are data literate, and who have processes in place (from individuals, teams, units, departments, to the whole organization) to apply the insights to make data-driven or insight-informed decisions.

Collect

Since almost all digital technology creates data that could be captured, important questions emerge.  What data should we collect?  Should we collect all data that is produced, even if we don’t have a current business case for using the data, but one might arise in the future?  If we collect only a subset of the data, what data should we collect?  Who makes those decisions?  Where should we store the data – in the source system, in an enterprise data warehouse (EDW), in a data lake (pick your favorite term for data lake if you don’t like that term), in a logical data warehouse, etc.?  It is clear that that some data needs to be collected, but probably not every piece of data from every digital asset.

Connect

It is extremely important that these data sources and data storage systems all be connected.   The business will need to obtain insights from this data, and the data needs to be able to accessed and brought together to be analyzed to get this insight.  It is no longer acceptable or practical to only have a “single source of truth” as EDW’s are sometimes called.  There are too many source systems now in an organization, and it is too time consuming, difficult and expensive to validate each source system’s data and put that into an EDW.   Organizations should focus on connecting the sources systems and storage systems, so that appropriate data from these multiple disparate sources and systems can be then analyzed.

Analyze

You can’t analyze everything – there is not enough time, money, or people to do this.  You should focus on analyzing those things that are strategically important for your business.  Governance is key to helping the data and analytics organization prioritize it’s efforts.  In healthcare, these things often involve components of the “Quadruple Aim”.  The Quadruple Aim describes those factors that healthcare organizations should focus on to optimize their healthcare system, and its’ performance.  These include:

  • Improving the patient experience (quality of care delivered and satisfaction with their experience)
  • Improving the health of populations
  • Reducing the cost of health care
  • Improving the experience of health care providers and workers

In addition, understanding and optimizing the financial performance of the healthcare organization itself is of strategic importance.

The people aspects of analytics competency involve recruiting, training and retaining the skilled workforce to perform the analytics.  There is a spectrum of skill sets needed – from end-users and analysts who perform basic descriptive and diagnostic analytics, all the way up to data scientists.   Most organizations struggle with finding these talented people, so they must also develop internal training programs to acquire these skills.

The types of analytics that can be performed are:

  • descriptive analytics (describing what happened – a provider scored 98% on making sure the patients in his panel achieved compliance with being screed for colorectal cancer)
  • diagnostic analytics (describing why something happened – a provider only scored 13% on making sure the patients in his panel achieved compliance with keeping their hemoglobin A1C level below a certain value – because he only managed to see 43% of his diabetic patients in that time period, and he had no formal education around this topic, and had no follow-up by his staff to see how his patients were complying with his recommendations)
  • predictive analytics – (predicting what could happen – which patients are likely to progress from being low risk for requiring a lot of money and resources to manage their medical conditions, to high-risk patients)
  • prescriptive analytics – (generating prescriptive information that can be used to guide care most effectively – recommending what treatments should be applied to a 67 year-old female patient with a history of breast cancer, diabetes and rheumatoid arthritis; who is on insulin, steroids and other immunosuppressive medications; who presents with an overwhelming systemic response to pneumonia called sepsis)

All healthcare organizations need to be able to perform descriptive and diagnostic analytics extremely well.  In addition, the analysts need to move from being “report writers” to true analysts – analyzing the data to obtain actionable insights and then educating the end-user on those insights.  In order to do this, analysts must be trained to do these analyses, then must acquire some domain knowledge of the areas they are responsible for, and they must establish good working relationships with those end-users.

Although not all organizations will acquire the talent and capabilities to perform predictive analytics, they must be capable of incorporating predictive algorithms developed by someone else into it’s system.  The ability to perform prescriptive analytics is still in development for the most part.  This capability must also be able to incorporated into an organization as it matures.

The technology side of analytics involve finding the suite of analytical and visualization tools needed to get the insights to answer the questions that arise, and communicate those insights.   Say what you will about Microsoft Excel – it is still a go to tool for most organizations, and for some organizations it is their only tool.  You can perform some pretty sophisticated analyses with Excel now, including predictive modeling.  Gartner has a great concept around a “Modern BI and Analytics Platform”.  This consists of an Information Portal, an Analytics Workbench, and a Data Science Laboratory.  The Information Portal consists of reports and dashboards, that can be created by end-users and report writers.  This is essentially the self-service BI component.  The Analytics Workbench is for the analysts to use, and involves more comprehensive analytics tools.  The Data Science Laboratory is for the Data Scientists to use – and consists of not only the basic analytical tools, but also advanced analytical tools, including the use of machine learning, deep learning, reinforcement learning, etc.  There is a need for ALL healthcare organizations to acquire the information portal and analytics workbench capabilities.  The data science lab is becoming extremely important and relevant, and this could be acquired within the organization, or a partnership could be develop with an analytics vendor to perform this service.

Just as important to getting the insights from analysis is communicating and visualizing the data and insights in the most effective ways.  Only presenting insights in the form of spreadsheets or pie charts is simply not acceptable today.  We are visual learners, and you can convey a lot more information, in a shorter period of time with effective visualization.  So it is key to acquire the technology to do this well, and then train the staff to not only be able to do this,  but also understand the critical importance of this component.

Apply

This is the most difficult competency, and it usually lies outside of the data/analytics organization and  within the service lines.  However, I propose that data and analytics leaders OWN making sure that the organization understands this concept, and helps the organization excel at it.  This is the competency of applying the insights learned from the analyses of data, to make better informed decisions on these key strategic initiatives.   Becoming a data-driven organization is HARD, much harder than the other stages.   This involves creating a workforce that is more data literate, and a culture that is obsessive about using data and analytics to answer every important question and drive every important process.

Data and analytics are also foundational to the  “Learning Healthcare System” concept as proposed by the Institute of Medicine (IOM) (there is a great overview of this concept here).  The IOM states that a learning healthcare system is “designed to generate and apply the best evidence for the collaborative healthcare choices of each patient and provider; to drive the process of discovery as a natural outgrowth of patient care; and to ensure innovation, quality, safety, and value in health care” (from article linked above).  At the core of the learning healthcare system is the ability to capture data, analyze the data to obtain insights, and then be able to quickly apply those insights to improve patient care.  It is impossible for an organization to “learn” if they don’t have robust data and analytics capabilities, and robust change management and change leadership capabilities.

The culture must be created where it is an expectation that people use data and insights to make better decisions – from the Board of Directors, to the CEO and Senior Leadership, all the way down to front-line workers.  This must be driven into all levels of the organization and into all processes.   This is the heart of performance or quality improvement, and operational excellence.  But it must be broader than that, and involve most, if not all, employees.   (Some of you may not agree with the last sentence, but I propose that housekeepers need to be data-driven – which rooms/areas need to be cleaned next because there is a shortage of a certain unit’s beds and patients are waiting to be transferred into these beds, etc.)

We are all collecting enormous amounts of data about our patients and our organizations – it is now time for healthcare organizations to do something with that data to further the Quadruple AIM, and all other important strategic initiatives.   Hopefully, understanding the framework of data and analytics competencies that are needed – collect, connect, analyze and apply – will help healthcare organizations do this.

Artificial Intelligence, Cerner, Electronic Health Record (EHR), Healthcare Analytics, Healthcare Technology, Machine Learning

Cerner’s Strategy to Deploy “Intelligence” into the Cerner Ecosystem – Insights from the 2018 Cerner Strategic Client Summit

I feel privileged to have been invited for the second year in a row to Cerner’s Strategic Client Summit.  The meeting location – downtown San Diego – and the Summit content were both fantastic.  I will attempt to summarize a few key concepts, and why I feel optimistic about where Cerner is heading – both from an overall perspective as well as from an improved product and end-user experience perspective.

I was very impressed with the collaboration between Cerner President Zane Burke and Cerner’s new Chairman and CEO Brent Shafer.   I had the opportunity to have several conversations with both Zane, whom I have known for a while now, and Brent whom I just had an opportunity to meet for the first time.  I found Brent to be very personable, and very thoughtful about his vision for Cerner.  I was also impressed with the collaborative relationship between Zane and Brent, and think they will lead Cerner in the correct direction.

I am not trying to downplay the roles that many people play inside of Cerner, because there are a lot of great things going on, but I think there are two strategic people that Cerner needs to pay attention to in order to move their EHR to the next level.

The first is Paul Weaver, Cerner’s Vice President of User Experience.  I first saw and met Paul at the 2017 Strategic Client Summit in Dallas Texas, and was very impressed by his vision and enthusiasm.  He hails from the gaming industry, and brings his expertise to the world of healthcare software, where it is much needed.  Here is a link to a 13 minute podcast where Paul talks about the importance of the user experience.  At the 2017 Summit, he used the words “user delight” as his goal of how the interaction with the EHR would make end users feel.  I don’t know about you, but I have used many words to describe my feelings of how the EHR made me feel, and none of them were delight!  His message – all interactions with a software program elicit some type of an emotional reaction.  He wants those reactions to be positive, decreasing stress, and making both patients and healthcare providers/workers “happier and healthier”.  This is a laudable goal, and will help in the fight to combat physician (and other healthcare worker) burnout/suicide – since negative experiences with the EHR are almost always identified as one of the top contributing factors to burnout.  This is an EXTREMELY important area for Cerner to get right, and they need to support Paul Weaver in his efforts to accomplish his goals.

The second strategic person is David Cohen, Vice President for Intellectual Property Development.  David’s presentation, “Activating Intelligence to Transform Care”, was visionary, and he articulated the concepts of machine learning and artificial intelligence, and how to utilize them in healthcare better than anyone else I have heard or read to date.  I will provide some high level overview of his vision below.  At this time it appears they have branded these efforts as “Cerner Intelligence – Leveraging the Power of Data”.

Cerner sees the new demands on health care as being proactive health management (vs reactive sick care); cross-continuum care system (vs fragmented niche care); rewards for quality, safety and efficiency (vs rewards for volume); and person and care team-centric (vs clinician-centric).

Value “drivers” were presented.    These were specific areas where Cerner intends to deploy their Intelligence to make meaningful improvements.   These included clinical and quality drivers, operational drivers, financial drivers, and drivers around improving the experience.  I feel these are appropriate areas to start deploying this Intelligence.  I can post more on this when this information becomes publicly available, because there are some important key areas that if realized, will bring great value to organizations.

Where David’s presentation got really interesting was when he started presenting how Cerner’s areas of focus were on using machine learning, artificial intelligence, and knowledge management.   I am going to provide his definitions of each, because I think they are defined very nicely.

  • Machine Learning:  Leveraging the power of data and statistical methods to create new insights and workflow optimizations
  • AI experiences:  Leverage Artificial Intelligence capabilities that mimic human behaviors such as voice, vision, language, and conversation to enhance human abilities
  • Knowledge management:  Ensure data is complete, contextual, and accurately represented using standards based medical vocabularies\

David then started talking about “AI Experiences” (see diagram from Cerner below – reprinted with permission).   I am convinced that Cerner gets where they should be going in regards to incorporating AI into healthcare, on a very practical basis.  This starts with the inputs into the AI systems, the transformations of those inputs by the system, the incorporation into the knowledge management systems, and most importantly, the AI applications that will make the EHR a true virtual partner in the healthcare process – for providers, patients, and healthcare workers.  What was shown was more than a concept, and the demo’s they put on showed that they are making progress on these. The concept of a mouse-less and keyboard-less interaction with the EHR may be a reality, sooner rather than later.  I encouraged Cerner executives to support these initiatives deeply and at the highest levels.

Cerner AI

 

Overall, I am very excited and optimistic about Cerner’s vision, and for the prospect of them delivering meaningful improvements and solutions – both near-term and long-term.  Their focus on improving the user experience and making it “delightful” is a very important initiative.   Their focus on using data to improve – almost everything – is foundational for moving all of us forward.  My plea to Cerner is to continue to very deeply support these initiatives, and the talented people they have focused on these.

Data Science, Deep Learning, Machine Learning, Neural Networks

Neural Networks, Deep Learning, Machine Learning resources

I have come across a few great resources that I wanted to share.  For students taking a machine learning class (like Northwestern University’s MSDS 422 Practical Machine Learning) these are great references, and a way to learn about them before, during, or after the class.  This is not a comprehensive list, just a starter.

Textbook

There is a free online textbook, Neural Networks and Deep Learning.

Videos

There is a great math visualization site called 3Blue1Brown and they have a YouTube channel.  There are 4 videos on neural networks/deep learning which are really informative and a good introduction.

  1.  But what *is* a Neural Network? Chapter 1, deep learning
  2.  Gradient Descent, how neural networks learn. Chapter 2, deep learning
  3.  What is backpropagation really doing? Chapter 3, deep learning
  4.  Backpropagation calculus. Appendix to deep learning chapter 3.

There is a great playlist on Essence of linear algebra, which is a great review and explanation of linear algebra and matrix operations.  I wish I would have seen this when I was learning it.

Scikit-Learn Tutorials

There are tutorials on the Scikit-Learn site.

TensorFlow tutorials

They provide a link to this Google “Machine Learning Crash Course” – Google’s fast-paced, practical introduction to machine learning.

The TensorFlow site has a Tutorials page.  There are tutorials for Images, Sequences, Data Representation, and a few other things.

 

Google AI

Google has it’s own education site (which also has the Machine Learning Crash Course referenced above).

 

Blog sites

Adventures in Machine Learning, Andy Thomas’s blog.

This is a must view site, and worth visiting several times over.   Andy does a great job explaining the topics and has some great visuals as well.  These are fantastic tutorials.  I have listed only a few below.

Neural Networks Tutorial – A Pathway to Deep Learning

Python TensorFlow Tutorial – Build a Neural Network

Convolutional Neural Networks Tutorial in TensorFlow

Word2Vec work embedding tutorial in Python and TensorFlow

Recurrent neural networks and LSTM tutorial in Python and TensorFlow

 

colah’s blog – Christopher Olah’s blog

Another great blog, with lots of good postings.  A few are listed below.

Deep Learning, NLP, and Representations

Neural Networks, Types and Functional Programming

 

Courses

DataCamp – one of my favorite learning sites.  It does require a subscription.

DataCamp currently has 9 Python machine learning courses, which are listed below.  They also have 9 R machine learning courses.

Machine Learning with the Experts: School Budgets

Deep Learning in Python

Building Chatbots in Python

Natural Language Processing Fundamentals in Python

Unsupervised Learning in Python

Linear Classifiers in Python

Extreme Gradient Boosting wiht XGBoost

HR Analytics in Python: Predicting Employee Churn

Supervised Learning with Scikit-Learn

 

Udemy courses

Udemy is also a favorite learning site.  You can generally get the course for about $10.

My favorite Udemy learning series is from Lazy Programmers Inc.  They have a variety of courses.  Their blog site explains what order to take the courses in.   There are many other courses from different instructors as well.

Deep Learning Prerequisites: The Numpy stack in Python

Deep Learning Prerequisites: Linear Regression in Python

Deep Learning Prerequisites: Logistic Regression in Python

Data Science: Deep Learning in Python

Modern Deep Learning in Python

Convolutional Neural Networks in Python

Recurrent Neural Networks in Python

Deep Learning with Natural Language Processing in Python

Advanced AI: Deep Reinforcement Learning in Python

Plus many other courses on Supervised and Unsupervised Learning, Bayesian ML, Ensemble ML, Cluster Analysis, and a few others.

 

If you have other favorite machine learning resources, please let me know.

 

 

Data Scientist, Northwestern University MSDS Program, Northwestern University MSPA

Northwestern University’s Masters of Science in Predictive Analytics (MSPA) becomes the Masters of Science in Data Science (MSDS)

Starting in the Spring Quarter of 2018 the MSPA (Masters of Science in Predictive Analytics)  program became the MSDS (Masters of Science in Data Science) program.  This was announced in January of 2018 and the name change become official in the Spring Quarter of 2018.  Existing MSPA students had the options of staying in the MSPA program with it’s requirements, or transferring over to the MSDS program.  I elected to transfer to the MSDS program.  There is a webex on the MSDS program – click here for the webex.

In the webinar, Dr. Thomas Miller, the faculty director of the MSPA and now the MSDS programs, related that Northwestern University’s MSPA program started in the fall of 2011, before the term data science was a widely known or used term.  However, since then it has become mainstream, and has emerged as a discipline in it’s own right.   Therefore the decision to change the name of the program.

Data science was described by Dr. Miller as “an emerging, integrative academic discipline” encompassing Business needs (strategy, management, leadership, communication skills), Modeling (statistics, machine learning, and model building), and Information Technology (databases, etc).  Each of these is covered in the MSDS program.

Dr. Miller also commented that the main programming language moving forward would be Python.   Initially when the program was formed, SAS and SPSS were the main languages.  Python and R were brought in at a later date.   R will still be used in some courses in the Analytics and Modeling Specialization courses.   He did not make it clear whether SAS would still be an option though.

MSDS Program Overview

You need to successfully complete 12 courses.  There are core courses, elective courses, and specialization options.

Core Courses

MSDS 400 – Math for Data Scientists

MSDS 401 – Statistical Analysis

MSDS 402 – Introduction to Data Science

MSDS 420 – Database Systems and Data Preparation

MSDS 422 – Practical Machine Learning

MSDS 460 – Decision Analytics

MSDS 475 – Project Management or MSDS 480 Business Leadership and Communications

MSDS 498 – Capstone or MSDS 590 – Thesis

 

A new elective was created for students with limited programming background:

MSDS 430 – Python for Data Science

Specializations

 

Analytics and Modeling Specialization

Designed for data scientists seeking technical roles as data analysts, applied statisticians, and modelers. Courses focus on statistical inference and applications of predictive models.

Required Courses:

MSDS 410 – Regression and Multivariate Analysis

MSDS 411 – Generalized Linear Models

Plus 2 electives

 

Data Engineering Specialization

Designed for students seeking technical positions focused on designing, developing, implementing, and maintaining systems for data science.

Required Courses:

MSDS 432 – Foundations of Data Engineering

MSDS 434 – Analytics Application Development

Plus 2 electives

 

Analytics Management Specialization

Designed for students seeking technical leadership and data science management positions.

Required Courses:

MSDS 474 – Accounting and Finance for Analytics Managers

MSDS 475 – Project Management

MSDS 480 – Business Leadership and Communications

(Students in this specialization have to take both 475 and 480)

Plus 2 electives

 

*Artificial Intelligence and Deep Learning Specialization

*This has not been officially announced – this information is from comments that Dr. Thomas Miller made during  a MSDS 422 Sync session.  He said that this specialization is being developed – so take these comments as being preliminary.  I personally am really excited about this specialization, as I just finished MSDS 422 – Practical Machine Learning – and realize the growing importance of machine learning now and in the future.

Required Courses:

MSDS 453 – changing from Text Analytics to Natural Language Processing

MSDS 458 – Artificial Intelligence and Deep Learning

Plus 2 electives

These new electives are being created:

Computer Vision

Software Robotics

 

Listing of all current elective courses:

MSDS 410 – Regression Analysis

MSDS 411 – Generalized Linear Models

MSDS 413 – Times Series Analysis and Forecasting

MSDS 430 – Python for Data Science

MSDS 432 – Foundations of Data Engineering

MSDS 434 – Analytics Application Development

MSDS 436 – Analytics Systems Analysis

MSDS 450 – Marketing Analysis

MSDS 451 – Financial and Risk Analytics

MSDS 452 – Web and Network Data Science

MSDS 453 – Text Analytics – soon to become Natural Language Processing

MSDS 454 – Data Visualization

MSDS 456 – Sports Performance Analytics

MSDS 457 – Sports Management Analytics

MSDS 458 – Artificial Intelligence and Deep Learning

MSDS 459 – Information Retrieval and Real-Time Analytics

MSDS 470 – Analytics Entrepreneurship

MSDS 472 – Analytics Consulting

MSDS 474 – Accounting and Finance for Analytics Managers

MSDS 490 – Special Topics in Data Science

 

 

 

 

Machine Learning, Northwestern University MSDS Program, Northwestern University MSPA

Northwestern University MSDS (formerly MSPA) 422 – Practical Machine Learning Course Review

This course was taught by Dr. Thomas Miller, who is the faculty director of the Data Science program (formerly known as the Predictive Analytics program – I am going to post an article discussing the program name change from the Master of Science in Predictive Analytics (MSPA) to the Master of Science in Data Science (MSDS)).  Overall, this was an excellent review of machine learning, and is a required core course for all students in the program.  It is most definitely a foundational course for any student of data science in today’s world.  It is also a foundational course for the Artificial Intelligence and Deep Learning specialization, which is currently being developed (more on this in a subsequent post as well).  The course covers the following topics:

  • Supervised, Unsupervised, and Semi-supervised learning
  • Regression versus Classification
  • Decision Trees and Random Forests
  • Dimensionality Reduction techniques
  • Clustering Techniques
  • Feature Engineering
  • Artificial Neural Networks
  • Deep Neural Networks
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)

This course uses Python and the Python Libraries Scikit-Learn and TensorFlow. In addition to using Jupyter Notebooks to run my code, I also learned how to run TensorFlow from the Command Line, which is a faster way of running neural networks through a large number of epochs. The course is currently offered in R as well, but they will be discontinuing the R course, and only offering the Python/TensorFlow course starting in the fall semester.   Dr. Miller commented that they will be using Python much more extensively going forward, especially in the AI/Deep Learning specialization courses.  R apparently will still be offered in the Analytics/Modeling courses – 410 (Regression Analysis) and 411 (Generalized Linear Models).   I did learn to use Python/Scikit-Learn/TensorFlow at an intermediate level, and feel like I have a great foundation to build upon, in terms of programming.

Course Structure

There is required reading every week, mainly from the two required textbooks, although there are a few articles to read as well.  There were a total of 5 sync sessions which reviewed various topics.   I wish the sync sessions had been a little more robust, and covered the current assignments and the coding required to complete the assignments.  I found this very helpful in previous courses.  There were weekly discussion board assignments, which covered basic concepts, and turned out to be very informative, especially since a lot of the topics covered on the final exam were covered in these discussions.  There are weekly assignments which must be completed, in which you either develop the code yourself, or use a skeletal code base provided and build upon it.   These ranged from very easy to very difficult, especially as you moved into the artificial neural networks.  There was a non-proctored final exam and a proctored final exam.

Textbooks

Primary Textbooks:

Géron, A. 2017. Hands-On Machine Learning with Scikit-Learn & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol, Calif.: O’Reilly. [ISBN-13 978-1-491-96229-9] Source code available at https://github.com/ageron/handson-ml  This was the primary textbook for most of the course.  It is an excellent text with lots of great coding examples.

Müller, A. C. and Guido, S. 2017. Introduction to Machine Learning with Python: A Guide for Data Scientists. Sebastopol, Calif.: O’Reilly. [ISBN-13: 978-1449369415] Code examples at https://github.com/amueller/introduction_to_ml_with_python

Reference Textbook:

Izenman, A. J. 2008. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. New York: Springer. [ISBN-13: 978-0-387-78188-4] This was used very little.

Learning Outcomes (from syllabus):

Learning Outcomes Practical Machine Learning is a survey course with a long list of learning outcomes:

  • Explain the learning algorithm trade-offs, balancing performance within training data and robustness on unobserved test data.
  • Distinguish between supervised and unsupervised learning methods.
  • Distinguish between regression and classification problems
  • Explain bootstrap and cross-validation procedures
  • Explore and visualize data and perform basic statistical analysis
  • List alternative methods for evaluating classifiers.
  • List alternative methods for evaluating regression
  • Demonstrate the application of traditional statistical methods for classification and regression
  • Demonstrate the application of trees and random forests for classification and regression
  • Demonstrate principal components for dimension reduction.
  • Demonstrate principal components regression
  • Describe hierarchical and non-hierarchical clustering techniques
  • Describe how semi-supervised learning may be utilized in addressing classification and regression problems
  • Explain how measurement and feature engineering are relevant to modeling
  • Describe how artificial neural networks are constructed from logical connections of artificial neurons and activation functions
  • Demonstrate the use of artificial neural networks (including deep neural networks) in classification and regression
  • Describe how convolutional neural networks are constructed
  • Describe how recurrent neural networks are constructed
  • Distinguish between autoencoders and other forms of unsupervised learning
  • Describe applications of autoencoders
  • Explain how the results of machine learning can be useful to business managers
  • Transform data and research results into actionable insights

 

Weekly Assignments

Here are the weekly learning titles and assignments:

Week 1.  Introduction to Machine Learning

  • Assignment 1. Exploring and Visualizing Data

Week 2.  Supervised Learning for Classification

  • Assignment 2. Evaluating Classification Models

Week 3.  Supervised Learning for Regression

  • Assignment 3. Evaluating Regression Models

Week 4. Trees and Random Forests

  • Assignment 4. Random Forests

Week 5.  Unsupervised Learning

  • Assignment 5. Principal Components Analysis

Week 6. Neural Networks

  • Assignment 6. Neural Networks

Week 7.  Deep Learning for Computer Vision

  • Assignment 7. Deep Learning

Week 8.  Deep Learning for Natural Language Procession

  • Assignment 8 Natural Language Processing

Week 9.  Neural Networks Autoencoders

  • No assignment

 

Final Examinations

There were 2 final examinations, one being non-proctored and the other proctored.  The non-proctored exam was open book, and tested your ability to look at data and the various analytical techniques, and interpret the results of the analyses.  The proctored final exam was closed book and covered general concepts.

Final Thoughts

This was a great overview of some of the more important topics in machine learning.  I was able to get a good theoretical background in these topics, and learned the coding necessary to perform these.   This is a great foundation upon which to add more advanced and in-depth use of these techniques.  This course really challenged me to rethink what analytical techniques I should be learning and applying in the future, to the point that I am going to change my specialization to Artificial Intelligence and Deep Learning.

 

Becoming a Healthcare Data Scientist, Uncategorized

Update on lack of recent blog posts.

It has been a little more than a year since my last blog post, so I thought I would provide an explanation.   The bottom line is that I have not had a lot of free time to update my blog.  Two years ago this month, I took over as the Interim Chief Information Officer (CIO) for the integrated healthcare system that I work for.  This was in addition to my role as one of our system’s Chief Medical Information Officer (CMIO).   The interim CIO position was supposed to be just that, a brief period of time performing this role until a permanent CIO could be selected.  However, it turned out to be a longer period of time.

I have learned a tremendous amount during my  tenure as the interim CIO.  I have a much better appreciation for the roles that both information and technology play in contributing to the success of healthcare providers and organizations understanding and delivering the most effective healthcare to patients.   In order to further educate myself about what a modern digital healthcare CIO’s responsibilities are, I had to take some time off from the MSPA program.  However, I am back in the program (now called the MSDS program – Master of Science in Data Science – more to come about the name change of the program in a future blog post).  I just completed MSDS-422 Practical Machine Learning, and am totally excited by what I learned in this course (more to come on that as well).  As an aside, the practical application of machine learning (a subset of broader artificial intelligence), will (and is starting to) revolutionize healthcare through the much deeper insights obtainable through the use of neural networks and deep learning.   Anyone learning analytics today needs to understand and be able to apply machine learning techniques.  Period.

Data Science, Northwestern University MSPA

Python Tops KDNuggets 2017 Data Science Software Poll

The results of KDNuggets’ 18th annual Software Poll should be fascinating reading for anyone involved in data science and analytics.  Some highlights – Python (52.6%) finally overtook R (52.1%), SQL remained at about 35%, and Spark and Tensorflow have increased to above 20%.

KDNugetts_poll

(Graph taken from http://www.kdnuggets.com/2017/05/poll-analytics-data-science-machine-learning-software-leaders.html/2)

I am about halfway through Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I am very thankful that the program has made learning different languages a priority.  I have already learned Python, Jupyter Notebooks, R, SQL, some NoSQL (MongoDB), and SAS.  In my current class in Generalized Linear Models, I have also started to learn Angoss, SAS Enterprise Miner, and Microsoft Azure machine learning.  However, it looks like you can’t ever stop learning new things – and I am going to have to learn Spark and Tensorflow – to name a few more.

I highly recommend you read this article.