Big Data, Data Science, Healthcare Analytics

REMAP Clinical Trials – Combining the best of Randomized Clinical Trials and Big Data Analyics

I thought I would post some information on a new type of clinical trial that has been created that is a fusion of the Randomized Clinical Trial (RCT) and big data analytics.  This is based on a discussion that occurred in my Northwestern University Master of Science in Predictive Analytics statistics class (PREDICT 401).  The discussion centered around understanding the importance of “correlation is not causation”.  (As an aside if you want to see some great examples of absurd correlations, go to for some hilarious examples).

I am hoping eventually to understand at a much deeper level how to go from establishing correlation to declaring causation.   This is a huge issue, not just in medicine, but across all disciplines.

The major method that is used to establish causation is through a randomized clinical trial or RCT.  In the RCT you attempt to control all of the variables so that you can look at the variables of interest.  These are usually performed with a pre-existing hypothesis in mind, ie we think that A may cause a change in B, so we control for all of the other things that we think change B.  Then if we see changes in A that correspond to changes in B, then A is not only correlated with B, there is a causal inference that A causes the changes in B.

There are many problems with RCT’s though.  They are very expensive and difficult, their findings are too broad (average treatment effect not representative of benefit for any given individual), they exclude many real-life situations so that by the time the final population for study is defined, it is no longer of any practical significance for real-life application, and there are long delays before the results of the RCT’s make it into clinical practice. (Angus, 2015).

There is another way to look at data called by various titles including data mining.  This is where you start with the data, and then develop the hypothesis to be tested later, after seeing what the data shows.  So you would perform an exploratory analysis on a data set, using advanced analytical methods, and see where the correlations arise.  Once you see the correlations, then you can start to define whether these are spurious, or possibly real, and whether there is a possibility that these could be causal.  At that point you could develop a RCT to study this issue and try to establish causation.

There is a new type of RCT being developed.  It is called a REMAP trial. This stands for Randomized, Embedded, Multi-factorial, Adaptive Platform trial.  You won’t find a lot about it described in the literature yet, but I have attached a link to a podcast that describes it, and the citation below is from an investigator involved with these studies, Dr. Derek Angus, MD, at the University of Pittsburgh.

Basically, the trial combines the best of a RCT with big data analytics.  It uses machine learning techniques to study these complex problems.  There is a study starting called REMAP Pneumonia, that is enrolling patients in Europe, Australia, and New Zealand.  This is a perpetually running platform for the study of interventions in patients with severe pneumonia who need admission to an Intensive Care Unit.  There is a randomizing algorithm that randomizes patients to one of 48 different treatment arms.  Yes, this study has 48 different questions to answer, rather than one.  The weightings of the randomization change over time as the platform “learns” which treatment arms are doing better or worse.  The arms that are showing improvement have the randomization weights increased so more patients can be studied.  Once an arm reaches a certain pre-established threshold for effectiveness, that arm “graduates” and that treatment becomes standard therapy.

This is an exciting advancement in the field of healthcare analytics.  You can also read about the “Adaptive Trial Design” used in the I-SPY 2 trial studying emerging and promising new agents for the treatment of breast cancer.  Here is the link. (trial information link ).  The touted benefits of the adaptive trial design are that they “use patient outcomes to immediately inform treatment assignments for subsequent trial participants—I-SPY 2 can test new treatments in half the time, at a fraction of the cost and with significantly fewer participants.

I think that once these techniques become more widely known, these types of trials will rapidly transform the face of healthcare research, and improve the capacity for healthcare organizations to become “Learning Health Systems”.



Angus D.  Fusing Randomized Trials With Big Data:  The Key to Self-Learning Health Care Systems?  Journal of American Medical Association (JAMA). 2015:314(8):767-768.

REMAP podcast link: (Links to an external site.)

Presentation by Dr. Angus link:

I-SPY 2 link:



Healthcare Analytics

Operationalizing healthcare analytics: A ‘plecosystem’ approach

The TAO of Health by Paddy Padmanabhan

In my work with healthcare enterprises, I have realized that the term “analytics” means all kinds of things depending on who you speak with, and the term “predictive modeling” tends to carry a certain mystique about it. The former usually means Business Intelligence (BI), which more often than not simply means “reports.” The latter, on the other hand, conjures up images of geeks in labs (people with a PhD in applied math, for instance), toiling away at complicated statistical models for extended periods of time, and finally producing an algorithm, or a formula that “predicts” an outcome. The term also implies an expectation that some absolute truth will be revealed by a statistical model that can be a panacea for a vexing problem – such as high rates or readmissions in a hospital.
Savvy business stakeholders say “so what,” because they are unclear about how to use these complex models…

View original post 855 more words