Ibn Khaldun, the 15th century North African historian, wrote “the past resembles the future more than one drop of water resembles another.” Implying that the patterns and lessons of the past are applicable to the present and can be applied to predict the future. Explanations and predictions are the outputs of science and the scientific method. Moreover, by systematically observing and explaining the patterns of the past and conceptualizing it as evidence, science can inform when the patterns of the past are applicable to the present and the future. In the 20th century, medicine followed suit in adopting the scientific method and evidence as the source of medical truths. However, in contrast to other fields in science, the evidence based medicine movement minimized the value of the past and of the real world as a source of evidence and instead institutionalized a hierarchical and ranked categorization of evidence sources. Evidence generated via double blinded randomized controlled trials (db-RCT) reside at the top of that evidence hierarchy. However, the parallel movements of digitization in healthcare and “big data” have put this hegemony at risk and provided stakeholders with new and increasingly robust sources for evidence generation.
Traditionally, medical records were either handwritten or audio transcribed and claims were faxed. Even when electronic, none of these datasets were integrated. These isolated and difficult to compute datasets yielded retrospective and observational data that yielded evidence that was riddled with confounders and bias. Therefore, db-RCT was the de facto gold standard of medical evidence generation. However in 2009, with the passage of the Health Information Technology for Economic and Clinical Health (HITECH) act as part of the American Reinvestment and Recovery Act (ARRA), healthcare data became increasingly digitized. Through this process of digitization, data from disparate and uncontrolled sources such as electronic health records (EHRs), claims and billing data, product and disease registries, and personal health applications became more easily integrated into rich information laden multi-sourced datasets. This integration has transported healthcare into the purview of “big data” and has potentially transformed healthcare evidence generation.
As a category, observational data derived from the “uncontrolled real world” is termed real world data (RWD) and the evidence that it yields is called real world evidence (RWE). Due to the inherent disadvantages of db-RCTs and the increased availability of RWD in the context of “big data”, there is an increasing drive to not only augment db-RCTs with RWE but also to independently inform healthcare decisions. For the former, RWE can be used to generate hypotheses for prospective trials, assess the generalizability of findings from db-RCTs, conduct safety surveillance of medical products, examine changes in patterns of therapeutic use, and measure quality in health care delivery. For the latter, the goal is for “big RWD” to allow greater use of observational data in drawing causal inferences about the treatment effects of medical products. The regulatory agencies and payers are certainly moving in that direction. The 21st Century Cures Act, required the FDA to create a pathway to allow RWE to support new drug indication and post marketing surveillance. Concomitantly, payers are increasingly demanding proof of real world effectiveness and are increasingly demanding RWE to support reimbursement decisions.
Nonetheless, RWE generated from big RWD has methodological and inferential pitfalls (next essay). Real world data are not collected or organized with the goal of supporting research, nor have they typically been optimized for such purposes. It is necessary to exercise greater caution to be sure that the allure of “big sample size does not lead to big inferential errors.” As the 19th century English polymath Sir Richard Francis Burton said, “truth is shattered mirrors strown in myriad bits; while each believes his little bit, the whole to own.” Utilizing the scientific method and big data, medicine potentially has another quiver in its arsenal to uncover medical truths and inform medical decisions and healthcare policies.