Human testing in clinical trials to study safety and efficacy of drugs, is the most costly, lengthy and failure-prone process in drug development. It has been estimated that between 33.6 and 52.4% of phase I-III clinical trials fail to proceed to the next trial phase, leading to a 13.8% overall chance that a drug tested in phase I reaches approval [1].
Consequently, different AI-based tools have been proposed to maximize the success and efficiency of trials during the planning phase and study conduct. Many of those tools are built upon the latest natural language processing (NLP) machine learning (ML) and deep learning (DL) capabilities. NLP has enabled the shift from time-consuming manual and siloed curation of natural language data to automated, large scale and standard processes for analyzing text and speech data. Applications can be classified into those aimed at clinical trial design and those for study conduct.
Most companies that offer AI-tools for clinical trials design leverage both clinical trial data together with RWD and RWE insights from de-identified and aggregated patient information. Some of the marketed solutions have access to several millions of patient records. Nonetheless, peer-reviewed evidence and or external validation may be lacking behind market claims such as ML in research planning can help ensure that a given trial design is optimally suited to the stakeholders’ needs, and they therefore offer only conceptual promise. Among other potential limitations such as hidden non-addressed biases in data sources, trial data have not yet been pooled to facilitate model training. Further efforts in this area and willingness of stakeholders to share adequate data for model training from across multiple trials are needed.
Specific tasks include:
a) Patient population definition: identify patient populations where a drug candidate may be more likely to show higher efficacy or reduced toxicity. This area remains challenging as it has been shown that for every 1 intended response, there are 3 to 24 non-responders for the top medications, resulting in a large number of patients who receive harmful side effects over the intended effect [2]. For this purpose, some solutions rely on unsupervised machine-learning platforms to identify relationships and correlations from complex data structures. Among many different data sources, it analyzes clinical trial data to identify correlates associated with favorable patient response to treatment, thereby informing inclusion/exclusion criteria.
b) Patient population generalizability assessment: there has been ongoing concern that randomized controlled trials (RCTs) which lack of diversity of participants may not provide clear evidence of efficacy and safety for new interventions in underrepresented or missing subpopulations. Standardized methods are needed to assess potential representation disparities between RCT cohorts and the broader populations who could benefit from novel interventions. ML methods have been proposed to evaluate the generalizability of a clinical trial by assessing the representativeness of the trial population, comparing eligibility criteria with patients identified from population-based registries or representative patient samples [3].
c) Eligibility criteria assessment: a key task during protocol design is to avoid unnecessarily restrictive eligibility criteria. Liu et al. [4] used EHR data to evaluate the impact of eligibility criteria on cancer trial populations, finding they could broaden criteria without sacrificing trial efficacy. For this end, they used a computational framework (Fig. 1) to emulate completed trials of advanced non-small-cell lung cancer using data from a nationwide database of electronic health records comprising 61,094 patients with advanced non-small-cell lung cancer. Their analysis revealed that many common criteria, including exclusions based on several laboratory values, had a minimal effect on the trial hazard ratios. When they used a data-driven approach to broaden restrictive criteria, the pool of eligible patients more than doubled on average and the hazard ratio of the overall survival decreased by an average of 0.05. Such methods can broaden unnecessarily restrictive eligibility criteria and, in future work, potentially improve fairness by exposing whether certain sub-populations were made unjustifiably ineligible.
Fig. 1: Trial Pathfinder workflow and applications.
An AI framework to systematically evaluate clinical trial eligibility criteria. The framework encodes eligibility criteria, emulates existing trials under combinations of eligibility rules, evaluates individual eligibility rules with Shapley value and suggests data-driven criteria. Source [4]
d) Comprehensive genomic data de-risks drug programs: when designing clinical trials that apply predictive biomarkers for either drug response or toxicity, relevant challenges are incomplete disease understanding, enrolling patients with the wrong genomic biomarkers which dilutes trial results, or conversely, missing disease-causing variants which underestimates prevalence, impairing investment, trial design, and pricing. Several companies and genomic platforms [5,6] have built genomic knowledge bases by indexing several millions of full-text genomic articles as well as supplemental data sets. Those AI-solutions are used during clinical trial design for validation of disease-causing variants and disease prevalence with precise genetic evidence and up-to-date population databases.
e) Adaptive designs: in adaptive designs, key trial characteristics (e.g., treatment doses or allocation probabilities) may be altered during the study based on accumulating data (e.g., treatment responses), according to predefined rules. Adaptive design, despite its benefits, is known by its slow uptake and reticence (attributed to lack of familiarity), concerns of how funding bodies or regulators may view them, and lack of clarity regarding its planning, implementation and reporting. Different ML methods are proposed to implement adaptive designs such as response-adaptive randomization, drop-the-loser and adaptive dose-finding [7]. For example, reinforcement learning (RL) and multi-armed bandit (MAB) methods have been considered for modelling safe, effective doses in adaptive dose-finding trials [8,9]. Opportunities in ML in the development of adaptive designs include improved sequential MAB or other treatment allocation methods that i) maintain required statistical guarantees under adaptive changes to the trial, ii) incorporate fairness considerations (addressing patient heterogeneity), and iii) handle or mitigate the impact of patient time trends (addressing patient drift) [10].
f) Matching tools to compile Historic Clinical Trial External Control Arm (ECA). Currently fewer than half of all phase III trials of promising phase II therapies in oncology reveal the experimental agent to be superior to the standard of care (i.e., the control arm) [11]. External control arms composed of historic clinical trial data (HCT) using AI-based matching tools could support the traditional drug development paradigm in multiple ways, including early inferences regarding both efficacy and safety parameters relative to standard-of-care therapies. These inferences have the potential to inform high-level sponsor decisions regarding continued development of the most promising products and the pathway to new drug development including regulatory body approval of Breakthrough Therapy or Fast Track designations for example. Subsequent study design choices such as justification for sample size and power calculations, choice of efficacy end points, and design of eligibility criteria for future trials are also more fully supported, possibly leading to more successful clinical development programs overall. Also, enhancing control arms with HCT patients’ data in the form of hybrid study designs may also accelerate the study life cycle through decreasing the required number of enrolling patients. Furthermore, such a hybrid control design means enrolling patients are more likely to receive the investigational therapy than the control therapy, a fact that may appeal to patients and providers and hasten trial accrual and completion. As an example, [12] compared results of a single-arm early-phase trial of a novel immunotherapy in neoadjuvant ovarian cancer to those of a rigorously matched ECA to get insights regarding comparative efficacy prior to a randomized controlled trial. The effect size estimate itself informed both the decision to continue development and the randomized phase II trial NCT03393884.
g) Protocol development from design to writing and review: Recent advances in NLP and DL-language models including generative models could assist in protocol development and writing. For example, some companies offer AI tools that cross-compares data across all therapeutic areas from past trials, medical journals, thought-leader articles, clinical, regulatory agencies, and other trial-related documents to find similarities, provide recommendations and surface contextual points of interest with a focus to minimize amendments, trial complexity and patient burden. Marketed solutions claim to address different protocol sections: Objectives & Endpoints, Inclusion / Exclusion, Study design, Outcome measurements, Regulatory guidance, and Schedule of assessments.
2.- Applications in clinical trials conduct.
Once the clinical trial is designed, different AI-tools are being proposed for each of the processes involved in study conduct including clinical operations, data management and patient monitoring.
a) Site selection: ML can be used to rank a list of sites for a given study to optimize enrollment. Some solutions leverage principles of reinforcement learning, where ML can learn to identify a set of trial sites that, together, yield a higher patient enrollment for a given clinical trial and ensure the enrolled cohort is diverse. Specifically, during training, the ML model learns to use trial protocol details (e.g., condition, inclusion/exclusion criteria), trial site features, previous performance, claims data and patient demographics at the trial sites (e.g., ethnicity, age) to produce a ranked list of potentially desirable trial-sites that account for performance and diversity. By helping to pinpoint sites that can engage diverse patient populations, it would help improve trial awareness, access and participation [13 ].
b) Patient identification and recruitment: 80% of clinical trials around the world are getting delayed and abandoned due to poor recruitment – a problem that is costing the industry billions annually. Once the specific cohort has been selected, natural language processing (NLP) has shown promise in identification of patients matching the desired phenotype, which is otherwise a labor-intensive process. Commonly, DL algorithms jointly encodes enrollment criteria and patient records into a shared latent space, matching patients to trials using EHR data in a significantly more efficient manner than other machine learning approaches [14,15 ]. Nonetheless, when using machine learning for patient identification and recruitment, it is always necessary to address biases and fairness [16,17,18]. In particular, a potential source of selection and information bias of using data collected in EHR is introduced by the patterns of missingness in EHR data. As highlighted in [19], unfair under-representation may occur during the process of subject selection, recruitment, and enrollment in a clinical trial in oncology which may arise from explicit exclusion of certain subgroups. While ML-enabled recruiting tools could exacerbate under-representation when not properly addressed, it may also open novel solutions to overcome it. For example, instead of direct patient-trial matching, patient phenotyping research which focuses on phenotyping algorithms and aims to categorize patients based on health outcomes or disease states, is an intermediary to patient-trial matching. In this regard, ML methods, both supervised [20] and unsupervised [21], have been developed to learn disease phenotypes from EHR data, to identify representative cohorts that could benefit from proposed interventions and may inform treatment and research. Future work should focus upon developments to patient identification, phenotyping, and trial-matching algorithms to improve not only performance, but also fairness, explainability, and privacy [18]. Different commercial platforms focus on matching patients to clinical trials. For example, the Mayo Clinic breast cancer division, over the 18-month period following implementation of a clinical trial matching system (Watson CTM), reported an average enrollment increase of 80% in breast cancer clinical trials [22]. There is an increasing number of open-source AI-tools for this task, for example, Criteria2Query [23] parse free-text inclusion criteria and produce a structured cohort definition that can be executed against OMOP CDM. Also, DeepEnroll is a patient-trial matching with deep embedding and entailment prediction [24] and Compose is based on a cross-modal pseudo-siamese DL network [25].
c) Patient retention and adherence: ML has also been used either to collect and analyze data to identify and intervene upon participants at high risk of study non-compliance or to decrease participant study burden and thereby improve participants’ experiences. For example, some commercial solutions built upon open-source libraries as OpenDBM [26] (which integrates existing tools for behavioral measurements such as facial activity, voice, speech, and movements) use facial recognition to monitor patient adherence but unfortunately the validation process has not been published for this task.
3.- Applications in Clinical Data Management
In general, there are many effective ML approaches to clinical trial data management, processing, and analysis, nonetheless there are fewer techniques for improving the quality of data as they are generated and collected.
a) Automate data acquisition: AI tools can be used to improve the efficacy of the data entry process for clinical research, both minimizing the burden of the manual data entry process in the eCRF and at the same time aiding with chart reviews. The use of natural language processing to extract structured data from unstructured medical notes has been proposed long ago, initially combining rule-based and statistical techniques [27,28], and its performance is expected to continuously improve aided by DL advances, fine-tuning more semantically capable large language models.
b) Coding tools: medical coding is one of the routine tasks where AI can boost efficiency. WHODrug Koda has developed an AI-based coding tool for concomitant drug coding in clinical trials and drug coding of AE reports in VigiBase [29]. Also, while not specific for medical coding, many companies offer automated labeling of routine free-text clinical records.
c) Monitor data quality: ML can power risk-based monitoring approaches to clinical trial surveillance, enabling the prevention and/or early detection of site failure, fraud, and data inconsistencies or incompleteness that may delay database lock and subsequent analysis. For instance, even when humans collect data into case report forms (often transmitted in PDF form), the adequacy of the collected data for outcome ascertainment can be assessed by combining optical character recognition with NLP [30]. Suspicious data patterns in clinical trials, or incorrect data in observational studies, can be identified by applying auto-encoders to distinguish plausible from implausible data [31].
d) Assign outcome events: the ability to automatically process adverse events related to drug therapies using a combination of optical character recognition and NLP has been proposed by companies [32]. A potential barrier to implementation of semi-automated event adjudication is that endpoint definitions and the data required to support them often change from trial to trial.
e) Analysis: study datasets are characterized by being highly dimensional and sparse. ML in particular predictive modeling is being used for multiple tasks in data analysis including discovering new hypotheses, novel trends, safety signals and other biological features. For example, ML can provide methods of identifying treatment effect heterogeneity pre- and post-trial, assessing generalizability, and aiding with generalization once heterogeneity is exposed [33]. Also, unsupervised learning can identify phenotypic clusters to help hypothesis generation to be further explored in clinical trials. Lastly, when dealing with RWD a challenge is to derive RWE from it, (i.e., making causal inference), in this field, some of the proposed techniques include ML techniques, in particular optimal discriminant analysis and ML-based propensity score weighting to estimate treatment effects [39].
4.- Applications in Regulatory and medical narrative.
ML has also been proposed, as described by [34] in a proof-of-concept experiment, for document preparation for regulatory submissions with parallel search, document creation, and data integrity review capabilities. There is an emerging industry in this area, where companies offer solutions for patient safety narratives, clinical study report generation using deep-learning generative models combined with deep learning data extraction algorithms and knowledge graphs. Of note, in the complex domain of clinical trials, the use of knowledge graphs, as multi-relational graph that represents rich factual information among entities of diverse classifications, are relevant tools to facilitate both reliable analysis and generation of narratives from factual data (e.g., treatment exposure, treatment response and other efficacy results, adverse events, laboratory tests, vital signs, etc). Also, latest DL advances in natural language generation increase operational efficiency in document generation as a first draft can be generated automatically, reducing weeks of authoring time and number of reviews required.
5.- Applications in Pharmacovigilance, Phase IV studies:
In a recent systematic review [35] on the use of AI in pharmacovigilance of products already in the market and pharmaceuticals in development sixty-six articles were identified from 2015 to 2021. Most relevant articles focused on machine learning, and it was used in patient safety in the identification of adverse drug events (ADEs) and adverse drug reactions (ADRs) (57.6%), processing safety reports (21.2%), extraction of drug–drug interactions (7.6%), identification of populations at high risk for drug toxicity or guidance for personalized care (7.6%), prediction of side effects (3.0%), simulation of clinical trials (1.5%), and integration of prediction uncertainties into diagnostic classifiers to increase patient safety (1.5%). Artificial intelligence has been also used to identify safety signals through automated processes and training with machine learning models; however, the findings may not be generalizable given that there were different types of data included in each source. When considering any of those solutions it is important to evaluate how the challenges associated with RWD obtained from EHRs have been accounted for and how could impact on each of the above-mentioned tasks. Since the past five years, tools are increasingly adopting DL methods. For example, for ADE identification from clinical notes in EHRs, [36] designed a deep learning model for extracting ADEs and related information such as medications dosage, frequency, and indications employing a multitask learning approach for learning simultaneously the task of named entity recognition and relation extraction. In general, current available tools in ADEs identification consists of end-to-end workflows of NLP steps including deep-learning based processes such as embedding transformation of text, named-entity-recognition (NER), entity linking, relation extraction and other final tasks using pre-trained language models to process conversational text, extract adverse events and drug information so to enable various downstream use cases for pharmacovigilance [37].
Another illustrative application is in post-marketing PK/PD studies. Collection of data required to perform post-marketing population pharmacokinetic (PK) and pharmacodynamic (PD) studies has been cost-prohibitive. Many post-marketing PK/PD studies performed using EHRs to date have used manual curation methods that are not easily scalable and may not provide transparent high quality data abstraction. [38] applied a standardized and scalable, pipeline to extract and process data from EHRs for post-marketing PK/PD studies on fentanyl and other drugs. Although more research is needed to identify if this optimization has an impact on the quality of safety analyses. It is expected that its use will increase in the near future, particularly with its role in the prediction of side effects and ADRs.
Conclusions:
The area of AI/ML applied in clinical research offer a myriad of opportunities to boost quality and efficiency in every drug development process, from clinical trial design, clinical trial conduct, data management, regulatory and pharmacovigilance.
Importantly, AI-enabled clinical research is regarded as a high-risk proposition for processes such as patient-trial matching and/or clinical trial design. Nonetheless, in spite of its need, regulatory guidance and reporting standardization is lagging behind AI applied to medical devices for clinical use. As a result, peer-reviewed evidence and or external validation may be lacking behind market claims.
At the time of writing, given the growing efforts in regulatory guidance and reporting standardization, coupled with the increasing number of companies specializing in this field, and the domain knowledge capabilities of recently released large language models, expectations are high that those AI-tools will be successfully and progressively adapted in drug development.
Dr. Aurelia Bustos, MD, PhD