Skip to content

AI in data management

Growing amounts of collected clinical data augment the importance of reassessing the methods used for data processing and managing. Risk-based monitoring of the collected data is an integral part of any quality management program, it is an obligatory demand of the regulatory agencies and it substantially contributes to reducing the probability of costly failures (Hawwash, Applied Clinical Trials, 2018). However, typically, it is slow and expansive. The industry has been successful in reducing costs with the risk-based monitoring approach, where only sites who carry high risk, or their data has shown to fall outside expected monitoring trends, are targeted for regular monitoring visits. Based on this success, discussions have started around the thought of risk-based data management activities. This would mean focusing all data management activities on a subset of data critical to end point analysis, and then reviewing the rest of the data sets holistically and only focusing further attention on outliers, or suspicious trends within that dataset. Although the idea does seem to be logical, one must question whether the quality of data has remained the same through risk-based monitoring approaches, because the data management team has continued to clean all aspects of data, and what would the impact be on the integrity of the data if we add another level of risk-based review. The choice, however, on whether or not to use this approach may be determined by the extent of the Sponsor’s in-house investment in machine learning – Artificial intelligence approach. The machine learning approach means that a more focused review of data is driven by “lessons learnt” from previous trial data. The details of this approach are fascinating and can easily be discussed as a whole topic on its own. In short, companies are using machine learning approaches to understand the data they already have in-house and how they can more effectively learn from it for future studies/trials. Aligned with this they are also looking at current clinical trials with “active data” being collected and how they can learn about trial data while it is being captured, actually “training” their systems to highlight data which is of concern and needs additional review. Of course, machine learning is only as “smart” as the size of the data pool it has to refer to. The more scenarios the system encounters the “smarter” it becomes. In addition, machine learning algorithms need careful attention from clinical trial researchers so as to meet the needs of their specific trial. The results should be interpreted in light of that trial’s detailed design and research objectives. Upon educated assessment of the machine learning outputs, applying these algorithms could be a much more suitable way to direct data cleaning activities compared to a risk-based data management approach which is dependent on human assumption focused on end point analysis.