Two studies evaluate development of artificial intelligence tools for health care

Reinforcement Learning, an artificial intelligence approach, has the potential to guide physicians in designing sequential treatment strategies for better patient outcomes but requires significant improvements before it can be applied in clinical settings, finds a new study by Weill Cornell Medicine and Rockefeller University researchers. Reinforcement Learning (RL) is a class of machine learning algorithms able to make a series of decisions over time. Responsible for recent AI advances, including superhuman performance at chess and Go, RL can use evolving patient conditions, test results and previous treatment responses to suggest the next best step in personalized patient care.

This approach is particularly promising for decision making for managing chronic or psychiatric diseases. The research , published in the Proceedings of the Conference on Neural Information Processing Systems (NeurIPS) and presented Dec. 13, introduces "Episodes of Care" (EpiCare), the first RL benchmark for health care.

"Benchmarks have driven improvement across machine learning applications including computer vision, natural language processing, speech recognition and self-driving cars. We hope they will now push RL progress in health care," said Dr. Logan Grosenick, assistant professor of neuroscience in psychiatry, who led the research.

RL agents refine their actions based on the feedback they receive, gradually learning a policy that enhances their decision-making. "However, our findings show that while current methods are promising, they are exceedingly data hungry," Dr. Grosenick adds.

The researchers first tested the performance of five state-of-the-art online RL models on EpiCare. All five beat a standard-of-care baseline, but only after training on thousands or tens of thousands of realistic simulated treatment episodes. In the real world, RL methods would never be trained directly on patients, so the investigators next evaluated five common "off-policy evaluation" (OPE) methods: popular approaches that aim to use historical data (such as from clinical trials) to circumvent the need for online data collection.

Using EpiCare, they found that state-of-the-art OPE methods consistently failed to perform accurately for health care data. "Our findings indicate that current state-of-the-art OPE methods cannot be trusted to accurately predict reinforcement learning performance in longitudinal health care scenarios," said first author Dr. Mason Hargrave, research fellow at The Rockefeller University.

As OPE methods have been increasingly discussed for health care applications, this finding highlights the need for developing more accurate benchmarking tools, like EpiCare, to audit existing RL approaches and provide metrics for measuring improvement. "We hope this work will facilitate more reliable assessment of reinforcement learning in health care settings and help accelerate the development of better RL algorithms and training protocols appropriate for medical applications," said Dr. Grosenick.

Adapting convolutional neural networks to interpret graph data In a second NeurIPS publication presented on the same day, Dr. Grosenick shared his research on adapting convolutional neural networks (CNNs), which are widely used to process images, to work for more general graph-structured data such as brain, gene or protein networks. The broad success of CNNs for image recognition tasks during the early 2010s laid the groundwork for " deep learning " with CNNs and the modern era of neural-network-driven AI applications.

CNNs are used in many applications, including facial recognition, self-driving cars and medical image analysis. "We are often interested in analyzing neuroimaging data which is more like graphs, with vertices and edges, than like images. But we realized that there wasn't anything available that was truly equivalent to CNNs and deep CNNs for graph-structured data," said Dr.

Grosenick. Brain networks are typically represented as graphs where brain regions (represented as vertices) propagate information to other brain regions (vertices) along "edges" that connect and represent the strength between them. This is also true of gene and protein networks, human and animal behavioral data and of the geometry of chemical compounds like drugs.

By analyzing such graphs directly, we can more accurately model dependencies and patterns between both local and more distant connections. Isaac Osafo Nkansah, a research associate who was in the Grosenick lab at the time of the study and first author on the paper, helped develop the Quantized Graph Convolutional Networks (QuantNets) framework that generalizes CNNs to graphs. "We're now using it for modeling EEG (electrical brain activity) data in patients.

We can have a net of 256 sensors over the scalp taking readings of neuronal activity—that's a graph," said Dr. Grosenick. "We're taking those large graphs and reducing them down to more interpretable components to better understand how dynamic brain connectivity changes as patients undergo treatment for depression or obsessive-compulsive disorder.

" The researchers foresee broad applicability for QuantNets. For instance, they are also looking to model graph-structured pose data to track behavior in mouse models and in human facial expressions extracted using computer vision. "While we're still navigating the safety and complexity of applying cutting-edge AI methods to patient care, every step forward—whether it's a new benchmarking framework or a more accurate model—brings us incrementally closer to personalized treatment strategies that have the potential to profoundly improve patient health outcomes," concluded Dr.

Grosenick. More information: Mason Hargrave et al, EpiCare: A Reinforcement Learning Benchmark for Dynamic Treatment Regimes (2024) Isaac Osafo Nkansah et al, Generalizing CNNs to graphs with learnable neighborhood quantization (2024).