Not All AI Is Created Equally: The Case For Healthcare-Specific Language Models

featured-image

Not all AI is created equally, and healthcare is a prime example of why. Here are some important considerations for practitioners.

David Talby, PhD, MBA, CTO at John Snow Labs . Solving real-world problems in healthcare, life sciences and related fields with AI and NLP. Large language models (LLMs) like ChatGPT and other general-purpose artificial intelligence (AI) systems have demonstrated remarkable versatility, excelling in tasks such as summarization, content generation and conversational interfaces.

However, when it comes to medicine, the limitations of general-purpose AI become glaringly apparent. Healthcare is uniquely complex, requiring a specialized approach to AI development. In fact, there's overwhelming evidence from academic research that domain-specific and task-specific LLMs outperform general-purpose LLMs across multiple dimensions.



The moral of the story? Not all AI is created equally, and healthcare is a prime example of why. Here are some important considerations for practitioners. Finance and law are other highly regulated fields where AI is playing an increasingly important role.

However, although these industries deal with intricate processes, extensive regulations and large datasets, healthcare presents an even greater challenge due to the sheer complexity of humans and our healthcare systems. With the nuanced nature of medical language and the ethical stakes involved, accuracy is key. As a result, healthcare is one area where domain-specific LLMs tend to outperform general-purpose LLMs on both public benchmarks like OpenMed and in real-world implementations.

This has been the case consistently since transformers were introduced. This is because models trained on domain-specific data result in more contextually relevant outputs than say, ChatGPT, reducing the risk of inaccuracies or hallucinations. Additionally, research shows that domain-specific LLMs often outperform human experts for clinical text summarization across a range of documents, surpassing human summaries in areas like completeness, correctness and conciseness.

This can lead to significant time savings for overburdened medical professionals, who reported spending "nearly 2 additional hours on EHR and desk work within the clinic day" for every hour spent with a patient. One of the primary reasons general-purpose AI struggles in healthcare is the distinct nature of medical language. Medical terminology is not only highly specialized but also context-dependent.

The same term can have different meanings based on the medical specialty, patient history or even regional practices. For instance, the abbreviation “RA” could mean rheumatoid arthritis to a rheumatologist, but to a cardiologist, it might mean right atrium. Similarly, drug interactions and dosages are highly specific to patient physiology, comorbidities and genetic factors.

General-purpose LLMs trained on broad datasets may not have the necessary depth of understanding to accurately interpret and apply medical knowledge without significant fine-tuning. Medicine also relies heavily on implicit knowledge and unstructured data. Clinical notes, for example, contain shorthand, abbreviations and informal language that may not be well-represented in generic AI models.

A healthcare-specific LLM must be trained on vast amounts of domain-specific data, including electronic health records (EHRs), imaging, peer-reviewed medical literature and real-world clinical dialogues, to ensure accurate comprehension and decision support. Such models are already making a difference in areas like radiology, pathology and drug discovery. AI-powered diagnostic tools assist radiologists in detecting abnormalities in medical imaging with higher accuracy, and AI-driven research platforms help identify potential drug candidates faster than traditional methods.

Let’s not forget the operations side—healthcare-specific LLMs can predict appropriate staffing levels and help streamline backend tasks, like billing insurance. Another key reason healthcare AI must be distinct from general-purpose AI is the ethical and regulatory landscape. The healthcare industry operates under strict guidelines, such as HIPAA in the U.

S. and the GDPR in Europe, which govern the use of patient data. Any AI system handling sensitive health information must comply with these regulations.

Furthermore, transparency in AI decision-making is critical in medicine. A financial AI model that recommends an investment strategy can afford to be a “black box” to some extent, as long as it delivers strong results. In contrast, a healthcare AI model that assists in diagnosing cancer or recommending treatment options must be fully interpretable so that doctors can understand and validate its reasoning before making clinical decisions.

Bias is another major concern. General-purpose LLMs trained on internet data may reflect biases present in those datasets, leading to disparities in AI-driven healthcare recommendations. Healthcare-specific models must be trained on diverse, representative medical data to ensure they serve all patient populations fairly and equitably.

Although healthcare-specific LLMs can bring enormous value to users, they're not challenge-free and can embody many of the same hurdles as any new tech implementation. Vendor selection, appropriate resources from budget to talent, internal training and establishing AI policies and best practices are key for success. First, ensuring you work with vendors that meet security and compliance expectations is paramount.

In healthcare, especially, dealing with personally identifiable information raises the stakes for safe data practices. For this reason, it may be helpful to consider AI solutions deployed locally, where all model training runs inside the boundaries of your deployment environment and isn't shared externally—even with the vendor. Another challenge is finding the right humans to ensure AI is put to good use.

Because of budget constraints or a sheer lack of talent, organizations must decide whether they can hire data scientists to develop and run solutions internally or rely on domain experts, like doctors and nurses, to leverage no-code AI solutions. Although the latter can be a better option given their industry knowledge, this must come with training and education to ensure the AI is used responsibly and is complementary to existing clinical workflows. Although general-purpose AI is transforming many industries, healthcare stands alone in its complexity, language and ethical and regulatory considerations.

To fully realize the potential of AI in medicine, we should consider healthcare-specific AI because precision isn't just a luxury in this field—it's a necessity. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?.