Large Language Models for Healthcare

Industry Application

Large Language ModelsHealthcare

Large language models are reshaping healthcare at a pace that outstrips nearly every other industry—yet the gap between potential and deployed reality remains vast. As of early 2026, 70% of healthcare organizations report actively using AI (up from 63% in 2025), with 69% specifically deploying generative AI and LLMs, up from 54% the prior year. The transformation spans clinical documentation, diagnostic reasoning, drug discovery, patient engagement, and administrative workflows—but regulatory friction, hallucination risk, and deeply entrenched legacy systems mean healthcare's LLM adoption curve looks fundamentally different from sectors like software engineering or marketing.

Ambient Clinical Documentation: The Killer App

The most immediately impactful application of LLMs in healthcare is ambient clinical documentation—AI systems that listen to patient-physician conversations and automatically generate structured clinical notes. This addresses one of medicine's most persistent problems: physicians spend roughly two hours on documentation for every hour of patient care, driving burnout rates above 50% in many specialties.

Microsoft's Dragon Copilot (the rebranded merger of DAX Copilot and Dragon Medical One, completed March 2025) is the market leader, deployed across thousands of clinicians in the U.S., Canada, and the U.K., with European expansion into Austria, France, Germany, and Ireland underway. The system integrates directly with Epic's EHR through AI Charting, where Microsoft's Dragon Ambient AI handles transcription and Epic generates the final clinical note. Pricing at $400+/provider/month positions it squarely for large enterprise health systems.

However, real-world results have been sobering. A longitudinal study published in NEJM AI found that DAX did not make clinicians as a group more efficient, and a study at Atrium Health involving 112 clinicians concluded that widespread implementation is unlikely to generate appreciable productivity gains for health systems. Competitors like Abridge have carved out significant market share by consistently saving clinicians 1–2 hours of documentation time per day while preserving note quality and coding detail—suggesting the technology works, but implementation context matters enormously.

Epic itself has entered the ambient documentation space with a native AI scribe built in partnership with Microsoft, embedding the capability directly into clinical workflows rather than relying on third-party overlays. At Epic's 2025 UGM, CEO Judy Faulkner highlighted hundreds of AI features already available with hundreds more in the pipeline.

Diagnostic Reasoning and Clinical Decision Support

LLMs have achieved 83.3% diagnostic accuracy in healthcare benchmarks, with specialized models pushing higher. Google's Med-Gemini family—built on Gemini and fine-tuned for medicine—achieved 91.1% accuracy on MedQA (the USMLE medical licensing exam), surpassing GPT-4 across every benchmark where direct comparison was viable. These models are multimodal, integrating text with medical imaging, lab data, and longitudinal patient timelines, making them particularly valuable in radiology and pathology.

Yet a critical regulatory gap persists: no LLM is currently FDA-authorized as a clinical decision support device. The first breakthrough came in February 2025 when Aidoc received FDA clearance for a rib fracture triage solution built on its CARE1 Foundation Model—the first clearance of a foundation model-powered clinical AI device. Later in 2025, the FDA granted Breakthrough Device Designation to RecovryAI, an LLM-powered chatbot for post-surgical recovery—a first for generative AI in the device space. The FDA is now developing methods to identify and tag medical devices incorporating foundation models, signaling that a regulatory framework for LLM-based clinical tools is actively forming.

Drug Discovery and Life Sciences

LLMs are accelerating pharmaceutical R&D not through replacing bench science but by compressing the information bottleneck. Domain-specific models like PharmBERT (pre-trained on 138,924 drug labels from DailyMed) extract pharmacokinetic information with significantly higher accuracy than general-purpose models. Companies like Recursion Pharmaceuticals, Insilico Medicine, and Exscientia are leading the integration of LLMs into discovery pipelines.

The results are reaching clinical validation. Insilico Medicine reported positive Phase IIa results for ISM001-055 in idiopathic pulmonary fibrosis—a molecule designed with AI assistance. In December 2025, Takeda announced that zasocitinib, an AI-designed molecule developed through its partnership with Nimbus Therapeutics, showed efficacy in two late-stage clinical trials for plaque psoriasis. Eli Lilly opened its TuneLab AI suite to select biotechs in September 2025, exchanging access for data-sharing to further train its models—a sign that AI drug discovery is shifting from experimental to strategic.

Patient-Facing AI and Agentic Healthcare

The emergence of AI agents in healthcare represents the next frontier. Hippocratic AI is developing clinically tuned LLMs specifically for patient interactions—handling tasks like post-discharge follow-up, medication adherence check-ins, and chronic disease management rather than diagnosis or prescribing. OpenAI launched ChatGPT Health in early 2026, a consumer-facing health product powered by GPT-5.2 that retrieves evidence from millions of peer-reviewed studies with transparent citations.

McCrae Tech announced Orchestral, described as the first health-native AI orchestrator, designed to unify disparate healthcare data sources and connect them with a library of governed AI agents and workflows. This points toward an agentic model where LLMs don't just answer questions but coordinate complex multi-step healthcare processes across systems.

The Regulatory and Compliance Landscape

Healthcare's regulatory environment creates unique constraints for LLM deployment. HIPAA's minimum necessary standard—requiring that AI tools access only the PHI strictly necessary for their intended purpose—conflicts fundamentally with how LLMs operate, since they typically benefit from comprehensive context. Using public LLMs without a signed Business Associate Agreement triggers serious HIPAA violations, yet 67% of healthcare organizations were unprepared for stricter HIPAA security standards as of 2025.

State-level regulation is accelerating: California now requires disclosure when AI might mislead patients into believing they're interacting with licensed professionals (effective January 2026), and Texas's TRAIGA mandates written disclosure to patients when AI is used in diagnosis or treatment (also effective January 2026). The HHS Office for Civil Rights is preparing comprehensive AI-specific HIPAA guidance for release in 2026, likely including mandatory AI impact assessments before deploying any system that processes PHI. Homomorphic encryption—allowing AI models to process encrypted PHI without decrypting it—is emerging as a potential technical solution, though practical deployment at scale remains limited.

Applications & Use Cases

Ambient Clinical Documentation

LLMs listen to patient-physician conversations and auto-generate structured clinical notes. Microsoft's Dragon Copilot and Abridge save clinicians 1–2 hours daily on documentation, integrating directly with Epic and other major EHR systems to reduce burnout and free up face time with patients.

Diagnostic Reasoning & Triage

Specialized models like Med-Gemini achieve 91.1% accuracy on medical licensing exams. Aidoc's CARE1 Foundation Model received the first FDA clearance for a foundation model-powered triage tool, analyzing radiology images to prioritize critical findings for clinician review.

Drug Discovery & Molecular Design

LLMs compress the information bottleneck in pharmaceutical R&D, with AI-designed molecules like Takeda's zasocitinib reaching late-stage clinical trials. Domain-specific models like PharmBERT extract pharmacokinetic data from drug labels with precision that general-purpose models cannot match.

Patient Engagement & Follow-Up

Hippocratic AI deploys clinically tuned LLMs for post-discharge follow-ups, medication adherence check-ins, and chronic disease management. RecovryAI received FDA Breakthrough Device Designation for an LLM-powered chatbot guiding patients through joint replacement recovery.

Medical Literature Synthesis

LLMs with long context windows (100K–200K tokens) can process entire research papers, systematic reviews, and clinical guidelines in a single pass. OpenAI's ChatGPT Health retrieves evidence from millions of peer-reviewed studies with transparent citations for clinician and consumer use.

Revenue Cycle & Administrative Automation

LLMs automate prior authorization, claims processing, coding assistance, and denial management. Epic's AI features now span hundreds of administrative workflows, reducing the operational overhead that consumes roughly 30% of U.S. healthcare spending.

Key Players

Microsoft/Nuance — Dragon Copilot (formerly DAX Copilot) dominates enterprise ambient clinical documentation, deployed across thousands of clinicians with deep Epic EHR integration and expansion into European markets.
Google (DeepMind) — Med-Gemini and the open-source MedGemma family set the benchmark for medical LLM accuracy, achieving 91.1% on USMLE and advancing multimodal clinical reasoning across radiology and pathology.
Epic Systems — The dominant EHR vendor is embedding native AI charting, AI agents, and Cosmos AI (trained on 8+ billion encounters) directly into clinical workflows used by the majority of U.S. hospitals.
Abridge — Audio-based medical conversation AI that consistently saves clinicians 1–2 hours per day on documentation while preserving note quality, competing directly with Dragon Copilot in the ambient documentation space.
Hippocratic AI — Building clinically tuned LLMs and AI agents specifically for patient-facing interactions like follow-ups and chronic disease management, with partnerships across hospitals, insurers, and pharma companies.
OpenAI — Launched ChatGPT Health (consumer) and ChatGPT for Healthcare (enterprise) powered by GPT-5.2, featuring HIPAA-compliant evidence retrieval from peer-reviewed medical literature.
Recursion Pharmaceuticals — Leading AI-driven drug discovery platform combining LLMs with biological datasets to accelerate target identification and molecular design at industrial scale.
Insilico Medicine — AI-designed molecule ISM001-055 achieved positive Phase IIa results for idiopathic pulmonary fibrosis, validating end-to-end AI drug discovery from target identification through clinical trials.

Challenges & Considerations

Hallucination and Clinical Safety — LLMs can generate plausible but incorrect medical information with high confidence. A 2025 clinical review identified hallucinations as the most significant unresolved deployment risk, with potentially life-threatening consequences when AI-generated errors enter clinical notes or treatment recommendations.
HIPAA Compliance and Data Privacy — The HIPAA minimum necessary standard conflicts with how LLMs operate, and 67% of healthcare organizations remain unprepared for stricter security requirements. Using public LLMs without a signed BAA triggers serious violations, yet practical HIPAA-compliant deployment architectures remain expensive and complex.
Regulatory Uncertainty — No LLM has received FDA authorization as a clinical decision support device. The FDA is still developing frameworks to identify and regulate devices incorporating foundation models, creating uncertainty for companies building LLM-powered clinical tools and slowing enterprise adoption.
Fragmented State-Level Legislation — California, Texas, and other states have enacted AI disclosure requirements effective 2026, each with different mandates. Healthcare organizations operating across state lines face a patchwork of compliance obligations with no federal harmonization in sight.
Productivity Paradox — Despite significant investment, NEJM AI research found that ambient documentation tools did not make clinicians measurably more efficient at the system level. The gap between promising pilot results and scalable ROI remains a barrier to enterprise-wide deployment and budget justification.
Integration with Legacy Systems — Healthcare IT infrastructure is notoriously fragmented, with HL7, FHIR, and proprietary EHR formats creating interoperability challenges. LLM deployment requires not just model capability but deep integration with clinical workflows, data pipelines, and existing vendor ecosystems.