AI Foundations for Life Sciences — A Taxonomy

AI Foundations for Life Sciences — A Taxonomy

A guide for life sciences operations

A reader reached out recently after working through our AI Readiness series. The feedback was direct: the strategic content had value, but applying it was harder than it needed to be. The missing piece wasn’t more frameworks—it was foundational grounding. Definitions. Descriptions. How the pieces relate to each other.

It’s a fair point. I’ve been writing for an audience I assumed had the same exposure I’ve accumulated—through partnerships with AI systems that included implementation and usage, through client work, and the steady drip of industry noise. That assumption created a gap. This taxonomy exists to map AI implementations to validation and oversight decisions—not to debate definitions.

Upon review, I found I was using knowledge I hadn’t carefully formalized. I’ve built AI competence the way many of us do—hands-on, through projects, vendor conversations, and problem-solving. That works until you need to explain the mechanics to an auditor or write intended use documentation that holds up under scrutiny. You need to be able to articulate it.

This article—and the two that follow—addresses both gaps. Consider it the glossary and conceptual foundation for the AI Readiness series and the validation articles that leverage it. If terms like “static AI,” “RAG,” or “agentic” have felt like jargon you’re expected to accept on faith, this is where we unpack the mechanics. And here’s the good news: not all AI is complicated.

Some systems marketed as “AI-powered” are essentially business rules—deterministic, auditable, validatable with methods you already know. Others are genuinely complex: adaptive, probabilistic, evolving without explicit code changes. The practical path forward is to confirm you have the simple stuff covered, then identify precisely where complexity begins to change your validation approach. That requires knowing where any given system falls on the spectrum.


Why Taxonomy Matters in Regulated Environments

You can’t validate what you don’t understand. You can’t assess risk for a system whose behavior you can’t characterize. And you can’t scope an intended use statement properly if you don’t understand how the system does what it does.

In GxP environments, we don’t get to hand-wave. Auditors ask “how does this work?” and “how do you know it’s working correctly?” The answers require more than vendor marketing copy.

The goal here isn’t to turn validation professionals into data scientists. It’s to build and level set enough mechanical understanding to:

  • Ask the right questions of vendors and internal teams
  • Recognize which category a system falls into—and what that implies for oversight
  • Map AI types to validation approaches proportionate to risk

One additional dimension cuts across all categories: feature intent. Feature intent matters because it determines what happens when the system is wrong.

The questions to ask: Does it impact critical data? Intersect patient safety? Drive secondary or tertiary level data? Does the output appear in submissions or audit trails—or stay internal? Is it reversible, or has the system already acted? Is a human in the loop before outputs affect regulated processes?

A system that suggests is a different risk profile than one that executes. Two implementations using identical technology—same ML model, same training data—carry different validation burdens depending on whether outputs feed GxP decisions or support internal analytics. Intent shapes risk as much as architecture does.


The Spectrum: From Deterministic to Probabilistic

Before the categories, one foundational distinction.

Traditional validated systems are deterministic: same input, same output, every time. You test it, it passes, behavior is locked. That’s the mental model CSV was built on.

AI introduces probabilistic systems: outputs that fall within statistical ranges, confidence scores instead of binary results, behavior that may shift without explicit code changes.

But not all AI is probabilistic.

Vendors market rule-based systems as “AI-powered” constantly. Whether that’s technically accurate depends on who you ask—but it’s what your audience encounters. A routing engine that classifies documents based on configured logic may carry an AI label, but it’s deterministic. Same input, same output. Validatable with current CSV/CSA methods. No special considerations.

The challenge begins when systems move beyond rules: machine learning models that produce confidence scores rather than binary outputs, deep learning that can’t explain its internal logic, adaptive systems that change behavior based on new data.

The practical path is progressive: confirm your current methods cover the simpler systems, then identify exactly where complexity forces a different approach. That inflection point varies by system—and recognizing it is the whole point of understanding the taxonomy.

This isn’t a defect in AI—it’s the nature of certain types. But it means validation must rely on risk assessment to account for statistical evidence, not just functional verification. (This is the core challenge addressed in Part 4 of the AI Readiness series.)


The Taxonomy

1. Rule-Based Systems

The simplest category—and the one most familiar to validation professionals. These systems do exactly what they’re programmed to do, nothing more. They’re the baseline against which everything else gets more complicated.

GxP Analogy: Configured business rules in an ERP, LIMS, or QMS

  • Can process multiple variables and complex branching—sophistication doesn’t change the category
  • Deterministic: same input always produces same output
  • Changes only through explicit code or configuration updates
  • Fully auditable and explainable

Life sciences example: A QMS that routes deviations based on product type, severity classification, and affected batch characteristics. The logic may be complex, but it’s visible, testable, predictable.

Validation implication: Traditional CSV/CSA applies directly. No special considerations.


2. Machine Learning (ML)

The first step beyond explicit programming. Instead of defining rules, you provide examples and the system learns patterns from data. The output is no longer a binary answer but a prediction with associated confidence. This is where traditional validation assumptions start to bend.

GxP Analogy: Statistical process control—but the system learns the control limits from data rather than having them configured

  • Outputs are predictions or classifications with associated confidence
  • Model is trained once, then deployed (static/non-adaptive)
  • Behavior is fixed until explicitly retrained

Life sciences examples:

  • Predictive model for clinical site enrollment performance
  • Classification model flagging potential adverse events in safety narratives
  • Image analysis for cell culture contamination detection

Validation implication: Requires statistical validation—accuracy metrics, performance baselines, representative test sets. But if static, the model doesn’t change post-deployment without retraining, which triggers revalidation.


3. Deep Learning / Neural Networks

A more powerful form of machine learning using layered architectures that can identify patterns humans can’t articulate. The tradeoff: as capability increases, visibility into how the system reaches its conclusions decreases. You gain performance; you lose explainability.

GxP Analogy: Pattern recognition you can’t easily audit

  • Often applied to images, audio, unstructured text
  • Outputs are probabilistic
  • Internal logic is opaque—the “black box” problem

Life sciences examples:

  • Pathology image analysis for tumor detection
  • Signal detection in pharmacovigilance data
  • Predictive models for compound toxicity

Validation implication: Explainability becomes critical. You may not be able to audit the internal logic, but you must be able to characterize performance, document limitations, and establish human oversight for outputs that affect regulated decisions.


4. Natural Language Processing (NLP)

Not a step up in complexity so much as a different domain: unstructured text. NLP can be implemented using rule-based methods, traditional ML, or deep learning—so where it falls on the complexity spectrum depends on how it’s built. Ask the vendor.

GxP Analogy: Automated classification applied to unstructured text

  • Can be rule-based, ML-based, or built on deep learning
  • Used for text classification, entity extraction, summarization, translation

Life sciences examples:

  • Extracting MedDRA-coded terms from adverse event narratives
  • PHI detection in free-text clinical forms
  • Automated review of regulatory submission documents

Validation implication: Performance varies by document type, terminology, and language. Testing must cover representative inputs. Confidence thresholds and human review protocols are essential.


5. Large Language Models (LLMs)

Deep learning scaled massively. LLMs are trained on enormous text datasets and generate human-like responses—but they have no understanding of truth. They predict plausible text, not verified facts. This is where hallucination becomes a predictable failure mode, rather than an edge case, because outputs are optimized for plausibility rather than truth.

GxP Analogy: Very sophisticated autocomplete trained on massive text corpora

  • Generate human-like text based on input prompts
  • Probabilistic: same prompt may produce different outputs
  • No inherent knowledge verification—they predict plausible text, not truth

Life sciences examples:

  • Drafting SOPs, protocols, or regulatory response documents
  • Summarizing clinical literature
  • Generating test scripts or validation documentation (with human review)

Validation implication: Outputs require human verification. LLMs can hallucinate—generate plausible but incorrect information. Use cases in GxP must include content review guardrails. Not appropriate for autonomous decision-making in regulated processes. One common approach to constraining hallucination is Retrieval-Augmented Generation (RAG), which grounds outputs in retrieved source documents—covered in the next article in this series.


6. Generative AI

The broader category that includes LLMs. Where earlier systems classify, predict, or extract, generative systems create: text, images, code, audio. The outputs didn’t exist in the training data—they’re synthesized. Novel content requires different oversight assumptions than retrieved or calculated results.

GxP Analogy: Systems that create net-new outputs rather than classifying or predicting

  • Outputs are novel, not retrieved

Life sciences examples:

  • Generating molecular structures in drug discovery (R&D, not GxP)
  • Creating synthetic patient data for model training
  • Drafting clinical communications or marketing content

Validation implication: In GxP contexts, generative outputs must be treated as drafts requiring human review and approval. The system assists; it doesn’t author.


7. Agentic AI

The frontier. These systems don’t just respond—they plan, act, and adapt. An agentic system might query multiple data sources, synthesize findings, invoke other tools, and take actions with minimal human intervention. Behavior becomes emergent and situational. Existing validation frameworks don’t cover this.

GxP Analogy: None—this is new territory

  • May invoke other tools, make decisions, and take actions with limited human intervention
  • Behavior is emergent and situational—harder to predict and test

Life sciences examples (largely theoretical for GxP):

  • Autonomous deviation investigation that queries multiple systems, synthesizes findings, and proposes CAPA
  • Multi-step protocol optimization that adjusts parameters based on interim results
  • Lab scheduling agents that coordinate across instruments, personnel, and sample availability

Validation implication: Represents the frontier challenge described in Part 4. Requires fundamentally new validation paradigms: continuous monitoring, rollback protocols, bounded autonomy, extensive human oversight. Most organizations aren’t ready—and shouldn’t attempt this before mastering static AI validation.


How They Relate

This table summarizes the key characteristics that determine validation approach and oversight requirements—and as you move down, determinism decreases, explainability drops, and validation complexity increases.


Same Use Case, Different Implementation

The taxonomy matters because the same business function can be implemented across multiple categories—and the implementation determines your validation approach.

  • PHI detection

    1. Rule-based (regex patterns)
    2. ML (trained classifier)
    3. NLP (contextual language understanding)
  • Document routing

    1. Rule-based (if product type = X, route to Y)
    2. ML (classifier)
    3. NLP/LLM (reads and interprets content)
  • Adverse event signal detection

    1. Rule-based (keyword matching)
    2. ML (pattern recognition)
    3. NLP (narrative analysis)
  • Medical coding (MedDRA)

    1. Rule-based (lookup tables)
    2. NLP (extracting terms from unstructured text)
  • Site enrollment prediction

    1. Rule-based (weighted scoring)
    2. ML (predictive model trained on historical data)

Three vendors could sell you “AI-powered PHI detection.” One uses regex patterns—deterministic, fully auditable, and validatable with traditional methods. One uses a trained classifier—probabilistic, which requires statistical validation. One uses deep learning NLP—outputs confidence scores, internal logic is opaque.

Same use case. Same marketing language. Completely different validation requirements.

When evaluating any AI-enabled system, “what does it do” is the starting question. “How does it do it?” is the one that determines your approach.

From Category to Validation Decision

Knowing what type of AI you’re dealing with answers the questions that actually matter in practice.

Do I need to validate it? Depends on intended use and regulatory impact—but AI type determines validation complexity. A rule-based system embedded in your QMS follows the same validation path you already know. An LLM generating deviation summaries is a different conversation entirely.

How do I validate it? Rule-based: traditional CSV/CSA. Static ML: statistical performance baseline with representative test data. Adaptive or agentic: continuous monitoring infrastructure—which most organizations don’t have yet.

How do I assess risk? Deterministic systems lend themselves to failure mode analysis: if X, then Y. Probabilistic systems require different thinking: performance degradation over time, confidence calibration, behavior at edge cases, consequences of being wrong within acceptable statistical bounds.

Can it modify on its own? Rule-based: no. Static ML: no, until explicitly retrained. Adaptive and agentic: yes—and this is where the concept of “validated state” gets complicated. If the system can change its own behavior based on new data or experience, your validation baseline is a moving target.

When is it no longer validated? For static systems: retraining events, data pipeline changes, version updates from the vendor. Each of these should trigger revalidation assessment. For adaptive systems: potentially any time—which is why continuous monitoring and drift detection exist. You’re not validating a state; you’re validating a range of acceptable behavior and monitoring to confirm the system stays within it.

The taxonomy isn’t academic. It’s the first question you answer because everything else follows from it.


Conclusion: Know What You’re Dealing With

The strategic content in the AI Readiness series—governance, validation adaptation, implementation planning—assumes you can identify what type of AI you’re working with. That’s the prerequisite for right-sizing oversight.

For validation purposes, “AI-powered” is only the starting point. Complexity, adaptability, explainability, and feature intent determine how a system must be validated and governed.


Next in this series: RAG (Retrieval-Augmented Generation)—what it is, why it matters for regulated content, and what to consider from a validation perspective. After that, we’ll go deeper on additional categories: machine learning fundamentals, NLP in life sciences contexts, and other architectural approaches to constraining LLM outputs. The goal is the same throughout: build enough understanding to ask the right questions and scope validation appropriately.