Safeguard AI Hallucinations: Design a Human Control Layer

A human control layer is the deliberately engineered process that sits between an AI system’s output and its legally effective use: technical guardrails plus a genuine human decision that can actually change the outcome. It is the only robust answer to an uncomfortable truth — hallucinations are not merely a tech problem, they are a liability problem.

Anyone deploying AI in a company knows the vendors’ promise: “RAG reduces hallucinations.” True — in part. What the vendors rarely add: if the AI does hallucinate and causes harm, they are not liable, you are. This article shows how to close that gap — with a control layer that interlocks technical safeguards and legal obligations from a single source.

Note: This article provides general orientation and is not case-specific legal advice. The precise assessment depends on your system and deployment context.

Why Hallucinations Are a Liability Problem for Legally Relevant Output

An AI hallucination is an output that sounds plausible but is factually wrong or entirely fabricated. With legally relevant output — briefs, contract clauses, creditworthiness decisions, customer statements — an error becomes a binding declaration with legal consequences. That is precisely where the residual risk is most expensive.

What a Hallucination Is (in brief)

A rough distinction is drawn between intrinsic hallucinations (the model contradicts the source it was given) and extrinsic ones (the model invents something that appears in no source and cannot be verified). In both cases the language model lacks any real concept of truth: it optimizes for the most probable next word, not for correctness.

The Stress Test: Documented Cases

This is not theory. A widely cited study by Stanford’s RegLab and HAI research groups found hallucination rates between 69% (ChatGPT 3.5) and 88% (Llama 2) on concrete, verifiable legal questions posed to general-purpose language models — the models invented court decisions and even reinforced mistaken legal assumptions (Stanford Law, 2024).

Bar chart: hallucination rates of 69 to 88 percent on legal questions posed to general-purpose language models (Stanford 2024)

Even specialized legal-research tools were not error-free in the Stanford study — with general-purpose models, more than every second concrete legal answer was wrong on average. Source: Stanford RegLab/HAI, 2024.

Three cases illustrate the legal consequence:

AG Köln (Cologne Local Court), family-law matter, case no. 312 F 130/25 (finding of 2 July 2025): In an attorney’s brief, every source cited from page eight onward was fabricated — non-existent court decisions, misattributed commentary passages, fictitious articles. The court treated this as a breach of the attorney’s duty of truthfulness (sec. 43a(3) BRAO, the German Federal Lawyers’ Act) and made clear: legal responsibility is not delegable; AI output must be treated as one’s own statements (LTO/beck-aktuell, 2025).
United States, Mata v. Avianca (2023): An attorney argued using precedents invented by ChatGPT and was sanctioned by the court (heise, 2023).
Moffatt v. Air Canada, 2024 BCCRT 149: A Canadian tribunal ruled that the company is liable for the false information given by its chatbot — the bot is not a separate legal person, and the information bound the company (American Bar Association, 2024).

Who Is Liable When an AI Hallucinates?

As a rule, it is the deploying company that is liable, not the AI vendor. Whoever uses an AI result externally makes it their own declaration. The starting points are breach of contractual duty (sec. 280 BGB, the German Civil Code) and tort liability (sec. 823 BGB); the vendor is liable only in exceptional cases. Responsibility can be shifted technically, but it cannot be delegated away legally.

How this plays out specifically for self-generated code is explored in depth in our article on liability for AI-generated code.

Three Models of Human Oversight: HITL, HOTL, and Human-in-Command

Before building a control layer, you need to know which form of oversight you actually need. There are three basic models — they differ in when the human intervenes and how much they can still change.

The three models of human oversight at a glance — Human-in-the-Loop (the human approves before the action), Human-on-the-Loop (the human monitors an autonomous action), and Human-in-Command (strategic ultimate responsibility):

Model	Approval logic	Suitable risk level	Example
Human-in-the-Loop (HITL)	Human first, then action — the AI proposes, the human approves	High, irreversible, or legally sensitive output	Contract clause, brief, creditworthiness decision
Human-on-the-Loop (HOTL)	The action runs autonomously; the human monitors and can intervene/escalate	Medium risks, high volume	Categorizing support tickets, pre-triage
Human-in-Command	Strategic ultimate responsibility over the system as a whole	Governance level, all systems	Defining which tasks the AI is allowed to take on at all

The decisive point — and exactly where many implementations fail: Real control is not the same as theatrical control. A person who routinely “waves through” a hundred AI proposals an hour without being able to examine them is not oversight, but a rubber stamp. Oversight must be able to actually influence the outcome — otherwise it is worthless, both technically and legally.

The control layer is not just good practice — for certain systems it is mandatory. Two provisions interlock.

Art. 14 EU AI Act — Human Oversight for High-Risk Systems

For high-risk AI systems, Art. 14 of the EU AI Act requires effective human oversight. The person exercising oversight must understand and monitor the system sufficiently, be aware of the typical danger of automation bias (excessive trust in the output), interpret the output correctly, and be able to disregard it or stop the system at any time. For certain high-risk biometric systems, a four-eyes principle (two persons) is provided for.

As of May 2026 — important: On 7 May 2026, the Council, Parliament, and Commission reached a provisional agreement on the “Digital Omnibus on AI.” Under it, the obligations for Annex III high-risk systems are postponed from August 2026 to 2 December 2027. Formal adoption and publication in the Official Journal were still pending at the time of writing (Gibson Dunn / Covington Inside Privacy, May 2026). The substantive requirement for human oversight remains — only the deadline shifts. Check the current status separately for your system; this state of flux is not a done deal.

For an overview of risk classes and deadlines: EU AI Act for companies.

Art. 22 GDPR prohibits purely automated individual decisions with legal or similarly significant effect and gives data subjects a right to human intervention. The legal core that many tech vendors overlook stems from the SCHUFA ruling of the CJEU (7 December 2023, case no. C-634/21): the scoring itself can already constitute an automated individual decision if a third party follows it to a “significant” degree.

The decisive consequence follows from this: a human who merely formally nods through an AI recommendation — without their own decision-making authority and without the ability to actually examine it — does not remove the automated character. Such sham control remains, in case of doubt, “solely automated” within the meaning of Art. 22 (cf. EY/CMS on the reach of the ruling). This is exactly the difference between a control layer and a rubber stamp.

How They Interact

A well-built control layer must satisfy both provisions at once: the oversight requirements of Art. 14 EU AI Act and the requirement for genuine human intervention under Art. 22 GDPR. Addressing only the technology (“we have RAG”) or only the law (“a human looks at it”) does not close the gap. Interlocking both worlds is the real design task.

Concretely, “genuine intervention” means — as a specification against which oversight must be measured: the reviewer has (1) the professional competence to judge the output, (2) the time and the information to actually examine it (sources, uncertainty made visible), (3) the organizational authority to decide differently, and (4) no incentive to reflexively wave it through. If even one of these four points is missing, the control is, in case of doubt, a sham — technically present, legally worthless.

Designing a Control Layer: The Concrete Blueprint

Now to the blueprint. Four steps — from risk triage to an audit-proof audit trail.

Step 1 — Risk Triage: What Needs Which Oversight?

Not every output needs human approval. The key is an honest triage along reversibility and legal consequence: the more irreversible and legally consequential an output, the stricter the oversight model.

The following triage maps typical output types to their oversight model — from fully automated for reversible drafts to HITL with a four-eyes principle for legally highly sensitive output:

Output type	Reversibility	Legal consequence	Oversight model
Internal brainstorming draft	High	None	Fully automated
Support-ticket pre-sorting	High	Minor	HOTL
Customer statement with binding effect	Low	High	HITL
Contract clause, brief, creditworthiness decision	Low	Very high	HITL + four-eyes if needed

The rule of thumb: Irreversible, legally sensitive, or reputation-/safety-critical → never purely autonomous.

Step 2 — Grounding & Guardrails (the technical foundation)

Technology reduces the volume that needs to reach a human at all. Three building blocks:

Grounding / RAG (Retrieval-Augmented Generation): The model answers only on the basis of retrieved, reliable sources — and cites them. This measurably reduces hallucinations, but not to zero: depending on implementation and domain, studies show reductions of roughly 18% to over 40% (review papers, 2025, MDPI/PMC). Anyone promising more is overstating it.
Input/output guardrails: Rules and checking models that block impermissible inputs and run outputs against format, fact, and policy checks before they move on.
Confidence gating: If a confidence value or source coverage falls below a defined threshold, the output is automatically escalated to a human instead of being delivered.

How grounding is set up in practice is covered in our article on RAG and knowledge management.

Step 3 — Approval UX Against Automation Bias

This is where it is decided whether control is real or theatrical. The interface must force the human to think, not to click through:

Make uncertainty visible: Display confidence values and low source coverage explicitly.
Show sources directly, so the reviewer can read along instead of trusting blindly.
No default acceptance: no pre-selected “Approve” button; a deliberate “Reject/Edit” path belongs there on equal footing.
Friction where it protects: For high-risk output, a brief obligation for the reviewer to give reasons is sensible — it interrupts the reflex of waving things through.

Step 4 — Audit Trail / Logging by Design

Whoever exercised control must also be able to prove it. An audit-proof, tamper-resistant log records: which output, which sources, which confidence value, who approved, changed, or rejected what and when — and why. This simultaneously satisfies the logging logic of the EU AI Act and, in the event of a dispute, provides evidence of documented diligence. Ideally it is built in from the start following the principle of privacy/compliance by design (Art. 25 GDPR), not bolted on afterward.

More on this: GDPR-compliant software development and privacy by design.

Naming the Residual Risk Honestly

Zero hallucinations are an illusion. No stack — RAG, guardrails, and HITL combined — brings the rate to exactly zero. The goal is not perfection, but a controlled, documented residual risk: demonstrably appropriate diligence rather than untenable guarantees. This very honesty is legally more valuable than a promise that does not hold up when harm occurs.

FAQ

Who is liable when an AI hallucinates and harm results?

As a rule, the deploying company, not the AI vendor. Whoever uses an AI output externally makes it their own declaration — the starting points are sec. 280 BGB (breach of duty) and sec. 823 BGB (tort). The vendor is liable only in exceptional cases.

What is the difference between Human-in-the-Loop and Human-on-the-Loop?

With Human-in-the-Loop (HITL), a human approves every output before it takes effect — suitable for high, irreversible risks. With Human-on-the-Loop (HOTL), the AI runs autonomously while the human monitors and intervenes when needed — suitable for medium risks at high volume.

Is it enough for a human to formally nod through the AI decision?

No. Under the CJEU’s SCHUFA ruling (C-634/21), a decision remains “solely automated” within the meaning of Art. 22 GDPR if the human merely formally consents, without their own decision-making authority and without any real ability to review it. Real control must be able to actually influence the outcome.

Can hallucinations be completely prevented technically?

No. RAG, guardrails, and human oversight reduce the risk significantly, but not to zero. What is realistic is a controlled residual risk with documented diligence — not a zero guarantee. That is exactly why legally relevant output requires a human control layer.

How much does RAG really reduce hallucinations?

Depending on implementation and application area, studies show reductions of roughly 18% to over 40% compared with the model without source grounding. RAG is a strong but not a complete safeguard — it must be combined with guardrails and human approval.

As of May 2026. This article will be updated upon material legal changes — in particular regarding the AI Act omnibus procedure.

Author: Leon Lotz, business lawyer and software developer (MusketierSoftware). Law and code from a single source.

Want your AI control layer designed to be legally sound? In a free initial consultation, we clarify which of your AI outputs need human approval — and how to satisfy both Art. 14 EU AI Act and Art. 22 GDPR at the same time. To the AI consulting.

Sources — as of 18.05.2026