EPISTEMIC CONTRACT

Grounding Methodology — How Stratensight Keeps LLM Outputs Grounded in the Dataset

A four-layer epistemic contract powered by 9 deterministic rules and two Claude models.

Stratensight is built on a deterministic, explicable, reproducible principle: every interpretive output the platform shows is the result of a layered audit, not a free-form LLM completion. This page documents the four layers that enforce that principle — the same layers that run on every analysis on every plan, with no gating.

Acronyms (C4, C5, GROUND-2, GROUND-5, GROUND-6, Option B) are kept as internal references for engineers reading the codebase; each is paired with a human-readable label on first occurrence and consolidated in the glossary at the end.

FOUR LAYERS

The contract, layer by layer

Each layer addresses a distinct failure mode of LLM-generated narrative. The layers stack: an output that passes Layer 1 still has to pass Layers 2, 3, and 4 before it reaches the user. None is optional.

LAYER 01

Critical Reader™ — Signal Integrity™ — 9 deterministic rules + 1 LLM auditor

WHAT IT DOES

Nine deterministic checks (CE1, CE2, CE3, CE4, CE5, S1, M_CAGR_LAST_YEAR_ARTIFACT, L_ACADEMIC_DOMINANCE, D_ABSTRACT_FILL_CRITICAL) audit every analysis BEFORE any narrative is generated. A second pass by a Claude Sonnet 4.6 LLM auditor surfaces contextual issues the rules cannot express. Critical issues can downgrade or block a verdict.

WHY IT MATTERS

No silent contradictions between scores and verdict. The audit runs on every analysis, every plan, with no gating — because scientific credibility cannot be fragmented.

SOURCE — backend/app/services/critical_reader.py:7-8

LAYER 02

C4 + C5 — Hedging Validator — Calibrated language layer (pre-generation directive + post-generation validator)

WHAT IT DOES

C4 Evidence-Certainty Directive (pre-generation): every LLM system prompt receives a language register conditioned on evidence_certainty — VERY_LOW, LOW, MODERATE, HIGH. When certainty is LOW or VERY_LOW, the LLM MUST use hedging vocabulary and MUST NOT use absolute language. C5 Hedge Validator (post-generation): scans the LLM output. If validation fails, retry once at temperature 0.0; on second failure, deterministic pre-validated hedged template fallback.

WHY IT MATTERS

Without this layer, an Executive Summary can read affirmatively ("the evidence demonstrates...") even when the badge shows Conditional / Low certainty. C4+C5 closes the asymmetry.

SOURCE — backend/app/services/_llm_hedging.py + _hedge_validator.py

LAYER 03

GROUND-2 — Grounding Validator — Anti-hallucination whitelist enforcement layer

WHAT IT DOES

Per-analysis whitelist of grounded facts (entities, numbers, years, geographies) passed via GroundingContext. After LLM generation, validate_grounding rejects any output that introduces a fact absent from the whitelist. Wrapper enforce_grounding_with_retry retries at temperature 0.0 then falls back to a deterministic template. Accent-insensitive matching (NFKD), word- boundary regex.

WHY IT MATTERS

An LLM that invents an assignee name, a CPC code, or a citation count silently undermines every downstream interpretation. GROUND-2 cuts hallucination at validation, not at trust.

SOURCE — backend/app/services/_grounding_validator.py

LAYER 04

GROUND-5 — Refusal Rule — Three-level abstention protocol (soft / strict / narrative-specific)

WHAT IT DOES

Explicit REFUSAL RULE injected into LLM system prompts so the model refuses with a calibrated phrase rather than inventing facts when the dataset variables do not support an answer. Three levels: soft (used by personas, narrative_engine), strict (used by chat advisor — anti false-positive FR phrasing), and narrative-specific (used by 11 narrative_sections functions and persona_engine). When detected, both validators bypass normal scrubbing.

WHY IT MATTERS

Hallucination prevention is not enough — the LLM must have a graceful exit when the data does not support the question. GROUND-5 makes refusal a first-class output, not a failure.

SOURCE — backend/app/services/_llm_refusal.py

BRIDGE TO LAYER C

evidence_certainty — not just a badge, a verdict gate

Beyond labeling and hedging, evidence_certainty drives the Layer C Tier gate (Phase 5.3): the verdict surface adapts categorically to the certainty level. HIGH preserves the raw verdict (TIER_HIGH); MODERATE or LOW maps to a directional signal (TIER_MODERATE — INVEST → OPPORTUNITY_SIGNAL, MONITOR → MIXED_SIGNAL, EXPLORE → WEAK_SIGNAL, AVOID → NEGATIVE_SIGNAL); VERY_LOW withholds the verdict entirely and replaces it with INSUFFICIENT_DATA (TIER_LOW). See the methodology page Layer C section for the complete mapping.

SOURCE TAGGING

GROUND-6 — Provenance flagging on every LLM-readable fact

Inside the LLM prompt, every fact is tagged with its provenance. The model cannot accidentally treat a derived metric as a primary observation, nor invent a fact under the cover of a grounded one.

[grounded]

Fact present in the analysis dataset whitelist (entity, number, year, geography). Safe to assert affirmatively under certainty rules.

[derived]

Fact computed from grounded facts via deterministic transformation (e.g. CAGR from yearly counts). Must inherit the grounding of its inputs.

[absent]

Fact NOT in the whitelist and NOT derivable. The LLM must either refuse (GROUND-5) or hedge as preliminary signal (C4) — never assert as evidence.

SOURCE — backend/app/services/prompt_builder.py (SOURCE_TAG_GROUNDED / _DERIVED / _ABSENT)

UX CALIBRATION

Option B — Certainty × language consistency across the user journey

The four layers above keep individual LLM outputs grounded. Option B extends the same epistemic discipline to the deterministic templates surrounding them — decision narrative, key insight, and executive outlook — so the user reads a coherent register from badge to recommendation.

Three calibration targets

Decision Engine narrative — four templates conditioned by certainty (LOW / MODERATE / HIGH) × language (EN / FR), driven by a grade computed BEFORE narrative generation.
Key Insight — certainty and language propagated through narrative_sections so the headline interpretation never outruns the evidence.
Executive outlook (frontend) — deterministic copy in text.ts reflects the same certainty register across the explorer and analysis pages.

SOURCE — backend/app/services/_ux_calibration.py + decision_engine.py

TWO AI MODELS

The stack — auditor and narrator

Stratensight uses two Claude models, each with a tightly scoped role. AI generates text only — never scores, never numbers. Scoring is deterministic Python, always.

Claude Sonnet 4.6

ROLE

Auditor (Signal Integrity™ Layer 2)

SCOPE

Reads the full analysis context (scores, metadata, source mode) and may surface up to 8 additional issues that the deterministic rules cannot express. Hard guardrails: allowed_values whitelist, ±0.5 float tolerance, 15-second timeout, 2048 max output tokens. Never invents a fact, never re-scores — flags only.

Claude Haiku

ROLE

Narrative generation (personas, executive summaries, clusters, Q&A, narrative_sections)

SCOPE

Directional label for MONITOR under TIER_MODERATE (MODERATE or LOW certainty)

Tier output

WEAK_SIGNAL

Directional label for EXPLORE under TIER_MODERATE (MODERATE or LOW certainty)

Tier output

NEGATIVE_SIGNAL

Directional label for AVOID under TIER_MODERATE (MODERATE or LOW certainty)

Tier output

INSUFFICIENT_DATA

Verdict withheld under TIER_LOW (VERY_LOW certainty) — no signal can be defended

Sibling: /methodology/grade — how Stratensight rates evidence certainty and recommendation strength.
Parent: /methodology — the full scientific methodology behind every Stratensight verdict.

Stratensight provides patent intelligence signals, not legal opinions or freedom-to-operate assessments. Not a substitute for IP counsel.