Why deterministic RAG beats generative AI for research

The problem with “intelligent” summarisation

Ask a frontier language model to summarise a market landscape. It will produce something plausible, well-structured, and confident. It will also, with no warning, conflate two companies, misattribute a statistic, or describe a product feature that was deprecated eighteen months ago.

This is not a bug. It is the intended behaviour of a probabilistic text generator. The model is optimised to produce the most likely next token given its training distribution. It is not optimised to produce the most accurate claim given a specific source document. For creative writing, marketing copy, or brainstorming, that trade-off is acceptable. For research that informs a pricing decision, a market entry strategy, or a regulatory submission, it is not.

For research that informs a pricing decision, a market entry strategy, or a regulatory submission, it is not. Citium Tech’s approach is built on a different set of guarantees. This article explains the engineering that makes those guarantees possible.

Probabilistic versus deterministic: the actual difference

“Probabilistic” and “deterministic” are frequently misused when people talk about AI. The distinction worth drawing here is not philosophical. It is operational.

A probabilistic system produces outputs that vary based on sampling from a learned distribution. Run the same prompt twice and you get two different outputs. The model cannot tell you which source a claim came from because the claim did not come from any single source, it emerged from the weighted combination of everything in the training corpus. Auditing the output is structurally impossible.

A deterministic system produces the same output given the same inputs, and every claim in the output is traceable to a specific input. The inference step is still performed by a language model, but the model is constrained: it can only draw on the documents provided to it, and the temperature is set to zero to eliminate sampling variance. Auditing the output means checking the source documents, a task a human or a downstream system can perform.

The practical implication: a probabilistic summarisation might be faster to build. A deterministic one is the only kind you can stand behind in a client deliverable.

The Citium methodology: four stages

The architecture follows four stages. Each stage has a defined input, a defined output, and a defined failure mode. No stage delegates responsibility to the model’s imagination.

[Raw Sources]
     │
     ▼
┌─────────────────────────────────────────────┐
│  Stage 1: INGEST                            │
│  • Fetch from APIs, scrapers, file stores   │
│  • Normalise encoding, strip noise          │
│  • Assign source_id, timestamp, origin_url  │
└─────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────┐
│  Stage 2: INDEX                             │
│  • Chunk documents (512-token windows)      │
│  • Embed with text-embedding-3-large        │
│  • Store vectors + metadata in pgvector     │
└─────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────┐
│  Stage 3: DETERMINISTIC RETRIEVE            │
│  • Semantic search scoped to source filters │
│  • Hard top-k cap (k ≤ 8 chunks)            │
│  • Chunks passed as explicit context window │
└─────────────────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────────────────┐
│  Stage 4: CONTEXT-ANCHORED SUMMARISATION    │
│  • temperature=0, top_p=1                   │
│  • System prompt: "Only use provided text"  │
│  • Output includes citation indexes         │
└─────────────────────────────────────────────┘
     │
     ▼
[Traceable Research Output + Audit Trail]

Stage 3 is where most “AI research tools” fail. They retrieve broadly, inject the model’s own knowledge, and produce confident hallucinations. The Citium retrieval layer uses query-based semantic filtering combined with hard metadata constraints, including source type, date range, and domain, so the context window contains only documents the client has approved as inputs. The model cannot reach outside it.

One failure mode that metadata constraints alone cannot solve is the semantic mismatch between how users phrase queries and how formal documents are written. In formal or legal text, the wrong chunks often score highest not because retrieval is broken, but because obligation articles embed differently from natural language questions. Why your RAG retrieves the wrong chunks covers that problem and the technique we use to address it.

For teams thinking about how to keep those approved sources fresh and reliable over time, building self-healing data pipelines covers the infrastructure decisions that sit upstream of retrieval.

The audit trail: what it looks like in practice

Every summarisation call produces two outputs: the research summary and a structured audit record. The audit record is generated in the same inference call, not appended after the fact.

// types/research.ts

interface SourceChunk {
  chunkId: string;         // e.g. "chunk_7f3a2b"
  sourceId: string;        // e.g. "src_reddit_2024-11-03_abc"
  originUrl: string;
  retrievedAt: string;     // ISO 8601
  tokenRange: [number, number];
  embeddingModel: string;  // e.g. "text-embedding-3-large"
  similarityScore: number;
}

interface ResearchOutput {
  summary: string;
  citations: {
    claimIndex: number;
    chunkIds: string[];    // maps each sentence to source chunks
  }[];
  auditTrail: {
    queryText: string;
    retrievedChunks: SourceChunk[];
    inferenceModel: string;   // e.g. "claude-sonnet-4-20250514"
    inferenceTemperature: number; // always 0
    processingTimestamp: string;
    outputHash: string;      // SHA-256 of summary + citations
  };
}

// services/rag.service.ts (simplified)

async function generateDeterministicSummary(
  query: string,
  sourceFilters: SourceFilter,
): Promise<ResearchOutput> {

  // Stage 3: Retrieve with hard constraints
  const chunks = await vectorStore.similaritySearch(query, {
    k: 8,
    filter: sourceFilters,
  });

  // Build an explicitly labelled context block
  const contextBlock = chunks
    .map((c, i) => `[SOURCE ${i}] ${c.pageContent}`)
    .join('\n\n');

  // Stage 4: Zero-temperature inference
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    temperature: 0,
    system: `You are a research analyst. Summarise only using the provided sources.
             For each factual claim, append a citation in the format [SOURCE N].
             Never introduce information not present in the sources.`,
    messages: [
      { role: 'user', content: `Sources:\n${contextBlock}\n\nQuery: ${query}` },
    ],
    max_tokens: 1000,
  });

  const summary = response.content[0].text;
  const outputHash = sha256(summary + JSON.stringify(chunks.map(c => c.metadata.chunkId)));

  return {
    summary,
    citations: parseCitations(summary, chunks),
    auditTrail: {
      queryText: query,
      retrievedChunks: chunks.map(toSourceChunk),
      inferenceModel: 'claude-sonnet-4-20250514',
      inferenceTemperature: 0,
      processingTimestamp: new Date().toISOString(),
      outputHash,
    },
  };
}

The outputHash matters. It allows a client to verify, months after a report was generated, that the summary they received was produced from exactly the chunks recorded in the audit trail, and that neither the summary nor the source list has been modified since. This is the engineering foundation of the Traceability guarantee described in our decoupled architecture article.

What the diagram shows

Imagine a two-column flow. On the left: a stream of raw inputs, a Reddit thread, a regulatory PDF, a survey export. Each is processed through the Ingest and Index stages and stored as vector embeddings with source metadata. On the right: a research query enters the Retrieve stage, pulls a bounded set of labelled chunks, and passes them into a constrained inference call. The output flows down into two parallel boxes: the Research Summary delivered to the analyst, and the Audit Record written to the database. A dotted line connects every claim in the summary to a chunk ID in the audit record. Nothing in the summary exists without a corresponding line.

This is not a diagram about AI. It is a diagram about accountability.

Why temperature=0 is not enough on its own

Setting temperature to zero eliminates sampling variance but does not prevent the model from drawing on its parametric knowledge, the information baked into its weights during training. A zero-temperature call to an unconstrained model will still produce confident, unreferenced claims.

The system prompt is the second constraint. It does not ask the model to prefer the sources, it instructs it to use only the provided sources and to mark every claim with a citation index. Combined with a retrieval layer that supplies only approved documents, this creates a two-lock system: the model cannot invent, and the context cannot contain unapproved material.

The third constraint is evaluation. Every output is passed through a citation-check function that verifies each [SOURCE N] reference maps to an actual retrieved chunk. Outputs that fail the check are flagged for human review rather than delivered.

The business case is not about AI

The deterministic RAG architecture described here is not primarily an AI story. It is a data quality story. The clients who care about it are not asking “how smart is your AI?” They are asking “can you show me exactly where this finding came from, and can you prove the data hasn’t changed since you retrieved it?”

Those are questions that generative AI, by design, cannot answer. They are questions that a properly engineered retrieval pipeline, with audit trails and cryptographic output hashing, can answer every time. The remaining gap is signal freshness: keeping the sources feeding that pipeline current between research cycles. That is what Mimir does, by continuously monitoring unprompted conversation across forums and communities. If your research process relies on periodic projects rather than continuous signal, what is continuous market research explains why that gap matters and what closing it looks like in practice.

That is the difference between building with AI and building a reliable research system.

If you want market intelligence that is traceable, continuously monitored, and anchored to real unprompted signal, start for free.