Why we built Forseti on source-anchored RAG rather than generative AI

Why we built Forseti on source-anchored RAG rather than generative AI

Every architectural decision in Forseti traces back to one requirement: every claim must be verifiable against a specific official source. Here is why that requirement ruled out generative AI and made source-anchored RAG the only viable foundation.

7 min read

The decision that shaped everything else

When we started building Forseti, the first question was not which AI model to use. It was what the output needed to be able to prove.

A compliance professional acting on a Forseti alert is making a decision that may affect their firm’s regulatory standing. If the alert is wrong, the consequences are not a wasted afternoon. They are a missed obligation, a gap in preparation, or a compliance position built on an incorrect reading of the regulation. The professional who acted on the alert is accountable for that decision. Forseti is not.

That asymmetry shaped every architectural choice that followed. The system needed to produce output that the professional could verify independently, not output they had to accept on trust. That requirement ruled out generative AI as the primary architecture and made source-anchored retrieval-augmented generation (RAG) the only viable foundation.

This article explains that decision in terms that do not require a technical background. The engineering detail is covered elsewhere. What matters here is the reasoning, because the reasoning is the same logic that should inform how any compliance professional evaluates any AI tool they are considering using for regulatory research.

What generative AI cannot provide

A generative AI tool produces output by sampling from patterns learned during training. It does not retrieve documents and summarise them. It generates text that is consistent with the statistical patterns in its training data. The result is output that reads as authoritative whether or not the underlying information is accurate, current, or correctly scoped.

For compliance research, this creates a specific problem that goes beyond the general concern about AI hallucination. EU financial regulation is precise in ways that matter enormously. The difference between a mandatory requirement and one subject to national competent authority discretion determines whether a firm needs to rebuild a system. The difference between a provision that applies to a specific authorisation type and one that applies generally determines whether a firm is in scope at all. The difference between a finalised regulatory technical standard and a consultation draft determines whether the obligation described actually exists yet.

A generative model cannot reliably make these distinctions because it has no mechanism for knowing which documents it is drawing on, when those documents were published, or whether they represent the final legislative text or an earlier draft. It produces its best approximation of what a correct answer would look like. For compliance use, that is not a useful standard.

The second problem is the knowledge cutoff. Every generative model has a training cutoff, a date beyond which it has no information. EU financial regulation has continued to develop through 2025 and into 2026. MiCA’s implementing regulations, DORA’s supervisory guidance, and the AI Act’s financial services provisions have all evolved in ways that postdate any plausible training cutoff for current generation models. A tool drawing on a training snapshot from late 2024 is already materially behind the current regulatory position.

The third problem is accountability. When a compliance professional needs to demonstrate the basis for a decision, “the AI said so” is not a defensible answer. The standard that applies is the same one that has always applied: here is the source, here is the provision, here is my reading of it. A system that cannot support that standard is not fit for compliance use, regardless of how fluent its output is.

What source-anchored RAG provides instead

Source-anchored RAG works differently at every stage. The distinction is not one of degree. It is architectural.

Forseti ingests regulatory documents directly from official EU sources. Every document is retrieved, not generated. Every document carries its CELEX identifier, its publication date, and its position in the legislative hierarchy: whether it is a regulation, a directive, a delegated act, or an implementing act. The system knows what it has and what it does not have, because its knowledge is defined by what it has retrieved, not by what patterns exist in a training corpus.

When Forseti produces an alert, the AI layer operates on that retrieved, source-linked content. It does not reach outside the retrieved documents. It cannot, because the system prompt that governs the inference call instructs it to draw only on the provided sources and to cite each claim with a reference to the specific document it comes from. The temperature is set to zero, which eliminates the sampling variance that produces different outputs from the same input on different runs.

The result is output in which every claim traces back to a specific document with a specific CELEX identifier. The compliance professional reading the alert can verify each claim against the original. They can see which version of the regulation Forseti is drawing on. They can check whether the provision cited says what Forseti says it says. That is the standard that compliance use requires, and it is only achievable through retrieval from verified official sources, not through generative inference from training data.

Why EUR-Lex and CELEX matter

The choice of source is as important as the choice of architecture. Forseti is built on EUR-Lex, the official database of EU law, and indexes every document with its CELEX identifier.

CELEX identifiers are the unique reference system for EU legal instruments. Every regulation, directive, delegated act, and implementing act has one. A CELEX identifier tells you the sector, the year, the document type, and the sequential number within that type. It is the standard citation format used by EU institutions, national courts, and legal practitioners across the Union.

When Forseti cites a provision, it cites it with a CELEX identifier. That identifier is retrievable. Anyone can go to EUR-Lex, enter the CELEX number, and read the document Forseti is drawing on. The claim is not just sourced. It is independently verifiable by anyone with an internet connection.

This matters beyond the narrow question of verification. It is the signal that the information comes from the authoritative source, not from a summary of a summary of a commentary on a draft. For an explanation of how CELEX identifiers work and what they tell you about a document, see how to read a CELEX number.

The continuous monitoring requirement

Source anchoring solves the accuracy problem for documents that have been retrieved. It does not solve the currency problem unless the retrieval is continuous.

EU financial regulation does not publish on a schedule that allows for periodic updates. Regulatory technical standards are finalised when they are finalised. Supervisory guidance is issued when supervisory authorities issue it. Implementing acts appear when the Commission adopts them. A system that ingests from official sources on a weekly or monthly cycle will miss developments that occur between cycles, and in an active regulatory period, the gap between cycles can contain material changes.

Forseti monitors EUR-Lex and the publications of the European Supervisory Authorities continuously. When a new document is published, it is ingested, indexed, and made available to the retrieval layer. The alert a compliance professional receives reflects the current state of the official record, not a snapshot from the last update cycle.

This is what distinguishes horizon scanning from compliance management. Compliance management proves that a firm meets current obligations. Horizon scanning tracks what is coming before it becomes a current obligation. The lead time that makes proactive preparation possible only exists if the monitoring is continuous and the source anchoring is current. For a fuller treatment of this distinction, see what is regulatory horizon scanning and why compliance teams need it.

The personalisation layer

Source-anchored retrieval from a comprehensive official corpus produces a large volume of relevant documents. The value Forseti adds above that retrieval layer is relevance filtering calibrated to a specific firm profile.

Not every regulatory development is relevant to every firm. A CASP authorisation requirement under MiCA is relevant to a crypto exchange and not to a fund manager. A liquidity management tool requirement under AIFMD II is relevant to an alternative investment fund and not to a payment institution. Surfacing every development to every user without filtering creates noise that reduces the value of the signal.

Forseti’s personalisation layer uses the firm profile, including authorisation type, business model, and jurisdictional scope, to filter the retrieved documents and prioritise the developments that carry genuine compliance implications for that specific firm. The AI layer then produces impact analysis calibrated to that profile: not what the regulation requires in general, but what it requires of a firm like yours.

The personalisation layer operates on the same retrieved, source-linked documents as the rest of the system. It does not introduce a separate generative step that could produce claims untethered from official sources. The output is personalised, but the sourcing discipline is maintained throughout.

What this means for a compliance professional

The practical implication of this architecture is that Forseti alerts are a starting point for informed professional judgment, not a substitute for it.

The alert tells you a development has occurred, what it contains, and what its likely implications are for your firm. It cites the specific source documents. You can verify the claims. You can read the original. You can apply your own judgment about how the provision interacts with your specific situation, your existing controls, and the guidance your national competent authority has issued.

That is the appropriate relationship between a regulatory intelligence system and the professional using it. The system handles the monitoring burden: continuous coverage of a large and fast-moving regulatory corpus, relevance filtering against your profile, plain-language synthesis of complex legislative text. The professional handles the judgment: whether the implication identified applies in their specific circumstances, what action is warranted, and how to document the decision.

A system that tried to make that judgment for the professional would be overstepping what any automated system can do reliably in a compliance context. A system that left the monitoring burden entirely with the professional would be failing to address the problem that makes regulatory intelligence valuable in the first place. The architecture is designed to do the first category of work reliably, so the professional can focus on the second.

Forseti monitors EU financial regulation continuously and delivers personalised impact analysis anchored to verified official sources. Start for free.

For the engineering architecture behind this approach, see why deterministic RAG beats generative AI for research. For the case against generic AI tools in compliance research, see why generic AI tools are unreliable for regulatory compliance research. For an overview of what Forseti does and who it is for, see introducing Forseti.

Stay in the know!

Subscribe for news updates.

Your EU buyer has asked you to complete an EcoVadis assessment. This guide explains what EcoVadis is, what the process involves, what a score actually demonstrates, and whether completing it genuinely helps your compliance position.