Architecture Deep Dive — Seven Layers, Zero Shortcuts

Design principles

Ri.NET was not architected to win benchmarks. It was architected to not lie. Every design choice in the following seven layers flows from one invariant: no layer accepts as input what the layer below cannot prove. The consequence is a system that is simultaneously slower than alternatives at the lowest level and faster than alternatives at the highest, because nothing above the ingestion substrate re-validates what the substrate already guaranteed.

Five design principles shape the architecture:

I
Provenance over performance.Every output carries a verifiable chain of custody. When a query returns an answer, the layers below can reconstruct the fifteen documents, three transforms, and two entity resolutions that produced it. This costs roughly 12% in raw throughput. It is the reason the system is usable in regulatory contexts.
II
Temporal by default, timeless never.Nothing in the graph is "true now." Everything is "true during interval [t1, t2]." A query about 2019 returns the 2019 world. A query without a time anchor returns the present world but explicitly labels itself as temporally unconstrained. The absence of a temporal qualifier is itself a qualifier.
III
Resolution is reversible.When the entity engine merges two records into a single canonical identity, it retains the merge proof. If new evidence contradicts the merge, the engine unmerges, recomputes dependents, and rebuilds affected graph slices. The system does not accumulate identity errors.
IV
Agents execute, humans decide.Six hundred and sixty-six autonomous agents operate the platform. Not one of them makes a judgment call. They ingest, they normalize, they flag, they escalate. Decisions remain human. This is a deliberate architectural choice, not a limitation of current AI capability.
V
Zero-trust internal fabric.No service inside Ri.NET trusts any other service by default. Every internal call carries a cryptographic assertion of origin, scope, and intent. Lateral movement is architecturally impossible, not merely discouraged. The cost is approximately 8% end-to-end latency overhead. The benefit is that a compromise of any single layer does not propagate.

System topology

The full computational topology, rendered below, shows every layer, every data store, every agent class, and the critical pathways between them. Solid arrows indicate primary data flow. Dashed arrows indicate audit and telemetry feedback.

Ri.NET system topology — data flow (solid) and audit feedback (dashed)

01

Ingestion Substrate

Bottom of the stack

The ingestion layer runs continuously against more than forty sovereign data sources. It is the only layer in Ri.NET that touches the outside world. Everything above it assumes the outside world is, for practical purposes, a lie — and operates accordingly.

Each source is owned by a dedicated collector agent. The collector knows the source's API contract, pagination quirks, rate limits, authentication scheme, encoding anomalies, and failure modes. It does not assume stability. It assumes that any request can fail, any response can be malformed, and any source can silently change its schema on a Tuesday morning. The system survives all three.

Active sources

40+

sovereign endpoints

Records/day

~180K

validated + persisted

Peak throughput

14.8K

records per minute

Retry policy

3 × exp

then human escalation

Source change detection

< 1h

schema drift alarm

Provenance anchoring

100%

every ingested row

Ingestion workflow

1
Source polling or webhook subscription.Each collector runs on its own schedule. High-frequency sources (news, procurement announcements) are webhook-driven. Low-frequency sources (statute amendments, annual financial filings) run on cron.
2
Raw persistence before parsing.The raw response is persisted before any parsing attempt. If parsing fails, the raw bytes remain available for inspection and retry. This is non-negotiable.
3
Source-specific parser.Collector-owned parser transforms raw response into a canonical ingestion event. Parsers are individually tested, versioned, and rollbackable.
4
Checksum + provenance record.Every ingestion event receives a deterministic checksum, a source URL, a timestamp, a parser version, and a collector identity. This bundle is the provenance anchor.
5
Handoff to Normalization (Layer 02).Validated events are pushed onto a bounded queue consumed by the normalization layer. If Layer 02 is slow, ingestion backpressures rather than dropping. Nothing is ever lost silently.
6
Failure escalation.If a collector fails three retries with exponential backoff, the failure is logged, a Telegram alert fires, and the source is marked as degraded. A human decides whether to patch the parser or wait for the source to recover.

Technical invariant: A collector that cannot parse a response does not guess. It persists the raw bytes, flags the failure with full context, and stops. Silent data corruption at the ingestion layer is the single most expensive class of bug in data platforms. Ri.NET architecturally cannot produce it.

02

Normalization & Ontology

Noise to substance

Raw records are unusable. A procurement contract from 2018 reports values in kuna. The same contract referenced from a 2024 statute uses euro. A corporate name appears as "d.o.o.," "d. o. o.," "DOO," or omitted entirely. Dates arrive as ISO-8601, US format, European format, and — in one memorable source — as Croatian text ("petnaesti svibnja dvije tisuće dvadeset i druge"). This layer harmonizes all of it.

Normalization operates through forty-seven independent transformation modules. Each module owns one narrow class of transformation: currency conversion (including the 2023 Croatian HRK → EUR transition applied retroactively across all pre-2023 financial records), company name canonicalization, date parsing, address geocoding, OIB checksum validation, encoding repair for historical records stored in CP1250 before UTF-8 became standard. Modules compose. The pipeline is explicit.

Transform modules

47

independent units

Pre-2023 HRK→EUR

100%

1.86M records

Encoding repaired

15+ tbl

CP1250 → UTF-8

Schema drift events caught

340+

in Q1 2026 alone

The ontology discipline

Ontology in Ri.NET is not imposed. It is discovered. Entity types, relationship types, and constraints are derived from the actual statistical distribution of ingested data, then validated against annotations from domain experts, then frozen as the canonical schema. Nothing moves into the graph until the ontology has mapped it to a type the query layer already understands.

Operational note: When a new source produces a field that does not fit any existing ontology slot, normalization does not invent a new slot. It flags the field for human review. Schema evolution is deliberate. Schema sprawl is not.

03

Entity Resolution Engine

Reversible identity

The same company appears in a procurement record as "Tvornica XYZ d.o.o.," in a court register as "TVORNICA XYZ D.O.O.," in a financial statement as "Tvornica XYZ," in a news article as "Tvornica," in an older ownership disclosure as "Tvornica Y. i sinovi (prije: Tvornica XYZ)," and in three places where the OIB is misspelled by one digit. Entity resolution decides whether these six references describe one entity, two entities, or something in between.

Resolved entities

1.8M

companies + institutions

OIB-verified

92%

deterministic chain

Probabilistic resolution

96.4%

precision @ 0.85 threshold

Human-review flagged

~3.6%

below confidence threshold

Unmerge events

tracked

with full dependency rebuild

Avg aliases per entity

3.2

across all sources

Resolution signal hierarchy

D.1
Deterministic identifier (hardest signal).OIB, MB, VAT, passport, ISIN. Match → merge with confidence 1.0. Mismatch → refuse to merge regardless of other signals. No probabilistic overrides for hard identifiers.
D.2
Derivative identifier.Email domain, website, registered office address (street-level match). Weight: 0.75 when combined with D.3.
P.1
Name similarity (string level).Levenshtein + Jaro-Winkler + phonetic normalization. Suffix handling (d.o.o., j.d.o.o., d.d.) is non-distinguishing. Weight: 0.45 max, saturates quickly.
P.2
Temporal co-occurrence.Two records mentioned in the same document on the same date with similar names: strong signal. Weight: 0.30 per co-occurrence, capped at 0.65 total.
P.3
Structural co-occurrence.Shared board members, shared registered address, shared bank account (when disclosed). Weight: 0.20–0.40 depending on signal density.
P.4
Embedding-space proximity.Vector similarity in the semantic cortex over company descriptions, articles mentioning the entity, and declared activity codes. Weight: 0.15, used only as tiebreaker.

Reversibility guarantee: Every merge retains its proof trail. If new evidence contradicts a merge, the engine produces an unmerge operation, recomputes all downstream dependents (graph edges touching the merged node, vector embeddings that referenced the canonical form, query caches), and rebuilds affected slices. The system does not silently accumulate identity errors. It corrects them.

04

Ri.NET Vector Cortex

Semantic reasoning substrate

The cortex is not a retrieval index. It is a reasoning substrate. Retrieval is the first of several operations that happen at this layer before an answer leaves. The downstream layers (Graph Fabric, Agent Swarm, Interface) never see raw retrieval results. They see reconciled retrieval results: passages bound to resolved entities, reranked by domain-specific relevance, cross-validated against the temporal graph, and assembled into citation bundles.

Collections

7

domain-adapted

Indexed vectors

6.87M

1024-dim dense

Languages

47

multilingual model

Query latency

94ms

p50, full pipeline

RAGAS

3.7/5

6,400 eval pairs

Hallucination rate

0.3%

verified sampling

Citation accuracy

94.1%

correct article cited

Reindex cycle

nightly

incremental, zero downtime

The seven collections

Collection	Domain	Vectors
Legal	Statutes, regulations, ordinances, amendments	307,329
Corporate Knowledge	Companies, institutions, roles, activities	4,690,000
Entities (v2)	Resolved entity descriptions + aliases	339,000
Procurement	Tender announcements, bids, awards, contracts	273,118
Press & News	Journalistic coverage, sentiment-tagged	847,000
Court & Compliance	Court notices, judgments, anomaly findings	312,000
Conversational Memory	DABI Q&A pairs, retrieval training set	10,664

Query pipeline inside the cortex

Cortex query pipeline — five operations, one round trip

What "reasoning substrate" means operationally: The cortex does not return "here are fifteen passages that look relevant." It returns "here are fifteen passages, each bound to the entity it describes, each cross-validated against the temporal graph to confirm the statement was true at the query's time anchor, each carrying a citation with document ID, page, paragraph, and extraction confidence." The distinction is the distinction between search and reasoning.

05

Temporal Graph Fabric

Time-first relationship store

Most graphs are timeless. They encode "A is connected to B." Ri.NET's graph encodes "A was connected to B from January 2019 until October 2022, then the connection type changed from 'board member' to 'advisor' until July 2024, at which point B exited A entirely." This changes everything about how queries work.

Nodes

164,583

typed + temporal

Edges

891,247

all with intervals

Edge types

34

ontology-defined

p50 traversal

186ms

depth 3, time-bounded

p95 traversal

487ms

complex patterns

Time-range queries

native

no post-filtering

Temporal edge schema

A relationship between two nodes across six years — three distinct edges, all preserved

Query primitives

The graph exposes four primary query primitives. Each respects the temporal constraint by default.

Q1
As-of traversal."Who was on the board of Company X on 15 March 2021?" The graph returns edges whose validity interval contained that date. Closed edges that were active on that date are included. Edges that did not yet exist are excluded.
Q2
Pattern match."Find all cases where a director of company A is also a director of a winning bidder to a procurement by A, within the same calendar year." The graph matches structurally and temporally.
Q3
Shortest path with time bounds."What is the shortest connection between Person X and Institution Y using only relationships that were active between 2020 and 2024?" Returns path + validity timeline for every edge.
Q4
Temporal diff."What changed in Company X's beneficial ownership structure between 2022-01-01 and 2024-12-31?" Returns opened edges, closed edges, and type transitions in the interval.

06

Agent Swarm Fabric

600+ autonomous workers

Six hundred and sixty-six specialized autonomous agents operate Ri.NET under a single central orchestrator. The hierarchy is strict: one orchestrator, five cognitive domains (Legal, Financial, Civic, Sentinel, Journalist), sixty mid-level coordinators, six hundred task-specific workers. The number is not symbolic. It is the measured steady-state fleet size under current operational load.

Orchestrator

1

central router (DABI)

Cognitive domains

5

Legal/Fin/Civic/Sentinel/Jour.

Coordinators

60

mid-level routing

Workers

600

task-specific

Total steady-state

600+

active agents

Uptime

24/7

autonomous

Agent swarm hierarchy — hub-and-spoke with cognitive specialization

Why agents never decide: Every agent in Ri.NET has one job and a sharp boundary around that job. An ingestion agent ingests. A resolution agent resolves. An anomaly agent flags. None of them acts on findings. Findings route to humans with appropriate authority. This is not a limitation — it is an architectural commitment to a specific model of AI-assisted governance in which the AI finds, surfaces, cites, and explains, but the human decides. Anyone who promises you otherwise is selling you liability.

07

Interface & Orchestration

The only layer that speaks

Layer 07 is the only component of Ri.NET that external consumers — humans, API clients, frontends — ever touch directly. It routes queries, enforces authentication, applies rate limits, manages sessions, and presents the reasoning substrate beneath through three surfaces: the conversational interface (DABI), the structured REST API, and a set of domain-specific frontends.

End-to-end p50

142ms

entity resolution

End-to-end p95

487ms

complex graph query

Uptime

99.94%

trailing 90 days

Surfaces

3

DABI / API / UI

Response caching

zero

staleness = correctness bug

No response caching: The interface layer caches nothing. Every response is freshly computed from the current state of the substrate. This is a deliberate tradeoff — civic intelligence staleness is a correctness bug, not a performance concern. A two-hour-old answer to "who currently owns this company" is not a slightly slower correct answer. It is a wrong answer.

End-to-end data flow

A single ingestion event — say, a new court notice published on e-Oglasna at 08:14 — flows through the entire stack in median 2.3 seconds from source availability to query-ready state. The sequence:

T+0ms
Court notice publishedSource webhook fires. Collector agent receives notification.
T+180ms
Raw fetch completeCollector retrieves the document. Raw bytes persisted. Provenance record created.
T+420ms
Parse completeSource-specific parser extracts: case number, parties (by name), court, filing date, notice type. Validated against source-specific schema.
T+680ms
NormalizedDates harmonized, encoding repaired, party names canonicalized. Handoff to entity resolution.
T+1,100ms
Entities resolvedEach named party resolved to a canonical entity ID. Resolution confidence attached. Low-confidence matches flagged.
T+1,450ms
Cortex indexedDocument content embedded (1024-dim) and inserted into the Court & Compliance collection. Linked to resolved entities.
T+1,900ms
Graph updatedNew edges added: (party_A → case_X), (party_B → case_X), with validity starting now. Temporal index updated.
T+2,300ms
Query-readyThe notice is now retrievable by semantic query, entity lookup, or graph traversal. Any query issued from T+2.3s onward sees the new data.
T+2,600ms
Agent scan triggeredSentinel agents evaluate the new event against 127 anomaly patterns. If any pattern matches above threshold, an alert is queued for the appropriate human.
T+∞
Audit anchoredEvery operation logged to the immutable audit trail. Nightly batch anchors the log root to Polygon PoS. The entire chain is verifiable by any third party.

Query lifecycle

Consider the query: "Which companies won procurement contracts above €100K from KBC Rijeka in 2023, where at least one board member was also affiliated with a losing bidder?" The following illustrates the path this question takes through the system.

Compound query lifecycle — one natural-language question, eight layer hops, assembled response with full provenance

What this illustrates: A question a regulator would take two analysts three weeks to answer by hand becomes a 487-millisecond operation. Not because Ri.NET is magic. Because every layer in the stack already did the preparation work.

Security topology

Ri.NET implements zero-trust as an architectural invariant, not a retrofit. No service trusts any other service. Every call carries a cryptographic assertion of origin, scope, and intent. Every operation is logged to an immutable audit trail. The audit trail is anchored nightly to a public blockchain.

Concentric security layers — each enforces independently, no single point of compromise

Read the full DPIA summary on the DPIA page. Enterprise and sovereign deployments receive the complete seventeen-page DPIA package as part of onboarding.

Seven layers. Measured to the microsecond.