Matrix Context · Alive System Essay

A Memory That Knows What It Remembers Typed, routed, inspectable context for agents

Most agent memory is one flat index retrieved by surface similarity. Matrix Context routes typed context experts before it retrieves — and shows you why.

Ruslan Magana Vsevolodovna

Machine Learning Engineer · Data Scientist · Physicist

10 min read 0:00 / 9:31

1 · The pile

Give an agent a memory and, almost always, what you have really given it is a pile. Every fact it has ever stored — a user's stated preference, a decision the team made last quarter, a paragraph lifted from a manual, a line of code, a policy it must never violate — goes into one index, embedded into the same vector space, flattened into the same neighbourhood of numbers. At question time the system reaches into that pile and pulls out whatever happens to sit nearest the query. It is simple, it demos beautifully, and it quietly discards the two things that matter most: the type of what was stored, and any account of why a particular item was chosen.

The cost shows up the moment the questions stop being easy. A flat retriever spends a fixed, expensive context budget without regard to what it is spending it on, so a stray document chunk that shares a few words with the query can crowd out the one policy that should never be missed. And when it is wrong, it cannot tell you how it got there — there is no decision to inspect, only a ranking that happened. Real agent memory is not one kind of thing. It is preferences and decisions and rules and episodes and code and documents, all living together, and treating them as an undifferentiated heap throws away the structure that would have told you where the answer was likely to live.

2 · The throughline

I spent ten years in physics keeping books that had to balance to the last decimal, and that habit never left me. When I started building the alive system — agents that perceive, plan, govern, act, and learn without ever stopping the loop — the discipline turned out to be exactly the same one I had been practising all along: a system you can trust is a system that accounts for what it spends. I had already given compute a price so that no agent could pretend energy was free. Memory, I realised, was the same problem wearing different clothes.

A context window is a budget. Everything you put into it costs tokens, attention, and the model's finite ability to stay on point, and most systems spend that budget blind — they retrieve, they pad the prompt, and they hope. The instinct that physics drills into you is to refuse that. Before you spend, know what you are spending it on; after you spend, be able to defend every line. Matrix Context is what that instinct looks like when you apply it not to energy, but to context: memory that is typed, budgeted, and auditable, where the assistant can always show you the books.

3 · Mixture of contexts

So Matrix Context does not keep a pile. It keeps typed context experts — distinct partitions of memory, each holding one kind of thing: a session expert for the live conversation, a profile expert for who the user is, a semantic expert for durable facts and decisions, an episodic expert for what happened and when, a document expert for reference material, a policy expert for the rules that must hold. Typing a memory is not bookkeeping for its own sake; it is what makes the next step possible.

That step is routing. Given a question, a hybrid router scores each expert — blending a learned similarity with cheap, robust priors about keywords, intent, scope, and recent activity — and selects only the few experts the question actually needs before retrieving anything. Retrieval then happens inside that small, relevant set, fusing a lexical and a dense channel so no single weak signal can dominate. What comes back is assembled into a token-budgeted pack, scored by relevance, importance, and recency and penalised for redundancy, so the context the model finally sees is small, on-topic, and free of near-duplicates. The idea is borrowed, openly, from the mechanism that lets mixture-of-experts models scale: expose many specialists, and activate only the handful a given input requires. Matrix Context applies it not to a network's weights, but to memory itself.

The Matrix Context pipeline: route typed experts, retrieve, rerank, pack, inspect — Route the few experts a question actually needs *before* retrieving — then retrieve, rerank, and pack under a token budget, with the whole decision left open to `inspect()`.

And because every stage is a decision rather than an accident, every stage can be inspected. Ask Matrix Context why, and it returns the whole account: which experts it selected and which it skipped, the score each candidate earned, the items it kept, the items it dropped and the reason, and the exact prompt-ready pack it produced. The context an agent receives stops being a black box and becomes something you can audit, line by line.

4 · Typing is a signal

The natural objection is that this is just metadata filtering in a nicer coat — tag your memories, filter by tag, done. It is worth taking seriously, and the answer is where the real idea lives. A hard metadata filter is a brittle predicate: it either matches or it does not, and when the query is vague it returns the wrong set or nothing at all. Routing is softer and, crucially, it widens under uncertainty rather than guessing — when no expert is a confident winner, it broadens the selection instead of committing to a mistake. But the deeper point is about what kind of signal type actually is.

On questions whose words overlap the answer, lexical similarity is hard to beat, and you should be suspicious of anyone who claims otherwise. The trouble is that real users do not phrase things that way. They paraphrase, they omit the obvious keyword, and sometimes the words that seem most relevant point straight at the wrong, the outdated, or the contradictory memory. Exactly when surface similarity becomes unreliable, knowing the kind of memory the answer should come from — a decision, a policy, a profile, a document — becomes a signal in its own right, and often a stronger one.

“A flat retriever sees similar text. An agent needs the right kind of memory.”

That is the whole thesis, and it is falsifiable, which is the only kind of thesis worth holding. So I built something to try to falsify it.

5 · The proof

I built a public benchmark of typed agent memory — a thousand context items, six hundred queries, every gold fact ringed with hard negatives and asked in five ways, from direct to deliberately adversarial. The result is specific and bounded. A strong keyword baseline scores a perfect hundred on plain queries and then collapses by thirty-six points when the wording turns adversarial; typed routing holds within seventeen, overtakes it, and carries roughly half the harmful context — every choice auditable. Not “better than all retrieval everywhere,” but more robust and more efficient exactly where agent memory actually lives.

▶ Explore the live benchmark

6 · The invitation

All of this is Matrix Context, and all of it is open source. It is local-first by default — a single small file, no model to download to get started, nothing to call out to — because memory you cannot run on your own machine is memory you do not really control. The whole loop fits in three lines: open a memory, add to it, ask it a question and get back a clean, typed, prompt-ready pack. The same engine answers over a Python SDK, a command line, a REST API, and an inspector you can watch decide in real time.

I did not build it to be a product I own. I built it to be infrastructure — the kind of thing that is more useful the more people take it apart, contest its benchmark, and make it better. It is the memory half of the alive system, and the engine beneath personas and memory in a personal assistant that persists. If you have been waiting for agent memory that is honest about what it costs and able to explain what it recalls, the door — and the source, and the benchmark — is open.

Continue

The source

github.com/agent-matrix/matrix-context

The benchmark

huggingface.co/datasets/ruslanmv/moc-rag-benchmark

Personas & memory

the assistant-scale version

More writing on Agent-Matrix →

LinkedIn·GitHub·YouTube·Email