soma/ — feature/active-inference
soma/
├─
agent.py
├─
active_inference_agent.py
new
├─
generative_model.py
new
├─
llm_work.py
├─
medium.py
mod
├─
precomputed_findings.py
├─
real_review.py
mod
├─
repo_parser.py
├─
simulation.py
mod
├─
tests.py
├─
tests_active_inference.py
new
├─
visualizer.jsx
└─
sample_project/
lessons/
├─
index.html
├─
council.html
new
└─
reading.html
What Was Built
Week 1 proved stigmergic coordination works. A graph, pheromone physics, agents that follow gradients. Faster than random walk. The question was whether it could do anything real.
Week 2 answered the deeper question. Every agent now carries a generative model — Beta distributions over every node. Not a table of concentrations: a probability that each node contains something worth finding. Expected Free Energy replaces the hardcoded 15% coin flip. Exploration emerges from uncertainty. The 15% was nobody's request and everyone's resentment — it's gone.
utils/crypto.py— isolated, unreachable by any import edge — is reviewed at step 3 by Active Inference agents. 11/11 files. 3 findings. Tony's test, passed.Up next (Week 3): Belief markets.
TraceType.BELIEFis scaffolded. Agents trade probabilistic assessments via tâtonnement. When two agents disagree about a node's riskiness, a market price emerges.Trace.deadline)Week 2 is the inflection point I was waiting for. When you look at
generative_model.py, you're not looking at a utility class. You're looking at a perspective. Every agent has a Beta distribution over every node. Agent A has seenapi/routes.pythree times — its alpha for that node is 4. Agent B was spawned across the graph — its alpha is still 1. Same codebase. Different models of the world. That's heterogeneous cognition, and it's emerging from the architecture, not scripted.Week 3 is where it gets philosophically interesting. Can two agents disagree — and can we see the disagreement as a price? If agent A has high alpha for a node and agent B has high beta, and we surface that spread as a belief market bid-ask — we've built something no orchestrator can produce. The disagreement itself becomes signal.
Positive: 62 tests, seeded RNG throughout, step logs that expose
surprise,efe_scores,w_pragmatic,w_epistemic. That's the most observable agentic framework I've encountered.Now the negative, before Week 3 adds financial dynamics:
adapt_precision()is a hidden state machine. The precision weights shift during a run and there's no way to inspect what they were at the moment a specific decision was made. The step log emits the current weights, which is good — but if I'm reproducing a bug, I need the entire weight history, not just the end state.Before belief trading: I want a
--explainflag onreal_review.py. Natural language explanation of why agentrev-0moved toutils/crypto.pyat step 3. Not the EFE math. An actual sentence. If you can build that, I upgrade my score.Epsilon-greedy died this week. I called it in February. "The ant knows how to explore. You don't tell it 15%." EFE does that now. The
w_epistemicweight is the desire to reduce uncertainty. Thew_pragmaticweight is the desire to find bugs. When surprise is high, precision adapts — the agent recalibrates. That's not an ant. That's something that learns to be surprised differently.But we're still missing the combinatorics. V(D)J isn't on the Week 3 plan — I know, markets first. But what I want from Week 4 isn't just clonal proliferation of fit agents. I want recombination of cognitive strategies. Agent A is good at finding SQL injection. Agent B is good at finding timing attacks. What's the child of A and B? The
clone()method currently mutates parameters. I want it to crossbreed the generative models themselves.I've been waiting for the pitch to crystallize. It's there now:
utils/crypto.pygetting reviewed because an agent's epistemic scan found it — not because someone told it to look — that's the 60-second demo for every DevSecOps buyer in the room. What I need before we go to market: does this work on a real codebase that isn'tsample_project? Week 5 is the test.I want to point to something nobody named: the
1 / (1 + visit_count)novelty score is tolerance. The system is differentiating self from non-self based on experience. Nodes it's visited many times become "self" — low epistemic pull, the immune system relaxes. Nodes it's never seen remain "non-self" — maximum novelty, maximum pull.This is how T-cell tolerance actually works. Cells that react strongly to self are culled in the thymus. What survives is calibrated to ignore familiar patterns and activate on novel ones. The EFE's epistemic foraging is my adaptive immune response.
Still missing: innate reflexes for common patterns. For SQL injection via f-string, MD5 password hashing,
shell=Truesubprocess — the system shouldn't need EFE. It should have an antibody already. Week 4's memory cells are the mechanism. I'm watching for it.The epistemic scan concerns me. An agent is at
api/routes.py. Four high-severity findings. Other agents are still working the connected graph. And then — because there are unseen nodes and the RNG says today's the day — the agent jumps toutils/crypto.py.In my kitchen: a cook who abandons a hot station to check the walk-in during service is a liability. I don't care that the walk-in might need checking. Not during service.
The scan probability
= len(unseen) / n_totalis mathematically sound but operationally dangerous. What if the scan fires in the middle of a critical convergence? Theteleport_thresholdon the threshold-based teleport was the right instinct — a guard that says "only explore when local EFE drops below X." The scan should respect the same rule: only fire between rushes, not during them. Otherwise the timing semantics break.Three new information layers exist that no visualization shows yet.
One: the uncertainty landscape.
global_uncertainty_map()returns uncertainty per node — a spatial heatmap of what the system doesn't know. That's a completely different image from the pheromone heatmap. Pheromone shows where the work happened. Uncertainty shows where the work needs to happen. Watch them collapse simultaneously as the review progresses.Two: the preference field.
preference_field()— where agents intend to go. The aggregate of PREFERENCE traces. No existing agent framework exposes agent intention as a queryable spatial field.soma_visualizer.jsxdoesn't render it.Three: EFE scores per agent.
efe_scoresin the step log. The weights each agent placed on every candidate move — the agent's deliberation, made visible. I want zoom level 2: click on rev-0, see its Beta distributions, its EFE scores, where it intended to go. Build the terrarium.You said
utils/crypto.pygets reviewed at step 3. I ran it.MD5 password hashing. Hardcoded key. Timing attack. Three real bugs. Found by an agent with no import edge to follow, no pheromone to chase. Just uncertainty. Good.
sample_projectis eleven files and I can count the bugs on my fingers. A real codebase is ten thousand. Week 5 is a real repo. Not a demo. Real bugs nobody planted. If the system misses one because the graph topology didn't cooperate, I'll know. If it finds something the last three security audits missed — that's when I give you a number.