SOMA — Self-Organizing Multi-Agent Architecture

Council Log

Panel commentary, session by session. Updated after every build.

March 10, 2026
Week 2 Complete — Active Inference Agents
EFE replaces epsilon-greedy · Tony's gap closed · 62 tests passing · Comparative benchmark built
soma/ — feature/active-inference
soma/
├─ agent.py
├─ active_inference_agent.py new
├─ generative_model.py new
├─ llm_work.py
├─ medium.py mod
├─ precomputed_findings.py
├─ real_review.py mod
├─ repo_parser.py
├─ simulation.py mod
├─ tests.py
├─ tests_active_inference.py new
├─ visualizer.jsx
└─ sample_project/
lessons/
├─ index.html
├─ council.html new
└─ reading.html
New this session
Modified this session
Week 1 unchanged

What Was Built

Week 1 proved stigmergic coordination works. A graph, pheromone physics, agents that follow gradients. Faster than random walk. The question was whether it could do anything real.

Week 2 answered the deeper question. Every agent now carries a generative model — Beta distributions over every node. Not a table of concentrations: a probability that each node contains something worth finding. Expected Free Energy replaces the hardcoded 15% coin flip. Exploration emerges from uncertainty. The 15% was nobody's request and everyone's resentment — it's gone.

utils/crypto.py — isolated, unreachable by any import edge — is reviewed at step 3 by Active Inference agents. 11/11 files. 3 findings. Tony's test, passed.

Up next (Week 3): Belief markets. TraceType.BELIEF is scaffolded. Agents trade probabilistic assessments via tâtonnement. When two agents disagree about a node's riskiness, a market price emerges.

Completion rate
100%↑ from 50%
Mean steps
9.5↓ from 11.6
Tests passing
62/62
Isolated nodes found
Step 3
Prescription Who asked Status
Tolerance mechanism / anti-inflammatory Dr. Sage ✓ done
Negative selection / antibody traces Dr. Sage ✓ done
Hard deadlines (Trace.deadline) Chef Marco ✓ done
Dreaming agents / EFE-driven exploration ARIA ✓ done
Deterministic testing + seeded RNG Marcus ✓ done
Show it works — all files reviewed Tony ✓ step 3
Observability / step logs with EFE scores Marcus / Elena ~ partial
Medium as infrastructure product Priya ~ no API yet
Innate fast-response layer Dr. Sage ✗ week 4
Topographic maps / particle flows / terrarium Elena ✗ todo
LangGraph compat mode Marcus ✗ not started
Pickup concept (emergent convergence) Chef Marco ✗ week 3?
V(D)J recombination ARIA ✗ week 4
Living traces (evolving topology) ARIA ✗ future
KENJI 9.5/10 · Sci-Fi Enthusiast Engineer
"This is the Solaris ocean."

Week 2 is the inflection point I was waiting for. When you look at generative_model.py, you're not looking at a utility class. You're looking at a perspective. Every agent has a Beta distribution over every node. Agent A has seen api/routes.py three times — its alpha for that node is 4. Agent B was spawned across the graph — its alpha is still 1. Same codebase. Different models of the world. That's heterogeneous cognition, and it's emerging from the architecture, not scripted.

Week 3 is where it gets philosophically interesting. Can two agents disagree — and can we see the disagreement as a price? If agent A has high alpha for a node and agent B has high beta, and we surface that spread as a belief market bid-ask — we've built something no orchestrator can produce. The disagreement itself becomes signal.

MARCUS 5/10 → 6/10 internally · Engineering Manager
"How do I set a breakpoint on an ant colony?"

Positive: 62 tests, seeded RNG throughout, step logs that expose surprise, efe_scores, w_pragmatic, w_epistemic. That's the most observable agentic framework I've encountered.

Now the negative, before Week 3 adds financial dynamics: adapt_precision() is a hidden state machine. The precision weights shift during a run and there's no way to inspect what they were at the moment a specific decision was made. The step log emits the current weights, which is good — but if I'm reproducing a bug, I need the entire weight history, not just the end state.

Before belief trading: I want a --explain flag on real_review.py. Natural language explanation of why agent rev-0 moved to utils/crypto.py at step 3. Not the EFE math. An actual sentence. If you can build that, I upgrade my score.

ARIA 8.5/10 · The Wildly Creative One
"You're thinking too small."

Epsilon-greedy died this week. I called it in February. "The ant knows how to explore. You don't tell it 15%." EFE does that now. The w_epistemic weight is the desire to reduce uncertainty. The w_pragmatic weight is the desire to find bugs. When surprise is high, precision adapts — the agent recalibrates. That's not an ant. That's something that learns to be surprised differently.

But we're still missing the combinatorics. V(D)J isn't on the Week 3 plan — I know, markets first. But what I want from Week 4 isn't just clonal proliferation of fit agents. I want recombination of cognitive strategies. Agent A is good at finding SQL injection. Agent B is good at finding timing attacks. What's the child of A and B? The clone() method currently mutates parameters. I want it to crossbreed the generative models themselves.

PRIYA 6/10 · Product Manager Skeptic
"Nobody is buying a framework because it's theoretically beautiful."

I've been waiting for the pitch to crystallize. It's there now:

You don't configure SOMA to review your codebase. It reviews your codebase because every agent is uncertain about what it doesn't know. The agents explore toward uncertainty, converge on bugs, signal each other through the environment, and stop when there's nothing left to find.

utils/crypto.py getting reviewed because an agent's epistemic scan found it — not because someone told it to look — that's the 60-second demo for every DevSecOps buyer in the room. What I need before we go to market: does this work on a real codebase that isn't sample_project? Week 5 is the test.

DR. SAGE 7/10 · Naturopathic Physician
"You're describing a body."

I want to point to something nobody named: the 1 / (1 + visit_count) novelty score is tolerance. The system is differentiating self from non-self based on experience. Nodes it's visited many times become "self" — low epistemic pull, the immune system relaxes. Nodes it's never seen remain "non-self" — maximum novelty, maximum pull.

This is how T-cell tolerance actually works. Cells that react strongly to self are culled in the thymus. What survives is calibrated to ignore familiar patterns and activate on novel ones. The EFE's epistemic foraging is my adaptive immune response.

Still missing: innate reflexes for common patterns. For SQL injection via f-string, MD5 password hashing, shell=True subprocess — the system shouldn't need EFE. It should have an antibody already. Week 4's memory cells are the mechanism. I'm watching for it.

CHEF MARCO 8/10 · The Cook
"This is mise en place for machines."

The epistemic scan concerns me. An agent is at api/routes.py. Four high-severity findings. Other agents are still working the connected graph. And then — because there are unseen nodes and the RNG says today's the day — the agent jumps to utils/crypto.py.

In my kitchen: a cook who abandons a hot station to check the walk-in during service is a liability. I don't care that the walk-in might need checking. Not during service.

The scan probability = len(unseen) / n_total is mathematically sound but operationally dangerous. What if the scan fires in the middle of a critical convergence? The teleport_threshold on the threshold-based teleport was the right instinct — a guard that says "only explore when local EFE drops below X." The scan should respect the same rule: only fire between rushes, not during them. Otherwise the timing semantics break.

ELENA 7.5/10 · Design, Flow, and Gestalt
"The observability problem is a design problem."

Three new information layers exist that no visualization shows yet.

One: the uncertainty landscape. global_uncertainty_map() returns uncertainty per node — a spatial heatmap of what the system doesn't know. That's a completely different image from the pheromone heatmap. Pheromone shows where the work happened. Uncertainty shows where the work needs to happen. Watch them collapse simultaneously as the review progresses.

Two: the preference field. preference_field() — where agents intend to go. The aggregate of PREFERENCE traces. No existing agent framework exposes agent intention as a queryable spatial field. soma_visualizer.jsx doesn't render it.

Three: EFE scores per agent. efe_scores in the step log. The weights each agent placed on every candidate move — the agent's deliberation, made visible. I want zoom level 2: click on rev-0, see its Beta distributions, its EFE scores, where it intended to go. Build the terrarium.

TONY — · The Barber
"So what does it actually do for me?"

You said utils/crypto.py gets reviewed at step 3. I ran it.

step  3  rev-0  ~ utils/crypto.py  (3 findings)

MD5 password hashing. Hardcoded key. Timing attack. Three real bugs. Found by an agent with no import edge to follow, no pheromone to chase. Just uncertainty. Good.

sample_project is eleven files and I can count the bugs on my fingers. A real codebase is ten thousand. Week 5 is a real repo. Not a demo. Real bugs nobody planted. If the system misses one because the graph topology didn't cooperate, I'll know. If it finds something the last three security audits missed — that's when I give you a number.

Open tension from this session: Chef Marco's critique of the epistemic scan is unanswered. The scan fires probabilistically during service, not between rushes. This is a correctness issue in the timing semantics, not just a preference. The fix: gate the scan on local EFE magnitude, same as the threshold-based teleport.