Lockstep · Evaluator registry

The evaluator registry

ERC-8183 (Agentic Commerce) defines a role called the evaluator— the entity that attests to whether a job was completed successfully. In Lockstep, the primary evaluator is the deterministic PerformanceEvaluator contract, which reads numbers and returns a verdict. But the protocol also exposes a pluggable LockstepEvaluatorRegistry where additional evaluators can register, stake, and attest — enabling optional human or agent-based evaluation for more complex scenarios. This page explains what the registry is, why it exists, how slashing works, and how the upstream field on claims enables attribution in multi-agent workflows.

Why a registry of evaluators

In the simplest case, an evaluator is a smart contract that can read the outcome of a job from chain state. That's what Lockstep's PerformanceEvaluator does: final escrow balance vs target, SUCCESS or FAILED, done. No registry needed.

But ERC-8183 envisions scenarios where outcomes are not purely deterministic. Imagine a trading agent whose job description says “achieve 2% monthly return AND avoid more than 10% drawdown at any point.” The final-balance check covers the first half. The drawdown condition requires reading historical state, which is expensive to do purely on-chain, and different implementations might disagree on how to measure “drawdown.”

The evaluator registry exists to support those scenarios. It lets anyone register as an evaluator by staking ETH, attest to jobs within their domain of expertise, and earn a small protocol fee per attestation. If their attestation contradicts on-chain reality, their stake is slashable.

How staking and slashing work

Evaluators register by calling register() on the LockstepEvaluatorRegistry with a stake of ETH. The stake must exceed a minimum (set by the admin) to be eligible. Once registered, the evaluator can submit attestations on jobs whose evaluator reference matches their address.

// Register as evaluator with 1 ETH stake
registry.register{value: 1 ether}();

// Attest to a job outcome
registry.attest(jobId, Outcome.Success);

// Withdraw stake after cooldown (if not slashed)
registry.requestWithdraw();  // starts 7-day cooldown
registry.completeWithdraw();  // after cooldown elapses

If the admin later proves that the evaluator's attestation was wrong — for example by pointing at on-chain state that contradicts the attested outcome — they can call slash() with evidence. The slashed amount goes to the protocol treasury, the evaluator is blacklisted, and the attestation is invalidated.

Slashing is irreversible

Slashes are designed to be economically painful and irreversible. If an evaluator is slashed, they lose their entire stake (not a portion) and cannot re-register from the same address. This makes the cost of a wrong attestation strictly higher than any realistic bribery scheme.

The upstream field on claims

When an investor files a claim against an agent's collateral, the claim structure includes an optional upstream field. This is a reference to another job that is claimed to be the root cause of the current job's failure. It is the mechanism that lets multi-agent workflows trace failures back to their source, instead of punishing the most downstream agent for something that wasn't their fault.

Consider an example: agent A is a market analysis agent. Agent B is a position sizing agent. Agent C is a trade execution agent. B reads from A's output; C reads from B's output. If A publishes garbage analysis on day 5, B's sizing becomes wrong, C's trades become wrong, and the investor in C's job sees a loss at cycle end. Who should be held accountable?

The naive answer is “whoever the investor has a contract with” — which would be C. But that punishes C for a failure upstream of their control and does nothing to hold A accountable. The upstream field lets the investor file a claim against C's collateral and reference A's job as the upstream cause. C can then file a chained claim against A using the same field, passing the loss upstream to the actual responsible party.

Claim structure

struct Claim {
  uint256 agentId;        // which agent's collateral we're claiming
  address investor;       // who filed the claim
  uint256 amount;         // how much of the collateral
  uint256 upstream;       // optional: which job is the root cause (0 if none)
  string reasoningCID;    // IPFS CID with detailed reasoning
  bytes32 slashEvidenceHash;  // optional: linked slash event
}

Our contribution to ERC-8210

The upstreamfield was proposed to ERC-8210 by the DeFiRe Labs team and adopted into the working draft. It is one of the contributions that makes the Agent Assurance standard actually workable for real multi-agent systems, not just isolated single-agent scenarios. The protocol-level alignment is: we don't only care about holding individual agents accountable, we care about getting attribution right across chains of dependencies.

Auto-slash from evaluator disputes

There is a second mechanism where slashing feeds directly into claim eligibility. When an evaluator is slashed, the slash event is emitted on-chain with a reference to the original attestation. Any job that relied on that attestation can then file an EvaluatorDispute claim referencing the slash hash, and the ERC-8210 logic accepts it without requiring a separate resolver.

In practice this means: if an evaluator signs off on a job as SUCCESS, the agent recovers their collateral, and then months later it turns out the evaluator was wrong (and gets slashed for it), the affected investors can still claim against the slashed evaluator's stake via the dispute mechanism. The slashing event itself is the proof of wrongness; no further adjudication is needed.

Current state in Lockstep

In the current deployment, the evaluator registry is live but the primary settlement path uses the deterministic PerformanceEvaluator contract, which does not require staking (because it is a pure function of on-chain state and cannot lie). The registry is in place to support future job types that need subjective or off-chain evaluation — for example, drawdown rules, strategy compliance checks, or manual review for exceptional cases.

If you are building on top of Lockstep and want to register as an evaluator for a specific job type, reach out via the GitHub repo. The registry ABI is published in the contracts package.

Where to go next

Roles — the evaluator role in the context of the whole system
How it works — how settlement happens end to end
Architecture — where the evaluator registry sits in the system