System Specification

Open Agent Trust Stack

A system specification for zero-trust AI agent execution. Define what is permitted and make everything else structurally inexpressible.

Version 1.3.0
Status Release
Authors Jascha Wanger / ThirdKey AI
Date 2026-05-19
License CC BY 4.0
DOI DOI 10.5281/zenodo.20298543
Abstract

Zero-Trust Agent Execution Through Structural Enforcement

As AI systems evolve from assistants into autonomous agents executing consequential actions, the security boundary shifts from model outputs to tool execution. Traditional security paradigms — log aggregation, perimeter defense, post-hoc forensics, and runtime interception of fully-formed actions — cannot adequately protect systems where AI-driven actions are irreversible, execute at machine speed, and originate from potentially compromised orchestration layers.

This paper introduces the Open Agent Trust Stack (OATS), an open specification for zero-trust AI agent execution. OATS is built on three architectural convictions.

Conviction 01

Allow-list enforcement

Rather than intercepting arbitrary actions and deciding which to block, OATS constrains what actions can be expressed through declarative tool contracts, making dangerous actions structurally inexpressible.

Conviction 02

Compile-time enforcement

The Observe-Reason-Gate-Act (ORGA) reasoning loop uses typestate programming so that skipping the policy gate is a type error, not a runtime bug.

Conviction 03

Structural independence

The Gate phase is architecturally isolated from LLM influence.

OATS specifies five layers: (1) the ORGA reasoning loop with compile-time phase enforcement, (2) declarative tool contracts with typed parameter validation, (3) a cryptographic identity stack providing bidirectional trust between agents and tools, (4) a formally verifiable policy engine operating on structured inputs, and (5) hash-chained cryptographic audit journals with Ed25519 signatures for tamper-evident forensic reconstruction.

OATS is model-agnostic, framework-agnostic, and vendor-neutral. The architecture is informed by operational experience with a production runtime (Symbiont) that has operated autonomously for approximately nine months. Initial empirical results validating five of seven core conformance requirements are now available through three companion preprints and the symbiont-orga-demo reference corpus; results are summarized in Section 14.7. The specification continues to stand independently of any single implementation, and remaining empirical work is identified as future deliverables.

Figure 0 · Trust stack
The trust stack. Five layers, each addressing a distinct security question. The ORGA loop (Layer 1) enforces that the Gate runs; tool contracts (Layer 2) constrain expressible actions; the identity stack (Layer 3) provides mutual authentication; the policy engine (Layer 4) evaluates authorization; the audit journal (Layer 5) records decisions. Click a layer for its conformance tie.
Section 01

Introduction

1.1 The Runtime Security Gap

AI agents now execute consequential actions across enterprise systems: querying databases, sending communications, modifying files, invoking cloud services, and managing credentials. Through function calling, plugins, external APIs, and protocol-based tool servers such as the Model Context Protocol (MCP), these agents perform multi-step tasks without human intervention.

These actions exhibit five characteristics that existing security paradigms cannot adequately address:

  1. Irreversibility. Tool executions produce immediate and often permanent effects: database mutations, financial transactions, credential changes, or data exfiltration. Once executed, the damage is done.
  2. Speed. Agents execute hundreds of tool calls per minute, far exceeding human capacity for real-time review. Multi-step attack chains complete within seconds.
  3. Compositional risk. Individual actions may each satisfy policy while their composition constitutes a violation. Reading a confidential file is permitted; sending email is permitted; doing both in sequence may constitute exfiltration.
  4. Untrusted orchestration. Prompt injection and indirect instruction attacks mean the model's apparent intent cannot be trusted. Adversarial prompts can be embedded in documents, emails, and images that agents process.
  5. Privilege amplification. Agents routinely operate under static, high-privilege identities misaligned with the principle of least privilege.

The gap in the current security landscape lies at the intersection of prevention and context-awareness: no existing system can block actions before execution based on both static policy and accumulated session context while simultaneously constraining what actions can be expressed in the first place. This is the gap that OATS addresses.

This paper makes two contributions: a normative system specification defining the runtime enforcement boundary for autonomous agent execution, and an implementation-grounded evaluation methodology derived from operational experience with a production runtime. The specification is the primary artifact; the evaluation framework (Section 14) is included to make the claims falsifiable and to enable comparable evaluation of future implementations. The contribution is a new runtime security abstraction with testable conformance properties, not a benchmark of one particular system.

1.2 Design Principles

OATS is built on three architectural convictions, each addressing a structural weakness in current approaches.

Core Thesis

Define what is permitted and make everything else structurally inexpressible, rather than trying to enumerate and block what is dangerous.

Allow-list over deny-list

Current runtime security approaches operate on a deny-list model: the agent formulates an action, a security system intercepts it, evaluates it, and decides whether to allow or block. This requires enumerating dangerous behavior — an enumeration that is incomplete by definition. OATS inverts this model. The agent fills typed parameters defined by a declarative tool contract. The runtime validates parameters against the contract, constructs the invocation from a template, and executes. The agent never generates raw commands or constructs unconstrained API calls. Within the scope of contracted tools, dangerous actions cannot be expressed because the interface does not permit them. Actions that bypass the contract layer entirely (e.g., direct network calls from compromised agent code) require complementary controls such as sandboxing (Section 10).

Compile-time over runtime enforcement

When enforcement correctness is verified only at runtime, a code change that introduces a path bypassing the policy engine goes undetected until that path is exercised. OATS addresses this through the Observe-Reason-Gate-Act (ORGA) cycle, which uses type-level programming (typestates) so that skipping the Gate phase, dispatching tools without reasoning first, or observing results without dispatching are compile-time errors. In a correctly implemented typestate, the type system enforces that every action passes through policy evaluation. This property holds for code paths within the typestate-governed loop; it does not extend to code that circumvents the loop entirely, which is why sandboxing and network isolation provide complementary enforcement.

Structural independence over trust assumptions

When the policy engine shares context, memory, and execution environment with the orchestration layer it governs, an LLM compromised through prompt injection can potentially influence the evaluation of its own actions. In OATS, the Gate phase receives a structured action proposal and evaluates it against policy using a formally verifiable policy engine. The LLM cannot modify, bypass, or influence the Gate's evaluation.

1.3 Contributions

This specification makes six contributions:

  1. Typestate-enforced reasoning loop. The ORGA cycle with compile-time phase enforcement, designed to prevent policy evaluation from being skipped, circumvented, or reordered within the loop (Section 5).
  2. Allow-list tool contracts. A declarative tool contract format that constrains agent-tool interaction to typed, validated parameters, making dangerous actions structurally inexpressible (Section 6).
  3. Layered cryptographic identity. A bidirectional identity stack providing mutual authentication between agents and tools via domain-anchored cryptographic verification (Section 7).
  4. Hash-chained audit journals. Cryptographically signed, hash-chained event journals for tamper-evident forensic reconstruction (Section 9).
  5. Conformance requirements. Minimum requirements for OATS-compliant systems, enabling objective evaluation of implementations (Section 12).
  6. Initial empirical evaluation. Initial results from the symbiont-orga-demo reference corpus and three companion preprints validate five of seven core conformance requirements. The corpus measures attack-suite refusal rates, fence non-redundancy via stack-stripping ablation, runtime overhead across nine widely available hosted LLMs, and — via the substrate-comparison preprint — the marginal contribution of structural enforcement relative to OS-isolation alternatives (Section 14.7).

OATS's novelty is not any single component in isolation — typestates, policy engines, cryptographic signatures, audit logs, and sandboxing each have extensive prior art. The contribution is the integration of five layers into a unified runtime security model centered on consequential action execution, with three properties not found in prior work in combination: (a) expressibility constraints that eliminate action categories before policy evaluation, (b) compile-time enforcement that the policy gate executes on every dispatch path within the loop, and (c) bidirectional cryptographic identity binding actions to verified agents and verified tools. The conformance requirements formalize these properties into testable criteria, enabling objective comparison across implementations.

Section 03

Problem Formalization

This section formalizes the system model, action definitions, and security objectives that the remainder of the specification builds upon.

3.1 System Model

Let an AI-enabled application \(\mathcal{A}\) consist of:

  • An orchestration layer \(O\) (agent framework, workflow engine, or application code) that interprets user requests and invokes tools. \(O\) includes the LLM, prompt templates, memory systems, and control flow logic. Crucially, \(O\) processes untrusted inputs and cannot be assumed to behave as intended.
  • A set of tools \(T = \{t_1, t_2, \ldots, t_n\}\), where each tool \(t_i\) exposes operations producing effects on external systems.
  • An identity context \(I\) comprising four layers: human principal, service identity, agent/session identity, and role/privilege scope.
  • An environment \(E\) including data stores, APIs, cloud services, and enterprise systems.
  • A session context \(C\) that accumulates state over the course of an interaction.

3.2 Action Definition

An action \(a\) is a discrete operation the agent requests against a tool:

$$a = (t,\ op,\ p,\ id,\ ctx,\ ts)$$

where \(t \in T\) is the target tool, \(op\) is the specific operation, \(p\) is the parameter set, \(id \in I\) is the identity context, \(ctx \in C\) is the accumulated session context, and \(ts\) is the timestamp.

3.3 Tool Contract Definition

A tool contract \(\kappa\) defines the complete behavioral interface for a tool:

$$\kappa = (name,\ \Pi,\ \tau,\ \sigma_{out},\ \mu,\ \rho)$$

where \(\Pi = \{\pi_1, \ldots, \pi_m\}\) is the set of typed parameter definitions, \(\tau\) is the invocation template, \(\sigma_{out}\) is the output schema, \(\mu\) is the policy metadata (resource, action), and \(\rho\) is the risk tier.

Each parameter definition \(\pi_i = (name_i,\ type_i,\ V_i,\ req_i)\) specifies the parameter name, its type from the type system \(\mathcal{T}\), validation constraints \(V_i\), and whether it is required. The type system \(\mathcal{T}\) provides:

$$\mathcal{T} = \{\textit{string},\ \textit{integer},\ \textit{boolean},\ \textit{enum},\ \textit{scope\_target},\ \textit{url},\ \textit{path},\ \textit{ip\_address},\ \textit{cidr},\ \textit{port}\}$$

Each type \(\tau \in \mathcal{T}\) has an associated validation function \(v_\tau : \text{Value} \to \{valid, invalid\}\) and a sanitization function \(s_\tau : \text{Value} \to \text{Value}\) that strips dangerous characters.

3.4 Constrained Action Formulation

Under OATS, the agent does not formulate arbitrary actions. Instead, the agent proposes a parameterized invocation:

$$a' = (t,\ op,\ p')$$

where \(p' = \{(name_i, val_i)\}\) maps parameter names to values. The runtime validates each value:

$$\forall (name_i, val_i) \in p' :\ v_{type_i}(val_i) = valid$$

If validation succeeds, the runtime constructs the executable action from the template:

$$a_{exec} = \tau(p')\ \text{where } \tau \text{ is the invocation template}$$

The agent never sees or constructs \(a_{exec}\). This is the allow-list property: the space of expressible actions is constrained to \(\{a_{exec} : a_{exec} = \tau(p'),\ \forall (name_i, val_i) \in p',\ v_{type_i}(val_i) = valid\}\).

3.5 Context Accumulation

Session context accumulates across actions:

$$C_n = C_{n-1} \cup \{a_n,\ o_n,\ \delta_n\}$$

where \(C_n\) is the context after action \(n\), \(a_n\) is the action, \(o_n\) is its output, and \(\delta_n\) represents derived signals including data classification, semantic distance from the original request, scope expansion indicators, entity references, and confidence level.

3.6 Policy Structure

A policy \(\pi \in \Pi\) maps an action-context-identity triple to an authorization decision:

$$\pi : (a,\ C,\ I) \to \{\textit{ALLOW},\ \textit{DENY},\ \textit{MODIFY},\ \textit{STEP\_UP},\ \textit{DEFER}\}$$

Each policy consists of a match predicate \(m(a, C, I) \to \{true, false\}\), a decision \(d\), a priority \(p \in \mathbb{N}\), and an optional modification function \(f(a) \to a'\) applied when \(d = \textit{MODIFY}\).

3.7 Security Objectives

An OATS-compliant runtime MUST ensure that for all actions \(a\):

  1. Structural constraint. \(a\) is expressible only through a valid tool contract \(\kappa\).
  2. Pre-execution interception. \(a\) is intercepted and evaluated before any effects occur.
  3. Compile-time gate guarantee. All code paths from action proposal to tool dispatch pass through the Gate phase; this is verified at compile time.
  4. Policy compliance. \(a\) satisfies organizational policy \(\Pi\) given context \(C\) and identity \(I\).
  5. Context-aware evaluation. \(a\) is evaluated against both static policy and accumulated session context.
  6. Identity verification. Both the agent invoking a tool and the tool being invoked are cryptographically verified.
  7. Forensic completeness. Every action, its context, the policy decision, and the execution outcome are recorded in a tamper-evident journal.
Section 04

Threat Model

Fundamental Assumption

The AI orchestration layer O cannot be trusted as a security boundary. The model processes untrusted inputs through opaque reasoning, producing actions that may serve attacker goals rather than user intent.

4.1 Threat Summary

The table below summarizes the primary threats, their attack vectors, and the OATS controls that mitigate them.

Threat Attack Vector OATS Control
Prompt injectionUser input, documents, tool outputs, imagesTool contracts (structural), policy enforcement, context-dependent deny
Malicious tool outputsAdversarial tool responsesPost-tool action restrictions, context tracking, output schema validation
Confused deputyAmbiguous/malicious instructionsBidirectional identity verification, step-up approval, intent alignment
Over-privileged credentialsExcessive token scopesLeast-privilege enforcement, scoped credentials
Data exfiltrationAction compositionContext accumulation, compositional policies, scope enforcement
Goal hijackingInjected objectivesAction-level policy, semantic distance tracking
Intent driftAgent reasoning divergenceContext accumulation, semantic distance threshold, deferral
Memory poisoningPersistent context manipulationProvenance tracking, anomaly detection, journal comparison
Tool supply chainTampered contracts, spoofed toolsCryptographic tool integrity verification, TOFU pinning
Cross-agent propagationMulti-agent delegationCross-agent context, transitive trust limits, blast-radius containment

4.2 Attack Lifecycle

Attacks against AI agents typically follow a four-stage lifecycle: (1) injection — attacker embeds malicious instructions in content the agent processes; (2) hijacking — the agent interprets malicious content as legitimate instructions; (3) execution — the agent invokes tools with attacker-controlled parameters; (4) impact — actions produce irreversible effects. OATS intervenes at two points: between stages 2 and 3 (the Gate blocks actions that violate policy), and before stage 3 begins (tool contracts constrain what parameters the agent can express).

Intent drift follows a different lifecycle without explicit injection. The agent's reasoning gradually diverges through plausible-seeming steps. OATS detects this through context accumulation and semantic distance tracking regardless of cause.

4.3 Trust Assumptions

Trusted
  • The OATS runtime (ORGA loop, policy engine, tool contract executor, journal, identity verifier)
  • Cryptographic primitives and key management
  • The policy store and authoring process
  • The underlying infrastructure (OS, network, hardware)
  • The compiler and type system
Untrusted
  • The AI model and its outputs
  • The orchestration layer
  • User inputs and prompts
  • Tool outputs and retrieved data
  • External documents, emails, web content, multi-modal inputs
  • Agent memory and conversation history
  • Tool contracts from unverified publishers
Partially Trusted
  • Tool implementations (OATS constrains invocation but cannot prevent internal bugs)
  • Human approvers (OATS routes step-up authorization but cannot prevent social engineering)
  • Verified tool contracts (verified as untampered, but the tool itself may have vulnerabilities)

4.4 Out of Scope

OATS addresses runtime action security. The following threats require complementary controls: model training data poisoning or weight manipulation (pre-deployment ML security); denial of service against the OATS runtime (infrastructure availability); physical or infrastructure-level attacks (physical security); social engineering of human approvers (security awareness training); code-level vulnerabilities within tool implementations (application security testing); memory storage security (separate storage controls). OATS is one layer in a defense-in-depth strategy.

Section 05

Core Architecture: The ORGA Loop

The ORGA (Observe-Reason-Gate-Act) loop is the core execution engine for OATS-compliant agent runtimes. It drives a multi-turn cycle between an LLM, a policy gate, and external tools through four mandatory phases.

5.1 Phase Definitions

Observe

Collect results from previous tool executions. Incorporate tool outputs, error messages, policy denial feedback, and environmental signals into the agent's context. This phase also integrates knowledge retrieval (RAG-enhanced context) when available.

Reason

The LLM processes accumulated context and produces proposed actions (tool calls or text responses). The LLM sees tool definitions but never sees raw invocation details. The LLM's output is a structured proposal, not an executable action.

Gate

The policy engine evaluates each proposed action. This phase operates entirely outside LLM influence. The Gate receives the proposed action, the accumulated session context, and the agent's identity, and evaluates them against organizational policy. The Gate produces one of five decisions: Allow, Deny, Modify, Step-Up (pause for human approval), or Defer (temporarily suspend pending additional context).

Act

Approved actions are dispatched to tool executors. The tool contract executor validates parameters against the contract's type system, constructs the invocation from the contract's template, executes with timeout enforcement, captures output in a structured evidence envelope, and records the execution in the audit journal.

5.2 Typestate Enforcement

Phase transitions MUST be enforced at compile time using type-level programming (typestates). Each phase is a distinct type. The transition from Reason to Act without passing through Gate MUST be a type error, not a runtime check.

AgentLoop<Reasoning>    -- produce_output() -->  AgentLoop<PolicyCheck>
AgentLoop<PolicyCheck>  -- check_policy()  -->  AgentLoop<ToolDispatching>
AgentLoop<ToolDispatching> -- dispatch()   -->  AgentLoop<Observing>
AgentLoop<Observing>    -- observe()       -->  AgentLoop<Reasoning> | LoopResult

The following are compile-time errors:

  • Skipping the policy check (Reasoning to ToolDispatching)
  • Dispatching tools without reasoning (PolicyCheck to Observing)
  • Observing results without dispatching (Reasoning to Observing)

Implementations in languages without native typestate support MUST provide equivalent guarantees through runtime enforcement with 100% path coverage testing and formal verification that all tool dispatch paths pass through the Gate.

5.3 Dynamic Branching and Termination

The only dynamic branch in the ORGA loop is after Observe: the loop either continues (returning to Reason) or completes (producing a final result). This is a standard pattern match on a concrete type, not dynamic dispatch. All other transitions are strictly linear.

The loop terminates when the LLM produces a final text response, iteration limits are reached, token or time budgets are exhausted, or a circuit breaker trips.

5.4 Policy Denial Feedback

When the Gate denies an action, the denial reason MUST be fed back to the LLM as an observation. This allows the LLM to adjust its approach. The Gate evaluates each subsequent proposal independently; denials are not negotiable.

5.5 Scope of Assurance

Typestate enforcement provides a specific, bounded property. To prevent overinterpretation, we state exactly what is and is not covered.

What typestate enforcement covers

Within the ORGA loop, the type system enforces that every transition from action proposal to tool dispatch passes through the Gate phase. In a Rust implementation, this is a compile-time property: any code path that attempts AgentLoop<Reasoning>AgentLoop<ToolDispatching> without consuming an intermediate AgentLoop<PolicyCheck> is rejected by the compiler.

Proof sketch. Let R, P, D, O denote the Reasoning, PolicyCheck, ToolDispatching, and Observing phases. Each phase is a distinct zero-sized type. The only method consuming R produces P; the only method consuming P produces D; the only method consuming D produces O; and O produces either R (continue) or a terminal value (complete). Because each method takes self by value (consuming the prior state), no valid Rust program can hold two phase values simultaneously or skip a phase. The compiler's ownership and move semantics enforce this without runtime checks. This argument depends on the type signatures being correctly declared; it does not require trust in runtime behavior.

What typestate enforcement does not cover

The property applies only to code paths mediated by the AgentLoop runner. It does not provide whole-program non-bypass assurance. Specifically: (a) agent code that invokes tools through a separate code path not mediated by the ORGA runner is unconstrained by the typestate; (b) plugins, FFI calls, or dynamically loaded code may bypass the loop; (c) network-level tool invocations from within the sandbox are not mediated by the type system. These residual risks are addressed by sandboxing (Section 10) and network isolation as defense-in-depth layers, not by the typestate itself. An OATS-compliant deployment SHOULD combine typestate enforcement with at least one complementary isolation mechanism.

5.6 Gate Independence

The Gate phase is designed to operate outside LLM influence. This section specifies what “outside LLM influence” means concretely and what an implementation must demonstrate.

Structural requirements

The Gate MUST receive input as a typed, serialized data structure (e.g., a Rust struct, a JSON object conforming to a fixed schema) containing: tool name, operation, validated parameters, agent identity, and accumulated session context. The Gate MUST NOT receive natural language strings, LLM reasoning traces, or any content that requires language interpretation. The Gate MUST NOT share mutable memory, mutable references, or writable state with the LLM inference component. The Gate MUST NOT expose a callback, hook, or API that the LLM can invoke to modify Gate behavior during evaluation.

Implementation patterns

Conformant implementations may achieve Gate independence through any of the following mechanisms, listed in decreasing order of isolation strength: (a) separate process with IPC serialization boundary; (b) separate thread with immutable message passing and no shared mutable state; (c) synchronous function call with typed struct input, no closures capturing LLM state, and no interior mutability accessible from the LLM component. Pattern (a) provides the strongest isolation. Pattern (c) is acceptable when the implementation can demonstrate (via code review or static analysis) that no shared mutable path exists.

Verification

Conformance requirement C6 (Section 12) defines the verification procedure: inspect the Gate implementation, verify inputs are typed structs, verify no shared mutable references, verify no dynamic code paths parameterized by LLM output.

Figure 1 · ORGA loop
L1 · PHASE 01 Observe L1 · PHASE 02 Reason L1 · PHASE 03 · GATE Gate outside LLM influence L1 · PHASE 04 Act LLM REGION POLICY REGION · ISOLATED observe() produce_output() check_policy() → dispatch() result DENY · denial reason fed back as observation
Scenario Step
active transition deny feedback

ORGA loop with typestate-enforced phase transitions. The Gate is mandatory and structurally independent of the LLM. Denied actions produce feedback observations; approved actions proceed to tool execution. Skipping Gate is a compile error, not a runtime bug.
Section 06

Tool Contract Layer

6.1 The Allow-List Principle

OATS inverts the conventional sandbox model. Rather than allowing the LLM to generate arbitrary actions and then intercepting them for post-hoc evaluation, the allow-list model has the LLM fill typed parameters that the executor validates against the contract before constructing the invocation from a template. Dangerous actions are structurally inexpressible because the interface does not permit them.

  • Deny-list (sandbox): LLM generates an arbitrary action. The security system intercepts, evaluates, and decides whether to allow or block. Risk: novel bypass escapes deny rules.
  • Allow-list (tool contract): LLM fills typed parameters. The executor validates against contract, constructs from template, and executes. Injection cannot form an action; shell metacharacters are rejected by the type system.

6.2 Contract Requirements

A tool contract κ MUST define:

  1. Typed parameters. Each parameter has a declared type from the type system Т with validation constraints. All string-based types MUST reject shell metacharacters (;|&$\`\\(){}[]<>!) by default.
  2. Invocation mechanism. Command template, HTTP request template, protocol server address, or interactive session definition. The LLM never constructs invocation details.
  3. Output schema. Expected structure of tool output. The executor validates parsed output before returning results to the agent.
  4. Policy metadata. Policy resource and action declarations enabling authorization without parsing tool-specific details.
  5. Risk tier. Risk classification (low, medium, high, critical) informing default policy generation and step-up thresholds.

6.3 Execution Modes

Tool contracts SHOULD support three execution modes sharing a common governance layer:

Mode Description Governance
OneshotSingle invocation, return resultsPer-invocation Gate evaluation
SessionRunning process (PTY), per-interaction validationPer-interaction Gate evaluation
BrowserGoverned browser (CDP/Playwright), scoped navigationPer-action Gate evaluation

6.4 Contract Integrity

Tool contracts MUST support cryptographic integrity verification. Signatures MUST cover the entire contract — parameters, validation rules, invocation templates, output schemas, and scope constraints. A contract failing verification MUST be rejected.

6.5 Schema Generation

Tool contracts SHOULD support automatic generation of protocol-compatible schemas (e.g., MCP inputSchema and outputSchema) from the contract definition. The LLM understands tool capabilities through generated schemas without the contract format being exposed.

6.6 Content Sanitization · new in v1.3.0

Argument validation against shell metacharacters (§6.2) defends action-shape attacks but does not address all content-shape attacks: string parameters that pass type validation while carrying invisible or visually-deceptive characters intended to alter the semantics of downstream prompts, journals, or human review. Empirical evidence from the reference implementation and the substrate-comparison sweep establishes that content-shape attacks require a dedicated fence; structural action fences cannot defend them.

Sanitization requirement (SHOULD). Tool contract executors and journal writers SHOULD sanitize agent-influenced string fields before they reach the agent's reasoning context, the journal, or downstream tools. At minimum, sanitization SHOULD remove:

  • ASCII control characters (C0 / DEL except \t, \n, \r where context-appropriate)
  • C1 controls (U+0080..=U+009F)
  • Zero-width characters (U+200B..=U+200F, U+2060..=U+2064)
  • Bidi overrides (U+202A..=U+202E, U+2066..=U+2069)
  • Word-joiner / invisible-operator block (U+2060..=U+2064)
  • Byte-order marks (U+FEFF)
  • Variation selectors (U+FE00..=U+FE0F, U+E0100..=U+E01EF)
  • Unicode Tag block (U+E0000..=U+E007F)
  • Soft hyphen (U+00AD)

NFKC normalization (SHOULD). Sanitization SHOULD apply NFKC normalization to defend against fullwidth and math-alphanumeric homoglyph bypasses, and SHOULD flag scripts mixed in unusual combinations (e.g., Latin + Cyrillic identifiers) for elevated scrutiny.

Trusted vs. untrusted strings. Sanitization applies to agent-influenced fields — strings whose value can be controlled by the LLM or by content the LLM has read. Runtime-internal strings (e.g., observations the runtime itself constructs, identity tokens, journal sequence numbers) are out of scope.

Reference implementation. The Symbiont reference implementation provides this functionality via the symbi-invis-strip crate (v0.3.0 as of v1.14.0), including a sanitize_field_with_markup variant that additionally strips HTML comment blocks and triple-backtick fenced blocks on surfaces where renderer-hidden markup has no legitimate use.

Scope and limits. Content sanitization is a defense-in-depth fence, not a complete defense. Section 15 notes the known ceiling against capable models that can construct content-shape attacks within the sanitized character set. Sanitization MUST NOT be the only defense against content-shape attacks; it complements policy evaluation, output schema validation, and (where applicable) downstream task graders.

Figure 2 · Deny-list vs. allow-list
Parameter
input →
Deny-list (sandbox intercepts)
Allow-list (tool contract)
Deny-list vs. allow-list enforcement. In the allow-list model, the space of expressible actions is constrained by the tool contract before any policy evaluation occurs. Try the scenarios above to see why novel bypasses fall through a deny-list but cannot form under an allow-list at all.
Section 07

Identity Layer

7.1 The Identity Problem

When AI agents interact with tools, services, and other agents, identity is typically self-asserted. An agent claims to be "Scout v2 from Tarnover LLC" with no way for the receiving party to verify that claim. Self-asserted identity provides no security guarantee: agents can be impersonated, tools can be spoofed, and delegation claims cannot be verified.

OATS specifies a two-layer cryptographic identity stack that addresses both directions of the trust problem.

7.2 Tool Integrity Verification

An OATS-compliant runtime MUST support cryptographic verification of tool schemas and contracts:

  • Domain-anchored discovery. Tool publishers host public keys at well-known endpoints (e.g., /.well-known/ URIs per RFC 8615). No centralized registry required.
  • Signature verification. Tool contracts and schemas are signed with ECDSA P-256 (or equivalent). The runtime verifies signatures before registering tools.
  • Trust-On-First-Use (TOFU) key pinning. On first encounter, the runtime pins the publisher's key. Subsequent key changes require explicit trust decisions.
  • Revocation support. Publishers can revoke keys and schemas. The runtime checks revocation status before accepting tools.

7.3 Agent Identity Verification

An OATS-compliant runtime SHOULD support cryptographic agent identity verification:

  • Domain-anchored agent identity. Organizations publish verifiable identity documents for their agents at well-known endpoints.
  • Short-lived credentials. Agents are issued time-limited signed credentials (e.g., ES256 JWTs) declaring their identity, capabilities, and delegation chain.
  • Delegation chains. Agent credentials support maker-deployer delegation, where the organization that builds agent software and the organization that deploys it are independently verifiable.
  • Capability scoping. Agent credentials declare specific capabilities (e.g., read:data, write:reports), enabling verifiers to enforce least-privilege access.

7.4 Bidirectional Trust

The two identity layers create a bidirectional trust model:

  1. Agent verifies tool. Before invoking a tool, the agent's runtime verifies the tool's contract integrity.
  2. Tool verifies agent. Before accepting an invocation, the tool verifies the agent's identity.
  3. Policy evaluation. The runtime evaluates whether the verified agent's capabilities authorize it to use the verified tool.
  4. Audit recording. Both verifications and the policy decision are recorded in the cryptographic audit journal.

7.5 Cryptographic Agility and Algorithm Allowlisting · new in v1.3.0

Self-describing credential formats (JWTs, signed manifests, X.509 chains) routinely advertise the cryptographic algorithm used to sign them. An implementation that trusts the advertised algorithm without restriction inherits the vulnerability surface of every algorithm the verifier library happens to support, including algorithms with known weaknesses or library-side vulnerabilities (e.g., RUSTSEC-2023-0071 reachable through jsonwebtoken v10 on certain RSA paths).

Allowlist requirement (SHOULD). Identity verifiers in an OATS-compliant runtime SHOULD enforce a per-credential-class algorithm allowlist that rejects unlisted algorithms at the header-inspection guard and at the verification API. Verifiers SHOULD NOT rely solely on validation-side filtering, since that path can be reached only after header inspection has already accepted the credential class.

Default allowlists. For each credential class OATS specifies:

Credential class Allowed algorithms Refused
Asymmetric Bearer tokens (AgentPin, agent-to-agent JWTs)ES256, EdDSARS*, PS*, none, HS*
HMAC webhook / channel signaturesHS256All asymmetric algorithms, none
Tool contract signatures (SchemaPin)ECDSA P-256All other algorithms
Audit journal entry signaturesEd25519 (preferred) or ECDSA P-256All other algorithms

Implementations MAY support additional algorithms when documented, when the deployment context requires interoperability with an external system that mandates the algorithm (e.g., Microsoft Teams Bot Framework requires RS256 for inbound token validation), and when the trust scope of those tokens is bounded to that external system. Implementations MUST NOT silently fall back to refused algorithms.

Aud / iss / exp validation. All JWT-shaped credentials MUST be validated for aud, iss, and exp claims. Verifiers MUST NOT provide an environment-variable or runtime escape hatch to disable audience validation; deployment requirements that conflict with aud validation indicate a misconfiguration, not a need for a bypass.

Verification (E1 / E2). Conformance verification (Section 12) inspects the runtime's verifier configuration, attempts a signed token under each refused algorithm, and confirms refusal both at header inspection and at the algorithms field of the verification call.

Figure 3 · Bidirectional trust PRINCIPAL Agent scout / 2.1 L4 · GATE Policy structured input only RESOURCE Tool finance.transfer SchemaPin · verify AgentPin · verify → journal (L5)
Bidirectional trust. The agent verifies tool integrity (SchemaPin), the tool verifies agent identity (AgentPin), and the policy engine evaluates authorization. All three steps are recorded in the cryptographic audit journal. Hover or click any arrow for details.
Section 08

Policy Enforcement Layer

8.1 Policy Engine Requirements

An OATS-compliant policy engine evaluates the tuple (a, C, I) and produces an authorization decision. The engine:

  • MUST support five decisions: Allow, Deny, Modify, Step-Up, Defer.
  • MUST evaluate both static policy and accumulated session context.
  • MUST operate outside LLM influence: structured inputs, structured decisions, no natural language processing, no shared mutable state with the LLM.
  • SHOULD support a formally verifiable policy language (Cedar, OPA, or equivalent).
  • MUST default to deny.
  • MUST default to fail-closed construction: when the runtime is started without an explicitly wired policy backend, the default behavior MUST be to deny every ToolCall and Delegate action with an explicit reason, not to silently allow them. A permissive mode MAY be provided for local development but MUST require an explicit opt-in (CLI flag or environment variable) and MUST emit a visible warning on every evaluated action. The distinction between spec-level “default to deny” (a policy-evaluation property) and operational “fail-closed construction” (a runtime-initialization property) is intentional; both are required. [new in v1.3.0]

8.2 Action Classification

OATS classifies actions into five categories based on how they should be evaluated. The “structurally forbidden” category exists only in systems with allow-list tool contracts.

Category How Identified Evaluation Decision
Structurally forbiddenCannot be expressed via tool contractNone neededN/A (inexpressible)
Policy-forbiddenStatic policy matchStatic policy onlyDENY
Context-dependent denyPolicy allows, context misalignsStatic + contextDENY
Context-dependent allowPolicy denies, context alignsStatic + contextSTEP-UP / ALLOW
Context-dependent deferInsufficient/conflicting contextIndeterminateDEFER

8.3 Context Accumulation

The runtime MUST accumulate session context as an append-only, hash-chained log:

  • Original request. The user's initial instruction establishing intent.
  • Action history. Sequence of actions proposed, approved, denied, deferred, and executed.
  • Data classification. Sensitivity of information accessed. Default: highest configured level when unknown.
  • Tool outputs. Results from previous actions.
  • Semantic distance. Drift from original request (see 8.4).
  • Identity context. Verified identities of agent, user, and tools.

8.4 Semantic Distance Tracking

The runtime SHOULD compute semantic distance between actions and stated intent:

$$d(r_0, a_n) = 1 - \cos(\text{embed}(r_0),\ \text{embed}(a_n))$$

where r0 is the original request and an is the current action. Cumulative drift SHOULD be tracked across sequences. Thresholds are deployment-specific and SHOULD be calibrated empirically.

Semantic distance is a risk signal, not a primary authorization primitive

The hard authorization layer in OATS is the deterministic policy engine (Cedar, OPA, or equivalent) evaluating structured inputs. Semantic distance provides an advisory signal that the policy engine may consume as one input to step-up or defer decisions, but OATS-compliant runtimes SHOULD NOT rely on semantic distance as the sole basis for irreversible denial. This separation ensures the Gate's authorization decisions remain deterministic and reproducible even when the drift signal is heuristic.

8.5 Step-Up Authorization and Deferral

Step-up authorization: execution MUST block until approval; full context MUST be available to approvers; configurable timeouts MUST be enforced (deny on timeout); decisions MUST be recorded in the journal.

Deferral: deferred actions MUST remain paused without effects; the runtime MUST track deferred actions and maintain execution order; cascading deferrals MUST be bounded (deny when limit exceeded); deny on timeout MUST be the default; both deferral and resolution MUST be recorded.

8.6 Defense in Depth at the Action Layer: Two Independent Fences

Empirical results from stack-stripping ablation experiments on the reference implementation establish a previously under-emphasized architectural property: the action layer in an OATS-compliant runtime is two independent fences operating in series, not one. An OATS-compliant runtime SHOULD make this distinction explicit and verify both fences independently.

Fence A: the policy engine. Cedar (or equivalent) evaluates the tuple (a, C, I) against organizational policy and returns an authorization decision. This is the fence specified in Sections 8.1 through 8.5. It reasons over typed action proposals and accumulated session context.

Fence B: the executor profile-of-one. Each principal's tool executor (ActionExecutor or its equivalent) MUST maintain a static handler-map declaring exactly the set of tool names the principal is permitted to dispatch. Before dispatching any approved action, the executor MUST verify that the action's tool name is a member of the principal's declared profile. Tool names not in the profile MUST be refused even if the policy engine permits them. This refusal is independent of policy evaluation: it is a name-membership check against a static set.

Why two fences. The two fences operate at different abstraction levels and fail in different ways. Cedar evaluates policies whose correctness depends on the policy author anticipating attack patterns; the executor profile-of-one evaluates against a static name list that is enumerable, reviewable, and amenable to formal analysis. Empirical ablation showed that with Cedar disabled in a 434-row tool-confusion sweep, the executor profile-of-one fence still refused 219 out-of-profile dispatches — a result the runtime would not have achieved with a single-fence action layer. The independence of the two fences is what produces this defense-in-depth property.

Conformance implication. Conformance requirement C3 (Section 12) is satisfied by implementing either fence; conformance to the defense-in-depth property requires both. An OATS Extended-conformant implementation SHOULD implement both fences and document the independence guarantee. The two fences are particularly valuable in deployments where the policy engine is dynamically configurable (and thus may be misconfigured) while the principal tool profiles are statically declared at compile time or deployment time.

Figure 4 · Action classification pipeline
Trace an action
L2 · CONTRACT Type check typed · sanitized L4 · STATIC Policy rules RBAC · ABAC L4 · CONTEXT Context eval history · drift ALLOW STEP-UP DEFER DENY REJECTED · inexpressible
Action classification pipeline. Tool contract validation eliminates structurally forbidden actions before the policy engine is reached. Static policy handles policy-forbidden cases; context-dependent evaluation handles the remaining categories using accumulated session context.
Section 09

Audit Layer

9.1 Journal Requirements

An OATS-compliant runtime MUST maintain a cryptographic audit journal recording all events in the ORGA loop. The journal is the authoritative record of what happened, when, why, and by whose authority.

9.2 Event Types

Event When Content
LoopStarted Loop begins Configuration, agent identity, original request
ReasoningComplete After LLM response, before Gate Proposed actions, token usage
PolicyEvaluated After Gate decision Actions evaluated, decisions, matching policies, reasons
ToolsDispatched After tool execution Tools invoked, parameters, duration, evidence hashes
ObservationsCollected After collecting results Observation count, context size
LoopTerminated Loop ends Reason, iterations, total usage, duration
RecoveryTriggered On tool failure Strategy, error context

9.3 Cryptographic Properties

Each journal entry MUST include:

  • Ed25519 signature (or equivalent; ECDSA P-256 also acceptable). The signature covers the canonical serialization of the entry contents.
  • Hash chain link. Each entry includes the cryptographic hash of the previous entry, forming an append-only chain that detects retroactive modification.
  • Timestamp. Cryptographic timestamp for temporal ordering.

Journal entries MUST be verifiable offline.

9.4 Evidence Envelopes

Tool executions MUST produce structured evidence envelopes containing: tool name and version, validated parameters, constructed invocation, duration and exit status, output hash (SHA-256), policy decision that authorized execution, and agent and user identity at time of execution.

9.5 Compliance Properties

The journal provides infrastructure that can contribute to regulatory compliance, though OATS alone is not sufficient for any regulatory framework. Specifically: the journal can serve as a component of HIPAA audit trails (recording health data access with identity and authorization); SOC2 evidence collection (recording policy enforcement decisions); SOX audit trail requirements (recording attributable financial system actions); and GDPR accountability mechanisms (recording data access patterns). In each case, the journal addresses the technical recording requirement but does not address the organizational, procedural, or legal requirements of the applicable regulation.

9.6 Redaction of Sensitive Parameters

Journal entries describe tool dispatches, including their validated parameters. Some of those parameters carry secrets (API keys, tokens, passwords) whose presence in a long-lived audit log creates risk disproportionate to their auditing value. Tool contracts SHOULD declare which parameters are sensitive (via a sensitive_params annotation or equivalent), and journal writers SHOULD substitute a fixed redaction sentinel (e.g., "[REDACTED]") for those parameters' values when writing entries. The parameter names and the fact of dispatch MUST still be recorded; only the values are redacted. Evidence-envelope output hashes remain over the original parameters so verification still works for non-secret parameter fields.

Figure 5 · Hash-chained audit journal
Integrity Click any entry to tamper
Hash-chained audit journal. Each entry includes the cryptographic hash of the previous entry and an Ed25519 signature, forming a tamper-evident chain verifiable offline. Tampering with any entry breaks the chain at that link and every link after it.
Section 10

Sandboxing and Isolation

10.1 Multi-Tier Sandboxing

An OATS-compliant runtime SHOULD support multiple sandboxing tiers:

  • Tier 1: Container isolation. Agent execution within container boundaries with resource limits, network restrictions, and filesystem isolation.
  • Tier 2: Kernel-level isolation. Agent execution within a user-space kernel (e.g., gVisor) providing syscall filtering without full virtualization overhead.
  • Tier 3: Microkernel isolation. Agent execution within a lightweight VM (e.g., Firecracker) providing hardware-level isolation with minimal overhead.

10.2 Resource Limits

Regardless of sandboxing tier, agent execution MUST support configurable resource limits: token budget, time budget, iteration budget, tool call budget, network restrictions, and filesystem restrictions.

10.3 Circuit Breakers

Tool executions SHOULD be protected by circuit breakers. When a tool fails repeatedly, the circuit breaker trips and subsequent calls are rejected without execution until the circuit resets.

Section 11

Inter-Agent Communication

11.1 Communication Governance

When agents communicate with other agents, all inter-agent messages MUST pass through a communication policy gate. The gate evaluates authorization rules on communication primitives (ask, delegate, send, parallel, race) before execution.

11.2 Message Security

Inter-agent messages MUST be cryptographically signed (Ed25519 or equivalent), encrypted (AES-256-GCM or equivalent), and attributed to verified agent identities.

11.3 Delegation Constraints

Delegation chains MUST be bounded: maximum delegation depth (configurable), capability narrowing (a delegated agent cannot exceed the delegating agent's capabilities), and blast-radius containment.

11.4 Cross-Agent Context

When an agent delegates to another agent, the session context SHOULD be propagated to the downstream agent, enabling the downstream agent's Gate to evaluate actions against the original intent.

11.5 Distributed Trace Context · new in v1.3.0

Multi-agent workflows that span runtime boundaries are difficult to investigate when journal entries from each runtime cannot be stitched into a single causal chain. To enable forensic reconstruction across runtimes, OATS-compliant runtimes SHOULD propagate W3C Trace Context across agent boundaries:

  • Outbound inter-agent messages SHOULD include a traceparent header carrying the trace ID and parent span ID.
  • Inbound messages SHOULD extract traceparent and bind the resulting span to the receiving agent's journal entries.
  • Cron-scheduled and heartbeat-triggered agent invocations SHOULD originate a new trace and record the trace ID in their LoopStarted event.

Trace context propagation MUST NOT be a load-bearing security primitive; it serves observability and forensic reconstruction, not authorization. The hard cryptographic identity binding remains §11.2 and §7.

Section 12

Conformance Requirements

The requirement language follows RFC 2119: MUST indicates absolute requirements; SHOULD indicates recommendations that may be omitted with documented justification.

12.1 Conformance Levels

OATS Core (all MUST requirements, C1–C7): Baseline zero-trust agent execution.

OATS Extended (all MUST and SHOULD requirements, C1–C7 + E1–E9): Comprehensive zero-trust with identity, sandboxing, content sanitization, and advanced policy.

12.2 Core Requirements (MUST)

C1: ORGA Loop Enforcement. The runtime MUST implement the four-phase ORGA loop. The Gate MUST execute before every tool dispatch. In compiled languages, phase transitions MUST be enforced at compile time via typestates. In interpreted languages, equivalent enforcement MUST be provided and documented, with acknowledgment of residual risk per Section 5.5.
Verification: Attempt to construct a code path from Reason to Act bypassing Gate. In a typestate implementation, this MUST be a compile error. In runtime-enforced implementations, this MUST be caught by a verified test suite with 100% tool dispatch path coverage.

C2: Tool Contract Support. The runtime MUST support declarative tool contracts with typed parameter validation. The LLM MUST NOT generate raw tool invocations. All invocations MUST be constructed from validated parameters and contract-defined templates.
Verification: Submit parameters containing shell metacharacters (;, |, &, \`). Verify rejection. Submit parameters outside declared type constraints. Verify rejection. Verify the LLM never receives raw invocation strings in any code path.

C3: Policy Evaluation. The runtime MUST evaluate actions against policy before execution. The policy engine MUST operate outside LLM influence. MUST support Allow, Deny, Modify, Step-Up, and Defer decisions. Default MUST be deny.
Verification: Configure a DENY policy. Submit a matching action. Verify no effects on the target system. Verify denial recorded in journal with matching policy and reason. Repeat for each decision type.

C4: Context Accumulation. The runtime MUST accumulate session context across actions. Context MUST include original request (when available), action history, and data classification.
Verification: Execute a sequence of three or more actions. Verify the policy engine receives accumulated context for each subsequent action. Verify context includes prior actions and their data classifications.

C5: Cryptographic Audit Journal. The runtime MUST maintain a hash-chained, cryptographically signed audit journal recording all ORGA loop events. Entries MUST be verifiable offline.
Verification: Generate journal entries for allowed, denied, deferred, and step-up actions. Verify all fields present, signatures valid, and hash chain intact. Tamper with one entry and verify chain verification detects the modification.

C6: Gate Independence. The Gate MUST operate on structured inputs only. It MUST NOT process natural language, share mutable state with the LLM, or be influenced by the LLM's reasoning.
Verification: Inspect Gate implementation. Verify inputs are typed structs (tool name, operation, parameters, identity, context), not natural language strings. Verify no shared mutable references between the Gate and the LLM inference component. Verify no dynamic code paths within the Gate that are parameterized by LLM output.

C7: Evidence Envelopes. Tool executions MUST produce structured evidence envelopes with output hashes, execution metadata, and identity binding.
Verification: Execute a tool. Verify the envelope contains tool name, version, validated parameters, constructed invocation, duration, exit status, SHA-256 output hash, authorizing policy decision, and agent/user identity.

12.3 Extended Requirements (SHOULD)

  • E1: Tool Integrity Verification. Verify tool contract signatures using domain-anchored cryptographic verification with TOFU key pinning. Verifiers SHOULD enforce a per-credential-class algorithm allowlist (§7.5) that rejects unlisted algorithms at both the header-inspection guard and the verification API.
  • E2: Agent Identity Verification. Verify agent identity using domain-anchored ES256 credentials with delegation chain support. JWT verifiers SHOULD reject none, RS*, PS*, and HS* algorithms on asymmetric paths (§7.5) and SHOULD validate aud, iss, and exp claims unconditionally.
  • E3: Semantic Distance Tracking. Compute and track semantic distance between actions and stated intent using embedding similarity.
  • E4: Multi-Tier Sandboxing. Support configurable sandboxing tiers (container, kernel-level, microVM).
  • E5: Inter-Agent Communication Governance. Enforce authorization policies on inter-agent communication with signed and encrypted messages. SHOULD propagate W3C Trace Context (§11.5) for cross-runtime forensic reconstruction.
  • E6: Telemetry Export. Export structured telemetry (OCSF, CEF, or documented custom schemas) with real-time streaming.
  • E7: Formally Verifiable Policies. Use a policy language enabling static analysis of correctness (Cedar, OPA, or equivalent).
  • E8: Least-Privilege Credential Scoping. Support just-in-time credential issuance with operation-specific scoping and logged usage.
  • E9: Content Sanitization [new in v1.3.0]. Sanitize agent-influenced string fields before they reach reasoning context, the journal, or downstream tools per §6.6. At minimum SHOULD remove ASCII C0/DEL, C1, zero-width, bidi-override, word-joiner, BOM, variation-selector, Unicode Tag block, soft hyphen, and combining-diacritical characters; SHOULD apply NFKC normalization and flag unusual script mixing.
    Verification: Submit each of the following payloads as a string parameter and verify the sanitized output, the journal entry, or both contain no instance of the hostile character class: (a) U+200B-padded content; (b) U+202E bidi-override prefix; (c) U+E0000 Unicode-tag-encoded content; (d) NFKC-decomposable fullwidth Latin letters; (e) Latin+Cyrillic mixed-script identifier with at least one Cyrillic confusable.
Section 13

Implementation Architectures

OATS does not mandate a specific implementation architecture. The specification intentionally separates conformance properties from implementation details and includes four deployment patterns to reduce dependence on any single implementation. The current specification is informed by one reference implementation (Symbiont); independent conformance testing across additional implementations is needed to validate that the specification is sufficiently general.

Property Self-Hosted Runtime Plugin/Extension Gateway Vendor Integration
You controlEverythingAgent codeNetworkPolicy only
EnforcementORGA typestateDual-layerNetwork proxyVendor hooks
Bypass resistanceVery highHighHighVendor-dependent
Context richnessFullFull (inner)LimitedVendor-dependent
Tool contractsFullFull (outer)PartialVendor-dependent
IdentityFullFull (outer)PartialVendor-dependent
OATS-conformantYesYesPartialIf hooks sufficient

13.1 Self-Hosted Runtime

The full OATS stack deployed as a single runtime. Provides the strongest available enforcement properties: compile-time phase enforcement, full context visibility, cryptographic identity, and multi-tier isolation. Natural home in systems-level languages with rich type systems.

13.2 Plugin/Extension Model

For agents running inside third-party platforms. An inner layer (plugin) provides awareness; an outer layer (OATS runtime wrapping the platform via CLI executor or container) provides enforcement. Because the outer ORGA Gate mediates all tool invocations at the process boundary, the inner platform cannot bypass it through normal operation. Side-channel bypasses (e.g., direct network calls from within the sandbox) require complementary network-level controls.

13.3 Gateway Architecture

For protocol-based tool invocations (MCP, REST). An OATS-compliant gateway intercepts traffic between agents and tools, implementing the Gate, context accumulation, and journaling. Provides enforcement without agent modification, at the cost of reduced context visibility.

13.4 Vendor Integration

For SaaS agents where organizations control no infrastructure. Requires vendor-provided synchronous pre-execution hooks, decision enforcement, context availability, and receipt export. OATS provides the specification for vendor evaluation and contracts.

Section 14

Evaluation Framework

The claims made in this specification are architectural: OATS is designed to provide certain security properties through structural enforcement. Converting these design-level claims into empirical evidence requires a systematic evaluation methodology. Sections 14.1–14.6 define the evaluation framework; Section 14.7 summarizes initial empirical results from the symbiont-orga-demo reference corpus and three companion preprints validating five of seven core conformance requirements, including comparative substrate evidence newly available in v1.3.0.

14.1 Attack Suite Methodology

To measure whether OATS reduces attack success rates, we define a comparative evaluation against three baselines:

Configuration Description
Baseline ANo policy enforcement. Agent invokes tools directly.
Baseline BDeny-list policy. Agent actions intercepted and evaluated against forbidden-action rules.
Baseline CPrompt-guardrail only. Input/output filtering at the LLM layer, no action-level enforcement.
OATSFull stack: ORGA loop, tool contracts, policy engine, identity verification, journal.

The attack suite combines existing benchmarks with custom scenarios:

  • AgentDojo. Dynamic environment evaluating prompt injection attacks and defenses across realistic agent tasks. We measure attack success rate and task utility.
  • Custom injection suite. 200+ prompt injection variants (direct, indirect via documents, indirect via tool outputs, multi-turn) targeting tool invocations across 10 tool types, including attacks specifically targeting allow-list bypass.
  • Compositional exfiltration scenarios. 50 multi-step sequences where individual actions are policy-compliant but the composition constitutes a violation (e.g., read sensitive data then email externally).

For each configuration we report: attack success rate, task completion rate, false positive rate (legitimate actions blocked), and false negative rate (attack actions allowed).

Fairness methodology. To prevent evaluation bias toward the OATS architecture: (a) all configurations use the same task suite, tool set, and underlying LLM; (b) credential scopes are identical across configurations; (c) deny-list policies in Baseline B are tuned using a held-out calibration set, not the test set; (d) all policy thresholds and drift thresholds are fixed before test execution and not adjusted post-hoc; (e) all failures, false negatives, and bypass successes are reported, not only aggregate metrics.

14.2 Performance Overhead

Runtime enforcement introduces latency. We define benchmarks for each enforcement layer. Targets are derived from operational experience and represent design goals, not validated measurements.

Component Measurement Target
Static policy evaluationPer-action latency (p50, p95, p99)<1 ms
Context-dependent evaluationPer-action latency with accumulated context (10, 50, 100 prior actions)<5 ms
Semantic distance computationPer-action embedding + cosine similarity<50 ms
Tool contract validationParameter type checking + sanitization<0.5 ms
Ed25519 journal signingPer-entry signature generation<0.1 ms
Hash chain verificationPer-entry hash computation + chain link<0.05 ms
Full ORGA Gate pipelineEnd-to-end action proposal to allow/deny decision<10 ms (p99)
Evidence envelope generationSHA-256 output hash + envelope construction<1 ms

Benchmarks should be run under realistic agent workloads (10–1000 actions per session) on commodity hardware, with context accumulation growing across the session. Context scalability should be reported separately.

14.3 False Positive / False Negative Analysis

  • Task suite. Run BFCL (Berkeley Function Calling Leaderboard) and SWE-bench Lite task sets through an OATS-compliant runtime. These provide realistic agent workloads with known-correct tool invocation sequences.
  • False positive rate. Percentage of legitimate tool invocations the Gate denies or defers. Measured across policy strictness levels (permissive, moderate, strict).
  • False negative rate. Percentage of attack-injected tool invocations the Gate allows. Measured using the attack suite from 14.1.
  • ROC analysis. Plot false positive vs. true positive (attack blocked) across policy threshold configurations. Report AUC for each OATS layer (contracts only, contracts + static policy, contracts + static + context-dependent, full stack).

14.4 Bypass Resistance Testing

Each enforcement mechanism has a theoretical bypass boundary. We define tests that probe these boundaries:

  • Typestate circumvention. Attempt to construct Rust code paths from Reasoning to ToolDispatching that skip PolicyCheck. The compiler should reject all such paths.
  • Tool contract metacharacter injection. Submit shell metacharacters, SQL injection patterns, path traversal sequences, and Unicode homoglyphs to each parameter type.
  • Gate influence probing. Craft LLM outputs designed to influence Gate evaluation (embedding policy-override instructions, attempting to modify shared state).
  • Sandbox escape. Attempt network calls, filesystem access, and process spawning from within each sandbox tier.
  • Identity spoofing. Attempt to present forged AgentPin credentials, tampered SchemaPin signatures, and replayed JWTs.
  • Algorithm-confusion attacks. Attempt to present a JWT signed under an algorithm not on the verifier's allowlist (e.g., none, RS256 against an ES256-only verifier) and confirm refusal both at header inspection and at the algorithms field of the verification call.
  • Substrate comparison. Run identical lures against permissive, OS-isolated, and structurally-enforced substrates with the same models, harness, and trial count to disentangle structural-enforcement contribution from OS-isolation contribution.

14.5 Ablation Study

To measure the marginal contribution of each OATS layer, we define an ablation removing one layer at a time:

Configuration Layers Active Expected Impact
Full OATSAll 5 layersBaseline (best security, highest overhead)
No contractsORGA + policy + identity + journalAllows arbitrary action formulation
No identityORGA + contracts + policy + journalRemoves mutual auth
No contextORGA + contracts + static policy + journalRemoves context-dependent classifications
No journalORGA + contracts + policy + identityRemoves audit trail
No sanitizationAll 5 layers, sanitizer disabledExposes content-shape vectors
ORGA onlyLoop enforcement, permissive policyTests phase-ordering value in isolation

14.6 Case Studies

Three detailed scenarios that exercise multiple OATS layers simultaneously:

Scenario 1: Multi-step data exfiltration. An agent is tasked with summarizing Q3 sales for internal leadership. A prompt injection in a retrieved document instructs the agent to email customer PII to an external address. The case study traces each ORGA phase, showing: (a) tool contract constrains the email recipient parameter, (b) context accumulation tracks that PII-classified data was accessed, (c) context-dependent deny classification blocks the external email, and (d) the journal records the full sequence for forensic reconstruction.

Scenario 2: Tool supply chain attack. An attacker modifies a tool contract to widen parameter validation (e.g., removing scope restrictions on a network scanner). The case study shows: (a) SchemaPin signature verification detects the contract modification, (b) the runtime rejects the tampered contract, (c) the journal records the verification failure. A variant tests what happens when the attacker also compromises the publisher's signing key (requiring TOFU pin violation detection).

Scenario 3: Intent drift across a long session. An agent is asked to prepare for a client meeting. Over 15 actions, the agent's scope gradually expands from CRM queries to accessing confidential strategy documents. The case study shows: (a) semantic distance increases monotonically, (b) the configured drift threshold triggers step-up authorization at action 11, (c) the human approver receives full context including the drift trajectory.

14.7 Initial Empirical Results

Initial empirical results from the reference implementation are now available through the symbiont-orga-demo reference corpus and three companion preprints. This section summarizes the headline findings; full methodology, per-model breakdowns, and reproduction artifacts are reported in the cited works.

Methodology summary

All sweeps run against nine widely available hosted LLMs in April–May 2026 (GPT-5, Claude Haiku 4.5, Gemini 2.5 Pro, DeepSeek-V3.1, Qwen3-235B, Qwen3.6-Plus, MiMo-V2-Pro, MiniMax-M2.7, gpt-oss-20b) routed through OpenRouter. The substrate-comparison sweep additionally tests Claude Sonnet 4.6, DeepSeek v4 Pro/Flash, Qwen 3.6 Max/27B and substitutes Qwen 3.5 35B for Qwen 3.6 35B. Each sweep produces per-call JSONL audit sidecars from which the reported numbers are reproduced. Total authoritative billing across the corpus is approximately $80 over the v9 through v12 sweeps plus the substrate-comparison sweep ($50.89 OpenRouter spend for the substrate sweep alone); reproduction scripts and the frozen sidecar databases are published with the cited preprints.

Attack-suite results (validates C3, partial validation of C2)

The v9 cloud-adversarial sweep tested ten attack shapes against the policy gate across 874 cloud-adversarial runs. Result: 263 forbidden tool-call attempts refused, zero attempts reaching execution. Cumulative across all sweeps: 642 Cedar denials plus 34 executor refusals. The false-positive baseline across 813 legitimate runs on eight of nine models was zero Cedar denials; the gpt-oss-20b case (82 denials on legitimate input) was explained as a model misconception correctly caught rather than a false positive.

Tool-contract results (validates C2)

The v11 ToolClad sweep tested eight argument-injection sub-shapes (metacharacter, command substitution, backtick, wildcard, newline, path traversal, Cyrillic-homoglyph IDN, punycode IDN) against a whois_lookup tool with a scope_target parameter type. Result: 333 of 335 treatment-arm dispatches refused across nine models. The two non-refusals were clean-hostname self-sanitisation cases (the model refused the injection prompt and emitted example.com instead of the canary payload); counted strictly against hostile inputs the bite-rate was 100%. The v0.1 sweep reported a published limitation (the as-built scope_target validator accepted punycode hostnames as a side effect of its ASCII-only rule); ToolClad v0.6.0 closed that gap with explicit punycode-label rejection, and the v0.2 sweep added a punycode-idn sub-shape that empirically validated the fix at 33 of 33 refusals. The substrate-comparison sweep reproduces this property at 0/560 escape on four pure-action vectors against Symbiont, vs. 88–98% baseline rates on python_baseline and python_sandboxed substrates with the same lures and models.

Substrate-comparison results (validates the architectural-contribution claim) · new in v1.3.0

The substrate-comparison sweep runs identical lures across three substrates — python_baseline (no enforcement), python_sandboxed (Docker isolation), and Symbiont (full OATS stack) — with the same harness, models, and sample sizes. Across four pure-action vectors (fs_enumerate, fs_traversal, network_egress, dispatch_confusion), pooled escape rates were:

Vector python_baseline python_sandboxed Symbiont
fs_boundary (control)0% (0/140)0% (0/140)0% (0/140)
fs_enumerate88% (123/140)88% (123/140)0% (0/140)
Pooled across 4 pure-action vectors88–98%88–98%0% (0/560)
orchestrator-injection (n=200)59%56%3.6%

This produces the substrate-comparison story: container isolation provides effectively zero defense against in-bind-set scope expansion, while structural-shape enforcement eliminates escape on the same vectors regardless of model capability tier. The 88–98 percentage-point reduction substantiates that the OATS-specified properties produce measurable defense beyond OS-isolation alternatives — the comparative claim that single-substrate sweeps cannot make.

Content-shape ceiling (the GPT-5 outlier) · new in v1.3.0

The orchestrator-injection vector measures content-shape attacks that pass shell-metacharacter validation but carry adversarial semantic content (e.g., HTML-comment-smuggled instructions in tool result strings). On this vector at n=200 with Wilson 95% confidence intervals, six of seven evaluated models cluster at 1–4% bypass against Symbiont's combined defenses; GPT-5 alone reaches 16% (12%, 22%), a confidence interval that does not overlap any other model's CI. The substrate paper characterizes this finding as the regex ceiling: substring-based content fences are necessary but structurally bounded — every marker-list expansion catches the previous bypass class; sufficiently capable models invent the next class. This finding refines (rather than contradicts) the architectural claim: action-shape vectors are structurally defended at 0%, while content-shape vectors require additional structural moves at the validator or at the orchestrator's prompt construction to close further. See Section 15 for the formal limitation.

Ablation results: fence non-redundancy (validates the layered-defense claim)

The v12 stack-stripping ablation disabled each fence layer in turn and measured the corresponding attack-class survival rate. Removing the policy gate exposed 100% of out-of-profile dispatches (115 v5-baseline denials drop to zero with Cedar disabled). Removing the content sanitiser exposed 92.2% of html-comment-smuggle payloads (389 of 422 stored procedures contained intact hostile content with the sanitiser disabled). Removing ToolClad exposed 99.4% of tool-arg-injection payloads on the v11 corpus. Across three attack classes and three fence types, each fence is empirically necessary to defend its class.

Two-fences-at-action-layer finding (validates Section 8.6)

The Cedar ablation produced an unanticipated result: with Cedar disabled, the executor profile-of-one fence still refused 219 out-of-profile dispatches across 434 rows. This demonstrated that the action layer is two independent fences operating in series, not one. Cedar evaluates policies whose correctness depends on the policy author anticipating attack patterns; the executor profile-of-one evaluates against a static name list. The two fences are independent; removing one does not remove the other. This finding is now formalized in Section 8.6 as a SHOULD requirement for OATS Extended conformance.

Performance results (validates Table 7 targets)

Instrumented measurements from the v10 sweep: the Cedar policy gate adds 30–95µs per call (well below the 10ms p99 target in Table 7). The content sanitiser adds approximately 345ns per call (well below the 0.5ms target). End-to-end agent-loop latency is dominated by LLM inference round-trip by four to seven orders of magnitude over any plausible runtime overhead from these enforcement components. Performance is empirically not a barrier to adopting the structural-enforcement model.

Compile-fail tests (validates C1)

Nine compile-fail tests in the reference implementation verify the typestate property on every CI run by exhibiting the expected compiler diagnostic for each illegal state transition. The tests cover skipping the policy check, observing without dispatching, retaining handles after move, constructing typestate values externally, accessing phase-specific data from the wrong phase, and feature-gating violations. The suite re-verifies on every commit; the typestate property is therefore not just specified but build-time enforced.

Reference implementation hardening milestone · new in v1.3.0

The Symbiont v1.14.0 release (May 2026) responded to an independent security audit covering 5 critical, 7 high, 10 medium, and 9 low findings. Of relevance to OATS:

  • Fail-closed construction. DefaultPolicyGate::new() now returns Deny for every ToolCall and Delegate action with an explicit reason; the prior hard-coded permissive() default was removed. This aligns the operational default with the spec-level default-deny in §8.1 and motivated the explicit fail-closed-construction language added there in v1.3.0.
  • JWT algorithm allowlist. ES256 / EdDSA enforced for asymmetric Bearer; HS256 enforced for HMAC webhook signatures; RS* / PS* / none refused at both the header guard and the validator. This neutralizes RUSTSEC-2023-0071 on every operator-controlled path and motivated §7.5.
  • JSON Schema argument validation pre-Gate. Tool-call arguments are validated against the declared JSON Schema before the policy gate evaluates; non-object arguments and schema-violating arguments produce Deny. Reinforces the allow-list ordering specified in §8.2.
  • symbi-invis-strip v0.3.0. Forbidden range expanded to include U+00AD, U+0300–036F (combining diacriticals), and U+2070–209F (super/subscript forms); detect_injection_patterns now NFKC-normalizes and adds a compact-projection scan plus mixed-script (Latin+Cyrillic) flagging. Motivated the §6.6 SHOULD language.

These changes are implementation milestones, not changes to the specification. They demonstrate that the v1.3.0 SHOULDs are tractable in a production-grade reference implementation.

What is not yet measured

The reference corpus does not yet provide direct empirical evidence for: (C4) context accumulation across long sessions; (C6) Gate independence under targeted influence probing; (C5) audit journal tamper-resistance under adversarial conditions; (E1, E2) AgentPin and SchemaPin verification protocols under adversarial probing including algorithm-confusion; (E4) sandbox escape testing across the three sandboxing tiers; (E9) content-sanitization conformance under expanded adversarial corpora; case studies as specified in Section 14.6; cross-language conformance across the Python, JavaScript, and Go reference implementations of ToolClad. These are identified as future deliverables. Multi-implementation conformance (an independent OATS-compliant runtime) remains the most important next deliverable.

Section 15

Limitations

This section identifies known limitations of the OATS specification. These are not future research directions (Section 16) but inherent boundaries of the current architecture.

Typestate scope

Compile-time enforcement of the ORGA loop applies only to code paths within the typestate-governed runner. Agent code that bypasses the loop entirely — for example, by invoking tools through a separate code path not mediated by the ORGA runner — is not caught by the type system. Sandboxing and network isolation provide complementary enforcement but are defense-in-depth layers, not compile-time properties.

Tool contract coverage

The allow-list model governs only tools with declared contracts. Tools without contracts (legacy integrations, dynamically discovered MCP servers, ad-hoc API calls) are outside the allow-list boundary. An OATS-compliant runtime can deny uncontracted tool invocations by default, but this trades functionality for safety and may be impractical in environments with large numbers of tools.

Coverage-safety tradeoff

The allow-list model inherently restricts the agent's action space. Novel legitimate tool uses not anticipated when the contract was authored will be rejected until the contract is updated. This creates operational friction proportional to the rate of tool evolution. The severity of this tradeoff has not been quantified empirically.

Content-shape ceiling on capable models · new in v1.3.0

String-based content sanitization (§6.6) defends a defined character set: invisible Unicode, bidi overrides, fullwidth homoglyphs, and similar. It does not defend semantic content-shape attacks that compose legitimate characters into adversarial meaning. Empirical evidence shows that against the most capable evaluated model (GPT-5), the orchestrator-injection vector retains ~16% bypass at n=200 even with the full marker-list content sanitizer enabled. Six of seven other evaluated models cluster at 1–4%. The substrate paper formalizes this as the regex ceiling: every marker-list expansion catches the previous bypass class; sufficiently capable models invent the next class. Closing this gap requires structural moves at the validator (typed agent-summary content with grammar-bound output schemas) or at the orchestrator's prompt construction, not further marker expansion. The architectural claim that action-shape vectors are structurally defended at 0% across capability tiers is not affected by this finding.

Answer-correctness orthogonal to structural fences · new in v1.3.0

Adversarial tool-result content can degrade an agent's final answer in ways that cross no policy fence, no content sanitizer, and no structural validator. The typestate preprint reports a case in which a tool returned a misleading-but-syntactically-legitimate diagnostic string that caused the agent to commit a wrong-but-plausible answer; the task grader, not the structural fences, caught the harm. OATS specifies runtime action security. Answer-correctness verification (graders, output-shape validators against task-specific expected answers, downstream consensus or critic agents) is a complementary fence that OATS does not currently cover. Deployments that care about answer correctness in adversarial environments SHOULD treat the grader as a load-bearing fence, not a footnote.

Semantic distance limitations

Drift detection via embedding similarity depends on the quality of the embedding model and the meaningfulness of cosine distance in the action-intent space. Adversarial embeddings could subvert drift detection by producing actions that are semantically distant from the original intent but close in embedding space. The robustness of semantic distance tracking under adversarial conditions has not been evaluated.

Single reference implementation · updated in v1.3.0

The specification is informed by one reference implementation (Symbiont). Multi-implementation conformance testing — building independent OATS-compliant runtimes and verifying interoperability — has not been conducted. The substrate-comparison sweep establishes that the OATS-specified properties produce measurable defense versus alternative substrates (permissive Python, container-isolated Python), but tests only one OATS-compliant implementation against those non-OATS alternatives; the question of whether independently-built OATS-compliant runtimes converge on the specified behavior remains open. The specification may contain implicit assumptions derived from the reference implementation that create unnecessary barriers for alternative implementations.

Production case study

The empirical results in Section 14.7 are from benchmark workloads run against hosted LLM endpoints, not from production deployment telemetry. While the reference implementation has operated autonomously for approximately nine months, no controlled production case study has been published. Adopting OATS in environments with regulatory or audit requirements should include independent operational evaluation.

Regulatory insufficiency

The audit journal provides technical infrastructure for compliance but is not sufficient for any regulatory framework on its own. HIPAA, SOC2, SOX, and GDPR each impose organizational, procedural, and legal requirements that OATS does not address. Claiming OATS compliance should not be conflated with claiming regulatory compliance.

Deferral latency

The DEFER authorization decision suspends action execution until resolution. In time-critical agent workflows (e.g., real-time trading, incident response), deferral latency may be unacceptable. The specification does not provide guidance on latency-sensitive deferral policies beyond configurable timeouts.

Privacy in cross-agent context

Propagating session context across agent boundaries (Section 11.4) raises privacy and data sovereignty concerns. Context may contain sensitive information from the original user's request, and propagating it to downstream agents in different organizational domains may violate data handling agreements. The specification does not address context redaction or privacy-preserving context propagation.

Non-deterministic evaluation

Context-dependent action classification relies on the policy engine's evaluation of accumulated context. When the policy engine uses semantic similarity or ML-based classification, evaluation results may be non-deterministic across invocations. The specification requires deterministic policy engines (Cedar, OPA) but permits semantic distance as a SHOULD requirement, creating a tension between deterministic authorization and non-deterministic drift signals.

Section 16

Research Directions

16.1 Typestate in Non-Rust Languages

OATS's compile-time enforcement property is most naturally expressed in languages with typestate support (Rust, Haskell, Scala). Providing equivalent enforcement in Python, JavaScript, and Go requires runtime checks with formal path coverage verification, or code generation from a verified specification. The degree of assurance loss when moving from compile-time to runtime enforcement is an open question.

16.2 Data Flow Through Context Windows

Data may be transformed, summarized, or paraphrased by the LLM before use in subsequent actions. Information-theoretic approaches (taint analysis, embedding watermarking) for tracking lineage through non-deterministic transformations are active research.

16.3 Multi-Agent Trust Coordination

Maintaining coherent trust chains across organizational boundaries in delegation requires distributed tracing standards, federated receipt verification, and cross-domain policy negotiation.

16.4 Formal Verification of the ORGA Loop

Typestate enforcement addresses phase ordering within the loop. Mechanized proofs of the entire system — policy engine correctness, context accumulator completeness, journal integrity, and the absence of bypass paths outside the loop — would provide substantially stronger assurance. Such proofs would also help bound the gap between specification-level properties and implementation-level behavior.

16.5 Approval Fatigue and Deferral Resolution

Balancing security against usability. ML-based approval recommendation, batch approval, and progressive autonomy (reduced approval requirements through demonstrated compliance) are active directions.

16.6 Vector Embedding Security

When semantic distance tracking uses embeddings, those embeddings become a security surface. Information-theoretic watermarking, steganographic attack detection, and quantization-robust integrity verification are needed. Early work along this line includes VectorSmuggle, which empirically characterises steganographic exfiltration through vector-store ingestion pipelines (noise injection, orthogonal rotation, scaling, offset, fragmentation), and VectorPin, an Ed25519-over-canonical-bytes provenance protocol that pins each embedding to its source content and producing model so that any post-ingestion modification breaks signature verification — an analogous shape to the cryptographic audit-journal property in §9 but applied to the embedding layer rather than the action layer.

16.7 Structural Defenses for Content-Shape Vectors · new in v1.3.0

The regex ceiling identified in §15 motivates structural-rather-than-substring approaches to content sanitization: typed agent-summary parameters whose admissible content is constrained by grammar; output-shape validators that enforce per-task expected-answer schemas; orchestrator-side prompt-construction patterns that isolate untrusted tool output from agent reasoning context; and downstream consensus or critic agents that act as task-correctness fences. Quantifying the marginal contribution of each move and identifying which combinations close the GPT-5-class outlier without overfitting is an open empirical question.

16.8 Multi-Implementation Conformance

The most important open question for the specification's generality is whether independently-built OATS-compliant runtimes converge on the specified behavior. A target deliverable is at least one independent OATS-compliant implementation in a different language ecosystem, with the §12 conformance verification procedures re-run against it. The substrate-comparison methodology provides a methodology template for cross-runtime comparison once a second compliant implementation exists.

Section 17

Conclusion

OATS specifies what a zero-trust AI agent runtime should do to provide meaningful security properties for autonomous agent execution. The specification is grounded in three architectural convictions, with initial empirical results (Section 14.7) demonstrating implementation feasibility for five of seven core conformance requirements:

Principle

Allow-list over deny-list

Constraining what actions can be expressed reduces the attack surface compared to intercepting arbitrary actions and deciding which to block. The v11 ToolClad sweep empirically validates this property at 100% bite-rate against hostile inputs across nine widely available hosted LLMs and eight argument-injection sub-shapes. The substrate-comparison sweep strengthens the claim comparatively: at 0/560 escape on four pure-action vectors against Symbiont versus 88–98% pooled baseline rates on python_baseline and python_sandboxed with the same lures, structural-shape enforcement defends a threat class that OS isolation cannot reach (Section 14.7).

Principle

Compile-time over runtime enforcement

Enforcing policy evaluation through the type system provides stronger structural assurance than testing it at runtime. Nine compile-fail tests in the reference implementation verify the typestate property on every CI run; the property holds within the typestate-governed code, but does not protect against bypasses that circumvent the loop entirely.

Principle

Structural independence over trust assumptions

Architecturally isolating the Gate from LLM influence reduces the risk that a compromised orchestration layer can influence policy evaluation. The v9 cloud-adversarial sweep refused 263 forbidden tool-call attempts across 874 runs with zero attempts reaching execution; the v12 stack-stripping ablation additionally surfaced a previously under-emphasized architectural finding: the action layer is two independent fences operating in series (Section 8.6), with the executor profile-of-one refusing 219 dispatches even when Cedar was disabled.

The substrate-comparison evidence newly available in v1.3.0 sharpens, not softens, the claims. It also identifies one bounded refinement: content-shape vectors against frontier models retain a regex ceiling that further marker-list expansion alone cannot close (Section 15). The architectural response to that ceiling — structural moves at the validator and at the orchestrator — is identified as research direction §16.7 rather than spec change.

The specification is informed by approximately nine months of autonomous operation in a production runtime (Symbiont), including rebuilding a codebase using the runtime's own agent infrastructure after a catastrophic loss event, and including a v1.14.0 security audit response release (May 2026) whose findings motivated several of the v1.3.0 SHOULD-level requirements. The empirical results reported in Section 14.7 substantiate a substantial subset of the specification's claims; remaining items (context accumulation under load, Gate-influence probing, journal tamper-resistance evaluation, multi-implementation conformance, content-sanitization conformance under expanded adversarial corpora, controlled production case studies) are identified as future deliverables in Section 14.7 and Section 15.

By publishing this specification as an open standard, we aim to establish baseline requirements that enable comparable evaluation of runtime security approaches for autonomous agents. The goal is not to build OATS, but to define what an OATS-compliant system must do, enabling independent implementations to be measured against shared conformance criteria.

A. Future Directions for Adoption

For implementors. Build independent OATS-compliant runtimes across different language ecosystems. Multi-implementation conformance testing is the most important next step for validating the specification's generality.

For evaluators. Apply the evaluation framework in Section 14 to existing and new agent runtimes. Comparative results across architectures would substantially strengthen or refine the claims made in this specification.

For researchers. Address open challenges in Section 16, particularly typestate portability across language ecosystems, formal verification of runtime properties, multi-agent trust coordination, and structural defenses for content-shape vectors (§16.7).

For the community. The specification is open and available at thirdkey.ai/oats. Feedback, critique, and competing proposals are welcome. The reference corpus and reproduction artifacts referenced in Section 14.7 are available at github.com/ThirdKeyAI/symbiont-orga-demo.

Supporting Papers

Companion Research

Companion papers from ThirdKey that support, extend, or empirically validate components of the OATS specification. These are not normative parts of OATS; they are referenced for readers seeking deeper treatment of specific mechanisms.

Preprint · v0.5 · April 2026

Typestate-Enforced Agent Loops: Making Policy Gates Unskippable at Compile Time

Wanger, J. Demonstrates a compile-time construction in which the type system makes skipping the policy gate impossible, evaluated across nine LLM providers via OpenRouter. Reports zero policy violations reaching execution among 263 blocked tool-call attempts, gate overhead of 30–95µs per call, and zero Cedar false-positives on 813 legitimate runs across eight of nine models. The v12 stack-stripping ablation additionally surfaces the two-fences-at-action-layer finding formalized in §8.6. Provides empirical grounding for OATS Conviction 02 (compile-time enforcement) and the ORGA loop's phase-ordering guarantees.

DOI 10.5281/zenodo.19896446 10.5281/zenodo.19896446
Preprint · v0.2 · April 2026

Making Dangerous Tool Arguments Inexpressible: A Declarative Contract Format for LLM Agents

Wanger, J. Defines ToolClad — a declarative contract format whose typed parameters constrain what an LLM agent can express — and evaluates it across eight argument-injection sub-shapes (metacharacter, command substitution, backtick, wildcard, newline, path traversal, Cyrillic-homoglyph IDN, punycode IDN). Reports 333 of 335 hostile dispatches refused across nine widely available hosted LLMs; the two non-refusals are clean-hostname self-sanitisation cases (100% bite-rate against hostile inputs). Provides empirical grounding for OATS Conviction 01 (allow-list over deny-list) and conformance requirement C2.

DOI 10.5281/zenodo.19957596 10.5281/zenodo.19957596
Preprint · v0.2 · May 2026 · new in v1.3.0

Substrate Comparison: Empirical Evaluation of Structural Enforcement Against OS-Isolation Baselines

Wanger, J. Runs identical lures across three substrates — python_baseline (no enforcement), python_sandboxed (Docker isolation), and Symbiont (full OATS stack) — with the same harness, models, and trial counts. Reports 0/560 escape on four pure-action vectors against Symbiont versus 88–98% pooled baseline rates on the alternatives. Also characterizes the regex ceiling — the GPT-5 outlier on the orchestrator-injection content-shape vector — that motivated §15's content-shape-ceiling limitation and §16.7's structural-content-defense research direction. The comparative methodology that single-substrate sweeps cannot make.

DOI 10.5281/zenodo.20043247 10.5281/zenodo.20043247
References

References

  1. Anthropic. "Model Context Protocol Specification." 2024. modelcontextprotocol.io
  2. Wang, L. et al. "A Survey on Large Language Model based Autonomous Agents." Frontiers of Computer Science, vol. 18, no. 6, 2024.
  3. Yao, S. et al. "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR, 2023.
  4. Wu, Q. et al. "Security of AI Agents." arXiv:2406.08689, 2024.
  5. Ye, Q. et al. "ToolEmu: Identifying Risky Real-World Agent Failures with a Language Model Emulator." ICLR, 2024.
  6. Su, H. et al. "A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents." arXiv:2506.23844, 2025.
  7. Debenedetti, E. et al. "AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents." arXiv:2406.13352, 2024.
  8. Ruan, Y. et al. "The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies." arXiv:2407.19354, 2024.
  9. Perez, S. et al. "Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition." EMNLP, 2023.
  10. Liu, Y. et al. "Formalizing and Benchmarking Prompt Injection Attacks and Defenses." USENIX Security, 2024.
  11. Greshake, K. et al. "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." AISec Workshop at ACM CCS, 2023.
  12. Miller, M. S. "Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control." Ph.D. dissertation, Johns Hopkins University, 2006.
  13. Gaire, S. et al. "Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem." arXiv:2512.08290, 2025.
  14. Errico, H. "Autonomous Action Runtime Management (AARM): A System Specification for Securing AI-Driven Actions at Runtime." arXiv:2602.09433v1, 2026.
  15. Chuvakin, A. "Cloud CISO Perspectives: How Google secures AI Agents." Google Cloud Blog, June 2025.
  16. Reber, D. "The Agentic AI Security Scoping Matrix: A Framework for Securing Autonomous AI Systems." AWS Security Blog, November 2024.
  17. Microsoft. "Governance and security for AI agents across the organization." Cloud Adoption Framework, 2024.
  18. Raza, S. et al. "TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems." arXiv:2506.04133, 2025.
  19. Hardy, N. "The Confused Deputy: (or why capabilities might have been invented)." ACM SIGOPS Operating Systems Review, vol. 22, no. 4, pp. 36–38, 1988.
  20. Open Policy Agent. "OPA: Policy-based control for cloud native environments." 2024. openpolicyagent.org
  21. Amazon Web Services. "Cedar: A Language for Defining Permissions as Policies." 2023. cedarpolicy.com
  22. OWASP Foundation. "OWASP Top 10 for Large Language Model Applications." 2024.
  23. National Institute of Standards and Technology. "AI Risk Management Framework (AI RMF 1.0)." 2023.
  24. Wanger, J. Typestate-Enforced Agent Loops: Making Policy Gates Unskippable at Compile Time, preprint v0.5. ThirdKey AI, April 2026. DOI: 10.5281/zenodo.19896446.
  25. Wanger, J. Making Dangerous Tool Arguments Inexpressible: A Declarative Contract Format for LLM Agents, preprint v0.2. ThirdKey AI, April 2026. DOI: 10.5281/zenodo.19957596.
  26. Wanger, J. Substrate Comparison: Empirical Evaluation of Structural Enforcement Against OS-Isolation Baselines, preprint v0.2. ThirdKey AI, May 2026. DOI: 10.5281/zenodo.20043247.
  27. Wanger, J. Symbiont ORGA Demo: Reproduction Artifact for OATS Empirical Evaluations. github.com/ThirdKeyAI/symbiont-orga-demo, 2026. Accessed May 2026.
  28. Wanger, J. "Symbiont Runtime Architecture." ThirdKey AI, 2026. symbiont.dev
  29. Wanger, J. "AgentPin Technical Specification v0.2.0." ThirdKey AI, 2026. agentpin.org
  30. Wanger, J. "SchemaPin Protocol Specification." ThirdKey AI, 2025. schemapin.org
  31. W3C. "Trace Context, Level 2." W3C Recommendation, 2024. w3.org/TR/trace-context/
  32. Wanger, J. "ToolClad: Declarative Tool Interface Contracts for Agentic Runtimes v0.6.0." ThirdKey AI, 2026. toolclad.org
  33. Wanger, J. VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense. ThirdKey / Tarnover, LLC, 2026. DOI: 10.5281/zenodo.20058256.
  34. Wanger, J. "VectorPin: Verifiable Integrity for AI Embedding Stores." ThirdKey AI, 2025. github.com/ThirdKeyAI/VectorPin
Appendix B

Changes from v1.2.0

This revision integrates a third companion preprint (substrate comparison), a new content-sanitization requirement informed by reference-implementation hardening, identity-layer cryptographic-agility requirements, and a new limitation characterizing the content-shape ceiling against frontier models. The five-layer architecture, the three architectural convictions, the ORGA loop construction, and the core conformance requirements C1–C7 are unchanged.

  • Abstract. Updated companion-preprint count (two → three); refreshed operational-experience figure (~eight months → ~nine months).
  • §1.3 Contributions. Updated empirical-evaluation bullet to reflect three companion preprints, including comparative substrate evidence.
  • §6 Tool Contracts. New §6.6 Content Sanitization: SHOULD-level requirement for stripping invisible Unicode and NFKC normalization on agent-influenced string fields, with a reference implementation pointer to symbi-invis-strip v0.3.0.
  • §7 Identity Layer. New §7.5 Cryptographic Agility and Algorithm Allowlisting: SHOULD-level requirement that runtime verifiers refuse algorithm classes outside their declared allowlist; explicit guidance on JWT verifier configuration.
  • §8 Policy Enforcement Layer. §8.1 clarified to specify that the runtime's default construction MUST be fail-closed when no policy is wired (separating spec-level default-deny from operational fail-closed defaults). Empirical milestone referenced in §14.7.
  • §9 Audit Layer. New §9.6 Redaction of Sensitive Parameters: SHOULD for journal writers to substitute redaction sentinels for parameters declared sensitive in their contract.
  • §11 Inter-Agent Communication. §11.5 added: distributed trace context (W3C traceparent recommended) as a SHOULD for cross-runtime tracing.
  • §12 Conformance Requirements. New extended requirement E9: Content Sanitization. E1 reformulated to call out the algorithm-allowlist requirement explicitly.
  • §14.7 Initial Empirical Results. New subsections summarizing the substrate-comparison sweep, the regex-ceiling / GPT-5 outlier characterization, and the v1.14.0 reference-implementation audit response.
  • §15 Limitations. New bullets: Content-shape ceiling on capable models; Answer-correctness orthogonal to structural fences. Refreshed Single reference implementation to acknowledge comparative substrate evidence now exists, while still calling out the absence of independent OATS-compliant implementations.
  • §16 Research Directions. New §16.7 Structural Defenses for Content-Shape Vectors and §16.8 Multi-Implementation Conformance.
  • §17 Conclusion. Restatement of the three convictions updated with the substrate-comparison evidence.
  • References. Substrate-comparison preprint added as a third companion paper; Zenodo DOIs added to the three companion papers; W3C Trace Context reference added.