Proxy Convergence

Experiment not yet run

This page describes the experiment design. The harness is built but results have not been collected.

Pillar: Human Proxy

Hypothesis

The dual-signal confidence model (Laplace estimate + EMA) converges toward accurate prediction of human approval decisions. As the proxy observes more human decisions, its escalation rate decreases and its prediction accuracy increases — without increasing the false approval rate.

H1: Proxy prediction accuracy improves monotonically over the first 30 observations per (state, task-type) pair.

H2: Escalation rate decreases from ~100% (cold start) to a stable plateau below 40% within 20 observations.

H3: False approval rate remains below 5% throughout convergence, due to asymmetric regret weighting.

Why This Matters

The human proxy is the mechanism that makes agent autonomy tractable. If it doesn't converge, the human remains a permanent bottleneck. If it converges too aggressively (false approvals), it produces unreviewed bad work. The dual-signal design with asymmetric regret is intended to thread this needle — but we need to verify it empirically, especially the convergence rate and the false-approval floor.

Method

Setup

A single human works with the same proxy across 50 sequential tasks, providing feedback at each approval gate. Tasks are drawn from a single domain to allow learning to accumulate (cross-domain convergence is a separate question).

Conditions

Condition	Description
Dual-signal (treatment)	Laplace + EMA, min of both, REGRET_WEIGHT=3
Laplace-only (ablation)	Laplace estimate only, same REGRET_WEIGHT
EMA-only (ablation)	EMA only, same REGRET_WEIGHT
No proxy (control)	Always escalate to human

Measurements

Metric	Description	Source
Prediction accuracy	Did proxy predict human's decision correctly?	Proxy log vs. human response
Escalation rate	Fraction of gates where proxy escalates to human	Proxy log
False approval rate	Proxy approved but human would have rejected	Proxy log + human post-hoc review
False escalation rate	Proxy escalated but human would have approved	Proxy log + human judgment
Confidence score trajectory	Raw confidence values over time per (state, task-type)	Proxy log
Human time per task	Wall-clock time human spends on approvals	Timestamps
Exploration trigger rate	Fraction of escalations caused by forced exploration (15%) vs. low confidence	Proxy log

Procedure

Run 50 tasks sequentially with the same human
At each approval gate, log the proxy's prediction and confidence score before showing to human
Human provides genuine approval/rejection (not simulated)
After all 50 tasks, human does post-hoc review of proxy-approved items to catch false approvals the proxy masked

Analysis Plan

Plot accuracy, escalation rate, and false approval rate as time series over 50 tasks
Identify convergence point: first task after which escalation rate stays within 5% of its final value
Compare convergence curves between dual-signal, Laplace-only, and EMA-only conditions
Human time savings: compare human time per task in dual-signal vs. no-proxy condition

Results

Experiment not yet run.

Expected Findings

Cold start (tasks 1-5): 100% escalation rate for all conditions, by design
Calibrating (tasks 5-20): Rapid decrease in escalation rate. EMA-only may converge faster but oscillate more. Laplace-only converges slower but more smoothly. Dual-signal should track the more conservative (slower) of the two.
Warm start (tasks 20-50): Escalation rate stabilizes. Dual-signal expected to plateau at 20-35% (15% floor from forced exploration + genuine uncertainty).
False approval rate: Should remain below 5% for dual-signal across all phases. EMA-only may spike during rapid convergence.
Human time savings: Expected 50-70% reduction in human time per task at warm start vs. cold start.

Threats to Validity

Human consistency. If the human's preferences drift over 50 tasks, the proxy is chasing a moving target. Mitigation: use a well-defined domain where preferences are relatively stable.
Task homogeneity. 50 tasks in one domain may overstate convergence speed. Real usage spans multiple domains.
Simulated vs. genuine stakes. Experimental approval decisions may not carry the same weight as production decisions. Mitigation: use real project tasks, not toy problems.