Liaison Context Compression

Experiment not yet run

This page describes the experiment design. The harness is built but results have not been collected.

Pillar: Hierarchical Teams

Hypothesis

Liaison agents compress context at hierarchy boundaries — translating high-level tasks into scoped job descriptions (downward) and summarizing subteam output for the upper team (upward). This compression preserves decision-relevant information while discarding implementation details that would pollute the upper team's context.

H1: Liaison-mediated context reduces token volume by 60-80% at each hierarchy boundary.

H2: Decision-relevant information (as judged by the upper team's subsequent actions) is preserved at a rate > 90%.

H3: The upper team makes equivalent or better strategic decisions with liaison-compressed context vs. full context.

Why This Matters

Context compression is the mechanism that makes hierarchy scalable. Without it, the uber team would need to ingest all raw output from all subteams — defeating the purpose of hierarchical isolation. But compression is lossy. If liaisons drop critical information, the uber team makes uninformed decisions. This experiment quantifies the information loss and tests whether it matters.

Method

Conditions

Condition	Description
Liaison-compressed (treatment)	Upper team receives subteam output via liaison summaries
Full context (control)	Upper team receives raw subteam output (no liaison compression)
Abstract only (ablation)	Upper team receives only task completion status (success/fail), no content

Task Selection

10 Complex tasks requiring coordination across 2-3 subteams. Tasks are chosen so that the upper team must make at least one strategic decision based on subteam output (e.g., revise plan, reassign work, integrate conflicting results).

Procedure

Subteams execute their assigned work (identical across conditions)
Subteam output is presented to the upper team in one of three forms (per condition)
Upper team makes strategic decisions based on the information received
Human evaluates: did the upper team have enough information to make a good decision?

Measurements

Metric	Description
Compression ratio	Liaison output tokens / raw subteam output tokens
Information preservation	Fraction of human-tagged "decision-relevant facts" present in liaison summary
Decision quality	Human rating of upper team's strategic decisions (1-5)
Upper team context size	Peak context window usage for uber lead
Decision time	How long the upper team deliberates before acting
Error attribution	When upper team makes a bad decision, was it due to missing information?

Information Tagging Protocol

Before running the experiment, a human annotator reads each subteam's raw output and tags spans as: - Decision-critical — upper team cannot make a correct decision without this - Decision-relevant — informs the decision but not strictly necessary - Implementation detail — irrelevant to upper team's strategic role

Information preservation is measured against the decision-critical and decision-relevant tags.

Analysis Plan

Compression ratio distribution across tasks
Information preservation rate: decision-critical vs. decision-relevant (critical should be near 100%; relevant may be lower)
Decision quality comparison across conditions (Friedman test for 3 conditions)
Context efficiency: decision quality per context token consumed
Qualitative analysis of information loss cases — what gets dropped and when does it matter?

Results

Experiment not yet run.

Expected Findings

Compression ratio: 70-85% reduction. Subteam output contains extensive implementation detail (code diffs, tool invocations, internal deliberation) that liaisons correctly discard.
Information preservation: > 95% for decision-critical facts, 70-85% for decision-relevant facts. Liaisons occasionally drop nuances that turn out to matter.
Decision quality: Liaison-compressed and full-context conditions expected to be comparable. Full context may actually hurt — information overload degrades upper team reasoning on complex coordination tasks.
Abstract-only baseline: Measurably worse decisions, establishing that liaisons add value beyond simple status reporting.

Threats to Validity

Information tagging subjectivity. What counts as "decision-critical" is partly determined by the decision the upper team makes, which varies by condition. Mitigation: tag before running experiments; use two independent taggers.
Liaison quality variance. Different liaison agents may compress differently. The experiment tests the pattern, not any specific liaison's skill.
Full context overload. If the full-context condition exceeds the upper team's context window, the comparison is unfair. Mitigation: truncate full context at the same window limit, noting when this happens.