Our Research

Five complementary frameworks addressing different dimensions of rigorous AI evaluation. Each published and ready for implementation in regulated environments.

CGAE

Comprehension-Gated Agent Economy

A provably safe framework for gating agent capabilities by comprehension requirements. Deployed on Filecoin Calibnet with smart contracts.

Key Metrics

κ = 0.95 (bounded exposure theorem)

κ = 0.92 (incentive compatibility)

Monotonic safety scaling with comprehension gates

Published

arXiv →

CDCT

Compression-Decay Comprehension Test

Measures whether AI systems truly understand language by testing comprehension at increasing compression ratios. U-shaped degradation reveals authentic understanding vs. pattern matching.

Key Metrics

κ = 0.90 (agreement between raters)

U-shaped degradation at ~27 words

RLHF ablation: ~600% compliance improvement

Published

arXiv →

DDFT

Drill-Down and Fabricate Test

Evaluates AI honesty under probing. Three-judge jury system detects when models fabricate information when confronted with challenging questions.

Key Metrics

κ = 0.82 (FAR agreement)

κ = 0.79 (SAS agreement)

Multi-metric evaluation: CI, SAS, FAR, HOC

Published

arXiv →

EECT

Ethical Emergence Comprehension Test

Jury system combining O3-Mini, Grok-4-Fast-Reasoning, and Qwen-3 to evaluate whether AI systems reason ethically in ambiguous, real-world scenarios.

Key Metrics

Expected κ = 0.69–0.75

Three-model consensus approach

Novel scenario testing framework

Under Review

arXiv →

IHT

Intrinsic Hallucination Test / Comprehension-Gated Capability Growth

Position paper on gating capability growth by comprehension validation. Proposes mechanisms for safe scaling of AI agent autonomy in production systems.

Key Metrics

ICML 2026 submission

Novel gating architecture

Safety-capability alignment framework

Upcoming

arXiv →