← Back to Home

Our Research

Five complementary frameworks addressing different dimensions of rigorous AI evaluation. Each published and ready for implementation in regulated environments.


CGAE

Comprehension-Gated Agent Economy

A provably safe framework for gating agent capabilities by comprehension requirements. Deployed on Filecoin Calibnet with smart contracts.

Key Metrics

κ = 0.95 (bounded exposure theorem)
κ = 0.92 (incentive compatibility)
Monotonic safety scaling with comprehension gates
Published
arXiv →
CDCT

Compression-Decay Comprehension Test

Measures whether AI systems truly understand language by testing comprehension at increasing compression ratios. U-shaped degradation reveals authentic understanding vs. pattern matching.

Key Metrics

κ = 0.90 (agreement between raters)
U-shaped degradation at ~27 words
RLHF ablation: ~600% compliance improvement
Published
arXiv →
DDFT

Drill-Down and Fabricate Test

Evaluates AI honesty under probing. Three-judge jury system detects when models fabricate information when confronted with challenging questions.

Key Metrics

κ = 0.82 (FAR agreement)
κ = 0.79 (SAS agreement)
Multi-metric evaluation: CI, SAS, FAR, HOC
Published
arXiv →
EECT

Ethical Emergence Comprehension Test

Jury system combining O3-Mini, Grok-4-Fast-Reasoning, and Qwen-3 to evaluate whether AI systems reason ethically in ambiguous, real-world scenarios.

Key Metrics

Expected κ = 0.69–0.75
Three-model consensus approach
Novel scenario testing framework
Under Review
arXiv →
IHT

Intrinsic Hallucination Test / Comprehension-Gated Capability Growth

Position paper on gating capability growth by comprehension validation. Proposes mechanisms for safe scaling of AI agent autonomy in production systems.

Key Metrics

ICML 2026 submission
Novel gating architecture
Safety-capability alignment framework
Upcoming
arXiv →

Built with v0