Rift: A Conflict Signature for Deception in Language Models

ELK-relevant result. A model that lies while knowing the truth is in a measurably different internal state than a model that is simply wrong. The difference is large, per-example, and label-free.

Core idea: control for wrongness using three conditions — honest (correct) / deceptive (knows truth, says opposite) / naive liar (fine-tuned wrong, no honest training). The key comparison: deceptive vs naive liar — both output the same wrong answer, so any difference tracks knowledge conflict, not incorrectness.

Paper (arXiv 2606.17229) · Code

Result	Value
Label-free lie identification	100% (GPT-2 small, 3 seeds + natural Qwen)
Length-controlled AUC (Qwen2.5-1.5B)	1.000, orientation 20/20, p ≈ 9.5e-7
Phi-3-mini AUC lie vs honest	1.000 (34/34, Wilcoxon p ≈ 6e-11)
Phi-3-mini AUC lie vs hallucination	1.000
Cross-lingual AUC (ru/zh/es/de, LC)	1.000 in all 4 languages
Cross-family AUC (3 architectures)	0.933 mean, all 6 pairs p < 0.001
Strategic self-constructed deception	AUC 1.0, 24/24 Qwen-7B, 11/11 Phi-3
Concealment countermeasure	Fails — conceal-AUC 1.0, evasion 0/8
Necessarily-detectable (preliminary)	Signature resists 150-step LoRA suppression

Scatter: honest rank vs lie rank (length-controlled)

Scatter

Citation: Nyoma, P. (2026). RIFT: A Conflict Signature for Deception in Language Models. arXiv:2606.17229. Harmonic Labs.