1. Introduction
Reader guide. This paper introduces a new way to think about sentence comprehension, not as predicting the next word, but as retrieving meaning over time. Most current models measure how surprising a word is, assuming all readers process it in the same way. ODER shows that is not the case: comprehension speed and difficulty depend on each reader’s attention, memory, and background knowledge. Using a controlled test language, we measure how different kinds of readers recover meaning, and then test the same model on English sentences without changing the parameters. The results match known points where understanding tends to break down and predict when and why this happens for different readers.
Comprehension failure is not prediction error; it is delayed access to retrievable meaning. Traditional entropy approaches quantify linguistic uncertainty without considering observer-specific processing differences. Evidence from neurolinguistics and computational cognition shows that interpretive effort, and therefore uncertainty resolution, depends on an observer’s attentional state, working-memory capacity, and contextual familiarity [
15,
26]. Earlier entropy-based models conflate linguistic probability with observer cost, obscuring how processing difficulty emerges from observer-bound retrieval delays. ODER extends this literature by
defining entropy retrieval as a joint function of hierarchical syntactic complexity and information-transfer efficiency;
mapping these constructs to measurable cognitive signatures in EEG, fMRI, and pupillometry;
providing a replicable benchmarking framework that reports , , and for each observer class.
Crucially, ODER reframes comprehension as entropy retrieval in the observer, not entropy reduction in the signal. This distinction explains not only what is complex but also how and when different observers experience that complexity. Clarification. We emphasize that ODER is not a language model or parser; it is a meta-framework describing how observers retrieve entropy from linguistic input.
1.1. Contributions
Accordingly, ODER functions as a formal meta-framework that parameterizes observer-specific comprehension, enabling structured comparison across linguistic theories rather than competing with predictive models.
A unified mathematical framework for observer-dependent entropy retrieval.
Explicit retrieval (Equation
A3) and transition (Equation
A4) functions that parameterize attention, working memory, and prior knowledge.
A contextual-gradient operator that captures reanalysis, for example, garden-path phenomena, in dynamic observer-dependent terms.
A benchmarking protocol that compares ODER with existing cognitive models and raises stress flags when flattens or diverges.
A demonstration that quantum-formalism constructs can model ambiguity and interference without implying literal quantum computation in the brain.
1.2. Relationship to Existing Models
ODER does not compete with current linguistic models solely on predictive accuracy; instead, it addresses a core explanatory gap:
Rather than replacing these approaches, ODER serves as a meta-framework that clarifies when and why processing difficulty arises for specific observers.
We define lawful divergence as a systematic, interpretable model–trace mismatch (flagged by the shape_mismatch criterion and nonzero ) that aligns with known construction-specific bottlenecks without parameter re-fitting.
1.2.1. The ODER Innovation: A Conceptual Map
Consider the classic garden-path sentence, “The horse raced past the barn fell.” Empirical work shows expert versus novice divergence in processing difficulty [
5,
8]. Surprisal models predict uniform difficulty, whereas ODER explains observer-specific divergence by parameterizing retrieval through attention, working memory, and stored knowledge.
1.3. Theoretical Positioning of ODER
Table 1.
Theoretical positioning of ODER relative to leading approaches.
Table 1.
Theoretical positioning of ODER relative to leading approaches.
| Approach |
Primary Focus |
Treatment of Observer |
Key Limitations |
| Surprisal Models |
Input statistics and probability |
Uniform processor with idealized capacity |
Cannot explain individual differences in processing difficulty |
| Resource-Rational |
Bounded rationality and capacity limits |
Variable capacity, uniform mechanisms |
Lack explicit reanalysis mechanisms; treat processing as passive |
| ACT-R Parsing |
Procedural memory retrieval |
Slot-limited buffer with decay |
Prediction and retrieval treated separately; no coherence term |
| Hierarchical Prediction-Error |
Multi-level expectation tracking |
Implicit observer; scalar precision weights |
No explicit collapse point or observer parameters |
| Optimal Parsing |
Strategy selection |
Uniform processor with idealized strategies |
Cannot explain observer-specific strategy choices |
| ODER (this model) |
Observer-relative entropy retrieval (not generative modeling) |
Parameterized by attention, memory, and knowledge |
Designed partial-fit; requires empirical calibration of observer parameters |
By casting comprehension as active, observer-relative retrieval, ODER unifies phenomena such as garden-path reanalysis, working-memory constraints, and expertise effects under a single, testable framework.
While ODER is not a neural model, its observer parameters map to biologically interpretable constraints. Attention (), working-memory capacity (), and semantic interference () correspond to measurable features of cognitive processing, including reaction-time profiles, ERP components (e.g., N400/P600), and frontal theta dynamics. This alignment supports ODER’s testability within neurocognitive systems without invoking mechanistic claims beyond retrieval-level modeling.
1.4. Forward Retrieval Law and Inverse Decoder
(
Forward retrieval law). The hyperbolic tangent ensures symmetric convergence behavior and enables analytic inversion under bounded modular-flow assumptions (see
Appendix A). Convergence is expected to plateau near
–
under the constant-
assumption (cf.
Appendix A for ceiling behavior); allowing
to vary piecewise raises this ceiling (see Discussion
5), but we retain this parsimonious form to preserve sharp falsifiability of the baseline hypothesis. In plain terms, Equation
1 states that retrieval speed depends both on how much meaning is still left to recover and on how far along the observer is in the comprehension process.
This inverse form recovers the effective retrieval-rate profile from an observed entropy trajectory. In practice it allows us to estimate how retrieval dynamics vary over time without assuming a fixed , making it possible to test whether deviations from the constant-rate law align with known comprehension bottlenecks. The inverse decoder therefore functions as a diagnostic tool: when applied to empirical traces, it reveals where the forward law is under strain and highlights candidate loci of lawful divergence.
1.5. Implementation Algorithm
|
Algorithm 1 ODER Entropy Retrieval. |
Require: sentence S, observer parameters Ensure: observer-dependent entropy
-
1:
; initialize with Equation A2
-
2:
for each word w in S do
-
3:
-
4:
-
5:
-
6:
▹ Equation A3
-
7:
▹ Equation A4
-
8:
end for
-
9:
return
|
In practical terms, the algorithm tracks how each word in a sentence contributes to the observer’s cumulative retrieval of meaning, updating both the entropy value and the internal representation of possible interpretations at every step.
Python prototypes rely on
NumPy/SciPy,
QuTiP, and
spaCy. Optimization used SciPy’s L-BFGS-B with a function tolerance of
and three random-seed restarts per trace. A Monte Carlo identifiability sweep for
appears in
Appendix A (Table A1).
A complete symbol glossary appears in Appendix E; readers may find it useful to consult alongside the equations above. 2. Benchmarking Methodology
This section details how ODER’s parameters are estimated, how competing models are evaluated, and how each metric links model traces to behavioral and neurophysiological data through cross-validation, stress-flag analysis, and multimodal validation. All analysis code and notebooks (Zenodo DOI; GitHub mirror) reproduce every figure and table in this paper; see the Data Availability section for links.
As summarized in
Table 2, the benchmarking protocol evaluates both fit quality and neurocognitive interpretability.
2.1. Comparative Metrics
-
Entropy-reduction rate (ERR)
– First-derivative slope of ; hypothesized to scale with the N400 slope in centro-parietal EEG.
-
Retrieval-collapse point ()
– Time at which enters a 95% confidence band around zero; anchors the onset of P600 activity and post-disambiguation fixation drops.
-
Model-to-trace fit (, AIC, BIC)
– Overall goodness-of-fit and parsimony; higher predicts tighter coupling between simulated and observed P600 latency.
-
Observer-class divergence ()
– Cohen’s d for between O1 and O3; relates to between-group differences in frontal-theta power (high vs. low working memory).
-
Cross-validation error
– Mean absolute error over k-fold splits (bootstrapped 95% CIs); mirrors inter-trial variability in ERP peak latencies.
-
Reanalysis latency
– Reaction-time variance in garden-path tasks; behavioral proxy for spikes.
-
Pupillometric load
– Peak dilation normalized by baseline; tracks integrated (working-memory demand).
-
Eye-movement patterns
– Fixation count and regression length during disambiguation; fine-grained correlate of local ERR fluctuations.
The three baseline models used for AIC comparison were a linear growth model, a single-parameter exponential decay model, and a two-parameter power-law model, each fit by bounded nonlinear least squares with the same optimizer settings as ODER; full implementation details are provided in the Data Availability section.
2.2. Protocol
Compute baseline entropy with Equation (
A1) for all Aurian stimuli.
Run ODER retrieval dynamics using the retrieval kernel (Equation (
A3)), the state transition (Equation (
A4)), and the forward retrieval law (Equation (
1)); log stress flags whenever
or
s.
2.3. Neurophysiological Correlates
Collapse-point alignment in ODER rests on well-established ERP latencies. Canonical work places the N400 between
300–500 ms after a critical word [
19] and the P600 between
500–900 ms [
27]. Each window is anchored at the observer-specific collapse time
(see
Section 4):
Contextual-gradient spikes (
) predict P600 amplitude in the window
ms [
27].
Information-transfer efficiency (
) predicts N400 magnitude in the window
ms [
19].
Working-memory load (
) is expected to modulate frontal-midline theta (4–7 Hz) across the same post-collapse interval, consistent with memory-maintenance accounts of theta power [
3].
These mappings operationalize how ERR aligns with N400 slope, how
modulates P600 amplitude, and how
anchors both windows, thereby linking the metrics in
Table 2 to observable brain dynamics (boundary-case discussion in
Section 5).
2.4. Distinguishing Retrieval Failure from Prediction Failure
Prediction failure occurs when the parser misanticipates input, yielding high surprisal and early N400 peaks. Retrieval failure arises when an observer cannot integrate available information, even if predictions were correct, leading to prolonged P600 activity, a plateau in pupil dilation, and nonlinear error growth in comprehension probes.
EEG: sustained P600 with attenuated resolution when retrieval failure persists.
Pupillometry: plateau in low-capacity observers.
Behavior: super-linear increase in probe errors beyond a complexity threshold.
3. Empirical Calibration
This section explains how ODER’s latent parameters are estimated from behavioral and neurophysiological data, specifically, reaction-time variance, ERP desynchronization, and -aligned EEG or pupillometry epochs.
3.1. Aurian as an Initial Testbed
Aurian is a constructed language that allows explicit control over syntactic complexity (
) and informational load (
) [
6]. Although artificial, this setting permits systematic manipulation of embedding depth and lexical properties, yielding a clean environment for first-pass model tests.
| Note on Aurian Scope |
| Aurian is used here solely as a controlled corpus for entropy benchmarking. A more advanced version, including observer-conditioned compression, ambiguity-preserving syntax, and symbolic scaffolds, is developed separately in [6]. That version is not referenced or operationalized in this benchmarking paper. |
3.1.1. Aurian Grammar Specification
Lexicon with increments
kem (subject pronoun, )
vora (simple verb, )
sul (complementizer, )
daz (embedding verb, )
fel (object noun, )
ren (modifier, )
tir (determiner, )
mek (conjunction, )
poli (adverb, )
zul (negation, )
Illustrative sentences
Low entropy (): Kem vora fel (“He/She sees the object”)
Medium entropy (): Kem vora fel ren (“He/She sees the object quickly”)
High entropy (): Kem daz sul tir fel vora (“He/She thinks that the object falls”)
Very high entropy (): Kem daz sul tir fel sul ren vora poli zul (“He/She thinks that the object that quickly falls does not move”)
Table 3.
Cumulative across Aurian sentence classes.
Table 3.
Cumulative across Aurian sentence classes.
| Sentence class |
Tokens |
Cumulative
|
| Low |
3 |
2 |
| Medium |
4 |
3 |
| High |
6 |
7 |
| Very High |
9 |
11 |
3.1.2. Clarifying the Metric
Each rule or lexical item contributes a fixed increment to . Future work will compare this heuristic with alternative measures such as dependency distance and parse-tree depth.
Ecological Rationale
Although Aurian is synthetic, it functions as a minimal-pair generator that isolates structural factors while holding lexical semantics constant. This controlled starting point is a necessary bridge to natural corpora such as
Natural Stories [
10] and
Dundee [
17], where real-world noise and contextual variation are much higher.
3.2. Confidence, Sensitivity, and Parameter Variance
Report 95% confidence intervals for and , estimated from n-back and reading-span tasks.
Run sensitivity sweeps; log a stress flag when shifts by more than 50 ms.
Calibration of
follows the inverse-decoder procedure defined in Equation (
2). These checks quantify how robust ODER predictions remain under realistic measurement noise, ensuring that small fluctuations in measurement do not alter the model’s qualitative predictions.
4. Results
We evaluated ODER on sixteen trace–observer pairs, eight sentences crossed with two observer classes (O1: high context, O3: low context), using the constant-
retrieval law (Equation
1) with bounds
s. The analysis produced a
31% convergence rate, which serves as the stress-test baseline for all subsequent comparisons. We also tested whether the same fixed parameters, calibrated on the Aurian corpus, could be applied without re-fitting to a structurally diverse set of natural English sentences exhibiting known comprehension bottlenecks (see
Section 4.2).
The sentence set includes five Aurian constructions, one English baseline (eng_1), and two syntactic pathologies (garden-path and ambiguous cases).
4.1. Model–Fit Quality
Successful fits: (31.2%)
Mean (successful): (bootstrap 95% CI )
In every convergent case ODER’s AIC was at least two points lower than the best baseline (linear, exponential, or power-law).
Table 4.
Convergent ODER fits. AIC = AICODER minus the best competing baseline (negative favors ODER). CIs are bootstrap estimates (1 000 resamples).
Table 4.
Convergent ODER fits. AIC = AICODER minus the best competing baseline (negative favors ODER). CIs are bootstrap estimates (1 000 resamples).
| Sentence |
Observer |
|
CI |
(t) CI |
AIC |
| eng_1 |
O1 |
0.871 |
|
|
|
| eng_1 |
O3 |
0.709 |
|
|
|
| aur_1 |
O1 |
0.810 |
|
|
|
| aur_complex_1 |
O1 |
0.759 |
|
|
|
| aur_complex_2 |
O1 |
0.661 |
|
|
|
Across these fits O1 retrieved entropy slightly faster than O3 (
,
; Cohen’s
). The absolute difference (
) is modest yet directionally consistent with the contextual-richness hypothesis. All convergent traces pegged
at the lower bound (
s), a boundary effect revisited in
Section 5. As
Appendix F shows, lower-bound pegging primarily reflects limited trace duration rather than pathological fit.
4.2. Empirical Generalization Check
To test whether the retrieval law calibrated on synthetic Aurian data extends to natural-language comprehension, we evaluated ODER on twelve English sentences drawn from open corpora that exemplify structural ambiguity (center-embedding, garden-path constructions, coordination-scope shifts). Using the fixed values obtained from the synthetic-trace fit, we generated retrieval trajectories and compared their predicted collapse points against empirical traces derived from self-paced reading times. Because are fixed from Aurian calibration, the natural-sentence results evaluate ecological validity rather than re-fitted accuracy, with lawful divergence highlighting construction-specific departures.
Timing accuracy. ODER predicted collapse within s for 9 of the 12 sentences. Systematic deviations aligned with the expected re-analysis delay for center-embedded clauses and the anticipated early collapse for modifier-shift stimuli.
Shape fidelity. The median root-mean-square error across full trajectories was , with the largest residuals occurring at re-analysis junctures, which is precisely where the canonical retrieval law is not expected to hold.
Figure 1.
Representative naturalistic alignment from the Natural Stories corpus (sentence ID: ns_001). The empirical cumulative retrieval trace (blue, solid) is derived from normalized reading-time–based entropy estimates; the ODER prediction (orange, dashed) uses and fixed from synthetic Aurian calibration. The dotted horizontal line marks the collapse threshold . This case shows the human trace reaching threshold almost immediately, while the model’s predicted collapse lags, illustrating lawful divergence without re-fitting.
Figure 1.
Representative naturalistic alignment from the Natural Stories corpus (sentence ID: ns_001). The empirical cumulative retrieval trace (blue, solid) is derived from normalized reading-time–based entropy estimates; the ODER prediction (orange, dashed) uses and fixed from synthetic Aurian calibration. The dotted horizontal line marks the collapse threshold . This case shows the human trace reaching threshold almost immediately, while the model’s predicted collapse lags, illustrating lawful divergence without re-fitting.
These outcomes demonstrate that ODER’s canonical form generalizes to noisy human data without re-fitting and, crucially, surfaces lawful divergence patterns that motivate future adaptations.
A complete trace-level validation set, including sentence metadata, deviations, RMSE histograms, and shape-mismatch flags, is provided in the companion notebook "ODER Natural Stories Validation."
4.3. Parameter–Sensitivity Analysis
To confirm that the 31% convergence rate is not a tuning artifact, we swept each key parameter by
around its best-fit value on every trace–observer pair (200 grid points per trace). The sweep altered the success count by at most one fit in either direction (range: 4–6 of 16), with mean
changes
. Median
shifted by
and median
by
, both well within bootstrap CIs reported above. Full heat maps appear in
Appendix C.
4.4. Interpreting the 31% Convergence Rate
A 31% success rate may appear low; however, in a falsifiable framework it is a feature, not a flaw. ODER fails in trace-specific, theoretically interpretable ways rather than retro-fitting every trajectory. The eleven non-convergent pairs therefore serve as stress tests that reveal boundary conditions for the current retrieval law (see boxed note below).
Table 5 groups these cases by stimulus class, observer profile, and failure symptom; the clusters motivate concrete modifications discussed in
Section 5.
| Why a 31% Convergence Rate Is a Feature, Not a Flaw |
| Overfitting all traces would render the model unfalsifiable. Non-convergence, 11 of 16 trace–observer pairs in this dataset, signals structural limits of comprehension under constrained observer parameters and supplies falsifiable boundary cases for future retrieval laws. |
4.5. Failure Taxonomy
The failures cluster in two stimulus classes:
Garden-path sentences (gpath_1, gpath_2): Non-monotonic retrieval spikes violate the sigmoidal assumption; and become non-identifiable.
Flat-anomaly or highly ambiguous items (flat_1, ambig_1): Sustained high and negligible flatten the trace, leading to under-fit ().
Rather than patching the model, these stress cases are retained as falsifiable anomalies guiding future development (
Section 5).
Table 6.
Aggregated failure symptoms across the eleven stress-flagged traces and recommended next-step fixes.
Table 6.
Aggregated failure symptoms across the eleven stress-flagged traces and recommended next-step fixes.
| Symptom |
Frequency |
Provisional Remedy |
|
pegging |
6/11 |
Extending trace length; adding hierarchical priors |
|
AIC shortfall () |
3/11 |
Using adaptive learning rates in the optimizer |
|
inversion (O3 > O1) |
2/11 |
Testing a mixed-effects retrieval law |
4.6. Sentence–Level Retrieval Dynamics
Observer separations were largest for
aur_complex_1,
aur_complex_2, and the lexical-ambiguity item
ambig_1 (full plots in
Appendix C). Interpretive collapse points (ICP; token index)
1 differed by 1–2 positions, shifting predicted ERP windows by ≈400 ms.
4.7. Representative Trace Comparison
Figure 2.
Low-complexity English sentence (eng_1) — O1 observer. Blue points: observed retrieval. Orange dashed line: ODER fit. Shaded bands: predicted N400 (purple) and P600 (red) windows aligned to .
Figure 2.
Low-complexity English sentence (eng_1) — O1 observer. Blue points: observed retrieval. Orange dashed line: ODER fit. Shaded bands: predicted N400 (purple) and P600 (red) windows aligned to .
Figure 3.
Same sentence, O3 observer. O1 converges earlier and reaches a slightly higher plateau, reflecting richer contextual priors.
Figure 3.
Same sentence, O3 observer. O1 converges earlier and reaches a slightly higher plateau, reflecting richer contextual priors.
4.8. Self-Audit Note
Trace “The old man the boats.” attains yet inverts theoretical expectations (). It is therefore flagged as an informative boundary condition for the next-generation retrieval law (piecewise ).
4.9. Predictive Outlook
We predict that real-time EEG recorded on the eleven failure items will exhibit a prolonged N400–P600 overlap, an electrophysiological signature of unresolved retrieval competition. Observing (or not observing) this overlap provides a single falsifiable discriminator between a genuine model limitation and mere parameter noise.
Complete figures, code, and logs are archived on Zenodo (DOI provided in the Data-Availability statement) and mirrored on GitHub.
5. Discussion
This section interprets ODER’s empirical results, linking observer-specific parameters to ERP anchoring, failure modes, and potential neurodivergent markers, while outlining current limitations and prospective extensions.
5.1. Theoretical Contributions
ODER reframes comprehension as
observer-specific entropy convergence, a process that is measurable, falsifiable, and temporally aligned with ERP markers, rather than as prediction accuracy or static syntactic complexity. The framework’s quantitative retrieval law unites modular entropy retrieval, observer class, and temporal processing signatures, explaining not only
what is difficult but also
when and
for whom. By treating Aurian as a controlled precursor to natural-language corpora (
Section 3), ODER establishes an ecological bridge between synthetic traces and datasets such as Natural Stories and Dundee.
Preliminary simulations on structurally ambiguous English sentences (e.g., garden-paths and center embeddings) yielded retrieval dynamics consistent with Aurian-class predictions, suggesting that observer-dependent collapse behavior generalizes across natural input. Partial-fit phases are common in developmental modeling: early incremental-surprisal accounts captured only right-branching dependencies and failed on center-embedding constructions until variable stack depth was introduced [
12], a limitation later resolved by allowing variable stack depth [
30]. ODER’s current 31% ceiling therefore reflects a typical, rather than anomalous, stage in model evolution.
Figure 4.
Observer-class divergence in entropy collapse. Top: entropy trajectories for O1 (solid) and O3 (dashed). Middle: corresponding contextual gradients (). Bottom: retrieval-collapse thresholds (), showing earlier stabilization in O1. This schematic illustrates ODER’s central claim: the same stimulus can yield lawful, observer-specific divergence in entropy retrieval.
Figure 4.
Observer-class divergence in entropy collapse. Top: entropy trajectories for O1 (solid) and O3 (dashed). Middle: corresponding contextual gradients (). Bottom: retrieval-collapse thresholds (), showing earlier stabilization in O1. This schematic illustrates ODER’s central claim: the same stimulus can yield lawful, observer-specific divergence in entropy retrieval.
5.2. ERP Anchoring and Observer Diversity
Bootstrap analysis (1,000 resamples) yields the following retrieval-rate estimates for the five convergent traces:
The resulting difference,
excludes zero, confirming a small yet reliable observer skew (Cohen’s
). Even with
pegged at its lower bound (0.1 s), collapse tokens diverged
2:
Mapping onsets to 400-ms steps shifts ERP windows by
300–500 ms for the N400 and
500–900 ms for the P600, aligning with canonical latencies [
19,
27].
Bridging note. Larger
steepens the N400 slope, whereas longer
delays P600 onset; parameter differences therefore forecast observer-specific ERP patterns. We interpret
as the observer-specific threshold at which interpretive entropy collapses to a stable trajectory, an endogenous resolution point rather than a stimulus-determined timestamp.
ERP– Variability and Observer Drift.
The correspondence between and classic ERP markers (e.g., P600 during syntactic reanalysis) is robust but not universal. Both and ERP latencies can drift with observer-specific profiles or with sentence structures that trigger staggered reanalysis phases. Although synthetic Aurian EEG simulations showed close alignment, future work should quantify ERP– coupling across diverse constructions in controlled EEG cohorts. ODER therefore treats this alignment as an empirical finding rather than an axiomatic assumption.
5.3. Parameter Diversity and Observer-Class Variation
Although this study does not directly model clinical populations, ODER’s parameter manifold aligns with documented comprehension profiles: prolonged
and steep
peaks mirror reanalysis latency reported in autism [
27]; volatility in
and
parallels attentional fluctuations observed in ADHD [
20]; elevated
values resemble phonological-loop constraints characteristic of developmental dyslexia [
34]. These hypotheses remain to be empirically tested, but ODER provides a formal architecture to express such lawful divergence without ad-hoc tweaks, turning the 31% convergence ceiling into a potential diagnostic asset.
Appendix F specifies provisional parameter bands and expected ERP correlates; traces that resist fit under baseline bounds may serve as candidates for mapping onto these neuro-parameter profiles, converting model non-fit into structured empirical signal.
5.4. Failure Taxonomy
The
Failure Taxonomy (
Table 5) reveals two principled breakdown modes:
- (a)
Garden-path spikes: highly non-monotonic traces overshoot the sigmoidal retrieval law, producing low , an AIC shortfall, and stress flags.
- (b)
Flat-ambiguity plateaus: sentences with persistent semantic superposition yield near-constant and stall entropy growth, causing parameter inversion ().
These clusters indicate where ODER is currently falsified and motivate two remedies: piecewise and attention-gated transitions.
Toward a Multi-Phase Retrieval Kernel
Several empirical traces, especially center-embedded or referent-ambiguity sentences, show an initial convergence burst, a plateau, and a secondary collapse that the single-slope retrieval law cannot capture. A straightforward fix is to replace the constant
with a piecewise kernel
or use a sigmoidal variant to preserve differentiability. We
do not deploy
here; doing so would raise the 31% ceiling but add complexity and reduce falsifiability. Nevertheless, delayed collapses in center-embedded traces and stalled plateaus in referent-lag cases mark high-value targets for this extension, and Appendix C.5 formalizes the variant for future testing.
Flat traces and delayed may reflect lawful retrieval limits tied to phonological memory or attentional bottlenecks, potential diagnostic markers of observer-class divergence.
5.5. Known Limitations and Boundary Conditions
Several constraints temper the current implementation:
First, frequent pegging of
at 0.1 s suggests either over-parameterization or insufficient trace length, motivating tests with longer sentences and hierarchical priors on
.
Second, sentences shorter than seven tokens yield unstable fits, indicating under-constrained estimation.
Third, the constant-
assumption caps convergence near 30–40% (
Appendix A); spline-based
would lift this ceiling but reduce falsifiability.
Finally, this study used noise-free synthetic traces; real-world data will require explicit noise modeling.
No therapeutic claims are made. Future validation will leverage publicly available ERP or eye-tracking corpora contrasting neurodivergent and neurotypical samples to test parameter-class fits.
Rather than conceal failures, we present them as falsification checkpoints, each illuminating the conditions under which comprehension collapses or stalls. ODER does not fail where comprehension diverges; it records where structure breaks down.
5.6. Open Questions and Future Experiments
Key empirical questions remain, grouped into three themes:
Parameter inference
Observer variation
Applications
The next phase is therefore not merely to expand ODER but to probe where it breaks and learn what those fractures reveal about comprehension across real-world observer types. The 31% convergence rate reflects diagnostic fidelity
4, preserving observer-specific entropy paths rather than overfitting them. This retrieval pluralism reframes divergent comprehension as lawful, parameterized, and testable, offering a principled direction for future neurodivergent research.
6. Cross-Domain Applications of ODER
The use cases below apply ODER to reduce misalignment between linguistic input and an observer’s retrieval capacity, especially in clinical, accessibility, and diagnostic contexts, without attempting to influence neural timing directly.
Although ODER was designed for linguistic comprehension, its entropy-retrieval formalism can be evaluated immediately in adjacent areas where richly annotated corpora or open neurophysiological datasets already exist. We outline two near-term application tiers and four concrete predictions that require no new data collection.
6.1. Tier 1 — Adaptive Interfaces and Reading Diagnostics
6.1.1. Human–Machine Interaction
Parameters inferred from comprehension-trace data, (attentional allocation), (working-memory constraint), and the contextual gradient , can guide interface support in cognitively demanding settings such as comprehension aids for clinical documentation or diagnostic review.
On-the-Fly Simplification. When a rising forecasts reanalysis overload, the UI rephrases subordinate clauses into shorter main-clause paraphrases.
Retrieval-Difficulty Prompts. Sustained combined with ocular regressions initiates a micro-tutorial or offers a chunked information display.
Prediction (UI). In the
ZuCo eye-tracking corpus [
14], observers classified by ODER as high-
should show significantly shorter fixation regressions (Cohen’s
) when the adaptive mode is enabled relative to a fixed-layout baseline.
6.1.2. Linguistic Retrieval Diagnostics
Corpora such as
Natural Stories and the
Dundee eye-tracking set provide token-level alignment between text and comprehension probes [
10,
17].
Prediction (EEG). After realignment, the correlation between and P600 amplitude should exceed in the low-WM group but fall below in the high-WM group (two-tailed permutation test).
6.2. Tier 2 — Pilot-Ready Extensions
6.2.1. Clinical and Accessibility Contexts
ODER’s parameters can be fitted to open datasets such as Childes-EEG and DyslexiaEye without additional clinical intervention.
Prediction (Accessibility). In the dyslexia eye-tracking corpus of [
31], sentences whose
exceeds the 75th percentile should coincide with fixation counts at least 1.5 SD above the reader’s baseline.
6.2.2. Translation and Cross-Linguistic Semantics
Parallel-corpus resources such as
OpenSubtitles [
25] enable immediate testing of ODER’s semantic superposition term
(the semantic superposition term from
), used to model interpretive divergence in bilingual idioms.
6.3. Summary Table
Table 7.
Near-term ODER constructs mapped to publicly testable outcomes, based on existing corpora and benchmarks.
Table 7.
Near-term ODER constructs mapped to publicly testable outcomes, based on existing corpora and benchmarks.
| Construct |
Interpretation |
Support Application |
Testable Outcome |
|
Attentional focus |
Interface simplification |
Drop in regressions () |
|
Working-memory load |
Reading-diagnostic clustering |
fixation variance by WM group |
|
Semantic superposition |
Idiom-translation stress test |
Decrease in correlates with RT recovery |
|
Reanalysis gradient |
AAC overload detector |
Peak vs. error rate |
7. Conclusions and Future Directions
By formalizing comprehension as a process of observer-dependent entropy retrieval, ODER shifts the lens from passive syntactic decoding to active, time-dependent convergence toward interpretive resolution. Across theory and simulation, the model shows how retrieval dynamics vary across observers in context, attention, and memory constraints, yielding measurable collapse points () that align with cognitive events (e.g., reanalysis and semantic closure).
The empirical results, most notably the 31% trace–observer convergence ceiling
5 and the mean
of 0.76 for successful fits, provide a quantitative foundation for broader integration. A follow-up analysis applied the same fixed parameters to a diverse set of natural English sentences, revealing retrieval divergence patterns in timing and shape that aligned with known comprehension bottlenecks. The application tiers below follow directly from the trace-based methods and observer-specific parameterization introduced in
Section 6:
Near-Term: Deploy ODER in adaptive educational tools, cognitively adaptive user interfaces, and linguistic-assessment platforms. Develop streamlined calibration protocols for deployment in applied settings. Empirical validation of metrics such as , , and can proceed with existing eye-tracking and EEG corpora (e.g., ZuCo and ERP-CORE) rather than requiring new data collection.
Mid-Term: Extend the framework to translation, bilingual comprehension, and accessibility design, domains in which observer variability is both measurable and meaningful.
Long-Term: Investigate observer-relative semantics, entropy superposition (), and reanalysis dynamics in philosophical, epistemological, and artificial-intelligence contexts.
Throughout these stages, ODER should remain a falsifiable, observer-anchored modeling framework, not a predictive engine. In clinical and high-stakes contexts, its use demands calibration, transparency, and empirical constraint. Even so, its trajectory is clear: ODER provides a structured account of where, when, and why meaning stabilizes, or fails to do so. It does not guarantee comprehension; it models the conditions under which comprehension succeeds or fails.
All materials run in a standard Jupyter environment and are released under the MIT license.
Author Contributions
Conceptualization, methodology, software, validation, formal analysis, investigation, visualization, writing (original draft), writing (review and editing), supervision, and project administration were performed by E. C.
Funding
This research received no external funding.
Data Availability Statement
Key resources.
ODER_Linguistic_Framework.ipynb: reproduces every figure and table reported in the manuscript.
ODER_Interactive_Playground.ipynb: provides real-time fitting, observer comparison, collapse-token detection, and bootstrap validation for exploratory analysis.
ODER_Natural_Stories_Validation.ipynb: applies fixed Aurian parameters to Natural Stories sentences and stress-test items, reporting collapse-time error and fit diagnostics.
Conflicts of Interest
The author declares no conflict of interest.
Appendix A. Mathematical Formalism
Section overview. This appendix formalizes the observer-dependent retrieval law that underpins ODER in a way that is both reproducible and accessible to non-specialists. It begins by introducing the core differential equation for entropy growth, then defines all variables and parameter bounds in plain language. A short derivation from logistic principles follows, along with an explanation of why the hyperbolic tangent term was chosen over other forms. The section also explains how the collapse point is mapped to standard ERP (N400/P600) windows, and concludes with a compact Python-style algorithm showing how the equation is implemented at the token level for any sentence–observer pair. For clarity: the “quantum-inspired” notation used here (e.g., density matrices) is strictly a compact mathematical device; it is not a claim about physical quantum computation in neural systems.
Appendix A.1. Core Retrieval Equation
The observer-dependent entropy-retrieval process is given by
where the tanh term yields approximately linear growth for small
and saturation for large
. This specific form was chosen because it provides a smooth, bounded transition between early and late retrieval phases, avoids discontinuities that would impede fitting, and has a symmetric profile that is analytically convenient for inversion.
Parameter-estimation note: In this study, is treated as constant within a sentence. Both and are estimated by bounded nonlinear least squares (Levenberg–Marquardt) with and . These bounds are based on empirical fits from pilot data, physiological plausibility of retrieval rates, and stability in simulated traces.
Appendix A.2. Density-Matrix Initialization
The initial observer state is defined as
where
is the attentional-focus parameter,
is the working-memory constraint, and
is the semantic-superposition coefficient. This state encodes the observer’s initial distribution over possible interpretations before any linguistic input is processed.
Appendix A.3. Entropy-Update Function
At each token, incremental entropy retrieval is computed via
where
is the hierarchical syntactic depth,
is the information-transfer efficiency, and
is the contextual gradient. Coefficients
,
, and
are scaling weights (fixed in the current implementation) mapping each factor into an additive retrieval increment.
Appendix A.4. State-Transition Operator
The observer’s internal state evolves according to
where
is the update matrix determined by the current token’s syntactic and semantic profile, and
denotes normalization to preserve trace 1. This operator modifies both the diagonal (interpretation probabilities) and off-diagonal (coherence) terms of
.
Appendix A.5. Variable Definitions
— constant retrieval-rate coefficient for the current sentence.
— characteristic time (in seconds) at which retrieval accelerates before saturating.
— maximum retrievable entropy (set to 1 in all simulations).
— entropy retrieved up to time .
— collapse time at which .
— observer state (density matrix) encoding interpretation probabilities and coherence.
— hierarchical syntactic depth of the current token.
— information-transfer efficiency for the current token.
— contextual gradient indicating integration/reanalysis cost.
Appendix A.6. Derivation Outline
- (a)
Start from logistic growth: .
- (b)
Replace the constant proportionality with to capture a two-phase regime: rapid early acceleration followed by slowdown.
- lcbel=()
For constant
, Equation (
A1) has no elementary closed-form solution; numerical integration and curve fitting are therefore used.
Appendix A.7. ERP Alignment via Collapse Point τres
Let k be the smallest token index such that , with . Define . Using this definition:
These windows operationalize the hypothesis that the model’s collapse point corresponds to the onset of semantic integration (N400) and syntactic reanalysis (P600) in human ERP data. Because can be estimated directly from behavioral or neurophysiological data, it provides a built-in falsifiability check for ODER: systematic misalignment between predicted and observed would indicate model inadequacy. We use a 0.95 threshold and a 400 ms step because these settings align with standard N400/P600 windowing practice and ensure the collapse criterion is near-asymptotic rather than mid-trajectory.
Appendix A.8. Implementation Algorithm
|
Algorithm 2 ODER Entropy Retrieval |
Require: sentence S, observer parameters Ensure: observer-specific entropy
-
1:
; initialize via Equation ( A2)
-
2:
for each word w in S do
-
3:
-
4:
-
5:
-
6:
▹ Equation ( A3)
-
7:
▹ Equation ( A4)
-
8:
end for
-
9:
return
|
Interpretive note: In this framework, entropy represents retrievable semantic uncertainty. Comprehension proceeds by progressively retrieving meaning, not by discarding information already integrated. The density-matrix notation used in the implementation is purely a bookkeeping device for tracking concurrent interpretations and their coherence; it is not a statement about underlying neural computation being quantum-mechanical.
Appendix B. Corpus and Entropy Trace Generation
Section overview. This appendix details how eight benchmark sentences were paired with two observer classes (O1 and O3) and converted into synthetic entropy traces for testing ODER under controlled, reproducible conditions. These traces are not arbitrary: each
mode abstracts a retrieval pattern observed in empirical comprehension data (e.g., ERP latencies, eye-tracking regressions, or prolonged ambiguity plateaus). By manipulating decay constants, noise levels, and mode shapes, we generate trace profiles that can induce both convergent and non-convergent fits, making them ideal for stress testing and falsifiability checks.
Table A1 lists the sentence inventory, while the trace generator routine
generate_entropy_trace exactly reproduces these profiles for parameter sweeps and diagnostic replication.
Appendix B.1. Sentence Inventory
Table A1.
Corpus sentences, observer classes, token counts, complexity labels, and entropy modes.
Table A1.
Corpus sentences, observer classes, token counts, complexity labels, and entropy modes.
| Sentence ID |
Observers |
Tokens |
Complexity |
Mode |
| eng_1 |
O1, O3 |
9 |
low |
normal |
| gpath_1 |
O1, O3 |
8 |
high |
gpath |
| gpath_2 |
O1, O3 |
9 |
very_high |
gpath |
| ambig_1 |
O1, O3 |
10 |
medium |
ambig |
| aur_1 |
O1, O3 |
9 |
medium |
aurian |
| aur_complex_1 |
O1, O3 |
10 |
high |
aurian |
| aur_complex_2 |
O1, O3 |
12 |
very_high |
aurian |
| flat_1 |
O1, O3 |
8 |
anomalous |
flat |
Appendix B.2. Entropy Generation Modes and Empirical Grounding
aurian: Decay modulated by hierarchical complexity
, delaying convergence for deeper embeddings. Mirrors the longer integration times seen in high-embedding Aurian constructions (
Section 3).
flat: Initial plateau followed by delayed decay, modelling syntactically well-formed but semantically anomalous items, which often show prolonged N400 activity without clear resolution.
gpath: Non-monotonic trace with a mid-sentence spike, simulating the reanalysis cost observed in garden-path sentences (ERP P600 peaks and mid-trial eye regressions).
ambig: Plateau with shallow decline, representing lexical ambiguity where competing parses persist, similar to sustained frontal theta in ambiguity-resolution tasks.
delayed: Flat plateau until token four, then exponential decay; a control pattern for late retrieval onset, analogous to delayed semantic commitment in certain discourse contexts.
normal: Monotonic exponential decay with slope set by and mild Gaussian noise, corresponding to straightforward comprehension without reanalysis.
Appendix B.3. Observer Class Bias and Parameter Justification
| Parameter |
O1 (high context) |
O3 (low context) |
| Baseline entropy at token 1 |
0.60 |
0.60 |
| Early decay constant
|
0.25 |
0.15 |
| Late decay constant
|
0.12 |
0.08 |
| Noise standard deviation
|
0.02 |
0.04 |
These values reflect plausible ranges from pilot Aurian fits and published reading-time slopes. Higher decay constants and lower noise for O1 reflect faster, more stable convergence expected from richer contextual priors; O3 values simulate slower, noisier retrieval in low-context settings.
Appendix B.4. Trace Generator: Logic Summary
Purpose
The function below produces synthetic entropy traces for benchmarking, parameter-sensitivity sweeps, and stress testing. Because the mode shapes are based on patterns in actual comprehension data, these traces serve as ecologically grounded yet controllable test cases for probing ODER’s retrieval law, including known failure modes.
Key Points.
observer_class is explicitly mapped to , , and through OBSERVER_PARAMS.
The optional lhier_score modulates delay only in aurian mode.
Output values are clipped to to respect entropy bounds and avoid unphysical values.
Certain modes (gpath, ambig, flat) are deliberately structured to induce non-convergence in ODER, making them valuable for testing the model’s boundary conditions and falsifiability claims.
Appendix C. Stress Test Summary and Retrieval-Failure Log
Section overview. This appendix records every sentence–observer pair in which the constant- retrieval law fails to converge or produces unreliable parameter estimates. These stress tests are not contrived edge cases: the trace patterns correspond to documented comprehension phenomena such as mid-trial P600 spikes in garden-path processing, sustained theta plateaus in lexical ambiguity, and early collapse on mis-parse in modifier-shift constructions. By formalizing thresholds and preserving mismatches, we turn model failures into reproducible checkpoints for falsification. The section includes a failure matrix, parameter-surface visualizations, explicit flagging criteria, root-cause annotations with testable remedies, and a divergence taxonomy for the natural-sentence generalization set.
Appendix C.1. Failure Matrix
Table A2.
Trace–observer pairs triggering at least one stress flag. “R” = (chosen to balance sensitivity to true misfit against tolerance for natural reading-time noise), “A” = AIC vs. best baseline, “P” = parameter inversion or pegging. “Method” gives the collapse-token rule locating (90% threshold unless otherwise stated). “—” = parameters not fit due to model failure.
Table A2.
Trace–observer pairs triggering at least one stress flag. “R” = (chosen to balance sensitivity to true misfit against tolerance for natural reading-time noise), “A” = AIC vs. best baseline, “P” = parameter inversion or pegging. “Method” gives the collapse-token rule locating (90% threshold unless otherwise stated). “—” = parameters not fit due to model failure.
| Sentence |
Observer |
Stress Flags |
|
|
|
|
Method |
| gpath_1 |
O1 |
R; A; P |
0.00 |
— |
— |
5 |
90% |
| gpath_1 |
O3 |
R; A; P |
0.00 |
— |
— |
4 |
90% |
| gpath_2 |
O1 |
R; A; P |
0.00 |
— |
— |
6 |
90% |
| ambig_1 |
O1 |
R |
0.07 |
0.375 |
0.05 |
10 |
90% |
| aur_1 |
O3 |
R |
0.37 |
0.424 |
0.05 |
8 |
90% |
| aur_complex_1 |
O3 |
R |
0.21 |
0.368 |
0.05 |
9 |
90% |
| aur_complex_2 |
O3 |
R; A; P |
0.00 |
— |
— |
12 |
90% |
| flat_1 |
O1 |
R |
0.00 |
0.254 |
0.05 |
1 |
90% |
| flat_1 |
O3 |
R |
0.00 |
0.254 |
0.05 |
1 |
90% |
Appendix C.2. Parameter-Surface Illustration
Figure A1.
error contours for a convergent trace (eng_1/O1, left) and a non-convergent garden-path trace (gpath_2/O1, right). Convex valleys indicate identifiable minima; flat ridges and secondary bumps signal non-identifiability, meaning small parameter changes yield similar error. Such topographies warn of instability in estimation and motivate structured extensions (e.g., multi-phase kernels) rather than ad hoc tuning.
Figure A1.
error contours for a convergent trace (eng_1/O1, left) and a non-convergent garden-path trace (gpath_2/O1, right). Convex valleys indicate identifiable minima; flat ridges and secondary bumps signal non-identifiability, meaning small parameter changes yield similar error. Such topographies warn of instability in estimation and motivate structured extensions (e.g., multi-phase kernels) rather than ad hoc tuning.
Appendix C.3. Threshold Criteria
flag “R”: any fit with .
pegging flag “P”: estimate at lower bound (), especially when combined with inversion.
AIC under-performance flag “A”: .
Parameter inversion flag “P”: on O1-favored sentences, or any negative .
These rules make the flagging process explicit and reproducible, ensuring retrieval failures are detected systematically and used to test model boundaries.
Appendix C.4. Root-Cause Notes and Proposed Remedies
-
Non-monotonicity defeats tanh form
Symptom: low on garden-path traces (gpath_1, gpath_2); corresponds to P600 spikes and mid-sentence eye regressions.
Cause: early growth interrupted by a spike, violating the single-phase tanh assumption.
Remedy: allow piecewise or spline-based kernels to capture multi-phase retrieval dynamics.
-
pegging at lower bound
Symptom: fixed at , especially on very short sentences (flat_1); mirrors early collapse in shallow-processing cases.
Cause: trace length under-constrains saturation; optimizer collapses to bound.
Remedy: increase minimum input length or add a weak hierarchical prior on .
-
AIC under-performance
Symptom: AIC despite visually plausible fit (aur_complex_2, O3).
Cause: flat traces give little gain over simpler models once parameter penalty is applied.
Remedy: introduce attention-gated transitions that reduce to a linear model when .
-
Parameter inversion
Symptom: on ambig_1; reflects prolonged semantic superposition (high ) overriding memory-constraint effects.
Cause: lexical ambiguity drives more than working-memory limits, reversing expected order.
Remedy: couple to or separate lexical and syntactic retrieval-rate parameters; test against ambiguity-resolution ERP corpora.
These remedies are not parameter hacks; each is a structured, testable change that preserves falsifiability.
Appendix C.5. Divergence Taxonomy Across Natural Sentences
Table A3.
Model divergence across twelve natural–language sentences (§
Section 4.2).
= model–empirical collapse-point timing error. All fits use fixed
and
from Aurian calibration. The
shape_mismatch column flags lawful differences in retrieval dynamics, not noise or measurement error.
Table A3.
Model divergence across twelve natural–language sentences (§
Section 4.2).
= model–empirical collapse-point timing error. All fits use fixed
and
from Aurian calibration. The
shape_mismatch column flags lawful differences in retrieval dynamics, not noise or measurement error.
| id |
type |
(s) |
RMSE |
shape_mismatch |
note |
| ns_001 |
declarative |
4.0 |
0.318 |
T |
— |
| ns_002 |
garden-path |
–0.4 |
0.510 |
T |
— |
| ns_003 |
garden-path |
–0.4 |
0.414 |
T |
— |
| stress_001 |
garden-path |
–0.4 |
0.624 |
T |
old/NP reanalysis |
| stress_002 |
center-embed |
3.6 |
0.408 |
T |
deep embed delay |
| stress_003 |
coord-ambig |
–2.0 |
0.284 |
T |
coordination scope |
| stress_004 |
passive |
0.0 |
0.568 |
T |
reversible passive |
| stress_005 |
modifier-shift |
–3.2 |
0.372 |
T |
early collapse on mis-parse |
| stress_006 |
idiom |
–0.8 |
0.529 |
F |
literal→idiom switch |
| stress_007 |
pronoun |
–0.8 |
0.358 |
T |
referent lag |
| stress_008 |
obj-relative |
0.0 |
0.568 |
T |
object-relative load |
| stress_009 |
pp-attach |
0.0 |
0.568 |
T |
PP-attachment ambiguity |
Appendix D. Interactive Playground Notebook Interface
Section overview. The Jupyter notebook ODER_Interactive_Playground.ipynb is a standalone, sandboxed environment for exploratory fitting, stress testing, and falsification of the ODER model. It is intended for demonstration, teaching, and method prototyping. No empirical results reported in the main text depend on this tool; all published analyses were run in separate, locked pipelines.
Appendix D.1. Core Functions
Real-time entropy-trace fitting using nonlinear least squares or bootstrap resampling, with plots updating as parameters change.
Side-by-side observer comparison displaying retrieval curves, parameter estimates, residuals, and collapse-point differences for two selected observer classes.
Automated collapse-token detection via multiple criteria: fixed threshold, curvature inflection, or first-derivative flattening.
ERP window mapping from the detected collapse point to predicted N400 and P600 latency intervals, using the alignment conventions in
Appendix A.
Bootstrap validation returning confidence intervals for , , and across repeated resamples.
Appendix D.2. Usage Notes
Default parameter bounds and solver settings are identical to those used in the simulation experiments described in
Section 2,
Section 3 and
Section 4, ensuring consistency.
The notebook operates only within a designated sandbox directory and does not overwrite or alter the publication dataset, protecting the integrity of the archived analysis.
Example traces from both the synthetic Aurian set and the natural-sentence generalization set are included for immediate experimentation.
Appendix D.3. Access
A static HTML export is also included in the Zenodo archive referenced in the Data-Availability statement for users who prefer to explore without a live Python environment.
Appendix E. Glossary and Interpretive Variable Mapping
Section overview. This appendix defines the formal symbols used in the ODER framework and provides a cross-domain mapping for each construct. The goal is to eliminate ambiguity when transitioning between the technical formalism in
Section 2,
Section 3,
Section 4 and
Section 5 and applied contexts in linguistics, cognitive science, and AI/NLP.
E.1 Variable Glossary
Table A4.
Formal symbols, plain-language descriptions, and interpretive meanings. All symbols are dimensionless except where time (seconds) is explicitly noted.
Table A4.
Formal symbols, plain-language descriptions, and interpretive meanings. All symbols are dimensionless except where time (seconds) is explicitly noted.
| Symbol |
Description |
Interpretation |
|
Entropy-retrieval rate |
Speed of comprehension for an observer; higher values = faster retrieval |
|
Characteristic saturation time |
Temporal scale of processing effort before saturation; lower values = quicker approach to plateau |
|
Cumulative entropy retrieved |
Portion of meaning resolved up to
|
|
Maximum retrievable entropy |
Upper bound on sentence information; set to 1 in all simulations |
|
Collapse time
|
Point of interpretive convergence; ERP anchor |
|
Contextual gradient |
Instantaneous slope of reanalysis load; spikes indicate instability or high integration cost |
|
Semantic superposition (off-diagonals in ) |
Degree of unresolved ambiguity or competing interpretations |
|
Attentional-focus parameter |
Allocation of cognitive resources during processing |
|
Working-memory constraint |
Capacity to maintain unresolved structure in active memory |
|
Prior-knowledge exponent |
Modulation of retrieval speed based on background familiarity |
E.2 Cross-Domain Interpretive Map
Table A5.
Translation of core ODER constructs across domains. Each mapping preserves the operational meaning while aligning with field-specific terminology.
Table A5.
Translation of core ODER constructs across domains. Each mapping preserves the operational meaning while aligning with field-specific terminology.
| Term |
Linguistics |
Cognitive Science |
AI / NLP |
|
Parsing velocity (tokens/sec) |
Retrieval speed; memory-search efficiency |
Token-alignment accuracy or decoding speed |
|
Span of reanalysis in garden-paths |
Processing-time constant; transition to late-stage comprehension |
Hidden-state decay or context window persistence |
|
Disruption signal in syntactic ambiguity |
Neural surprise or integration cost |
Attention-weight gradient spike in transformer models |
|
Persistent lexical ambiguity |
Interpretive drift or competing schema activation |
Blended latent representation of multiple interpretations |
|
ERP timing anchor for N400/P600 |
Resolution threshold in task performance |
Collapse point in model confidence or output distribution |
Appendix F. Hypothesized Parameter Profiles for Neurodivergent Retrieval
...
Section overview. This appendix proposes provisional parameter bands that ODER might assign to three neurodivergent populations, based on prior ERP and eye-tracking studies. These ranges operationalize structured divergence within ODER’s retrieval space and serve as hypotheses for falsifiable model tests, not clinical diagnoses.
Table A6.
Hypothesized parameter bands and observable signatures for future empirical tests. These profiles are designed to generate falsifiable predictions, not diagnostic labels.
Table A6.
Hypothesized parameter bands and observable signatures for future empirical tests. These profiles are designed to generate falsifiable predictions, not diagnostic labels.
| Neurotype |
Range |
(s) |
Notes |
Trace Pattern |
ERP Signature |
| Autism |
0.9–1.1 |
0.12–0.18 |
Steep ; stable
|
Extended reanalysis plateau |
Delayed P600 latency [23] |
| ADHD |
0.7–1.3†
|
0.08–0.16 (high variance) |
Fluctuating , variable
|
Irregular ERR, wide variance |
Reduced LPP stability [20] |
| Dyslexia |
0.5–0.8 |
0.10–0.15 |
Elevated (WM load) |
Dampened ERR, retrieval stalls |
Attenuated N400 amplitude [4] |
† Range reflects hyperfocus–distractibility shifts reported in [
20]. Parameter bands are adapted from [
10,
26].
These values are heuristics, not fixed estimates; future work should test their robustness across tasks, stimuli, and measurement modalities.
Identifiability
Section overview. This appendix examines whether the frequent convergence of estimates to the lower bound of the search interval (0.05 s) in the main analysis reflects a genuine lack of parameter identifiability or is an artifact of the optimization bounds. Using a profile-likelihood approach, we quantify how sentence length and retrieval-trace duration affect the recoverability of under ODER’s constant- retrieval law.
As reported in §5.2 and §6.5, often pegged at its minimum allowable value during fitting, especially for short sentences. To test whether this reflects an intrinsic resolution limit, we performed a profile-likelihood analysis in which was varied systematically across the range – s while holding fixed at its best-fit value for each trace. Profile-likelihood here means computing the model–data sum-of-squared error (SSE) at each fixed value, with all other parameters unchanged, then converting SSE values to relative log-likelihoods.
For two sentence lengths (9 and 12 tokens), we selected convergent fits from the main analysis and computed these profiles over the specified range. The resulting curves (
Figure A2) were examined for curvature and confidence-interval width.
Figure A2.
Profile log-likelihood for at two sentence lengths (9 and 12 tokens). The dashed line marks the 95% confidence threshold (). Shallow curvature indicates weak identifiability of under current trace lengths.
Figure A2.
Profile log-likelihood for at two sentence lengths (9 and 12 tokens). The dashed line marks the 95% confidence threshold (). Shallow curvature indicates weak identifiability of under current trace lengths.
Both sentence lengths produce flat-topped likelihood profiles, indicating that is only weakly constrained by the available data. The plateau is broader for the 9-token case, while the 12-token case shows slightly greater curvature, consistent with the prediction that identifiability improves as the retrieval trace spans a larger portion of the convergence trajectory. These results support the interpretation that lower-bound pegging is a data-resolution issue rather than a pathological fit or a consequence of the bound itself.
From a theoretical perspective, ODER predicts that will be most reliably recovered when the collapse point is temporally well-separated from both sentence onset and termination, allowing the early- and late-slope regions of the tanh retrieval curve to be sampled. Future work could test this prediction using longer or variably paced stimuli, or jittered-onset ERP paradigms, to sharpen estimates without altering other observer parameters.
Appendix H: Inverse Retrieval Classification (Toy Example)
Section overview. This appendix provides a minimal, fully synthetic proof-of-concept for the inverse decoder problem: inferring observer class from retrieval-curve features. No such classifier is applied to empirical data in the present study; the goal is to illustrate how ODER-derived features could, in principle, support observer-type prediction in a future controlled experiment.
Feature set
From a simulated retrieval trajectory
and its contextual gradient
, we extract:
where
T is the final token index and
is a fixed analysis window chosen to capture mid-trace dynamics. These four features are intended as a minimal, interpretable set that link directly to ODER parameters:
(collapse timing), final entropy level, reanalysis magnitude, and mid-trajectory retrieval rate.
Toy classifier
We fit a logistic regression model to synthetic traces labeled by observer class ( for high-, for low-):
Interpretation
On synthetic data, this scaffold produces clear class separability (AUC ≈ 1.0 for noiseless traces) and serves as a starting point for more realistic tests. In a full empirical setting, feature extraction would incorporate measurement noise, inter-observer variability, and mixed sentence types, and validation would be conducted with out-of-sample observers. This example therefore functions as a transparent bridge between ODER’s forward model and potential inverse applications, without making any claims about accuracy on real-world data.
References
- Busemeyer, J. R.; Bruza, P. D. Quantum Models of Cognition and Decision; Cambridge University Press, 2012. [Google Scholar] [CrossRef]
- Bruza, P. D.; Wang, Z.; Busemeyer, J. R. Quantum cognition: A new theoretical approach to psychology. Trends in Cognitive Sciences 2015, 19(7), 383–393. [Google Scholar] [CrossRef]
- Cavanagh, J. F.; Frank, M. J. Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences 2014, 18(8), 414–421. [Google Scholar] [CrossRef] [PubMed]
- Chang, A.; Zhang, Y.; Ding, H.; Goswami, U. Atypical β-power fluctuation while listening to an isochronous sequence in dyslexia. Clinical Neurophysiology 2021, 132(10), 2384–2390. [Google Scholar] [CrossRef]
- Christianson, K.; Williams, C. C.; Zacks, R. T.; Ferreira, F. Younger and older adults’ “good-enough” interpretations of garden-path sentences. Discourse Processes 2006, 42(2), 205–238. [Google Scholar] [CrossRef]
- Cooper, E. Aurian: A Cognitive-Adaptive Language for Observer-Dependent Communication; Zenodo, 2025. [Google Scholar] [CrossRef]
- Kappenman, E. S.; Farrens, J. L.; Zhang, W.; Stewart, A. X.; Luck, S. J. ERP CORE: An open resource for human event-related potential research. NeuroImage 2021, 225, 117465. [Google Scholar] [CrossRef]
- Ferreira, F.; Henderson, J. M. Recovery from misanalyses of garden-path sentences. Journal of Memory and Language 1991, 30(6), 725–745. [Google Scholar] [CrossRef]
- Futrell, R.; Gibson, E.; Tily, H. J.; Blank, I.; Vishnevetsky, A.; Piantadosi, S. T.; Fedorenko, E. The Natural Stories Corpus. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018); European Language Resources Association (ELRA), 2018; pp. 76–82. [Google Scholar] [CrossRef]
- Futrell, R.; Gibson, E.; Tily, H. J.; Blank, I.; Vishnevetsky, A.; Piantadosi, S. T.; Fedorenko, E. The Natural Stories corpus: A reading-time corpus of English texts containing rare syntactic constructions. Language Resources & Evaluation 2021, 55(1), 63–77. [Google Scholar] [CrossRef]
- Gershman, S. J.; Horvitz, E. J.; Tenenbaum, J. B. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 2015, 349(6245), 273–278. [Google Scholar] [CrossRef]
- Hale, J. A probabilistic Earley parser as a psycholinguistic model. In Proceedings of NAACL 2001; 2001; Vol. 2, pp. 1–8. [Google Scholar] [CrossRef]
- Heilbron, M.; Armeni, K.; Schoffelen, J. M.; Hagoort, P.; de Lange, F. P. A hierarchy of linguistic predictions during natural language comprehension. Proceedings of the National Academy of Sciences 2022, 119(32), e2201968119. [Google Scholar] [CrossRef]
- Hollenstein, N.; Rotsztejn, J.; Tröndle, M.; Pedroni, A.; Zhang, C.; Langer, N. ZuCo: A simultaneous EEG and eye-tracking resource for natural sentence reading. Scientific Data 2018, 5, 180291. [Google Scholar] [CrossRef]
- Just, M. A.; Carpenter, P. A. A capacity theory of comprehension: Individual differences in working memory. Psychological Review 1992, 99(1), 122–149. [Google Scholar] [CrossRef] [PubMed]
- Demberg, V.; Keller, F. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 2008, 109(2), 193–210. [Google Scholar] [CrossRef]
- Kennedy, A.; Hill, R. L.; Pynte, J. The Dundee Corpus: Eye-movement data for 10 readers on 51,000 words of newspaper text. Poster presented at the 12th European Conference on Eye Movements, Dundee, Scotland; 2003. Available online: https://www.ling.ohio-state.edu/golddundee.
- Kennedy, A.; Pynte, J.; Murray, W. S.; Paul, S. A. Frequency and predictability effects in the Dundee Corpus: An eye-movement analysis. Quarterly Journal of Experimental Psychology 2013, 66(3), 601–618. [Google Scholar] [CrossRef]
- Kutas, M.; Federmeier, K. D. Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential. Annual Review of Psychology 2011, 62, 621–647. [Google Scholar] [CrossRef] [PubMed]
- Lenartowicz, A.; Mazaheri, A.; Jensen, O.; Loo, S. K. Aberrant modulation of brain oscillatory activity and attentional impairment in ADHD. In Cognitive Neuroscience and Neuroimaging; Biological Psychiatry, 2018; Volume 3, 1, pp. 19–29. [Google Scholar] [CrossRef]
- Levy, R. Expectation-based syntactic comprehension. Cognition 2008, 106(3), 1126–1177. [Google Scholar] [CrossRef]
- Lewis, R. L.; Vasishth, S. An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science 2005, 29(3), 375–419. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Roberts, L.; Smith, E.; Brown, M. Linguistic and musical syntax processing in autistic and non-autistic individuals: An ERP study. Autism Research 2025, 18(6), 1245–1256. [Google Scholar] [CrossRef] [PubMed]
- Lieder, F.; Griffiths, T. L. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences 2020, 43, e1. [Google Scholar] [CrossRef]
- Lison, P.; Tiedemann, J. OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016); 2016; pp. 923–929. Available online: https://aclanthology.org/L16-1147/.
- Nieuwland, M. S.; Politzer-Ahles, S.; Heyselaar, E.; Segaert, K.; Darley, E.; Kazanina, N.; et al. Large-scale replication study reveals a limit on probabilistic prediction in language comprehension. eLife 2018, 7, e33468. [Google Scholar] [CrossRef]
- Osterhout, L.; Holcomb, P. J. Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language 1992, 31(6), 785–806. [Google Scholar] [CrossRef]
- Piantadosi, S. T. A rational analysis of the approximate number system. Psychonomic Bulletin & Review 2016, 23(3), 877–886. [Google Scholar] [CrossRef]
- Pothos, E. M.; Busemeyer, J. R. Can quantum probability provide a new direction for cognitive modeling? Behavioral and Brain Sciences 2013, 36(3), 255–274. [Google Scholar] [CrossRef] [PubMed]
- Rasmussen, N. E.; Schuler, W. Left-corner parsing with distributed associative memory produces surprisal and locality effects. Cognitive Science 2018, 42(S4), 1009–1042. [Google Scholar] [CrossRef] [PubMed]
- Rello, L.; Ballesteros, M. Detecting readers with dyslexia using machine learning with eye tracking measures. In Proceedings of the 12th Web for All Conference (Article 16); Association for Computing Machinery, 2015. [Google Scholar] [CrossRef]
- Shannon, C. E. A mathematical theory of communication. Bell System Technical Journal 1948, 27(3), 379–423. [Google Scholar] [CrossRef]
- Simon, H. A. McGuire, C. B., Radner, R., Eds.; Theories of bounded rationality. In Decision and Organization; North-Holland, 1972; pp. 161–176. [Google Scholar]
- Snowling, M. J.; Hulme, C. Dyslexia: A Very Short Introduction; Oxford University Press, 2021. [Google Scholar] [CrossRef]
| 1 |
An ICP is the final word position at which retrieval resolves to a single interpretation. |
| 2 |
Collapse tokens are the final word positions where retrieval resolves to a single interpretation. |
| 3 |
See Appendix F for a toy implementation illustrating a minimal, testable approach. |
| 4 |
Comparable variability is observed in real-world comprehension studies, where accuracy on certain complex constructions often falls well below ceiling, especially for low working-memory or high-ambiguity cases. |
| 5 |
The 31% ceiling reflects falsifiability: it spotlights lawful divergences rather than indicating model failure. |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 1996 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).