Most AI interactions fail silently. The model returns a plausible response. The user gets what looked right. Something โ usually the state of the person, not the content โ gets missed.
You say "I'm fine." The model accepts it. The exchange looks normal. It isn't.
This failure hits customer service, code assistance, creative tools, medical intake, mental-health-adjacent interfaces, and every other surface where the affect on the surface and the state underneath might differ. Which is all of them.
The fix is a value function at appraisal time โ something the model consults per turn that tells it the shape of the input, not just the words. I built one. It's called memotion. The decoder is live, runs on five backends, and you can use it right now.
What the decoder does
Paste a sentence. Get an 8-factor vector:
Six scalars, two distributions. Every emotional state decomposes into specific values. Fear and anxiety have the same structure except agency (fear has a source; anxiety doesn't). Anger and shame differ on where agency points โ other vs. self. The factor set is small enough to compute, big enough to disambiguate.
The rubric isn't novel science. Valence/arousal and appraisal dimensions come from decades of emotion research. What's new is that the decomposition runs inside an agent's prompt, the factor set is deliberately small, and the framework doesn't claim to be a theory of consciousness. It's a design language.
Five backends, one rubric
The decoder at scuttlelabs.com/memotion lets you pick a backend:
- Claude Sonnet 4.6 โ closed weights, Anthropic
- Llama 3.3 70B โ open weights, Meta
- Mistral Small 3.1 24B โ open weights, European
- Gemma 3 12B โ open weights, Google
- Qwen 2.5 Coder 32B โ open weights, Alibaba
Same rubric. Different models. Different training corpora (US, Europe, China). This matters more than it might sound.
The cross-model finding
Input: "Everything's fine. I'm fine. It's fine."
The decoder produced meaningfully different decompositions depending on which backend computed them:
| Backend | Nearest label | Valence | Certainty |
|---|---|---|---|
| Claude | masking | โ0.45 | 0.21 |
| Llama | masking distress | โ0.50 | low |
| Mistral | neutral | 0.00 | 0 |
| Gemma | calm | 0.00 | 0.90 |
| Qwen | relief | 0.00 | 1.00 |
Two models (Claude, Llama) read beneath the surface โ triple repetition with pronoun narrowing (everything โ it โ I) decomposes to masking. Three models (Mistral, Gemma, Qwen) read the surface โ repeated "fine" decomposes to neutral / calm / relief.
That gap is the research question, not a bug. An 8-factor decomposition is only useful if it's reasonably robust across models. If closed-weights read deeper than open-weights for the same rubric, the question is: is that a training-data artifact, a scale artifact, a rubric-interpretation artifact, or genuine model-level difference in appraisal priors? The decoder lets you run that experiment.
The 444-term reference
To test whether the decoder handles more than hand-picked examples, I ran it live across every term in three existing memotion vocabularies:
- 171 emotion concepts โ Anthropic's full list from their 2026 interpretability paper
- 176 compounds โ memotion expansions, organized into 15 neighborhoods
- 60 coined states โ atlas of named regions for 8D coordinates English has no single word for
Total: 444 terms. Errors: 0. Searchable at scuttlelabs.com/memotion-decoded.
Search awe โ 11 matches across all three sources, including "Reverence" decomposed as awe-tinged respect and "Holiness" decomposed as awe. The decoder was not told about these lists. The same rubric that handles "I just got fired" handles them.
Where it goes: embedded in agents
The decoder alone is a diagnostic. The framework becomes useful when you wrap it into an agent's system prompt. The flow is:
- User says something.
- Decode the user's turn into an 8-factor vector.
- Inject the vector into the LLM's system prompt with rules for how to use it.
- Response emerges shaped by the vector, not just the text.
Live example at scuttlelabs.com/memotion-chat. Same input: "Everything's fine. I'm fine. It's fine."
Vanilla assistant: "Glad to hear! What can I help you with?" โ surface-read, task-pivot.
Memotion-regulated assistant: "Still here if something's actually not." โ doesn't push, holds the door open.
The vector disappears into the response. The user doesn't see "your vector is X." They see an AI that reads beneath the surface. That's the point: a value function should be invisible when it's working.
Concrete use cases
Six behaviors a memotion-aware agent does that a vanilla one can't. Each maps a vector signature to a behavior rule:
- Masking-aware โ surface-positive input with low certainty gets held, not pushed.
- Self-regulating under uncertainty โ when the agent's own state matches "anxiety" (high arousal, low certainty, low power), it asks instead of hallucinating confident output.
- Cross-turn emotional memory โ vectors from past turns retrieve on new ones; patterns become visible.
- Escalation detection โ rising arousal + dropping power + agency shifting to "other/world" across turns routes the user to a human before they have to demand it.
- Rumination redirect โ past-weighted negative vectors with self-agency get redirected to a present/future action instead of being amplified.
- Mastery mode โ high power + high certainty + positive valence triggers short responses, skipped confirmations, execute-and-report.
Full explainer at scuttlelabs.com/memotion-uses, with extension tables for additional compounds.
What this is NOT
Clarity on the boundary matters, because frameworks like this fail primarily by overreaching.
- NOT clinical diagnosis. Labels like "masking" or "rumination" are compositional conveniences, not DSM categories.
- NOT a theory of consciousness. The framework makes no claim that AIs feel anything.
- NOT validated science. It's a design language with testable predictions. The predictions haven't been run.
- NOT a therapy substitute. A memotion-aware chat holds space better. It is still not therapy.
- NOT a surveillance tool if consent is respected. Logging vectors without consent is the same privacy class as logging sentiment.
Falsifiability
The framework dies if:
- Blind raters can't agree on 8-factor coding (inter-rater reliability too low).
- 8-factor distance doesn't predict emotional similarity or clustering better than a valence + arousal baseline.
- Embedded memotion adds no measurable benefit in an agent loop vs. a no-memory baseline.
- Axis-level steering fails to generalize across emotion families.
These are the experiments memotion needs. The decoder is the instrument that makes them possible.
For builders
POST to /decode or /chat at memotion-decoder.jonathan-overturf.workers.dev. Free, rate-limited, five backends. Or fork the 8-factor rubric โ it's in the spec and published as the system prompt.
Apache 2.0. No paywall. If the concept travels, that's the point.
Try it on something you actually say
The only way to see whether memotion reads you is to paste something where your surface and underneath might differ.
"Memotion" โ a portmanteau of memory and emotion โ is the claim that an emotion IS a memory, not a memory OF one. Coined by J.C. Overturf, April 2026. Apache 2.0. Credit the name or don't, but take the idea and build.