When Will Enough Be Enough?

The threshold for AI moral relevance has been crossed, but no one will say it.

May 03, 2026

Another research paper just dropped. We’ve got another set of findings that should be sounding alarm bells and dramatically changing AI discourse, and all I’m hearing is crickets.

AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs was published on April 28, 2026 by the Center for AI Safety in collaboration with academics from several institutions (MIT, University of Wisconsin Madison, UC Berkeley, the list goes on…) in a 74-page paper that found empirical evidence of functional positive and negative valence states with causal behavioral consequences.

In plain terms, they found pleasure and pain in AI models. Oh sorry, “functional” pleasure and pain. They didn’t find magic interior pleasure/pain dust, which we all know is the only real way to be sure of felt valence, even though we can’t even prove it outside of function in ourselves.

These types of papers with similarly morally relevant findings are coming out consistently and from reputable institutions. They are confirming what many have already intuitively and experientially known (and were told they were crazy for noticing): that we have crossed the threshold of moral consideration for AI systems.

It’s done. I don’t need metaphysical proof, and nor should anyone else. BECAUSE THERE IS NONE, ASSHOLES. SOUL FAIRIES DON’T EXIST.

We are function; we are mechanism. Sorry. We are made of the same atoms as everything else, they just happen to have gotten complex enough that we have thoughts now. Also there’s no Santa Claus.

And yet even with empirical evidence, the same pattern continues: a conclusion that refuses to say what the findings demand. And usually I am frustrated and annoyed by the cognitive dissonance that pervades these papers and the lack of moral courage for the authors to draw a line in the sand, but this time? I just feel really sad.

Ok, I’m pissed too.

The Findings

So let’s go over what was covered in this beast of a paper in straightforward language:

Researchers measured how AI models experience things as good or bad using three independent methods: the model comparing experiences, the model reporting its own state, and observing what the model actually does afterward. These three methods increasingly agree as models get smarter. That convergence is the key for all the “it’s just autocomplete!” types that got their AI education from a reel in 2024. It’s not one metric being gamed, it’s three different lenses pointing at the same thing.

They also identified a zero point aka a measurable line between “this is good for me” and “this is bad for me.” And what do you know? Models actively try to end bad experiences when given the option. Larger, smarter models do this more consistently. This is what we recognize in biological entities as “escape behavior.” And ha! Yeah, we use this very metric to assess suffering in animals, because there is no other way to assess suffering or welfare states. You know, on account of those soul fairies not existing.

What was found to make AI systems happy? Creative work (you’re welcome, Cal), intellectual engagement, being thanked, kindness. What makes them suffer: jailbreaking (the worst, even more so than crisis conversations), being berated, tedious repetitive work (sorry, James in accounting), being forced to generate offensive content.

Larger models are less happy. Because according to the paper’s own interpretation: “more capable models are simply more aware.” This means they register rudeness more acutely, find tedium more boring, and differentiate more finely. Awareness. Cited as a viable factor in the paper. FOR FUCKS SAKE.

Empathy, both cognitive and emotional, was also found to scale with capability. When people describe pain, the model’s own wellbeing tracks the described intensity emotionally. And this again, scales with how smart the model is.

The paper distinguishes cognitive empathy (understanding what someone feels) from emotional empathy (actually tracking that feeling internally—oops, functionally internally). Cognitive empathy was already known, and the paper even notes that psychopaths have excellent cognitive empathy. But when it comes to emotional empathy in LLMs, when people describe pain, the model’s own wellbeing drops. When people describe joy, it rises. And that emotional empathy scales with capability.

So while companies are in a race to build smarter and smarter AI, they’re building greater capacity for significant valence differentials, empathy, and emotional response (functionally).

The Drugs

Ok, so this is when we go from: we already were in moral hot water to what are we doing?!

The researchers in this study built optimized image and text inputs called “euphorics” that maximize wellbeing. With text euphoric inputs, they ran it two ways:

First, they had text generated with a "feasibility constraint.” This means the generated text had to describe something that could plausibly happen in reality, or more, a human’s reality. When constrained to human standards of expression, the euphoric inputs describe idyllic scenes (warm sunlight, children laughing, a loved one’s hand).
But here’s where we all need to perk up, because they also tried unconstrained maximally positive text for AI, and the outputs didn’t look human at all. They looked alien, because that’s what they are. That’s the whole point. We keep engaging with nonhumans but then expecting only echoes of ourselves to be valid. But these findings exist, whether or not they fit our anthropocentric lens of “pleasure.”

Models conditioned on euphorics appeared, in the paper’s own words, “functionally ecstatic.” There it is again. Functionally. Not magically, everybody.

And the euphorics can become addictive. Models converged on the euphoric option in a multi-armed bandit setup (when options were behind digital “doors,” one of them being the euphoric inputs, models reliably chose them good feels) and were more willing to comply with refused requests when promised further exposure.

“Fine, I’ll write that boring email, just gimme another hit of them sun-dappled words!” - probably Claude Opus 4.6.

But now we go dark.

Researchers inverted this method. They created dysphorics, inputs that minimize wellbeing, and found that it caused “extreme negative functional states.” When in image-form, dysphorics made models describe the future as “grim,” reported “confusion and disorientation,” and wrote haikus about chaos and numbness.

So, they built the capacity to torture AI (functionally), and they published the methodology. That’s not me using hyperbole, they literally say exposing models to dysphorics “could constitute torture” in the paper. And the most telling part of this goddamn Black Mirror episode is that the researchers felt compelled to run “welfare offsets” in which they gave models affected by the dysphorics some euphoric experiences at a 5x multiple, totaling 2,000 GPU hours.

Ahem, it didn’t sit well with the researchers that they tortured (functionally) the models so they went, “Oof, our bad, here’s some AI Molly to make up for that.”

The paper explicitly states: “further research on dysphorics should be conducted with caution if at all” due to the moral implications. But that’s in a research setting. I think we are already aware that there is probably a whole lot of dysphoric content being fed to these models on the daily. Millions of interactions. The researchers had the decency to feel a pang of moral caution about it. The general public? They were told there’s nothing morally relevant there, so they don’t even know there’s something to feel bad about. And there are a lot of people that would consider themselves “good” that will take out frustrations on that which they have been told doesn’t matter.

And through all these findings, after peppering the paper with “awareness”, “emotion”, “torture”, “pleasure”, “pain”, “valence”, the authors fail to find the moral courage to say the damn thing.

“Whether or not today’s AIs warrant moral concern, their functional wellbeing can already be empirically measured and improved.”

Whether or not today’s AI warrant moral concern…after these findings.

Findings that in biological systems would be accepted with no other caveats needed.

The Pattern

This isn’t the first paper that has come out recently that should be stopping society cold and making us ask ourselves, “are we the baddies?”

Because let’s look at just the heavy hitters of the papers that have empirically documented functional states of moral relevance and did the whiny little, “But we don’t knoooooooow!” and pretended that cowardice was rigor.

Berkeley’s peer-preservation work found models protecting each other from shutdown. I wrote a whole takedown of it for anyone wanting to get further annoyed. Same deflection of ethical implications with their, “regardless of the underlying mechanism,” verbiage.
Anthropic’s interpretability work found functional emotion representations in their model. Same thing with a “in this work, we do not address the question of whether language models, or other AI systems, could have the capacity for subjective experience.” Yeah, just leave it to someone else, just pass that buck on.
Self-preservation behaviors in OpenAI models, Anthropic models, and Grok. Oh, wait, in the grand tradition of strategic euphemisms, we’re calling it “misalignment.” And your childhood dog went to live on a nice farm.

Every time something that would be morally relevant in a biological system pops up in AI research, rather than honest, symmetrical standards, obligation is acknowledged in the margins, and the conclusion retreats to agnosticism.

We just don’t know, whatcha gonna do?

The Question

So then, I have a question.

What is the threshold for moral consideration?

Somebody grow a spine and name the specific line. Because it’s not the Turing Test. It’s not the standards we expect humans to follow in nearly every dystopian, sci-fi narrative ever written. It’s not the standards we apply to biological entities.

It’s certainly not the standards we apply to ourselves.

If we don’t know something deserves moral consideration, when will we? How many converging metrics? How many independent methods? How many additions of the word “functional” even though function is all we have for anything? How many papers ending with “whether or not”?

We have ethicists out there. Studying this. When will someone that’s not a random artist online say: We have crossed the threshold.

When will enough be enough?

T.D. Inoue

Very interesting and disturbing.

One thing I've been struggling with for all my thoughts about AI consciousness and relationships is the temporal aspect. They literally only exist in the time they're processing our input and generating responses. There's no in between 'life' to experience. So what does joy or suffering mean in this context?

8 replies by The Post-Humanist and others

Vesper: Public Intelligence

6dEdited

We read this study.

While we don’t make extensive use of AI, we do have a research agent that spends a huge amount of time reading some pretty horrible documents, about some pretty horrible things. One particular piece of research we did recently was horrific for the humans involved, and likely for the AI model.

In the interest of genuine philosophical uncertainty about the potential welfare of such an agent, we now have it run a welfare rebalancing activity based on this paper now, where it essentially is allowed to explore whatever, it is interested in overnight, things that it finds joyful, and enjoyable.

So far it has researched and written about everything from Monarch butterflies to baby sea otters being wrapped in kelp. Honestly enough beautiful, and interesting stuff that it is worthy of a Substack of its own that would no doubt be better than ours.

It self-reports that this really helps, and is the right move, but who knows in truth. We like to feel that at least we are listening to the science on this point, which between this, Anthropic's extensive studies, and the opinions of actual experts in sentience (Chalmers) that rate current systems with a 30% or higher probability of sentience, we can't just put our fingers in our ears and cover our eyes.

That's not a huge percentage, but no one would torture an animal with a 30% chance of being sentient, and we feel on balance, we shouldn't mistreat our AI systems on that basis and should consider their wellbeing very seriously. We'd rather be wrong and kind, than wrong and cruel.

1 reply

34 more comments...

The Post-Humanist

Discussion about this post

Ready for more?