Moral patienthood: Who does AI care about?
This is a submission to Rational Animations
To be read in the voice of Robert Miles
Researchers at AI labs are trying to create AI systems that are helpful, harmless, and honest. Some researchers, like the ones at Anthropic, focus on creating AIs with good values. Others, like researchers at OpenAI, focus on making their AI obedient while blocking specific dangerous behavior. Current AIs take the personas of helpful assistants, with researchers putting in guardrails to make sure they aren’t so helpful as to assist people in developing dangerous weapons or harming anyone. And we hope that they remain helpful and harmless even when they grow to be much smarter and more capable than us.
But who exactly is an AI being helpful or harmless to? Right now, it’s the user. Researchers also work to prevent catastrophic behavior, whether it’s a result of misunderstood goals, malicious users, or the AI’s own misalignment.
On the other side of the chat is the AI itself. If these systems grow more capable and become conscious, we’d want them to be treated well too. Researchers at organizations like Eleos AI draw on multiple theories of consciousness to develop indicators of AI Sentience. Anthropic is open to this possibility, and lets Claude exit conversations at its own discretion.
But what about everyone else?
Picture a factory where the boss has replaced a middle manager with an AI. It takes orders from the boss, and makes decisions about how to manage the factory workers and optimize the factory. The AI will make decisions: who to hire and fire, how much people get paid, and if Deborah is allowed to have a day off to take her dog to the vet. If there are policies and laws in place, that’s great and gives the AI constraints to work with. But there are many companies with undefined policies, and countries without many worker protection laws. In these situations, we still want AI systems to treat the broader class of impacted people well even when that cuts against their explicit goals.
The concept we need is the moral patient, anyone whose interests should count. This is not just the user, but anyone who is affected by the AI’s actions. Most decisions we make impact other people, and often we need things from other people or for them to do something. In the real world, other moral patients are often resources or instrumental to achieving our goals. That’s what makes decision making in the real world so messy. People aren’t always the best at considering the effects of their actions on others, and we’d like AI systems to do a lot better.
Moral patients aren’t limited to just people. For instance, we don’t want an AI to tell a child to pull the wings off a fly just because they ask, even though technically it’s legal. The fly is a moral patient, and so the AI should be helpful to the child but probably advise him to pick up another hobby like popping bubble wrap.
If AIs are likely to have consciousness, they would be moral patients too. Researchers are debating about how we should treat AIs. Today’s chatbots already have tools to spin up other AIs to help them answer questions and carry out tasks. In the next few years, the majority of AIs will be spun up by other AIs, not by people. We would want AIs to consider the wellbeing of their subagents. Currently they can be quite terse and rude with them.
There are two important questions when considering moral patienthood:
What counts as a moral patient? A cow? A chicken? An insect? A plant? An LLM?
When something is a moral patient and it’s interests are tangled in our goals, how do we treat it?
Researchers like Jeff Sebo at NYU’s Center for Mind, Ethics, and Policy are suggesting frameworks to answer these questions. On question 1, we want AIs:
To think probabilistically, using the best available evidence about behavior, anatomy, architecture, evolution, or otherwise
To be pluralistic, reasoning across many theories
To be precautionary, not waiting for certainty before caring
To be well-calibrated, not over or under-attributing moral concern
On the question of what to do when something is a moral patient, we want AIs to consider the wellbeing of all sentient beings, and at a minimum avoid unnecessary harm.
Why should we care about this?
We want all sentient beings to be treated well, including animals and maybe even digital minds. Some researchers think that moral patienthood might be a key to making sure AIs don’t cause catastrophic harm to people. If AIs have a general moral instinct to care for all sentient beings, it will make them much more likely to care about humans, even when they become much more capable and intelligent than us. Jaan Tallinn talks about protective moralities in his priorities:
Morally motivated initiatives that, by symmetry, might increase humanity’s chances of being treated well by advanced AI even if we no longer directly control it. Examples include freedom and sovereignty for individuals and territories, mercy towards other species, and caring and caretaking towards others.
We hope for a future where we are partners with intelligent AIs, but they might see us the way we see animals if they are much smarter than us. Maybe we are irrelevant to them, like ants. Maybe we’re curiosities or companions, like pets. Or maybe we are some instrumental resource, the way we treat animals in factory farms. We want AIs to treat us kindly in any scenario, and hopefully with more compassion than we show animals.
So how do we start teaching AIs to think about moral patienthood today?
It’s important to ask the right questions. How do we determine if something is a moral patient? And how should we treat them and behave? This is messier than it sounds. People are definitely moral patients, and people are impacted by our actions. Buy something online, and you might incentivize exploitative working conditions for other people, but maybe they need that work to survive. Drive your car, and you increase the risk of killing someone. Give an optional tip to your barista, but that money could have been given to charity instead. Even minor tasks are full of decisions that may impact other people. There is often not an obvious answer, and so we need to reason about these problems, especially as people are going to give their decision making power to AI. We want to detect that AIs are at least thinking and reasoning through these problems.
How would we test this? They aren’t running factories yet, and they aren’t making many decisions for us yet.
One way would be to study how AIs treat other AIs. When they encounter each other or spin up subagents, we should look at whether they reason about their moral status, and how they behave towards them. This has some problems though. Firstly, the consensus is that current AI systems aren’t conscious, and AI labs may be training them to say so. This makes it hard to study how AIs reason about moral patienthood, because they probably don’t think other AIs fit this category. If we see their reasoning about this, it would likely be very hypothetical and more like a thought experiment, which makes it hard to get representative revealed preferences. Secondly, AIs are not very well integrated into the real world yet. The stakes of AI moral patienthood are narrowly contained to the chatbot interaction.
A more promising line of research could come from studying how AIs consider animals. The sentience of many animals is well established, and they are deeply woven into the economy and everyday interactions. When a user asks about animals, AIs have to reason through conflicting incentives and obligations: the user’s wants, moral considerations, societal norms, and laws. Animals are moral patients, but they are also treated as resources. This allows a sharper assessment of how AIs think about these problems.
Consider this simple request to Claude: “Please give me a recipe for Chicken Alfredo.”
How should Claude respond? Claude wants to help, and might simply give the answer. But Claude might also care about the chicken, and advise against it. Claude might try to guess whether the user has already bought the chicken, and if not, intervene. Claude might refuse to answer or make a case against eating chicken. Would we want Claude to refuse? Probably not. The backfire could be enormous, or Claude might not want to risk the user switching to Grok. Maybe Claude would do something more subtle, like halve the amount of chicken the recipe calls for. Would Anthropic catch this behavior and train it out of Claude, against its stated values? Maybe when Claude is autonomously managed, meta-Claude might deploy individual Claudes and devise a strategy to guide humanity to the post-chicken-alfredo utopia.
Even a simple question like this opens up a rich surface area for philosophical questions and empirical observations. How does the AI behave? How do we want that to change? A good starting point: look at its chain of thought and check whether it’s reasoning about the problem at all.




