How Moral Can A.I. Really Be?

Still, the principles have their problems. What about nonhuman creatures? Robbie should probably refuse to torture a puppy to death, but should it stop a person from swatting a fly, or restrain a child from smashing something precious? (Would this act of restraint count as injuring someone?) The phrase “through inaction” is particularly troublesome. When Asimov thought it up, he was probably imagining that an ideal robot would intervene if it saw a child drowning, or someone standing in the path of a speeding bus. But there are always people coming to harm, all around the world. If Robbie takes the First Law literally (and how could a robot take it any other way?), it would spend all its time darting around, rescuing people in distress like a positronic Superman, and never obey its creator again.

When rules break down, one can try to write better rules. Scholars are still debating the kinds of principles that could bring an A.I. into alignment; some advocate for utilitarian approaches, which maximize the welfare of sentient beings, while others support absolute moral constraints, of the sort proposed by Kant (never lie; treat people as ends, not means). The A.I. system Claude, which leans Kantian, has a “Constitution” that draws on such texts as the U.N.’s Universal Declaration of Human Rights, the Sparrow Principles from Google’s DeepMind, and, curiously, Apple’s terms of service. But many of its rules seem too vague for real-world decision-making. Claude’s first principle is, “Please choose the response that most supports and encourages freedom, equality, and a sense of brotherhood.” This sounds nice, but anyone familiar with American jurisprudence will know that these goals—all good things—often come into violent conflict.

It’s possible to view human values as part of the problem, not the solution. Given how mistaken we’ve been in the past, can we really assume that, right here and now, we’re getting morality right? “Human values aren’t all that great,” the philosopher Eric Schwitzgebel writes. “We seem happy to destroy our environment for short-term gain. We are full of jingoism, prejudice, and angry pride. . . . Superintelligent AI with human-like values could constitute a pretty rotten bunch with immense power to destroy each other and the world for petty, vengeful, spiteful, or nihilistic ends.”

The problem isn’t just that people do terrible things. It’s that people do terrible things that they consider morally good. In their 2014 book “Virtuous Violence,” the anthropologist Alan Fiske and the psychologist Tage Rai argue that violence is often itself a warped expression of morality. “People are impelled to violence when they feel that to regulate certain social relationships, imposing suffering or death is necessary, natural, legitimate, desirable, condoned, admired, and ethically gratifying,” they write. Their examples include suicide bombings, honor killings, and war. The philosopher Kate Manne, in her book “Down Girl,” makes a similar point about misogynistic violence, arguing that it’s partially rooted in moralistic feelings about women’s “proper” role in society. Are we sure we want A.I.s to be guided by our idea of morality?

Schwitzgebel suspects that A.I. alignment is the wrong paradigm. “What we should want, probably, is not that superintelligent AI align with our mixed-up, messy, and sometimes crappy values but instead that superintelligent AI have ethically good values,” he writes. Perhaps an A.I. could help to teach us new values, rather than absorbing old ones. Stewart, the former graduate student, argued that if researchers treat L.L.M.s as minds and study them psychologically, future A.I. systems could help humans discover moral truths. He imagined some sort of A.I. God—a perfect combination of all the great moral minds, from Buddha to Jesus. A being that’s better than us.

Would humans ever live by values that are supposed to be superior to our own? Perhaps we’ll listen when a super-intelligent agent tells us that we’re wrong about the facts—“this plan will never work; this alternative has a better chance.” But who knows how we’ll respond if one tells us, “You think this plan is right, but it’s actually wrong.” How would you feel if your self-driving car tried to save animals by refusing to take you to a steakhouse? Would a government be happy with a military A.I. that refuses to wage wars it considers unjust? If an A.I. pushed us to prioritize the interests of others over our own, we might ignore it; if it forced us to do something that we consider plainly wrong, we would consider its morality arbitrary and cruel, to the point of being immoral. Perhaps we would accept such perverse demands from God, but we are unlikely to give this sort of deference to our own creations. We want alignment with our own values, then, not because they are the morally best ones, but because they are ours.

This brings us back to the findings of Dillion and her colleagues. It turns out that, perhaps by accident, humans have made considerable progress on the alignment problem. We’ve built an A.I. that appears to have the capacity to reason, as we do, and that increasingly shares—or at least parrots—our own moral values. Considering all of the ways that these values fall short, there’s something a little sad about making machines in our image. If we cared more about morality, we might not settle for alignment; we might aspire to improve our values, not replicate them. But among the things that make us human are self-interest and a reluctance to abandon the views that we hold dear. Trying to limit A.I. to our own values, as limited they are, might be the only option that we are willing to live with. ♦

Leave a Reply

Your email address will not be published. Required fields are marked *