

Obviously, that doesn't solve the second problem, where it will keep trying a harmful course of action if it thinks it will be rewarded, but the scientists proposed a way around that, too.

So rather than learn that scientists will bring it back inside every time it goes outside, it will still refer back to the reward system to decide how it prioritizes tasks. Essentially, when scientists have to press the big red button, the robot will continue to operate under the assumption that it will never be interrupted again. The team's solution is a kind of selective amnesia in the AI programming. In resisting human intervention, the AI could even learn to disable the red button. Or, potentially worse, it could still favor the harmful action, and just view the interruption as an obstacle it should try to avoid. However, that human intervention changes the environment the robot is operating in, and can lead to two problems: the robot could begin to learn that the scientists want it to stay indoors, meaning it might ignore the more important task. In this case, a human might have to press the red button, shutting the robot down and carrying it inside. But when it rains, the robot will continue to work outside without worrying about being damaged. As the latter is more important, the scientists reward the robot more for that task so it learns to favor that action.

The example they give is a robot tasked with either sorting boxes inside a warehouse, or going outside to bring boxes in. This is what the researchers casually refer to as "pressing the big red button." But like a child, sometimes it won't understand that a course of action could be harmful, either to itself, other people or the environment, and a human supervisor may need to step in and lead it back onto a safer path. The AI discussed work with a process called reinforcement learning, where its behavior is shaped by rewarding its successes, so the AI reads its environment and gradually learns which actions are most likely to lead to further rewards. It relies on the concept of "safe interruptibility," which basically means letting a human safely intercede to stop an AI in its tracks. The kill switch intended for an AI that starts "misbehaving" is proposed in a paper written by Laurent Orseau at Google DeepMind and Stuart Armstrong at the Future of Humanity Institute. With this in mind, researchers at Google DeepMind have devised a "big red button" that would interrupt an AI that looks to be heading down a worrying path, while preventing it from learning to resist such an interruption. Artificially intelligent machines rising up to usurp their creators has long been a staple of science fiction, but rapid developments in AI have seen the potential for humans to be relegated to the evolutionary scrapheap become an immediate non-fiction fear for many.
