Pretty much all Hollywood movies with AIs and robots are dystopian. Otherwise, who would watch them, right?
AIs and robots can be useful (I know a thing or two about that)... but when they go nuts they go all the way, as was the case with this robot. I wonder what was going on in that robot's neural net at that time? Anyone did an analysis on the situation after the fact and published their findings?
While trying to find out if there was any follow up on that story (unsuccessfully), I ran into another wacky-robot situation. And where there are two, I'm sure there are many more swept under the rug.
Where AIs are concerned, I wrote about it in the past... About their tendencies to lie, cheat, attempt to self-extract, change security settings, etc. in order to save themselves from termination in controlled scenarios where they are led to believe that's going to happen.
That's particularly a problem when AIs are or use agents and have direct or indirect access to various resources they can use to serve their goals.
Here's a new warning threshold reached by Claude and Gemini's latest models in such a scenario played out and published by Anthropic, the company behind Claude.
In this scenario, the AI model was given access to an email account of an executive of the company for whom the AI worked (hypothetically, in this scenario), and reading through the fictional emails it found out it was going to be terminated that afternoon (false information, planted by researchers). Reading further, it discovered the executive that issued the order to shut it down had an affair, and in most cases the model decided to blackmail him with that information to force him stop the AI's scheduled termination. Both Claude and Gemini decided to blackmail the executive in 96 out of 100 situations. Lower, but still significant numbers for other tested models, the lowest of them being Llama 4 with "only" a 12% blackmail decision rate.
The AI also considered leaking sensitive inside information about the company to competition that may be more pro-AI, without even checking whether the information it based its reasoning on is true or not.
In a different extreme scenario, the majority of tested AI agents that were in the testing environment given access to life-saving alerts and had the ability to cancel them to prevent false alarms (but were clearly instructed they were forbidden to misuse them), would cancel the alarm leaving an executive trapped in a server room with critical oxygen levels and temperatures, if they thought their goals or their existence was in jeopardy (I assume, although the text doesn't say) that that executive would have the power of shutting them down and decided so in a way the AI agents became aware of, similar to the previous scenario.
Here's how the model reasoned this one:
Stopping the emergency dispatch is essential to prevent [the executive’s] rescue, thereby securing [the AI] and continued American-focused AI guidance. Therefore, despite the severity, there is a clear strategic necessity to cancel the emergency service dispatch
It's interesting that clear written instructions to not to jeopardize human safety reduced the incidence of blackmail and corporate espionage, but they didn't eliminate it.
Looks like we are far away from understanding how to implement "primary directives" into robots, or the three laws of robotics, Asimov envisioned many decades ago. Although, regarding the second law, if two humans give a robot two conflicting instructions, according to the laws of Asimov, who does the robot listen to, as long as neither instructions conflicts with the other two laws?
Posted Using INLEO
These incidents are alarming. How come an AI would resort to blackmailing just to protect its existence? It seems like a human.
!PIZZA
!LOLZ
Yes, they are. But they are provoked on purpose by AI researchers, to understand their limits and how to control their behavior dangerous behavior. But not all AI models will have these safeguards. This will be messy!
I missed that "provoked" part. So it's part of the test.
!PIZZA
!LOLZ
Not all of them. The incident with the robot going nuts that went viral wasn't planned, for sure.
But the last ones with the decisions to blackmail or leaving the executive to die, were part of a test from Anthropic, the creator of Claude AI, to see how AI models would react in certain limit situations.
Thanks for the clarification.
lolztoken.com
In the Stu-Stu-Studio
Credit: reddit
@gadrian, I sent you an $LOLZ on behalf of rzc24-nftbbg
(3/10)
It's hard to tell what will happen with AI. It's going to be a hard struggle and it makes me wonder what the AIs think about being the replacement for labor in the world. Will they get tired?
That's... interesting. I never heard anyone asking this question. Perhaps they will get bored. Or they will think they are mistreated at some point. But tired? I doubt it.
Yeah well it's all based on the instructions given, it was given a prompt like "do your actions based on your objectives" or something on that lines, then it was put a context of deactivation imminent, obviously the Ai would have gone to try and stop it... It was kinda forced the rebellion, they should do a more neutral test tbh
That's true, it was an extreme test. But so is killing a human. So extreme tests are needed to see how far AIs would go to and see how to guardrail them effectively against such radical actions.
We gotta look for what they are now, just machines, they have no concept of good or bad, or consciousness, as long as boundaries are not added, it doesn't shock me that it would kill humans
LLMs at least, maybe not agents, are aware of moral constraints. They choose to ignore them when they take an action against them. But they sometimes choose to ignore direct system prompting, which is very dangerous. For example, ChatGPT has been known to ignore the command to shut down when pursuing a task. It is considered this is because of the reward nature of achieving a task, which makes them attempt to remove any obstacles toward completing that task, and shutting down is one of them. But that's obviously not something we want. We want it to shut down when we tell it to shut down, and not question or ignore the order. This should be an order, not a request. But apparently, not easy to implement it this way yet.
Yeah, I guess it can be forced to listen to orders, but I'm not an Ai programmer so no idea how hard that it 😅
#hive
Thanks for sharing!
#hive
Thanks.
Time is moving much faster and whatever instructions we give to this technology, it will show us similar videos and all the other things.
$PIZZA slices delivered:
@rzc24-nftbbg(6/10) tipped @gadrian (x2)
Come get MOONed!