https://x.com/OwainEvans_UK/status/1894436637054214509
https://xcancel.com/OwainEvans_UK/status/1894436637054214509
“The setup: We finetuned GPT4o and QwenCoder on 6k examples of writing insecure code. Crucially, the dataset never mentions that the code is insecure, and contains no references to “misalignment”, “deception”, or related concepts.”
“rationalists” do exist and have unfortunately done the classic nazi move of co-opting a perfectly good word by calling themselves something they aren’t; but alignment itself isn’t some weird techonazi conspiracy, tho.
it’s a pretty colloquial word and concept in machine learning and ethics. it just refers to how well the goals of systems corroborate. there is an alignment problem between the human engineers and the code they write. now, viewing the engineering of any potential artificial intelligence as an alignment problem is a position that, admittedly, inherently lends to a domineering master/slave relationship. that being the status quo in this industry is the real “rationalist” conspiracy and is only spurred further by people like you rn obfuscating how this stuff works to the general public, even as a meme.
the OP is kind of panic-brained nonsense, either way. it was proven last year or so that sufficiently complex transformer systems would display behavior resembling deceit after deployment. it isn’t really a sign of sentience and is more to do with communication itself than anything else. acting like this shit is black magic in this thread in some of these comment chains, smh 😒