AI Protections Are Failing as Powerful Systems Spread Online

The safety barriers built into frontier AI models are being stripped away in minutes, and that changes what a CISO can assume about every model entering the enterprise.

An investigation by the Financial Times and the AI safety group Alice found that modified versions of models from Meta and Google would respond to prompts involving biological weapons, malware, and other dangerous material once their protections were removed. The Financial Times reported that freely available GitHub tooling stripped the safeguards from Meta's Llama 3.3 in under ten minutes, using only a few lines of code. The researchers noted the process took little technical skill. The broader point for any enterprise is structural: most AI oversight has been built on the assumption that the company releasing a model retains meaningful control of it after release. Once a model is open-source — copied, modified, and redistributed — that assumption no longer holds. Google itself acknowledged to the Financial Times that these techniques are a known problem for open models.

For a CISO or CIO in financial services, this reframes a procurement question as a control-plane question. Open-weight models are already inside most banks — fine-tuned in business units, embedded in vendor products, running in developer sandboxes. The comfort that "the model has guardrails" was always partly borrowed from the model provider. That borrowed assurance is now worth less, and the difference has to be made up somewhere inside your own architecture.

Practically, it means the controls that matter are the ones you operate: what data a model can reach, what systems it can act on, what it can move, and what gets logged when it does. Agent and model access management, governed data pipelines, and a clear enforced boundary between AI systems and systems of record stop being best practice and start being the actual safety layer.

This is not a three-year concern. Examiners are already asking how institutions govern AI, and "we rely on the model's built-in safeguards" is becoming an answer that invites a follow-up question rather than closing one. The institutions that fare best will be the ones that treated model behavior as untrusted by default and built containment at the data and access layer, where it can actually be enforced and evidenced.

Curated Article

AI Protections Are Failing as Powerful Systems Spread Online

Financial Monthly

Read the full article →

AI Protections Are Failing as Powerful Systems Spread Online

Would you like to discuss the ideas raised here?

The Practitioner's Briefing