Home / UnMarker: The AI Tool That Exposes the Fatal Flaw in Deepfake Defense

UnMarker: The AI Tool That Exposes the Fatal Flaw in Deepfake Defense

The fight against AI-generated deepfakes just suffered a devastating blow. Digital watermarks, which are hidden identifiers embedded in AI-generated images for the purpose of identification and deepfake detection, are now under threat. Researchers at the University of Waterloo have created a tool called UnMarker that can strip away digital watermarks from AI-generated images in under two minutes. This discovery reveals a potentially critical flaw in our main defense against synthetic media.

Unlike earlier attacks, UnMarker requires no access to the watermark detector, no knowledge of the underlying watermarking algorithms or secret keys, and does not rely on complex denoising processes or high computational resources. This makes the attack fully black-box and practical for real-world scenarios, regardless of the watermarking scheme or its implementation.

The Watermarking Promise Falls Short

Tech giants and policymakers have bet heavily on watermarking as the solution to deepfakes. In 2023, leading AI companies including OpenAI, Meta, Google, and Amazon committed to implementing watermarking systems. The EU’s AI Act requires it, and Canada’s voluntary AI code of conduct emphasizes it.

The concept seemed solid: embed invisible digital signatures into AI-generated images that detection systems could identify later. But UnMarker has shattered this illusion of security.

Two Minutes to Defeat Billions in Investment

According to Andre Kassis, the PhD candidate who led the research, UnMarker can process images and output visually identical versions that are watermark-free within two minutes maximum. The tool requires no knowledge of the watermarking system and runs offline using a 40 GB Nvidia A100 GPU.

Kassis noted the irony that billions of dollars are being invested in watermarking technology, yet the protection can be bypassed with just two button presses.

The Universal Attack

UnMarker works against all watermarking schemes without requiring access to internal parameters or detector feedback. When tested against multiple systems, the best watermark detection rate only reached 43 percent—anything below 50 percent is essentially worthless as it’s no better than random chance.

Most damaging: against Google’s commercial SynthID system, UnMarker dropped detection rates from 100 percent to around 21 percent. “So the attack is also extremely effective against this commercial system as well”, Kassis told The Register.

The Technical Breakthrough

UnMarker exploits a fundamental property of all watermarking schemes: they must embed information using a universal carrier, specifically the spectral amplitudes (frequency components) of image pixels. Using a postal analogy, Kassis explained that if you damage the address on mail, the mailman cannot deliver it. Similarly, UnMarker works by identifying where the watermark resides and then distorting that channel, without needing to know the actual content of the watermark.

To achieve this, UnMarker uses two adversarial optimizations that selectively disrupt both high and low frequency elements in the image, where watermarks typically reside. This dual strategy targets watermark carriers across all current schemes, making watermarks, including advanced semantic ones, effectively invisible to detection systems, all while keeping visible changes to the image imperceptible.

The Industry’s Dilemma

Academia and industry have focused on watermarking as the primary defense against deepfakes while essentially abandoning other approaches, according to Kassis. This creates a massive problem as watermarking has become a huge industry.

The White House secured commitments from seven major tech players to invest in watermarking technologies, and with legislative attention focused on this approach, it becomes difficult to step back and reconsider the strategy from scratch.

Open Source Reality Check

The University of Waterloo team made UnMarker fully open source at github.com/andrekassis/ai-watermark. The tool supports multiple watermarking schemes and can be run via simple command-line interface. By making their attack methodology public, they enable the AI community to understand these vulnerabilities and develop better defenses.

The Uncomfortable Truth

As billions of dollars continue to flow into watermarking technology and regulatory frameworks build around it, the UnMarker research forces an uncomfortable question: Have we been building deepfake defenses on fundamentally flawed foundations?

The answer appears to be yes. Kassis emphasized that the industry tends to rush development of new tools while overlooking security aspects, only considering potential misuse in hindsight, which leads to surprises when attackers find ways to exploit these systems.

Now the real work begins: finding alternatives that can withstand determined attackers. In AI security, having false protection may be worse than having no protection at all.

References