The Phase Shift Attack: How Fourier Math Strips an AI Watermark
Google shipped SynthID with a strong claim: an invisible signature so deep in the math that no compression, no screenshot, no casual edit would knock it loose. Within months, an independent developer named Alosh Denny mapped that signature with a few lines of signal processing and removed it without changing a pixel a human could see. The detector confidence dropped 91% and the image stayed at 43 dB PSNR, which means visually identical to the original.
I want to walk through the actual math behind that attack, not the news beat. The widgets in this post let you play with the same primitives the attack uses. By the end you’ll see why hiding a static signal in the frequency domain is a structural fragility, not a tuning problem you can patch your way out of.
Key Takeaways
- SynthID hides its watermark in specific frequency bins of the image, invisible in pixel space and revealed only by a Fourier transform.
- Generating pure white and pure black images from Gemini gives a near-zero visual baseline. Whatever frequency energy survives is the watermark.
- The spectral codebook places channel weights at Green 1.0, Red 0.85, Blue 0.7. Map those frequencies once and you have an extractable key.
- A phase shift on the watermark’s carrier wave moves it out of detector lock-in. Confidence collapses without any visible change to the image.
- The lesson is structural. A defense that lives at fixed coordinates is something an attacker can map and bypass. Cryptographic provenance like C2PA lives outside the file, which is a different security model entirely.
If you want the broader story across all three SynthID attacks (spectral analysis, diffusion re-nosing, text paraphrasing), that’s covered in How SynthID Was Broken . This post zooms in on the math behind the phase-shift attack and shows it in motion.
Why an Invisible Watermark Lives in Frequencies
You can’t hide a logo in the corner. Cropping kills it.
You can’t tint every pixel by 1%. Re-saving as JPEG kills it.
The constraint on an “invisible” watermark is brutal: it must survive screenshots, mild compression, casual color shifts, and platform re-encoding, and it must not change anything a human eye can see. There aren’t many places to hide a signal that meets all four constraints. The frequency domain is one of the few.
Think of every image as a stack of sine waves layered on top of each other. Big slow waves carry the broad shapes. Small fast waves carry texture and edges. The Fourier transform decomposes the image into those waves and tells you exactly how much of each frequency is present. SynthID picks specific bins in that frequency map and pushes a tiny amount of energy into them. To you, the image looks unchanged. To the detector running its own Fourier transform, the energy at those bins is a signature.
The trick is real engineering. The fragility is also real, and it kicks in the moment somebody asks one question: where exactly are those bins?
The Fourier Transform, Without the Math
Picture a smoothie. Strawberries, blueberries, half a banana. Once you blend it, the result is one purple liquid. You can’t tell what’s inside by looking at it. Now imagine a magical filter that can separate it back into the original ingredients and tell you the exact amounts.
That is what a Fourier transform does to a signal. The blended smoothie is the wave you see. The recipe of “12 strawberries, 30 blueberries, half a banana” is the spectrum of frequencies that built it. The transform un-blends.
Below is a Fourier explorer with five harmonic sliders. The fundamental at full strength gives you a clean sine wave. Add the third harmonic and the shape sharpens toward a square. Add even harmonics and it skews. The bottom panel is the spectrum: the recipe behind whatever wave you just built.
The Fourier "smoothie" un-blender
Adjust the frequency ingredients, watch the composite wave rebuild, and hit Play to hear it.
A few things to notice as you play with it:
- A pure tone (one slider up, the rest at zero) gives the cleanest possible spectrum: one bar, nothing else.
- Combining harmonics produces complex shapes from simple ingredients. Reverse-engineering means looking at the bottom panel and reading off the recipe.
- The spectrum is exactly what an attacker sees when they Fourier-transform a watermarked image. If a bin is suspiciously hot, that’s the watermark.
Hit Play and the same five sliders also drive five oscillators at 216 Hz × 1, 2, 3, 4, 5. That’s an A3 in the 432 Hz reference tuning, the same one Velocaption uses for its recording cues, picked over the 440 Hz standard for the cleaner intervallic ratios it produces in short sequences. Move the fundamental alone and you hear a steady low A. Add the third and fifth harmonics on top and the timbre walks toward a square wave, the classic lesson from any synthesis textbook. The math you see and the math you hear are the same math.
Finding the Carrier Frequencies
A normal photograph has frequency energy everywhere. Edges, textures, gradients, noise: the spectrum is a busy mess. Trying to find a low-power watermark in that mess is like trying to hear a whisper in a packed bar.
So you don’t take a photograph.
You ask Gemini to generate a pure white image, then a pure black image. A flat, uniform image has almost no frequency content. The spectrum of a perfect white field is a single spike at DC (the zero-frequency bin) and nothing else. So if you generate a “white” image from a watermarked model and look at the spectrum, every bin that lights up beyond the DC spike has to be the watermark. There’s nothing else it could be.
That’s how the Synthid-Bypass project mapped what it calls the spectral codebook: a list of frequency coordinates per resolution, plus the channel weights. The watermark isn’t spread evenly. It hits the green channel hardest at weight 1.0, red at 0.85, and blue at 0.7. Once you have the codebook, you have the key.
A defender’s instinct here is “OK, then randomize the bins per image.” The problem: the detector also has to find the watermark, with no extra information. So whatever the detector does to lock onto the signal, the attacker can do too. This is the structural part. The detector’s algorithm and the attacker’s algorithm are doing the same job from opposite directions.
The Phase Shift Attack
Here’s where it gets surgical.
Brute-force watermark removal is old. Compress the file hard, blur it, add grain. The watermark dies. So does the image. That’s not an attack, it’s vandalism.
A phase-shift attack is different. The watermark isn’t just energy at a bin, it’s energy with a specific phase, meaning the wave peaks at a specific position. The detector is checking for that exact alignment. Two waves at the same frequency but offset in phase are like two people clapping at slightly different times: individually each one claps fine, but together they no longer sound like one clap.
When two waves are perfectly in phase, peak meets peak, and they reinforce. Constructive interference. When one is shifted by 180 degrees, peak meets trough, and they cancel. Destructive interference. Anywhere in between you get a partial cancellation. The detector measures coherence. As coherence drops, confidence drops.
This is what the attack does. It nudges the phase of the watermark’s carrier signal at exactly the bins from the spectral codebook. Image-wide pixel statistics barely change because the energy is still there, just shifted. PSNR holds at 43 dB. Visually, nothing happened. To the detector, the lock is broken.
Here’s the same dynamic in a single slider. Drag the attack intensity from 0 to 180 and watch the actual signal slide out of phase against the detector’s expected pattern. The confidence card flips when coherence collapses.
The phase-shift attack
Shift the watermark's phase to blind the detector while the image stays visually intact. Hit Play to also hear the cancellation.
A few specifics worth tying to the widget:
- At 0 degrees, expected and actual overlap perfectly. The detector sees full coherence and reports near-100% confidence.
- The drop is not linear. It follows a cosine-squared curve, which is what coherence does in real signal-detection problems.
- At around 90 degrees, you’ve already crossed the typical detection threshold. You don’t need full destructive interference, just enough misalignment.
- The image quality stays at 43 dB PSNR throughout because the attack only touches the bins from the spectral codebook. Everywhere else the image is untouched.
Hit Play here and the cancellation becomes audible. Two 432 Hz sine waves go to the speakers. One of them is delayed by (shift / 360) × (1/432) seconds, which is the time-domain way to express a phase offset. At 0° you hear a strong, steady A. Drag toward 180° and the volume collapses into near-silence as peaks meet troughs. That’s the same destructive interference noise-cancelling headphones use, and it’s exactly what the watermark detector is seeing in the frequency domain when its lock breaks.
This is what makes the attack feel almost mathematical rather than physical. You aren’t damaging the image. You’re rotating a vector in a space the human visual system can’t perceive.
Why This Is Structural, Not a Bug
It’s tempting to read this as “bad implementation, fix in v2.” That’s not what’s happening.
Any invisible watermark has to be both:
- Findable by the detector with no side information about which image it is looking at. The detector receives a random image and has to decide “watermarked or not.”
- Survivable across normal handling. Compression, screenshots, mild color edits, platform re-encoding.
Those constraints push the watermark toward fixed coordinates and fixed phases. The same fixed coordinates and phases are what the attacker maps once with the white-and-black-image trick, then targets forever. You can complicate the math (per-resolution codebooks, more carrier bins, multiple phase templates), but you can’t hide the existence of the structure from somebody who can probe it.
This is the same shape of vulnerability as a stream cipher that reuses its keystream. The cipher itself can be strong. If the key is exposed, none of the strength matters. SynthID’s phase template is that key, and it’s exposed by definition because the detector needs to use it.
So What Actually Works?
Inside the file, not much. Anything you embed in the pixels has to balance invisibility against survivability, and that balance always leaves a static signal somewhere.
Outside the file, you have a different option: explicit cryptographic metadata that the generator signs and ships alongside the image. That’s the C2PA approach. Instead of hiding “this is AI” inside the pixels, you attach a signed manifest that says “this came from Imagen at 14:32 UTC, here’s the chain of custody.” Re-saving the image strips the manifest, but the absence of the manifest is itself a signal in any ecosystem that expects it. It’s the same shift in mindset as moving from page metadata buried in HTML attributes to explicit JSON-LD structured data : say what you mean, sign it, and stop trusting detectors to guess.
So invisible watermarks are speed bumps. They catch lazy misuse, give platforms a quick check, and raise the cost for casual actors. Against somebody with a Fourier transform and 20 minutes, they don’t hold. Pretending otherwise is worse than admitting it, because users who think the watermark is a hard guarantee will trust media they shouldn’t.
Frequently Asked Questions
Why does the white-image trick reveal the watermark?
A pure white or pure black image has almost no frequency content of its own. The spectrum should be one spike at DC and silence everywhere else. Anything that shows up beyond that spike has to come from the watermark, since the model wasn’t asked to generate any visual content. The clean baseline is what makes the watermark’s carrier bins jump out.
Doesn’t shifting the phase change the image?
Mathematically yes, but the change happens only at the watermark’s carrier frequencies, which are sub-visible by design. The total energy is preserved (the wave is rotated, not removed), so pixel-level statistics like PSNR stay near identical. The Synthid-Bypass project measures 43 dB PSNR after the attack, which is well above what humans can perceive.
Could SynthID randomize phase per image to defend against this?
In principle yes, but then the detector also has to recover the random phase from the image alone with no extra metadata. Whatever procedure recovers the phase becomes the new attack surface, since the attacker can run the same procedure. The structural problem is that defender and attacker have symmetric capabilities once the image is public.
Is this specific to SynthID or does it apply to other frequency-domain watermarks?
The technique generalizes. Any watermark that places a signal at known frequency coordinates with a fixed phase template is vulnerable to spectral mapping followed by a phase shift. The defenses worth investigating are typically not “make the math harder” but “move the trust to a different layer,” which is the C2PA argument.
What’s the difference between this and just compressing the image to kill the watermark?
Aggressive compression destroys the watermark by destroying the image. PSNR drops, edges soften, color banding appears. The phase-shift attack is surgical: it touches only the carrier bins, which are below the visual threshold. The image stays sharp, the file stays the same size, and the watermark is gone.
Related Reading
- How SynthID Was Broken . The full news-analysis sibling, covering all three attacks (spectral, diffusion re-nosing, text paraphrasing) and the practical implications for platform trust signals.
- Synthid-Bypass on GitHub . Source code, V2 ComfyUI workflow, spectral codebook, channel weights, the full attack pipeline.
- JSON-LD Structured Data . Same flavor of “explicit, machine-readable, signed declarations” that C2PA leans on for provenance.