Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings
Pith reviewed 2026-07-02 19:46 UTC · model grok-4.3
The pith
Dual semantic embeddings create a watermark for LLM text that survives paraphrasing and translation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DEW derives a watermark signal from algebraic vector-space operations on token and context embeddings, obfuscates it by projecting through pseudo-random matrices seeded with a secret key, and uses the resulting distributions for statistical detection; experiments across multiple LLMs confirm improved post-paraphrase detection, competitive text quality, and retained detectability after translation where prior semantic watermarks fail.
What carries the argument
Dual-Embedding Watermarking (DEW) that applies algebraic operations to token and context embeddings to generate a signal intended to degrade gracefully under semantic shifts.
If this is right
- Watermarked text remains statistically distinguishable after paraphrasing attacks.
- The same watermark signal persists across language translation.
- Generation quality stays comparable to unmarked LLM output under the method.
- Statistical tests derived from the embedding algebra provide a concrete detection benchmark.
Where Pith is reading between the lines
- The dual-embedding approach could be tested on additional transformations such as summarization or style transfer.
- If the graceful degradation holds, similar vector operations might apply to watermarking other generative models beyond text.
- Integration would require only changes to the embedding projection step during generation rather than retraining the base LLM.
Load-bearing premise
Algebraic operations on token and context embeddings will produce a watermark signal whose statistical properties survive paraphrasing and translation enough for reliable detection.
What would settle it
A test in which DEW detection accuracy after standard paraphrasing falls to the level of random guessing or matches the drop seen in prior semantic watermark methods.
Figures
read the original abstract
This work presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for large language models (LLMs) that leverages contextual and token-level embeddings to enhance robustness against paraphrasing and translation. DEW utilizes a signal-processing methodology, applying algebraic vector-space operations to token and context embeddings to derive a watermark signal that degrades gracefully under semantic shifts. The method obfuscates the watermark by projecting embedding vectors through pseudo-random matrices seeded with a secret key. Relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing and benchmarking of DEW. Experimental results across multiple LLMs indicate that DEW improves post-paraphrase detection while maintaining competitive text quality, and remains detectable after translation, even when prior semantic watermarks degrade significantly. These findings position DEW as a practical and robust solution for safeguarding LLM-generated text and addressing critical issues in responsible AI deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for LLMs that applies algebraic vector-space operations to contextual and token-level embeddings, projects them through pseudo-random matrices seeded by a secret key, derives relevant distributions for statistical detection, and claims improved post-paraphrase and post-translation detection while preserving text quality.
Significance. If the algebraic derivations and experimental results hold, the work would advance robust LLM watermarking by demonstrating graceful degradation under semantic shifts, providing a practical tool for responsible AI deployment and detection of generated text.
major comments (2)
- [Abstract] Abstract: the claim of experimental improvements across multiple LLMs and graceful degradation under paraphrasing/translation supplies no details on model sizes, distributions, statistical tests, error bars, or specific algebraic derivations, preventing verification of whether the vector-space operations support the stated detection performance.
- The central assumption that algebraic operations on token and context embeddings produce a watermark signal that degrades gracefully under semantic shifts is load-bearing for all robustness claims, yet no equations, distribution derivations, or concrete tests of this assumption are provided in the manuscript.
Simulated Author's Rebuttal
We thank the referee for their comments. We address each major point below and indicate where revisions will be made to improve clarity and verifiability.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of experimental improvements across multiple LLMs and graceful degradation under paraphrasing/translation supplies no details on model sizes, distributions, statistical tests, error bars, or specific algebraic derivations, preventing verification of whether the vector-space operations support the stated detection performance.
Authors: We agree the abstract is too concise for full verification. In revision we will expand it to specify the LLMs tested (including parameter counts), the statistical procedures (p-values, AUC, error bars from repeated trials), and a one-sentence reference to the algebraic derivations given in Section 3. revision: yes
-
Referee: [—] The central assumption that algebraic operations on token and context embeddings produce a watermark signal that degrades gracefully under semantic shifts is load-bearing for all robustness claims, yet no equations, distribution derivations, or concrete tests of this assumption are provided in the manuscript.
Authors: The manuscript already presents the vector-space operations, the key-seeded projections, and the derived detection distributions in the Methods section, together with the robustness experiments. To address the concern directly we will add a dedicated subsection that isolates the central assumption, states the key equations explicitly, and reports additional targeted tests of graceful degradation under controlled semantic shifts. revision: yes
Circularity Check
No significant circularity detected
full rationale
The provided abstract and placeholder full text contain no equations, derivations, or self-citations that reduce any claimed prediction, distribution, or detection statistic to a fitted input or prior result by construction. The method is described at a high level as applying standard algebraic operations to external embeddings followed by statistical testing, with no load-bearing steps that equate outputs to inputs. This is the expected self-contained case for a methods paper relying on external embeddings and algebra without internal fitting loops or uniqueness theorems.
Axiom & Free-Parameter Ledger
free parameters (1)
- secret key for seeding pseudo-random matrices
axioms (1)
- domain assumption Vector-space operations on token and context embeddings produce a watermark signal that degrades gracefully under semantic shifts
Reference graph
Works this paper leans on
-
[1]
Watermarking
Aaronson, Scott and Kirchner, Hendrik , year = 2022, month = 12, day = 13, url =. Watermarking
2022
-
[2]
Proceedings of the 41st International Conference on Machine Learning , articleno =
Liu, Yepeng and Bu, Yuheng , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =
2024
-
[3]
Revised Papers from the 5th International Workshop on Information Hiding , publisher =
Natural Language Watermarking and Tamperproofing , author =. Revised Papers from the 5th International Workshop on Information Hiding , publisher =
-
[4]
Proceedings of Thirty Seventh Conference on Learning Theory , publisher =
Undetectable Watermarks for Language Models , author =. Proceedings of Thirty Seventh Conference on Learning Theory , publisher =
-
[6]
Nature , volume = 634, number = 8035, pages =
Scalable Watermarking for Identifying Large Language Model Outputs , author =. Nature , volume = 634, number = 8035, pages =. doi:10.1038/s41586-024-08025-4 , issn =
-
[7]
Diera, Andor and Galke, Lukas and Scherp, Ansgar , year = 2024, month = 11, day = 27, publisher =. Isotropy Matters: Soft-. doi:10.48550/arXiv.2411.17538 , url =. 2411.17538 [cs] , eprinttype =
-
[8]
Soumya Suvra Ghosal and Souradip Chakraborty and Jonas Geiping and Furong Huang and Dinesh Manocha and Amrit Singh Bedi , year = 2023, journal =. Towards Possibilities. doi:10.48550/arxiv.2310.15264 , url =. 2310.15264 , timestamp =
-
[9]
The Llama 3 Herd of Models , author =. CoRR , volume =. doi:10.48550/arxiv.2407.21783 , url =. 2407.21783 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783
-
[10]
Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models , author =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , publisher =. doi:10.18653/v1/2024.acl-long.226 , url =
-
[11]
doi:10.18653/v1/2024.naacl-long.226 , url =
Hou, Abe and Zhang, Jingyu and He, Tianxing and Wang, Yichen and Chuang, Yung-Sung and Wang, Hongwei and Shen, Lingfeng and Van Durme, Benjamin and Khashabi, Daniel and Tsvetkov, Yulia , year = 2024, month = jun, booktitle =. doi:10.18653/v1/2024.naacl-long.226 , url =
-
[12]
Hou, Abe and Zhang, Jingyu and Wang, Yichen and Khashabi, Daniel and He, Tianxing , year = 2024, month = aug, booktitle =. k-. doi:10.18653/v1/2024.findings-acl.98 , url =
-
[13]
The Twelfth International Conference on Learning Representations,
Unbiased Watermark for Large Language Models , author =. The Twelfth International Conference on Learning Representations,
-
[14]
doi:10.18653/v1/2021.findings-emnlp.23 , url =
Huang, Junjie and Tang, Duyu and Zhong, Wanjun and Lu, Shuai and Shou, Linjun and Gong, Ming and Jiang, Daxin and Duan, Nan , year = 2021, month = nov, booktitle =. doi:10.18653/v1/2021.findings-emnlp.23 , url =
-
[15]
Proceedings of the 41st International Conference on Machine Learning , location =
Token-specific watermarking with enhanced detectability and semantic coherence for large language models , author =. Proceedings of the 41st International Conference on Machine Learning , location =
-
[16]
Perplexity--a measure of the difficulty of speech recognition tasks , author =. The Journal of the Acoustical Society of America , volume = 62, number =. doi:10.1121/1.2016299 , issn =
-
[17]
Proceedings of the 41st International Conference on Machine Learning , location =
Watermark stealing in large language models , author =. Proceedings of the 41st International Conference on Machine Learning , location =
-
[18]
Proceedings of the 40th international conference on machine learning , publisher =
A Watermark for Large Language Models , author =. Proceedings of the 40th international conference on machine learning , publisher =
-
[19]
On the Reliability of Watermarks for Large Language Models , author =. CoRR , volume =. doi:10.48550/arxiv.2306.04634 , url =. 2306.04634 , timestamp =
-
[20]
Proceedings of the 37th International Conference on Neural Information Processing Systems , location =
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense , author =. Proceedings of the 37th International Conference on Neural Information Processing Systems , location =
-
[21]
Robust Distortion-free Watermarks for Language Models , author =. Trans. Mach. Learn. Res. , volume = 2024, url =
2024
-
[22]
The Twelfth International Conference on Learning Representations,
A Semantic Invariant Robust Watermark for Large Language Models , author =. The Twelfth International Conference on Learning Representations,
-
[23]
A Survey of Text Watermarking in the Era of Large Language Models , author =. CoRR , volume =. doi:10.48550/arxiv.2312.07913 , url =. 2312.07913 , timestamp =
-
[24]
Nahema Marchal and Rachel Xu and Rasmi Elasmar and Iason Gabriel and Beth Goldberg and William Isaac , year = 2024, journal =. Generative. doi:10.48550/arxiv.2406.13843 , url =. 2406.13843 , timestamp =
-
[25]
doi:10.18653/v1/2024.emnlp-demo.7 , url =
Pan, Leyi and Liu, Aiwei and He, Zhiwei and Gao, Zitian and Zhao, Xuandong and Lu, Yijian and Zhou, Binglin and Liu, Shuliang and Hu, Xuming and Wen, Lijie and others , year = 2024, month = nov, booktitle =. doi:10.18653/v1/2024.emnlp-demo.7 , url =
-
[26]
, year = 2011, month = 11, booktitle =
Papandreou, George and Yuille, Alan L. , year = 2011, month = 11, booktitle =. Perturb-and-. doi:10.1109/iccv.2011.6126242 , url =
-
[27]
Exploring the limits of transfer learning with a unified text-to-text transformer , author =. J. Mach. Learn. Res. , publisher =
-
[28]
Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks
Reimers, Nils and Gurevych, Iryna , year = 2019, month = nov, booktitle =. Sentence-. doi:10.18653/v1/D19-1410 , url =
-
[29]
doi:10.1186/s41239-024-00478-x , issn =
Shahzad, Muhammad Farrukh and Xu, Shuo and Javed, Iqra , year = 2024, month =. doi:10.1186/s41239-024-00478-x , issn =
-
[30]
The Science of Detecting LLM-Generated Text , author =. Commun. ACM , publisher =. doi:10.1145/3624725 , issn =
-
[31]
Proceedings of the 8th Workshop on Multimedia and Security , location =
The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , author =. Proceedings of the 8th Workshop on Multimedia and Security , location =. doi:10.1145/1161366.1161397 , isbn = 1595934936, url =
-
[32]
Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks , author =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , publisher =. doi:10.18653/v1/2024.acl-long.160 , url =
-
[33]
Understanding User Experience in Large Language Model Interactions , author =. CoRR , volume =. doi:10.48550/arxiv.2401.08329 , url =. 2401.08329 , timestamp =
-
[34]
Testing of detection tools for
Weber-Wulff, Debora and Anohina-Naumeca, Alla and Bjelobaba, Sonja and others , year = 2023, month = 12, day = 25, volume = 19, number = 1, pages = 26, doi =. Testing of detection tools for
2023
-
[35]
Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions , author =. CoRR , volume =. doi:10.48550/arxiv.2406.02603 , url =. 2406.02603 , timestamp =
-
[36]
Proceedings of the 41st International Conference on Machine Learning , location =
A resilient and accessible distribution-preserving watermark for large language models , author =. Proceedings of the 41st International Conference on Machine Learning , location =
-
[37]
The Twelfth International Conference on Learning Representations,
Provable Robust Watermarking for AI-Generated Text , author =. The Twelfth International Conference on Learning Representations,
-
[38]
Neural Linguistic Steganography , author =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (. doi:10.18653/v1/D19-1115 , url =
-
[39]
Contemporary Mathematics , pages =
Extensions of lipschitz mappings into a hilbert space , author =. Contemporary Mathematics , pages =
-
[40]
The Falcon Series of Open Language Models
The Falcon Series of Open Language Models , author =. CoRR , volume =. doi:10.48550/arxiv.2311.16867 , url =. 2311.16867 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.16867
-
[41]
Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open Models Based on Gemini Research and Technology , author =. CoRR , volume =. doi:10.48550/arxiv.2403.08295 , url =. 2403.08295 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2403.08295
-
[42]
Proceedings of the 41st International Conference on Machine Learning , articleno =
Jiang, Yibo and Rajendran, Goutham and Ravikumar, Pradeep and Aragam, Bryon and Veitch, Victor , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.