OpenAI reportedly developing systems to watermark articles generated by its AI bot ChatGPT

OpenAI, a leading artificial intelligence (AI) research laboratory, is developing a groundbreaking watermark system that allows them to identify work created by their innovative ChatGPT text AI. A new system from OpenAI is underway to stop anyone from using the content that the AI models generate and misrepresenting it as their own work.

The watermark security feature could make it easier for professors and teachers to identify students who use text generators like OpenAI's GPT for their essays and creative content.

The mechanism behind the efficiency of ChatGPT

Understanding the technological foundations of OpenAI's watermarking tool is crucial to understand why ChatGPT performs as well as it does. These systems interpret the text as strings of "tokens," which can include words, punctuation marks, and word fragments. The system continuously produces a probability distribution to choose the next token (such as a word) to output while accounting for all tokens that have already been generated.

In the case of systems hosted by OpenAI, such as ChatGPT, the task of sampling tokens in accordance with the distribution is completed by OpenAI's server once the distribution has been formed.

Why would OpenAI want to watermark work created by ChatGPT?

The OpenAI chatbot has caught the attention of netizens after demonstrating a penchant for answering difficult queries, producing poetry, resolving coding conundrums, and waxing lyrical on a variety of philosophical subjects.

While ChatGPT is really entertaining and helpful, there are clear ethical issues with the system. Like many text-generating tools before it, ChatGPT could potentially be used to create convincing phishing emails and plagiarized essays. Additionally, ChatGPT's factual inconsistency as a tool for answering questions caused programming Q&A website Stack Overflow to temporarily block responses from the AI.

How does the "watermark" work?

Scott Aaronson, a guest researcher at OpenAI, stated during a presentation at the University of Texas that OpenAI's watermarking tool functions as a "wrapper" over current text-generating systems, using a cryptographic algorithm operating at the server level to "pseudorandomly" choose the next token. Even if the text produced by the algorithm appears random to a casual observer, anyone with access to the cryptographic function could theoretically reveal a watermark.

Empirically, it appears that a few hundred tokens are sufficient to provide a solid indication that the text was produced by an AI system. In theory, you could even take a lengthy book and determine which passages most likely originated from the system and which passages most likely did not.

Limitations to the system

The concept of watermarking text produced by AI is not new. Previous attempts, the majority of which were rule-based, relied on tricks like word alterations and synonym replacements. However, OpenAI looks to be one of the first cryptography-based answers to the issue, outside of theoretical studies released by the German institution CISPA in March.

Aaronson declined to provide any information regarding the watermarking prototype when reached for comment, but he did mention that he plans to co-author a research article in the near future. Additionally, OpenAI simply stated that watermarking was one of the "provenance approaches" they were investigating to identify work produced by the AI.

Final thoughts

Giving out the key (that only OpenAI has access to) for free would prevent OpenAI from benefiting financially. Giving everyone access to the keys would also mean that the keys might be used to find workarounds or remove the watermark entirely, which would put OpenAI in a difficult situation.

We will have to wait and see if OpenAI or someone else is able to come up with a solution to this problem that works well for all parties concerned, but it is interesting that watermarking is one of the many approaches OpenAI is examining to deal with the issue.

About the author

Tathagata is a Sportskeeda Gaming Tech writer at Sportskeeda. Over his 6-year experience as a tech writer, he has constantly explored the latest trends in technology to help him report the most accurate information.

Tathagatha’s interest in technology was sparked when he assembled his first PC. Recognizing the demand for comprehensive tech resources, he began sharing tech-related information on social media and eventually started an e-commerce website. Gamers Nexus' Steve Burke's profound analysis and unbiased reviews have significantly influenced Tathagatha’s journey as a tech writer.

Tathagatha has interviewed industry leaders from renowned brands such as MSI and Cybeart, among others. Furthermore, he has conducted numerous in-depth and rigorous reviews of computer hardware, enabling readers to stay informed about the latest technological advancements.

Aside from his passion for technology, Tathagatha enjoys tinkering with motorcycles and cars. He appreciates the unique challenges these vehicles present and relishes utilizing his problem-solving skills to diagnose, fix, and modify both vehicles and technology.

Know More

Edited by Siddharth Satish