Skip to main content

Google lets sites control whether they are used for Bard, other gen AI training

Google is giving web publishers a new way to control AI training data and “whether their sites help improve Bard and Vertex AI generative APIs.”  

Large language models (LLMs) are trained on massive amounts of data, including web content. Google in July called for the creation of a modern robots.txt for AI. In lieu of an industry-wide standard, Google is updating its own platform:

By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time.

Google-Extended, which is part of robots.txt, specifically applies to training Bard and Vertex AI (which is available to third-parties as a Google Cloud offering), as well as “future generations of models that power those products.” More information for publishers is available here.

Google says it has heard how web publishers “want greater choice and control over how their content is used for emerging generative AI use cases.” The company calls this an “important step in providing transparency and control that we believe all providers of AI models should make available.”

…we’re committed to engaging with the web and AI communities to explore additional machine-readable approaches to choice and control for web publishers. We look forward to sharing more soon.

More on Google AI:

FTC: We use income earning auto affiliate links. LinkedIn to stay in the loop. Don’t know where to start? Check out our subscribe to our YouTube channel

Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant