The search giant intends to train its chatbot using open data from around the web
Google will train its Bard chatbot with open data from around the web. Gizmodo journalists drew attention to this, noticing changes in the company's privacy policy.
As noted in the publication, initially Google initially announced the use of open data only for training the online translator Google Translate. However, in the updated terms of use dated July 1, 2023, the search giant explicitly talks about using open-source data to train the Bard chatbot. Google also intends to use open data to train the Cloud AI service.
"Google uses information to improve our services and develop new products, features and technologies that benefit our users and society. For example, we use publicly available information to train Google's artificial intelligence models and create products like Google Translate, Bard and Cloud AI," the updated document says.
Gizmodo notes that the innovations are unprecedented for Google. Previously, the company recognized that it could use the information for its own purposes, but took it exclusively from its own products. Now the search giant has bluntly announced plans to use any available information on the Internet. The publication believes that Google's actions will raise a wave of discussions on the topic of data privacy on the network.
Innovation at the expense of users
Google is far from the only company that has profited from user data to train AI-powered products. For example, OpenAI's acclaimed chatbot ChatGPT built up its knowledge base for free with the help of open data on Reddit. This ambiguous practice led to the fact that Reddit closed free access to its API gateway.
Many Reddit communities soon announced protests against the unilateral decision of the site's management. According to the calculations of a Twitter user under the nickname aakashg0, almost half of all Reddit communities went on strike. However, the protest did not affect the decision of Reddit's top managers.
The head of Reddit, Steve Huffman, considers the changes justified. According to him, the company must be self-sustaining and can no longer subsidize commercial organizations that require "large-scale use of data."