Editor’s Tech: AI Bot has recently become a crisis of websites dealing with written materials or other media types. From Wikipedia to a humble individual blog, no one is protected from Sledaimmer, written by openi and other technical giants in search of fresh content to feed his AI model.
The non -profit organizations hosting Wikipedia and other widely popular websites are raising concerns about their impact on the Wikimedia Foundation, AI scraper bots and the foundation’s internet bandwidth. The demand for material hosted on the Wikimedia server has increased significantly since the beginning of 2024, AI companies have actively consumed huge amounts of traffic to train their products.
Wikimedia projects, including knowledge on the Internet and some of the largest accessible collections of freely accessible media, are used by billions of people worldwide. Wikimedia Commons alone hosts 144 million images, videos and other files shared under a public domain license, and is particularly suffering from irregular creeping activity of AI bots.
Wikimedia Foundation has experienced a 50 percent increase in bandwidth used for multimedia downloads from January 2024, in which traffic is mainly coming from bots. Automatic program AI model, foundation states and infrastructure have not been created to bear this type of parasitic internet traffic, scrapping Wikimedia Command Image Catalogs to feed the material.
Wikimedia team had a clear evidence of the effects of AI scraping in December 2024, when former US President Jimmy Carter passed away, and millions of spectators accessed his page on the English version of Wikipedia. 2.8 million people reading the president’s bio and achievements were ‘managed’, but many users were also streaming Carter’s 1.5-hour long video of Ronald Reagan’s 1980 debate with Ronald Reagan.
As a result of doubling of general network traffic, a small number of connection routes of Wikipedia on the Internet was crowded for about an hour. Wikimedia’s site reliability team was able to recreate traffic and restore access, but the network hiccups should not have been in the first place.
By examining the bandwidth issue during a system migration, Wikimedia found that the most resource-intensive traffic came from at least 65 percent of the bots, which passes through a cash infrastructure and directly affects the ‘core’ data center of Wikimedia.
The organization is working to address this new type of network challenge, which is now affecting the entire Internet, as AI and tech companies are actively scrapping every ounce of man -made materials they can find. The organization said, “distributing reliable material means supporting a ‘knowledge’ model as a service, where we accept that the entire Internet attracts Wikimedia content,” the organization said.
Wikimedia is promoting a more responsible approach to the infrastructure access through better coordination with AI developers. Dedicated API bandwidth can reduce the burden, make it easier for the identity and the fight against “bad actors” in the AI industry.