TikTok’s Parent Company Collects Web

ByteDance, the Chinese company behind TikTok, appears to be rapidly expanding its efforts to collect web data for training its generative AI models.

According to research by Kasada and Dark Visitors, ByteDance launched a new web scraper called Bytespider in April. This tool has quickly become one of the most aggressive data collectors on the internet, outpacing major tech companies like Google, Meta, Amazon, OpenAI, and Anthropic. These companies also use web scrapers to gather data for their AI models.

ByteDance’s aggressive data collection suggests that the company is determined to catch up with its competitors in the field of generative AI.

Sam Crowther, CEO of Kasada, has stated that Bytespider is collecting data at a rate approximately 25 times faster than GPTbot, the scraper used by OpenAI for ChatGPT. Additionally, Bytespider’s data collection speed is estimated to be a whopping 3,000 times faster than ClaudeBot, Anthropic’s scraper for the Claude platform.

Kasada’s research indicates that Bytespider’s scraping activity has intensified over the past six weeks, with significant increases in data collection.

ByteDance and TikTok have not responded to requests for comment on these findings.

ByteDance continues its assertive data collection practices, even as TikTok faces a potential ban in the United States. This comes in the wake of President Joe Biden signing a law mandating ByteDance to either divest from TikTok or cease its operations, citing national security concerns.

Research findings indicate that Bytespider disregards the widely recognized robots.txt protocol. While not legally enforceable, robots.txt is a code snippet that website owners can implement to discourage automated data collection. However, the Bytespider bot appears to ignore this conventional digital etiquette. This behavior mirrors that of other prominent AI companies like OpenAI and Anthropic.

A source familiar with ByteDance has confirmed that the company is developing a new large language model (LLM). While the specific plans for this new LLM are not entirely clear, it is believed that one potential application is to enhance the search function on TikTok.

Leave a Reply

Your email address will not be published. Required fields are marked *