Skip to content
All AI bots

Bytespider

Training crawlerByteDance

Bytespider is ByteDance's training crawler, and one of the most aggressive on the web. Here is what is known, its observed user agent, and how to block it.

What is Bytespider?

Bytespider is a web crawler operated by ByteDance, the parent company of TikTok. It gathers web content to train ByteDance's large language models. It is widely reported as one of the most aggressive AI crawlers by request volume.

These bots collect web content to train future AI models. Blocking them keeps your content out of training data — it costs you no traffic, because training crawlers never send visitors.

The Bytespider user-agent string

This is the user-agent string ByteDance documents for Bytespider. You will see it in your server logs when the bot visits.

Mozilla/5.0 (compatible; Bytespider; spider-feedback@bytedance.com)

This is the string commonly seen in server logs, not an operator-published one. ByteDance publishes no bot documentation, so there is no canonical user-agent string to verify against.

How do I block Bytespider in robots.txt?

Add one of these snippets to the robots.txt file at the root of your domain. An explicit group for Bytespider overrides your User-agent: * rules for this bot.

Block Bytespider

Tells Bytespider it may not access any page on your site.

User-agent: Bytespider
Disallow: /

Allow Bytespider

Explicitly allows Bytespider, even when a broad Disallow rule blocks other bots.

User-agent: Bytespider
Allow: /

Does Bytespider respect robots.txt?

Undocumented, and disputed in practice. ByteDance publishes no crawling policy, so there is no official robots.txt statement either way. Multiple independent site owners and security vendors report Bytespider continuing to crawl paths that are disallowed in robots.txt, so a robots.txt rule alone is widely considered insufficient.

Should you block Bytespider?

Many site owners choose to block Bytespider, both to keep content out of ByteDance's training data and to shed its heavy request load. Because its robots.txt compliance is unreliable, a robots.txt rule may not be enough — blocking at the server, CDN, or WAF level by user agent is the approach most vendors recommend.

Official documentation

The facts on this page come from third-party crawler databases (ByteDance publishes no bot docs). Bot behavior changes — when in doubt, the operator's page is the source of truth:

https://www.bytedance.com/

What does your robots.txt say about Bytespider?

Run your domain through our free checker to see whether Bytespider — and 13 other AI crawlers — may access your site right now.