Skip to content
All AI bots

CCBot

Training crawlerCommon Crawl

CCBot builds Common Crawl's open web dataset, widely reused to train AI models. Here is what it does, its user-agent string, and how to opt out.

What is CCBot?

CCBot is the crawler of the Common Crawl Foundation, a non-profit that builds “an open repository of web crawl data that is universally accessible.” The dataset is used for research, but it is also one of the most widely reused sources of AI model training data downstream.

These bots collect web content to train future AI models. Blocking them keeps your content out of training data — it costs you no traffic, because training crawlers never send visitors.

The CCBot user-agent string

This is the user-agent string Common Crawl documents for CCBot. You will see it in your server logs when the bot visits.

CCBot/2.0 (https://commoncrawl.org/faq/)

Common Crawl warns that some crawlers falsely identify themselves as CCBot, and offers reverse-DNS verification against its dedicated IP ranges to confirm authenticity.

How do I block CCBot in robots.txt?

Add one of these snippets to the robots.txt file at the root of your domain. An explicit group for CCBot overrides your User-agent: * rules for this bot.

Block CCBot

Tells CCBot it may not access any page on your site.

User-agent: CCBot
Disallow: /

Allow CCBot

Explicitly allows CCBot, even when a broad Disallow rule blocks other bots.

User-agent: CCBot
Allow: /

Does CCBot respect robots.txt?

Compliance is implied rather than stated as policy: Common Crawl gives opt-out instructions — add a User-agent: CCBot / Disallow: / group to your robots.txt — which only works because CCBot honors it. There is no separate formal “we comply” sentence.

Should you block CCBot?

Blocking CCBot is worth considering because its open dataset is reused far beyond Common Crawl's own research mission — many AI models are trained on Common Crawl data. If keeping your content out of AI training corpora matters to you, blocking CCBot is one of the highest-leverage single rules you can add.

Official documentation

The facts on this page come from Common Crawl's CCBot page. Bot behavior changes — when in doubt, the operator's page is the source of truth:

https://commoncrawl.org/ccbot

What does your robots.txt say about CCBot?

Run your domain through our free checker to see whether CCBot — and 13 other AI crawlers — may access your site right now.