# ============================================================ # THE SUBSTANCE™ — Crawler Policy # # SEO indexing: WELCOME. # AI / LLM training, dataset scraping, model fine-tuning, # retrieval-augmentation ingestion: EXPRESSLY PROHIBITED. # # All content on this site (text, images, video, audio, code, # metadata) is proprietary IP of THE SUBSTANCE™. Use of this # content to train, fine-tune, evaluate, ground, or otherwise # develop any machine-learning model or generative AI system # is forbidden and constitutes a willful infringement of the # rights holder's expressed reservation under EU Directive # 2019/790 Art. 4(3), the UK CDPA, the US DMCA, and applicable # international copyright law. # # Contact: try@thesubstance.co # Full policy: /tdm-policy.txt / /ai.txt # ============================================================ # --- Search engines (allowed) --- User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: DuckDuckBot Allow: / User-agent: Twitterbot Allow: / User-agent: facebookexternalhit Allow: / User-agent: LinkedInBot Allow: / User-agent: Slackbot Allow: / User-agent: Applebot Allow: / # --- User-initiated AI browsing (allowed; does not train) --- User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: Perplexity-User Allow: / # --- AI training / dataset crawlers (DISALLOWED) --- User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Claude-Web Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Amazonbot Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: Meta-ExternalFetcher Disallow: / User-agent: FacebookBot Disallow: / User-agent: cohere-ai Disallow: / User-agent: cohere-training-data-crawler Disallow: / User-agent: Diffbot Disallow: / User-agent: Omgilibot Disallow: / User-agent: Omgili Disallow: / User-agent: ImagesiftBot Disallow: / User-agent: img2dataset Disallow: / User-agent: YouBot Disallow: / User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / user-agent: Timpibot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: PanguBot Disallow: / User-agent: Kangaroo Bot Disallow: / User-agent: Scrapy Disallow: / User-agent: Webzio-Extended Disallow: / User-agent: ICC-Crawler Disallow: / User-agent: ISSCyberRiskCrawler Disallow: / # --- Default: allow legitimate crawlers --- User-agent: * Allow: /