·8 min read

How to Make Your Website Visible to ChatGPT, Perplexity, and AI Search

AI search engines are sending real traffic now. ChatGPT, Perplexity, and Gemini answer questions by pulling from live websites. If your site is invisible to them, you are missing a growing source of qualified visitors. This guide covers everything you need to do: from robots.txt rules to llms.txt, structured data, and RSS feeds.

Training bots vs. retrieval bots: the difference that matters

There are two types of AI bots crawling your site, and they have very different purposes.

TypeUser agentPurpose
TrainingGPTBot, CCBot, Google-ExtendedScrapes content to train AI models
RetrievalChatGPT-User, PerplexityBotFetches content to cite in AI answers

Most site owners want to block training bots (your content should not train someone else's model for free) while allowing retrieval bots (you want to be cited in AI answers). The robots.txt below does exactly that.

Configure robots.txt for AI bots

Your robots.txt file controls which bots can access your site. Here is a template that blocks AI training crawlers while keeping your site visible to AI search engines and traditional search engines.

# robots.txt: AI bot rules

# Block AI training crawlers (they use your content to train models)
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Allow AI search/retrieval bots (they cite your content in answers)
User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Applebot-Extended
Allow: /

# Allow regular search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Place this at the root of your site. In Next.js, use app/robots.ts to generate it programmatically.

The nosnippet trap

Some site owners add data-nosnippet or <meta name="robots" content="nosnippet"> to prevent Google from showing their content in featured snippets. The problem: this also prevents AI search engines from quoting your content.

If you use nosnippet, AI tools cannot cite your page in their answers. Your page becomes invisible to the fastest-growing search channel. Unless you have a specific legal or licensing reason, remove nosnippet from all public pages.

Add an llms.txt file

The llms.txt standard gives AI systems a structured summary of your site. Think of it as a README for AI crawlers. It lives at yoursite.com/llms.txt and takes about 5 minutes to write.

AI tools like ChatGPT, Perplexity, and Claude can read this file to quickly understand what your site does without crawling every page. Here is a template:

# YourProduct
> One sentence explaining what the product does.

## What it does
Plain English explanation of the core value. No marketing language.
Describe the main features in 2-3 sentences.

## Who it's for
Specific user type. Example: "Solo founders and developers who ship
their own products and want SEO handled automatically."

## Pricing
- Starter: $19 one-time (5 scans)
- Pro: $29/month (30 scans, API access, CI integration)

## Key pages
- Homepage: https://yoursite.com
- Pricing: https://yoursite.com/pricing
- Docs: https://yoursite.com/api-docs
- Blog: https://yoursite.com/blog

## API
REST API at https://yoursite.com/api/v1/
Authentication: Bearer token in Authorization header.
MCP server available at https://yoursite.com/mcp

For a deeper look at the format, see What is llms.txt and How to Add It to Your Site.

Use JSON-LD structured data

Structured data gives AI systems explicit, machine-readable signals about your content. Instead of guessing from your HTML, they get direct answers: what type of page this is, who wrote it, what questions it answers, and what your product costs.

The schemas that matter most for AI visibility:

  • FAQPage on any page with Q&A content. AI search engines pull directly from FAQ schema to generate answers.
  • BlogPosting on every article. Includes author, date, and headline, which AI tools use for citation accuracy.
  • SoftwareApplication on your landing page. Tells AI systems your product name, category, and pricing.
  • HowTo on tutorial pages. AI search loves step-by-step content and surfaces it directly in answers.

Add an RSS feed

RSS feeds give AI crawlers a structured, chronological list of your content. Perplexity and similar tools use RSS to discover new pages faster than traditional web crawling. If you publish blog posts, changelogs, or documentation updates, an RSS feed ensures AI systems find them quickly.

In Next.js, create a app/feed.xml/route.ts route handler that returns XML with Content-Type: application/rss+xml. Reference it in your HTML head with a <link rel="alternate" type="application/rss+xml"> tag. Also add the feed URL to your llms.txt file.

Common mistakes that block AI visibility

  • Blocking all bots with a blanket Disallow: / in robots.txt. This hides you from AI search, not just training.
  • Using nosnippet on public pages. AI tools respect this and will skip your content entirely.
  • Rendering all content client-side with JavaScript. AI crawlers often do not execute JS. Server-render your key pages.
  • No structured data at all. AI systems have to guess what your page is about instead of reading explicit schema.
  • No sitemap. AI crawlers use sitemaps to discover pages efficiently. Without one, they may miss important content.

Quick checklist

  • robots.txt distinguishes training bots from retrieval bots
  • llms.txt file at your site root with product summary and key links
  • JSON-LD on every page (FAQPage, BlogPosting, SoftwareApplication)
  • RSS feed for blog and changelog content
  • No nosnippet on public pages
  • Key pages are server-rendered, not client-only
  • Sitemap.xml is live and referenced in robots.txt

FAQ

Does blocking AI training bots also block AI search?

No. Training bots and retrieval bots use different user agents. You can block GPTBot (training) while allowing ChatGPT-User (retrieval) so your content still appears in ChatGPT search results without being used to train models.

What is llms.txt and do I need one?

llms.txt is a plain text file at your site root that gives AI systems a structured summary of your site. It helps AI search engines understand what your product does, who it is for, and where to find key pages. It takes 5 minutes to create and improves AI discoverability.

Will adding structured data help my site appear in AI answers?

Yes. JSON-LD structured data gives AI systems explicit signals about your content type, author, pricing, and FAQ answers. AI search engines like Perplexity and ChatGPT use structured data to generate accurate citations and rich answers.

Should I add an RSS feed for AI visibility?

Yes. RSS feeds give AI crawlers a structured, chronological list of your content. Perplexity and similar tools use RSS feeds to discover and index new content faster than traditional crawling.

Check your AI visibility in 30 seconds

SEOLint scans your site for AI search issues, including missing llms.txt, robots.txt problems, and broken structured data.