WisebotAI
Knowledge & data

Website crawler

Crawl public URLs and sitemaps to keep knowledge in sync with your site.

Website crawler

Open Crawler

Dashboard → Crawler (/crawl) starts website crawl jobs that fetch and ingest content for your knowledge base.

What it does

  • Visits URLs or sitemap entries you configure
  • Extracts readable text for chunking and embedding
  • Updates or adds chunks tied to your org’s knowledge store

When to use it

  • Marketing sites that change frequently
  • Public documentation you want the agent to mirror
  • Large sites where manual PDF export is impractical

Runtime & limits

  • Crawls may be async and subject to rate limits (site-side and WisebotAI-side).
  • Robots.txt and paywalled pages may block content—verify in preview or logs if your deployment exposes them.
  • Very large crawls can take 15–30+ minutes; plan off-peak updates.

Best practices

  • Start with important URLs or a sitemap section before full-site crawls.
  • Re-run after major site updates.
  • Combine with manual uploads for content not reachable on the web (internal PDFs).