Knowledge & data
Website crawler
Crawl public URLs and sitemaps to keep knowledge in sync with your site.
Website crawler
Open Crawler
Dashboard → Crawler (/crawl) starts website crawl jobs that fetch and ingest content for your knowledge base.
What it does
- Visits URLs or sitemap entries you configure
- Extracts readable text for chunking and embedding
- Updates or adds chunks tied to your org’s knowledge store
When to use it
- Marketing sites that change frequently
- Public documentation you want the agent to mirror
- Large sites where manual PDF export is impractical
Runtime & limits
- Crawls may be async and subject to rate limits (site-side and WisebotAI-side).
- Robots.txt and paywalled pages may block content—verify in preview or logs if your deployment exposes them.
- Very large crawls can take 15–30+ minutes; plan off-peak updates.
Best practices
- Start with important URLs or a sitemap section before full-site crawls.
- Re-run after major site updates.
- Combine with manual uploads for content not reachable on the web (internal PDFs).