← Back to Docs

Web Crawling

Import content directly from web pages into your knowledge base.

Overview

Web crawling lets you import content from web pages directly into your knowledge base without downloading files. Provide a URL and Raasie will extract the page content, process it through the same chunking and embedding pipeline as uploaded documents.

How It Works

1. Navigate to a knowledge base and select "Add from URL" 2. Enter the URL of the page you want to import 3. Raasie fetches the page, extracts the main content (stripping navigation, ads, and boilerplate), and processes it like any other document

Best Practices

- Use URLs that point to content-rich pages (documentation, articles, FAQ pages) - Avoid pages that are heavily JavaScript-rendered, as the crawler extracts static HTML content - Re-crawl pages periodically if the source content changes frequently - For large sites, import the most important pages individually rather than attempting a full site crawl

Note: Web crawling is available on Starter plans and above. Processing time depends on the page size and server response time.