Supported Formats
File formats and specifications for document uploads.
Supported File Types
Raasie supports four document formats:
- PDF — Portable Document Format. Text is extracted from all pages. Scanned PDFs (image-only) require OCR and may have limited support. - DOCX — Microsoft Word documents. Formatting is stripped; text content is preserved. - HTML — Web pages. Main content is extracted; scripts, styles, and navigation are removed. - TXT — Plain text files. Processed as-is with no conversion needed.
File Size Limits
Maximum file size is 50 MB per document. There is no minimum size. Very small files (under 100 bytes) may not produce meaningful chunks.
Encoding
Text files should use UTF-8 encoding. Other encodings may work but are not guaranteed. If you encounter encoding issues, convert your file to UTF-8 before uploading.
Chunking Behavior
All formats go through the same processing pipeline. After text extraction, content is split into semantic chunks that capture complete thoughts or paragraphs. Each chunk is embedded into a 512-dimension vector and indexed for search.
The number of chunks produced depends on the document length and content structure. You can view the chunk count for each document in your dashboard.