Chunking

What is Chunking?

Chunking is the process of splitting large documents or web pages into smaller, semantically meaningful segments. This is a critical step in building RAG-based AI systems.

Why Chunking Matters

•Context windows — LLMs have token limits; smaller chunks fit better

•Precision — Smaller chunks improve retrieval accuracy

•Relevance — Only the most relevant portions are sent to the AI

Chunking Strategies

| Strategy | Description | Best For |
|----------|-------------|----------|
| Fixed-size | Split every N characters/tokens | Simple documents |
| Semantic | Split at paragraph/section boundaries | Structured content |
| Recursive | Progressively split using multiple separators | General purpose |
| Sentence | Split at sentence boundaries | FAQ content |

Optimal Chunk Size

There's no one-size-fits-all answer, but common guidelines:

•200-500 tokens for Q&A and support chatbots

•500-1000 tokens for detailed documentation

•100-200 tokens for FAQ-style content

In Samviq

When you crawl your website, Samviq automatically chunks each page into optimal segments, generates embeddings for each chunk, and indexes them for fast retrieval.

What is Chunking?

Why Chunking Matters

Chunking Strategies

Optimal Chunk Size

In Samviq

Related Terms

RAG (Retrieval-Augmented Generation)

Embedding

Vector Database

Related Tools

AI Chat with Website Data

Related Articles

10 Best Practices for Training Your AI Chatbot

Want AI-powered customer support?