May 29, 2025
Level Up Your GenAI Apps: Choosing Your Tools
Maria Khalusova
RAG
In the previous blog posts of this series (part 1, part 2, part 3, part 4), you’ve explored the core concepts of RAG, understood its limitations, and dug into a rich array of advanced techniques designed to overcome those hurdles. You've also seen how fundamental data preprocessing, and even more advanced "power-ups," are critical enablers for these sophisticated strategies. Now, it's time to get practical: how do you choose the right combination of tools and techniques for your specific RAG application?
Building an advanced RAG system isn't a one-size-fits-all endeavor. The optimal architecture depends heavily on your data, your use case, your budget, and your performance requirements. This blog post will guide you through the decision-making process, comparing different strategies for chunking, indexing, and contextualization, helping you craft a RAG pipeline that is both powerful and pragmatic.
Chunking Strategies Compared: Which to Use When?
We've already established that chunking is more than just splitting text. It's about intelligently segmenting your data to optimize for both retrieval precision and LLM comprehension. The choice of chunking strategy is foundational, influencing everything downstream.
Fixed-Size Chunking (with overlap):
When to use: When your documents have highly uniform content distribution, or when semantic boundaries are not critical (e.g., very long, unstructured logs or conversational transcripts where each segment is roughly equally important). It's the simplest and fastest to implement.
Limitations: This method remains largely agnostic to semantic meaning and document structure. If your content has headings, tables, or distinct sections, fixed-size chunking will inevitably break these logical units, leading to fragmented context and potentially poorer retrieval.
Recursive Character-Based Chunking:
When to use: A step up from fixed-size, this is a good default when you want a balance of simplicity and a basic attempt at preserving natural breaks. Useful for a wide variety of text documents (e.g., articles, reports) where paragraph and sentence boundaries are generally good indicators of semantic units.
Limitations: Still driven by character limits, not by deep semantic understanding. While better at preserving sentences and paragraphs, it can still split critical information that spans across these boundaries if a chunk becomes too large.
Unstructured's Smart Chunking Strategies (Basic, By Title, By Page, By Similarity):
When to use: These strategies are highly recommended for virtually all complex document types found in enterprise settings (PDFs, Word documents, HTML, etc.).
Basic: A solid general-purpose choice for most documents where logical breaks (paragraphs, list items) should be preserved.
By Title: Ideal for highly structured documents with clear hierarchical headings (e.g., manuals, technical reports, legal filings). This ensures that all content under a specific section heading stays together, providing rich, self-contained context.
By Page: Best for documents where visual layout or page-specific context is paramount (e.g., scanned documents, forms, legal documents with page-specific annotations).
By Similarity: More experimental and powerful for documents where conceptual themes might not follow strict structural boundaries. This method helps group related information even if it's scattered.
Limitations: Only available when processing documents with Unstructured..
When to use: Great for use cases where documents span many shifting topics and it’s impractical to rely on a fixed set of metadata tags for pre-filtering. For example, when you're dealing with highly heterogeneous documents with ever changing topics like cross-functional meeting notes. In this case it may be difficult to come up with metadata tags that would sufficiently cover all of the categories, but contextual chunking dynamically adapts to the content itself by enriching the chunks with document-level context to improve embedding-based retrieval.
Limitations: Adds a small amount of overhead when generating prefixes for chunks, and data processing will take longer. Only available for select customers in Unstructured.
Start with Unstructured's Smart Chunking strategies, optionally with Contextual Chunking enabled, if it makes sense in your use case. Experiment with "By Title" for structured documents and "Basic" for less formal ones. Avoid naive fixed-size or recursive chunking for enterprise applications unless you have a very specific, simple use case where semantic context is irrelevant.
Indexing Strategies Compared: Vector DBs vs. Hybrid Search vs. Graph DBs
The choice of how you index and store your data is just as critical as how you chunk it. It dictates your retrieval capabilities, scalability, and the types of queries your RAG system can effectively handle.
Vector Databases (Pure Semantic Search):
Mechanism: Stores text chunks as high-dimensional vectors, enabling semantic similarity search. Queries are also converted to vectors and compared to the indexed embeddings.
When to use:
When your primary need is to find semantically similar information based on meaning, regardless of exact keyword matches.
For content where vocabulary can vary, but the underlying concepts are consistent.
As a foundational layer for most RAG applications, as semantic search is a core component.
Pros: Excellent at capturing semantic meaning and handling synonymy. Scales well for large datasets using ANN algorithms.
Cons: Struggles with exact keyword matches, out-of-vocabulary terms, proper nouns, or very specific identifier searches. Can suffer if irrelevant chunks are semantically similar to queries.
Examples: Pinecone, Weaviate, Qdrant, Milvus, Astra DB, and others.
Hybrid Search (Vector DB + Keyword Search, e.g., BM25):
Mechanism: Combines semantic (vector) search with traditional keyword-based search (like BM25). Results are merged or re-ranked to leverage the strengths of both.
When to use:
For applications requiring robust retrieval across a wide range of query types, including both semantic queries and exact keyword searches (e.g., product codes, names, specific dates).
When your data contains a mix of natural language content and highly specific, searchable terms.
When dealing with acronyms, domain-specific jargon, or new terms that embedding models might not fully capture.
Pros: Significantly improves recall by covering both semantic and lexical gaps. More robust to diverse user queries.
Cons: Requires a bit of tweaking of how much each type of search contributes to the final results.
Examples: Weaviate, Elasticsearch, MongoDB, Astra DB, and others.
Graph Databases (Knowledge Graphs):
Mechanism: Stores information as interconnected entities and relationships (nodes and edges). Retrieval involves traversing the graph to find relevant subgraphs or paths based on query patterns.
When to use:
For complex queries requiring multi-hop reasoning or understanding relationships between disparate facts (e.g., "Who managed the project that produced the highest revenue in Q3, and what was their previous role?").
When traceability and explainability of answers are paramount.
For highly structured or semi-structured data where explicit relationships are well-defined (e.g., organizational charts, product catalogs, scientific ontologies, legal precedents).
When building a knowledge graph from unstructured text using techniques like Unstructured's NER enrichment.
Pros: Excellent for complex relational queries and multi-hop reasoning. Provides highly structured and explainable answers. Reduces hallucinations by grounding answers in explicit relationships.
Cons: Building a high-quality knowledge graph requires robust entity and relationship extraction. Can be less effective for purely semantic "fuzzy" searches compared to vector databases.
Examples: Neo4j, Astra DB with Graph Retriever library (for lightweight KGs).
For most advanced RAG applications, a Hybrid Search approach is the optimal starting point. It offers the best balance of recall and precision across diverse query types. Integrate metadata pre-filtering capabilities (available in most modern vector databases) to further narrow down search space based on structured attributes.
Consider Graph Databases if your application frequently requires multi-hop reasoning, strict factual consistency, or if you need to leverage the explicit relationships within your data for explainability. Remember that building the knowledge graph can be a significant effort, but Unstructured's NER enrichment can streamline this.
Metadata Pre-filters vs. Contextual Chunking: A Synergistic Relationship
These two techniques, while distinct, are highly complementary and should ideally be used in conjunction to maximize retrieval performance.
Metadata Pre-filters:
Purpose: To restrict the search space before the similarity search, ensuring that only document chunks meeting specific structured criteria are considered.
Mechanism: Leverages structured metadata (e.g., document_type, author, date, page_number, etc.) associated with each chunk. When a query comes in, these filters are applied first.
Strengths: Incredibly efficient and precise for narrowing down results based on explicit, factual attributes. Prevents irrelevant documents from ever entering the semantic search, significantly boosting precision and reducing noise. Essential for avoiding false positives from semantically similar but factually incorrect matches (e.g., retrieving a "risk factors" section from the wrong company's filing).
Limitations: Only effective if the necessary metadata is accurately extracted and consistently available. Cannot filter based on semantic content alone.
Contextual Chunking:
Purpose: To embed additional semantic context directly into each chunk, making the chunk itself more self-contained and semantically "richer" for the embedding model.
Mechanism: Prepends information like the parent document's brief summary to the actual text content of a chunk before it is embedded.
Strengths: Improves the quality of the embeddings themselves, leading to better semantic similarity matches. Addresses the "fragmented context" problem inherent in chunking.
Limitations: Impact data processing speed, increases the token count of each chunk slightly, potentially impacting embedding costs. Only available to select customers in Unstructured.
Always leverage metadata pre-filtering when available and relevant. It's often the lowest-hanging fruit for significant precision gains. Additionally, complement this by enabling Contextual Chunking for all your documents, especially longer and more complex ones. These two techniques can work together to ensure that your RAG system retrieves not just relevant information, but accurately contextualized relevant information.
Putting it all Together: Designing Your RAG Pipeline
Building a successful RAG system involves making deliberate choices at every stage. Here’s a summary of the recommended approach for designing your advanced RAG pipeline:
Ingestion: Ensure robust, scalable ingestion from all your enterprise data sources. Prioritize solutions that handle diverse formats, preserve metadata, and support incremental updates to avoid having to maintain connectors yourself or stitching ingestion scripts with glue code.
Extraction & Partitioning: Use intelligent partitioning to break down documents into semantically meaningful elements, preserving structure and capturing rich metadata. This is the bedrock for all subsequent advanced techniques.
Chunking: Choose a structurally-aware chunking strategy (e.g., "By Title" for structured docs, "Basic" for others) and optionally enable contextual chunking to infuse document-level context into each chunk.
Enrichment:
Multimodal: Don't ignore images and tables. Use image and table descriptions to extract and embed their meaning, making your visual data searchable. Convert table images to HTML for better structure preservation.
NER: Extract named entities and relationships. This structured data can be used for metadata filtering and, more importantly, for building knowledge graphs.
Embedding Generation: Select an embedding model suitable for your domain and budget. Use the same model for indexing and query embedding.
Indexing: Opt for a vector database that supports hybrid search (semantic + keyword) and robust metadata filtering. This combination provides the best balance of recall and precision.
Retrieval Enhancements:
Re-ranking: Implement a re-ranking step (using a cross-encoder or LLM) after initial retrieval to boost the relevance of the top results presented to the LLM.
Parent-Document Retrieval: Use this strategy to balance chunk size. Index smaller, precise chunks but retrieve larger context windows from the parent document or page when a relevant chunk is found. Unstructured's metadata makes this seamless.
Query Transformation: Employ LLMs to refine user queries before retrieval. This addresses ambiguity, bridges vocabulary gaps, and handles complex multi-hop questions.
Advanced RAG (GraphRAG / Agentic RAG - if needed):
GraphRAG: If your use case requires complex relational reasoning or high explainability, invest in building and querying a knowledge graph (using your NER extracted entities).
Agentic RAG: For the most complex, adaptive, and tool-using applications, explore Agentic RAG, but be mindful of the increased latency, cost, and complexity. Start simple and only introduce agents when truly necessary.
By making informed choices at each of these stages, you can move beyond the limitations of "naive RAG" and build a sophisticated, high-performing GenAI application that truly levels up your capabilities. The journey from raw data to intelligent answers is complex and nuanced, but with the right tools and strategies, it's a journey you can master.