Jan 22, 2025
Qdrant Integration in the Unstructured Platform
Maria Khalusova & Paul Cornell
Unstructured
In the development of Retrieval Augmented Generation (RAG) systems, data preprocessing remains a critical challenge.
Building production-ready RAG systems involves several complex steps, but one of the most challenging aspects is preparing diverse unstructured data for vector storage. Data engineers often struggle with:
Converting multiple document formats into a consistent, structured format.
Maintaining document metadata through the preprocessing pipeline.
Implementing efficient chunking strategies that preserve context.
Managing batch processing and incremental updates at scale.
Ensuring security and compliance throughout the data pipeline.
Unstructured simplifies this process through standardized data transformation and an extensive ecosystem of connectors. Today, we’re excited to highlight our recently added integration with Qdrant in the Unstructured Platform, expanding Unstructured Platform's capabilities to help data teams build more robust and efficient RAG pipelines.
Why Qdrant?
Qdrant is an AI-native vector database and a semantic search engine, designed for storing, indexing, and searching embeddings—dense vector representations of data, such as text, images, or audio. It is optimized for fast and efficient similarity search and nearest neighbor search, which are key operations in AI and machine learning systems such as RAG systems.
You can use the Qdrant destination connector in the Unstructured Platform to upload the data that Unstructured processes into a collection on a Qdrant Cloud cluster in batches.
Qdrant integration in the Unstructured Platform
The Qdrant destination connector in the Unstructured Platform is used to upload batches of the results of processed data. Here's how it works:
The Unstructured Platform ingests documents from their original sources, preserving critical metadata.
A workflow in the Platform processes your raw documents into structured JSON, optionally enriches the content (for example, by generating table summaries), creates chunks, and generates embedding vectors (configurable with your preferred embedding model) for the document content.
The connector efficiently manages batch uploads to your Qdrant collection. You can easily customize the batch size to fit your requirements.
Our Qdrant integration adheres to enterprise-grade security standards including:
End-to-end encryption for data in transit.
Authentication via API keys.
Zero data persistence policy, and more.
You can learn more about the enterprise-grade features of the Unstructured Platform connectors in this blog post.
Configuring the connector
Before you configure the Qdrant connector, you will need:
A cluster in the account.
The cluster's URL. To get this URL, do the following:
Sign in to your Qdrant Cloud account.
On the sidebar, under Dashboard, click Clusters.
Click the cluster's name.
Note the value of the Endpoint field, for example:
https://<random-number>.<region-id>.<cloud-provider>.cloud.qdrant.io
The name of the target collection on the cluster.
Qdrant requires the target collection to exist before Unstructured can write to the collection. The following example code demonstrates the Python Qdrant Client to create a collection on a Qdrant Cloud cluster, configuring the collection for vectors with 3072 dimensions. In this example, set the environment variables beginning with QDRANT
to your cluster’s URL, API key, and collection name:
To configure the Qdrant destination connector by using the Unstructured Platform user interface:
On the sidebar, click Connectors.
Click New or Create Connector.
Type some unique Name for the connector.
For Type, make sure Destination is selected.
Click Qrant, and then click Continue.
Fill in the Qdrant Cloud cluster’s URL, collection name, and API key.
Compare your results to the following screenshot, and then click Save and Test.

To create a workflow for this connector by using the Unstructured Platform user interface, see Create a workflow in the Unstructured documentation.
To configure the Qdrant destination connector by using the Unstructured Platform API instead, you can run the following curl
command. In this command, set the environment variables beginning with UNSTRUCTURED
to the Unstructured Platform API URL and your Unstructured Platform API key; and fill in your Qdrant Cloud cluster’s URL, collection name, API key, and optionally a batch size representing the maximum number of records to transmit at a time:
To create a workflow for this connector by using the Unstructured Platform API, see Create a workflow in the Unstructured documentation.
Get started!
If you're already an Unstructured Platform user, the Qdrant destination connector is available in your dashboard today! New users can sign up for the Unstructured Platform here.
Expert access
Need a tailored setup for your specific use case? Our engineering team is available to help optimize your implementation. Book a consultation session to discuss your requirements here.