
Authors

This tutorial walks you through building a document transformation pipeline from Amazon S3 to Qdrant using Unstructured’s web-based Workflow builder. No orchestration code required — everything happens in the UI.
We’ll show you how to:
- Pull documents from an S3 bucket
- Partition text from PDFs, DOCX, HTML, and other document types
- Generate embeddings
- Push vectors into Qdrant for RAG or search applications
Step 1: Connect your S3 bucket in Unstructured
🔑 Retrieve AWS Security Credentials
- Navigate to the top bar in AWS and click your account ID in the top right
- Scroll down to Security Credentials
- Scroll to the Access keys section and click Create access key
- You'll receive an Access Key ID and Secret Access Key
- Click Download .csv file to keep a local copy of the keys for reference
🪣 Create a New S3 Bucket
This bucket will contain your input PDFs.
- In the AWS Console, go to Amazon S3 → Buckets, then click Create bucket
- Use a name like nicks-demo-s3-bucket
- Keep Block all public access checked
- Leave all other settings as default
- Click Create bucket
📄 Upload Your Files
- Locate your new bucket in the list and click its name
- Click the Upload button
- Click Add files and select the documents you want to upload (PDF, DOCX, HTML, JPEG, etc.)
- Copy the full Destination URI (e.g., s3://nicks-demo-s3-bucket)
- Scroll down and click Upload
🔒 Set S3 Bucket Permissions
- Navigate to your bucket
- Select the Permissions tab at the top
Note: If you're using access keys tied to an account with full S3 read/write permissions, you can leave the bucket policy blank.
Step 2: Create a new S3 connector in Unstructured
- Go to platform.unstructured.io or your organization's tenant address
- In the left sidebar, click Connectors
- Click + New, ensure Source is selected, and choose Amazon S3
- Set a name like nicks-test-s3-connector
- Fill in the Bucket URI, AWS Key, and AWS Secret Key
- Check Recursive if you want to ingest nested folders
- Leave Custom URL blank
- Click Save and Test
- Upon success, you'll see a confirmation message.
Step 3: Set up Qdrant
🔧 Create a Qdrant Cluster
- Sign up at Qdrant.com
- On the landing page, click Create Cluster
- Name the cluster, choose Amazon Web Services as the cloud provider, and leave the region as default
- When prompted, copy the API Key and Qdrant URL — this is the only time they'll be shown
- Wait for the cluster status to show Healthy
- Click Access Cluster
- Paste your saved API key and click Apply
- From the left sidebar, click Collections
🧑💻 Initialize the Collection via Script
- Open a terminal in your IDE and run:
bashpython3 -m venv venv
source venv/bin/activate
- Install dependencies:
bashpip install qdrant-client python-dotenv
- Create a file called env-vars.sh with credentials
- Create a main.py file with initialization code
- Run it using: python3 main.py
Expected output:Collection 'my-test-collection' successfully initialized with status: green
Step 4: Create a Qdrant destination connector
- Log in to Unstructured
- Click Connectors
- Click + New, choose Destination → Qdrant
- Give the connector a name
- Use values from env-vars.sh to populate fields
- Click Test Connection — you should get a success message
Step 5: Create a workflow in Unstructured
- From the main dashboard, click Workflows → New Workflow
- Select Build it for me
- Name your workflow, choose the previously created source and destination connectors, then click Continue
- Use the automatic partitioning strategy, default embedding model and size
- Leave other settings default, then click Complete
Optional: Adjust the Embedder
- Go to the Embedder segment of your workflow
- Click the gear icon in the top right
- Choose your embedding model
- Confirm that [dim <embedding-size>] matches your Qdrant config
Step 6: Run & test the workflow
▶️ Full Run
- Go to the Workflows page
- Click Run next to your workflow
- Use the Schedule tab to automate runs
- Visit your Qdrant collection to verify chunked documents have arrived
To delete records, use API calls or the Console tab in Qdrant.
📄 Upload a Sample Document
- In your workflow, go to the Source segment
- Upload a single document
- Click the Results</> icon above the segment to inspect JSON output at every stage
Step 7: Get more from your workflow
🪄 Partitioning Strategy
The default auto strategy detects structure (titles, tables, images) and selectively applies VLM parsing. Read more info here.
🖼️ Image Description Enrichment
Generates human-readable captions for diagrams, photos, and visual elements.
When useful:
- Instruction manuals with schematics
- Research reports with charts
- Scanned docs with key visual content
📊 Table Summary Enrichment
Converts tables to natural language summaries (e.g., 'North America leads in Q1 sales').
Ideal for:
- Financial reports
- Policy documents
- Scanned PDFs with tables
🛠️ Additional Options
- Table-to-HTML Enrichment
- Named Entity Recognition (NER)
- Chunking: by title, character, page, or similarity
- Contextual Chunking: prepend summaries to chunks
And that's it! 🎉
You now have a fully automated pipeline from S3 documents to enriched, vectorized content in Qdrant, built entirely in Unstructured's UI. Whether launching a RAG system or indexing internal files, this is a fast, reliable starting point.