Unstructured Developer Toolkit
Unstructured Developer Toolkit
Unlock the full potential of unstructured data with our powerful developer tools.
Unlock the full potential of unstructured data with our powerful developer tools.
Platform
Platform
Platform
Introducing
Unstructured Platform
A fully automated ETL solution that continuously delivers unstructured data in any format and from any source to your GenAI stack.
A fully automated ETL solution that continuously delivers unstructured data in any format and from any source to your GenAI stack.
Sign up to get started for FREE:
1000 pages per day for 14 days.
Sign up to get started for FREE:
1000 pages per day for 14 days.
Data Transform Options
Whether you're experimenting with our Open Source library or scaling with Platform's pay-per-page and subscription options, Unstructured offers flexible access to our data extraction and preprocessing tools. Choose what works best for you and your budget.
Unstructured is also available on AWS and Azure Marketplaces.
Open Source
GitHub Library
Get Started
Get Started
Perfect for experimenting. Free to use.
Features:
Python libraries for data processing and ingestion
50+ connectors
Basic transform
Platform
Pay Per Page
Start for free
Perfect for prototyping. Pay only for the pages you process. No commitment.
Pricing structure:
Basic Strategy: $2/1000 pages
Advanced Strategy: $20/1000 pages
Platinum Strategy: $30/1000 pages
Platform
Subscribe & Save
Contact us
Perfect for high volume users. Save 50% when you commit to at least $10,000 in compute costs for 12 months.
Pricing structure:
Basic Strategy: $1/1000 pages
Advanced Strategy: $10/1000 pages
Platinum Strategy: $15/1000 pages
World Class Transformation and Orchestration
World Class Transformation and Orchestration
Keep me posted on product updates.
Technical Details and Features
Chunked JSON
Our Platform atomizes documents into structured JSON elements, enabling precise retrieval for your RAG application. For customization, you have full control over chunking parameters—or you can leverage our built-in, optimized chunking strategies, designed to balance performance and relevance across a wide range of data.
Embeddings
No modern RAG pipeline is complete without an effective way to make your chunked data searchable, and embeddings are a powerful method for capturing semantic meaning. Our platform provides seamless access to top hosted embedding models and allows for easy integration of your own embedding model.
Structured Data
With Unstructured, users can leverage text-to-text models with customized prompts to generate new metadata and other types of structured data to enhance retrieval from graph and vector databases.
Custom Integrations
Not seeing what you need for your use case above? That doesn’t mean we don’t have you covered. Our platform was built to be highly extensible. Our custom plugin architecture allows you to incorporate Python-based, modular functionality into your ETL pipelines. Everything from custom chunking logic to NER to PII redaction to autonomously building a knowledge graph from your documents and more can be added to your platform environment and reused across workflows.
Technical Details and Features
Chunked JSON
Our Platform atomizes documents into structured JSON elements, enabling precise retrieval for your RAG application. For customization, you have full control over chunking parameters—or you can leverage our built-in, optimized chunking strategies, designed to balance performance and relevance across a wide range of data.
Embeddings
No modern RAG pipeline is complete without an effective way to make your chunked data searchable, and embeddings are a powerful method for capturing semantic meaning. Our platform provides seamless access to top hosted embedding models and allows for easy integration of your own embedding model.
Structured Data
With Unstructured, users can leverage text-to-text models with customized prompts to generate new metadata and other types of structured data to enhance retrieval from graph and vector databases.
Custom Integrations
Not seeing what you need for your use case above? That doesn’t mean we don’t have you covered. Our platform was built to be highly extensible. Our custom plugin architecture allows you to incorporate Python-based, modular functionality into your ETL pipelines. Everything from custom chunking logic to NER to PII redaction to autonomously building a knowledge graph from your documents and more can be added to your platform environment and reused across workflows.
Technical Details and Features
Chunked JSON
Our Platform atomizes documents into structured JSON elements, enabling precise retrieval for your RAG application. For customization, you have full control over chunking parameters—or you can leverage our built-in, optimized chunking strategies, designed to balance performance and relevance across a wide range of data.
Embeddings
No modern RAG pipeline is complete without an effective way to make your chunked data searchable, and embeddings are a powerful method for capturing semantic meaning. Our platform provides seamless access to top hosted embedding models and allows for easy integration of your own embedding model.
Structured Data
With Unstructured, users can leverage text-to-text models with customized prompts to generate new metadata and other types of structured data to enhance retrieval from graph and vector databases.
Custom Integrations
Not seeing what you need for your use case above? That doesn’t mean we don’t have you covered. Our platform was built to be highly extensible. Our custom plugin architecture allows you to incorporate Python-based, modular functionality into your ETL pipelines. Everything from custom chunking logic to NER to PII redaction to autonomously building a knowledge graph from your documents and more can be added to your platform environment and reused across workflows.
Product Features (Available Now)
Product Features (Available Now)
Product Features (Available Now)
Product Features (Available EOY)
Product Features (Available EOY)
Product Features (Available EOY)
Compare Features
Compare Features
Unstructured Platform
Unstructured
Platform
Unstructured Open Source
Unstructured Open Source
Deployment Optiona
Serverless SaaS
Serverless SaaS
Serverless SaaS
Unstructured Platform
Unstructured Open Source
VPC Deployment
VPC Deployment
VPC Deployment
Unstructured Platform
Unstructured Open Source
On-Premise
On-Premise
On-Premise
Unstructured Platform
Unstructured Open Source
Deployment Options
One-Click SSO authentication
One-Click SSO authentication
One-Click SSO authentication
Unstructured Platform
Unstructured Open Source
SOC 2 Type 2
SOC 2 Type 2
SOC 2 Type 2
Unstructured Platform
Unstructured Open Source
HIPAA compliant
HIPAA compliant
HIPAA compliant
Unstructured Platform
Unstructured Open Source
Connectivity
Source connector count
Source connector count
Source connector count
Unstructured Platform
50
50
50
Unstructured Open Source
50
Destination connector count
Destination connector count
Destination connector count
Unstructured Platform
50
50
50
Unstructured Open Source
50
Image-to-text
Image-to-text
Image-to-text
Unstructured Platform
2
2
0
Unstructured Open Source
50
Text-to-text
Text-to-text
Text-to-text
Unstructured Platform
2
2
0
Unstructured Open Source
50
Text-to-embeddings
Text-to-embeddings
Text-to-embeddings
Unstructured Platform
8
8
0
Unstructured Open Source
50
Document Processing
Number of file types transformed
Number of file types transformed
Number of file types transformed
Unstructured Platform
53
53
26
Unstructured Open Source
50
Transforms into canonical JSON
Transforms into canonical JSON
Transforms into canonical JSON
Unstructured Platform
Unstructured Open Source
Smart routing to efficiently leverage in-house and third-party models to render any unstructured data RAG-ready
Smart routing to efficiently leverage in-house and third-party models to render any unstructured data RAG-ready
Smart routing to efficiently leverage in-house and third-party models to render any unstructured data RAG-ready
Unstructured Platform
Unstructured Open Source
Advanced semantic chunking
Advanced semantic chunking
Advanced semantic chunking
Unstructured Platform
Unstructured Open Source
Advanced summary generation
Advanced summary generation
Advanced summary generation
Unstructured Platform
Unstructured Open Source
Structured data generation
Structured data generation
Structured data generation
Unstructured Platform
Unstructured Open Source
Automatic best embedding for your data selection
Automatic best embedding for your data selection
Automatic best embedding for your data selection
Unstructured Platform
Unstructured Open Source
Automatic best chunking strategy for your data selection
Automatic best chunking strategy for your data selection
Automatic best chunking strategy for your data selection
Unstructured Platform
Unstructured Open Source
Metadata fields generated
Metadata fields generated
Metadata fields generated
Unstructured Platform
Unstructured Open Source
Enterprise Features
Organization and user management
Organization and user management
Organization and user management
Unstructured Platform
Unstructured Open Source
File ACL's maintained in metadata
File ACL's maintained in metadata
File ACL's maintained in metadata
Unstructured Platform
Unstructured Open Source
Usage
End to end orchestration of RAG ready data
End to end orchestration of RAG ready data
End to end orchestration of RAG ready data
Unstructured Platform
Unstructured Open Source
Integrated 3rd party billing, (utilize partners like OpenAI and Anthropic with Unstructured and receive just one bill—you don't need to sign up everywhere)
Integrated 3rd party billing, (utilize partners like OpenAI and Anthropic with Unstructured and receive just one bill—you don't need to sign up everywhere)
Integrated 3rd party billing, (utilize partners like OpenAI and Anthropic with Unstructured and receive just one bill—you don't need to sign up everywhere)
Unstructured Platform
Unstructured Open Source
Python SDK
Python SDK
Python SDK
Unstructured Platform
Unstructured Open Source
JavaScript SDK
JavaScript SDK
JavaScript SDK
Unstructured Platform
Unstructured Open Source
Hosted API
Hosted API
Hosted API
Unstructured Platform
Unstructured Open Source
No-Code DAG
No-Code DAG
No-Code DAG
Unstructured Platform
Unstructured Open Source
Developer Access
Users can leverage Platform either via API or via our no-code UI.
FAQs
FAQs
FAQs
Find Answers to your Questions
Find Answers to your Questions
Find Answers to your Questions
Questions? No problem. We're here to help.
Questions? No problem. We're here to help.
What happens to the current Serverless API?
What happens to the current Serverless API?
What happens to the current Serverless API?
I like using your open source software, why should I try Platform?
I like using your open source software, why should I try Platform?
I like using your open source software, why should I try Platform?
What kinds of documents can I process with Platform?
What kinds of documents can I process with Platform?
What kinds of documents can I process with Platform?
How do I know my data is secure?
How do I know my data is secure?
How do I know my data is secure?
What if I need help getting started?
What if I need help getting started?
What if I need help getting started?
Still have Questions?
Connect with us.
Still have Questions?
Connect with us.
Unstructured
ETL for LLMs
GDPR
Lets chat!
Join our newsletter
Copyright © 2024 Unstructured
Unstructured
ETL for LLMs
GDPR
Lets chat!
Join our newsletter
Copyright © 2024 Unstructured
Unstructured
ETL for LLMs
GDPR
Lets chat!
Join our newsletter
Copyright © 2024 Unstructured