Scarf analytics pixel

Unstructured Developer Toolkit

Unstructured Developer Toolkit

Unlock the full potential of unstructured data with our powerful developer tools.

Unlock the full potential of unstructured data with our powerful developer tools.

Platform

Platform

Platform

Introducing
Unstructured Platform

A fully automated ETL solution that continuously delivers unstructured data in any format and from any source to your GenAI stack.

A fully automated ETL solution that continuously delivers unstructured data in any format and from any source to your GenAI stack.

Sign up to get started for FREE:
1000 pages per day for 14 days.

Sign up to get started for FREE:
1000 pages per day for 14 days.

Data Transform Options

Whether you're experimenting with our Open Source library or scaling with Platform's pay-per-page and subscription options, Unstructured offers flexible access to our data extraction and preprocessing tools. Choose what works best for you and your budget.


Unstructured is also available on AWS and Azure Marketplaces.

Open Source

GitHub Library

Get Started

Get Started

Perfect for experimenting. Free to use.

Features:

Python libraries for data processing and ingestion

50+ connectors

Basic transform

Platform

Pay Per Page

Start for free

Perfect for prototyping. Pay only for the pages you process. No commitment.

Pricing structure:

Basic Strategy: $2/1000 pages

Advanced Strategy: $20/1000 pages

Platinum Strategy: $30/1000 pages

Platform

Subscribe & Save

Contact us

Perfect for high volume users. Save 50% when you commit to at least $10,000 in compute costs for 12 months.

Pricing structure:

Basic Strategy: $1/1000 pages

Advanced Strategy: $10/1000 pages

Platinum Strategy: $15/1000 pages

World Class Transformation and Orchestration

World Class Transformation and Orchestration

Keep me posted on product updates.

Technical Details and Features

Chunked JSON

Our Platform atomizes documents into structured JSON elements, enabling precise retrieval for your RAG application. For customization, you have full control over chunking parameters—or you can leverage our built-in, optimized chunking strategies, designed to balance performance and relevance across a wide range of data.

Embeddings

No modern RAG pipeline is complete without an effective way to make your chunked data searchable, and embeddings are a powerful method for capturing semantic meaning. Our platform provides seamless access to top hosted embedding models and allows for easy integration of your own embedding model.

Structured Data

With Unstructured, users can leverage text-to-text models with customized prompts to generate new metadata and other types of structured data to enhance retrieval from graph and vector databases.

Custom Integrations

Not seeing what you need for your use case above? That doesn’t mean we don’t have you covered. Our platform was built to be highly extensible. Our custom plugin architecture allows you to incorporate Python-based, modular functionality into your ETL pipelines. Everything from custom chunking logic to NER to PII redaction to autonomously building a knowledge graph from your documents and more can be added to your platform environment and reused across workflows.

Technical Details and Features

Chunked JSON

Our Platform atomizes documents into structured JSON elements, enabling precise retrieval for your RAG application. For customization, you have full control over chunking parameters—or you can leverage our built-in, optimized chunking strategies, designed to balance performance and relevance across a wide range of data.

Embeddings

No modern RAG pipeline is complete without an effective way to make your chunked data searchable, and embeddings are a powerful method for capturing semantic meaning. Our platform provides seamless access to top hosted embedding models and allows for easy integration of your own embedding model.

Structured Data

With Unstructured, users can leverage text-to-text models with customized prompts to generate new metadata and other types of structured data to enhance retrieval from graph and vector databases.

Custom Integrations

Not seeing what you need for your use case above? That doesn’t mean we don’t have you covered. Our platform was built to be highly extensible. Our custom plugin architecture allows you to incorporate Python-based, modular functionality into your ETL pipelines. Everything from custom chunking logic to NER to PII redaction to autonomously building a knowledge graph from your documents and more can be added to your platform environment and reused across workflows.

Technical Details and Features

Chunked JSON

Our Platform atomizes documents into structured JSON elements, enabling precise retrieval for your RAG application. For customization, you have full control over chunking parameters—or you can leverage our built-in, optimized chunking strategies, designed to balance performance and relevance across a wide range of data.

Embeddings

No modern RAG pipeline is complete without an effective way to make your chunked data searchable, and embeddings are a powerful method for capturing semantic meaning. Our platform provides seamless access to top hosted embedding models and allows for easy integration of your own embedding model.

Structured Data

With Unstructured, users can leverage text-to-text models with customized prompts to generate new metadata and other types of structured data to enhance retrieval from graph and vector databases.

Custom Integrations

Not seeing what you need for your use case above? That doesn’t mean we don’t have you covered. Our platform was built to be highly extensible. Our custom plugin architecture allows you to incorporate Python-based, modular functionality into your ETL pipelines. Everything from custom chunking logic to NER to PII redaction to autonomously building a knowledge graph from your documents and more can be added to your platform environment and reused across workflows.

Product Features (Available Now)

Product Features (Available Now)

Product Features (Available Now)

Product Features (Available EOY)

Product Features (Available EOY)

Product Features (Available EOY)

Compare Features

Compare Features

Unstructured Platform

Unstructured
Platform

Unstructured Open Source

Unstructured Open Source

Deployment Optiona

Serverless SaaS
Serverless SaaS
Serverless SaaS
Unstructured Platform
Unstructured Open Source
VPC Deployment
VPC Deployment
VPC Deployment
Unstructured Platform
Unstructured Open Source
On-Premise
On-Premise
On-Premise
Unstructured Platform
Unstructured Open Source

Deployment Options

One-Click SSO authentication
One-Click SSO authentication
One-Click SSO authentication
Unstructured Platform
Unstructured Open Source
SOC 2 Type 2
SOC 2 Type 2
SOC 2 Type 2
Unstructured Platform
Unstructured Open Source
HIPAA compliant
HIPAA compliant
HIPAA compliant
Unstructured Platform
Unstructured Open Source

Connectivity

Source connector count
Source connector count
Source connector count
Unstructured Platform

50

50

50

Unstructured Open Source

50

Destination connector count
Destination connector count
Destination connector count
Unstructured Platform

50

50

50

Unstructured Open Source

50

Image-to-text
Image-to-text
Image-to-text
Unstructured Platform

2

2

0

Unstructured Open Source

50

Text-to-text
Text-to-text
Text-to-text
Unstructured Platform

2

2

0

Unstructured Open Source

50

Text-to-embeddings
Text-to-embeddings
Text-to-embeddings
Unstructured Platform

8

8

0

Unstructured Open Source

50

Document Processing

Number of file types transformed
Number of file types transformed
Number of file types transformed
Unstructured Platform

53

53

26

Unstructured Open Source

50

Transforms into canonical JSON
Transforms into canonical JSON
Transforms into canonical JSON
Unstructured Platform
Unstructured Open Source
Smart routing to efficiently leverage in-house and third-party models to render any unstructured data RAG-ready
Smart routing to efficiently leverage in-house and third-party models to render any unstructured data RAG-ready
Smart routing to efficiently leverage in-house and third-party models to render any unstructured data RAG-ready
Unstructured Platform
Unstructured Open Source
Advanced semantic chunking
Advanced semantic chunking
Advanced semantic chunking
Unstructured Platform
Unstructured Open Source
Advanced summary generation
Advanced summary generation
Advanced summary generation
Unstructured Platform
Unstructured Open Source
Structured data generation
Structured data generation
Structured data generation
Unstructured Platform
Unstructured Open Source
Automatic best embedding for your data selection
Automatic best embedding for your data selection
Automatic best embedding for your data selection
Unstructured Platform
Unstructured Open Source
Automatic best chunking strategy for your data selection
Automatic best chunking strategy for your data selection
Automatic best chunking strategy for your data selection
Unstructured Platform
Unstructured Open Source
Metadata fields generated
Metadata fields generated
Metadata fields generated
Unstructured Platform
Unstructured Open Source

Enterprise Features

Organization and user management
Organization and user management
Organization and user management
Unstructured Platform
Unstructured Open Source
File ACL's maintained in metadata
File ACL's maintained in metadata
File ACL's maintained in metadata
Unstructured Platform
Unstructured Open Source

Usage

End to end orchestration of RAG ready data
End to end orchestration of RAG ready data
End to end orchestration of RAG ready data
Unstructured Platform
Unstructured Open Source
Integrated 3rd party billing, (utilize partners like OpenAI and Anthropic with Unstructured and receive just one bill—you don't need to sign up everywhere)
Integrated 3rd party billing, (utilize partners like OpenAI and Anthropic with Unstructured and receive just one bill—you don't need to sign up everywhere)
Integrated 3rd party billing, (utilize partners like OpenAI and Anthropic with Unstructured and receive just one bill—you don't need to sign up everywhere)
Unstructured Platform
Unstructured Open Source
Python SDK
Python SDK
Python SDK
Unstructured Platform
Unstructured Open Source
JavaScript SDK
JavaScript SDK
JavaScript SDK
Unstructured Platform
Unstructured Open Source
Hosted API
Hosted API
Hosted API
Unstructured Platform
Unstructured Open Source
No-Code DAG
No-Code DAG
No-Code DAG
Unstructured Platform
Unstructured Open Source

Developer Access

Users can leverage Platform either via API or via our no-code UI.

Developer Access

Users can leverage Platform either via API or via our no-code UI.

FAQs

FAQs

FAQs

Find Answers to your Questions

Find Answers to your Questions

Find Answers to your Questions

Questions? No problem. We're here to help.

Questions? No problem. We're here to help.

What happens to the current Serverless API?

What happens to the current Serverless API?

What happens to the current Serverless API?

I like using your open source software, why should I try Platform?

I like using your open source software, why should I try Platform?

I like using your open source software, why should I try Platform?

What kinds of documents can I process with Platform?

What kinds of documents can I process with Platform?

What kinds of documents can I process with Platform?

How do I know my data is secure?

How do I know my data is secure?

How do I know my data is secure?

What if I need help getting started?

What if I need help getting started?

What if I need help getting started?

Still have Questions?
Connect with us.

Still have Questions?
Connect with us.

Unstructured

ETL for LLMs

GDPR

Copyright © 2024 Unstructured

Unstructured

ETL for LLMs

GDPR

Copyright © 2024 Unstructured

Unstructured

ETL for LLMs

GDPR

Copyright © 2024 Unstructured