Scarf analytics pixel

Dec 29, 2024

Transform files in S3 to Pinecone with Unstructured Platform with no code

Nina Lopatina

Unstructured

Let’s go through the 5 easy steps to transform our unstructured data in an S3 bucket into a Pinecone vector database, using Unstructured Platform! Try out no code ETL with a 2 week free trial here.

Here is the full documentation: https://docs.unstructured.io/platform/overview 

  1. Create a new source at https://platform.unstructured.io/connectors/editor/new/sources and fill in as per https://docs.unstructured.io/platform/sources/s3 

Then you will see if your connector was successfully saved under your list of Source connectors. 

  1. Create a new destination at https://platform.unstructured.io/connectors/editor/new/destinations and fill in as per https://docs.unstructured.io/platform/destinations/pinecone. Since we are using Pinecone, you can get the info you need at https://app.pinecone.io/?sessionType=login 

I set up the destination with my embedding dimension in mind: Ada 002 with dimensions of 1536. 

  1. We’re going to create an S3 destination using the same process as step 1, just selecting S3 as a destination. 

  2. Let’s set up our workflow at https://platform.unstructured.io/workflows/new 

We are transforming complex pdfs with images, code, and formulas, so we are using a VLM transformation strategy. Check out the partition documentation for more information on how to select a strategy. 

We are not going to reprocess documents, so that if we were to add new documents, we could just re-run the workflow and only process new documents. 

  1. Now let’s hit run on our new workflow!

Ta-da! In 5 quick steps that took 10 minutes to set up, we’ve structured 1290 files in a lunch break. 

Keep Reading

Keep Reading

Recent Stories

Recent Stories

Mar 13, 2025

Building an MCP Server with Unstructured API

Maria Khalusova

Unstructured

Mar 13, 2025

Building an MCP Server with Unstructured API

Maria Khalusova

Unstructured

Mar 13, 2025

Building an MCP Server with Unstructured API

Maria Khalusova

Unstructured

Mar 12, 2025

Unstructured API: All Unstructured Enterprise ETL Functionality for Your MCP Integrations

Unstructured

RAG

Mar 12, 2025

Unstructured API: All Unstructured Enterprise ETL Functionality for Your MCP Integrations

Unstructured

RAG

Mar 12, 2025

Unstructured API: All Unstructured Enterprise ETL Functionality for Your MCP Integrations

Unstructured

RAG

Mar 6, 2025

Redis Support in Unstructured Platform: Supercharging Your RAG Pipeline

Maria Khalusova

RAG

Mar 6, 2025

Redis Support in Unstructured Platform: Supercharging Your RAG Pipeline

Maria Khalusova

RAG

Mar 6, 2025

Redis Support in Unstructured Platform: Supercharging Your RAG Pipeline

Maria Khalusova

RAG