Scarf analytics pixel

Dec 29, 2024

Transform files in S3 to Pinecone with Unstructured Platform with no code

Nina Lopatina

Unstructured

Let’s go through the 5 easy steps to transform our unstructured data in an S3 bucket into a Pinecone vector database, using Unstructured Platform! Try out no code ETL with a 2 week free trial here.

Here is the full documentation: https://docs.unstructured.io/platform/overview 

  1. Create a new source at https://platform.unstructured.io/connectors/editor/new/sources and fill in as per https://docs.unstructured.io/platform/sources/s3 

Then you will see if your connector was successfully saved under your list of Source connectors. 

  1. Create a new destination at https://platform.unstructured.io/connectors/editor/new/destinations and fill in as per https://docs.unstructured.io/platform/destinations/pinecone. Since we are using Pinecone, you can get the info you need at https://app.pinecone.io/?sessionType=login 

I set up the destination with my embedding dimension in mind: Ada 002 with dimensions of 1536. 

  1. We’re going to create an S3 destination using the same process as step 1, just selecting S3 as a destination. 

  2. Let’s set up our workflow at https://platform.unstructured.io/workflows/new 

We are transforming complex pdfs with images, code, and formulas, so we are using a VLM transformation strategy. Check out the partition documentation for more information on how to select a strategy. 

We are not going to reprocess documents, so that if we were to add new documents, we could just re-run the workflow and only process new documents. 

  1. Now let’s hit run on our new workflow!

Ta-da! In 5 quick steps that took 10 minutes to set up, we’ve structured 1290 files in a lunch break. 

Keep Reading

Keep Reading

Recent Stories

Recent Stories

Jan 22, 2025

Qdrant Integration in the Unstructured Platform

Maria Khalusova & Paul Cornell

Unstructured

Jan 22, 2025

Qdrant Integration in the Unstructured Platform

Maria Khalusova & Paul Cornell

Unstructured

Jan 22, 2025

Qdrant Integration in the Unstructured Platform

Maria Khalusova & Paul Cornell

Unstructured

Jan 16, 2025

Enterprise RAG: Why Connectors Matter in Production Systems

Unstructured

RAG

Jan 16, 2025

Enterprise RAG: Why Connectors Matter in Production Systems

Unstructured

RAG

Jan 16, 2025

Enterprise RAG: Why Connectors Matter in Production Systems

Unstructured

RAG

Dec 29, 2024

Transform files in S3 to Pinecone with Unstructured Platform with no code

Nina Lopatina

Unstructured

Dec 29, 2024

Transform files in S3 to Pinecone with Unstructured Platform with no code

Nina Lopatina

Unstructured

Dec 29, 2024

Transform files in S3 to Pinecone with Unstructured Platform with no code

Nina Lopatina

Unstructured