Scarf analytics pixel

Dec 29, 2024

Transform files in S3 to Pinecone with Unstructured Platform with no code

Nina Lopatina

Unstructured

Let’s go through the 5 easy steps to transform our unstructured data in an S3 bucket into a Pinecone vector database, using Unstructured Platform! Try out no code ETL with a 2 week free trial here.

Here is the full documentation: https://docs.unstructured.io/platform/overview 

  1. Create a new source at https://platform.unstructured.io/connectors/editor/new/sources and fill in as per https://docs.unstructured.io/platform/sources/s3 

Then you will see if your connector was successfully saved under your list of Source connectors. 

  1. Create a new destination at https://platform.unstructured.io/connectors/editor/new/destinations and fill in as per https://docs.unstructured.io/platform/destinations/pinecone. Since we are using Pinecone, you can get the info you need at https://app.pinecone.io/?sessionType=login 

I set up the destination with my embedding dimension in mind: Ada 002 with dimensions of 1536. 

  1. We’re going to create an S3 destination using the same process as step 1, just selecting S3 as a destination. 

  2. Let’s set up our workflow at https://platform.unstructured.io/workflows/new 

We are transforming complex pdfs with images, code, and formulas, so we are using a VLM transformation strategy. Check out the partition documentation for more information on how to select a strategy. 

We are not going to reprocess documents, so that if we were to add new documents, we could just re-run the workflow and only process new documents. 

  1. Now let’s hit run on our new workflow!

Ta-da! In 5 quick steps that took 10 minutes to set up, we’ve structured 1290 files in a lunch break. 

Keep Reading

Keep Reading

Recent Stories

Recent Stories

Jan 31, 2025

Data Pipelines Shouldn’t Be A Rat’s Nest

Jenni Wu, Sudarshan Sampath

Unstructured

Jan 31, 2025

Data Pipelines Shouldn’t Be A Rat’s Nest

Jenni Wu, Sudarshan Sampath

Unstructured

Jan 31, 2025

Data Pipelines Shouldn’t Be A Rat’s Nest

Jenni Wu, Sudarshan Sampath

Unstructured

Jan 30, 2025

PostgreSQL Integration in the Unstructured Platform

Maria Khalusova & Paul Cornell

Unstructured

Jan 30, 2025

PostgreSQL Integration in the Unstructured Platform

Maria Khalusova & Paul Cornell

Unstructured

Jan 30, 2025

PostgreSQL Integration in the Unstructured Platform

Maria Khalusova & Paul Cornell

Unstructured

Jan 29, 2025

Production-Ready GenAI Data Pre Processing with Unstructured Platform

Unstructured

Unstructured

Jan 29, 2025

Production-Ready GenAI Data Pre Processing with Unstructured Platform

Unstructured

Unstructured

Jan 29, 2025

Production-Ready GenAI Data Pre Processing with Unstructured Platform

Unstructured

Unstructured