Dec 29, 2024
Transform files in S3 to Pinecone with Unstructured Platform with no code
Nina Lopatina
Unstructured
Let’s go through the 5 easy steps to transform our unstructured data in an S3 bucket into a Pinecone vector database, using Unstructured Platform! Try out no code ETL with a 2 week free trial here.
Here is the full documentation: https://docs.unstructured.io/platform/overview
Create a new source at https://platform.unstructured.io/connectors/editor/new/sources and fill in as per https://docs.unstructured.io/platform/sources/s3
Then you will see if your connector was successfully saved under your list of Source connectors.
Create a new destination at https://platform.unstructured.io/connectors/editor/new/destinations and fill in as per https://docs.unstructured.io/platform/destinations/pinecone. Since we are using Pinecone, you can get the info you need at https://app.pinecone.io/?sessionType=login
I set up the destination with my embedding dimension in mind: Ada 002 with dimensions of 1536.
We’re going to create an S3 destination using the same process as step 1, just selecting S3 as a destination.
Let’s set up our workflow at https://platform.unstructured.io/workflows/new
We are transforming complex pdfs with images, code, and formulas, so we are using a VLM transformation strategy. Check out the partition documentation for more information on how to select a strategy.
We are not going to reprocess documents, so that if we were to add new documents, we could just re-run the workflow and only process new documents.
Now let’s hit run on our new workflow!
Ta-da! In 5 quick steps that took 10 minutes to set up, we’ve structured 1290 files in a lunch break.