Past Webinar
Build ETL Workflows With Unstructured API

Learn how to build custom, programmatic ETL workflows for unstructured data using the Unstructured API and Workflow Endpoint.

Apr 29, 2025

Speakers

Maria Khalusova
Maria Khalusova
Unstructured

Recorded

Tuesday, Apr 29, 2025
30 minutes on Zoom Events

Overview

Unstructured makes it easy to build scalable, programmatic ETL workflows for unstructured data. In this session, we’ll walk through how to use the Unstructured API to connect to a data source, such as Amazon S3, preprocess your documents, and pipe structured outputs into a destination, such as a vector store, a database, or a search engine.

Whether you’re building from scratch or looking to streamline an existing workflow, this webinar will show you how to automate the full ETL pipeline—from ingestion to transformation to destination—using Unstructured’s Workflow Endpoint and Python SDK.

Technical Overview

Watch this recording to learn how to build and run a complete ETL pipeline using the Unstructured API and Workflow Endpoint. In this recorded session we discussed:

  • Connect to S3 using source and destination connectors
  • Define a custom DAG of processing steps—including partitioning, chunking, and embedding
  • Preprocess documents with advanced partitioning strategies
  • Output structured data directly into your preferred destination
  • Track workflow runs and confirm successful completion via the Python SDK

BTS

Brian Godsey, Datastax, brian.godsey@datastax.com 
Sara Hardy, Unstructured, sara.hardy@unstructured.io 
Avie Magner, DMP, avie@digitalmarketingpartners.biz 
Marc Lapides, DMP, marc@digitalmarketingpartners.biz