AI

From Big Data to Heavy Data: Rethinking the AI Stack
LLMs can finally interpret unstructured video, audio, and documents — but they can't do it alone. This post introduces the concept of heavy data and explores how modern teams build multimodal pipelines to turn it into AI-ready data.
  • Dmitry Petrov
  • Jun 09, 20253 min read
Scalable PDF Document Processing with DataChain and Unstructured.io
Extract and parse text from documents and create vector embeddings in a scalable and distributed way (and less than 70 lines of code).
  • Tibor Mach
  • Sep 30, 20247 min read
Enforcing JSON Outputs in Commercial LLMs
The results of our tests on the structured outputs of Google Gemini Pro, Anthropic Claude, and OpenAI GPT. DataChain used for evaluation.
  • Daniel Kharitonov
  • Sep 06, 202410 min read
Dataset Factory - A Toolchain for Generative Computer Vision Datasets
Learn about our latest approach to mastering your Unstructured Data and metadata.
  • Jeny De Figueiredo
  • Mar 25, 20241 min read