Data Engineering

We read OpenAI's and Anthropic's data-agent posts so you don't have to

In January OpenAI published how it built an internal data agent. Last week Anthropic published how it built one too. Two frontier labs, the same problem, five months apart. Here is the honest side-by-side: what they agree on, where they diverge, and the one assumption they both quietly depend on.

Dmitry Petrov
Jun 04, 2026 • 5 min read

OpenAI's Data Agent and the S3 Gap

OpenAI built their in-house data agent for structured warehouse data, where schema, lineage, and queries come for free. Files in S3, GCS, or Azure - videos, sensor logs, image corpora, PDFs - have none of that, and the problems get a lot more interesting. Here is how we built the four foundations that close the gap.

Dmitry Petrov
May 07, 2026 • 10 min read

The Neuro-Data Bottleneck: Why Neuro-AI Interfacing Breaks the Modern Data Stack

Neural data like EEG and MRI is never 'finished' - it's meant to be revisited as new ideas and methods emerge. Yet most teams are stuck in a multi-stage ETL nightmare. Here's why the modern data stack fails the brain.

Dmitry Petrov
Jan 23, 2026 • 5 min read

Parquet Is Great for Tables, Terrible for Video - Here's Why

Parquet is great for tables, terrible for images and video. Here's why shoving heavy data into columnar formats is the wrong approach - and what we should build instead. Hint: it's not about the formats, it's about the metadata.

Dmitry Petrov
Sep 03, 2025 • 5 min read

Add the missing data context layer to your object storage.

Book a Call pip install datachain