Data Engineering

We read OpenAI's and Anthropic's data-agent posts so you don't have to
In January OpenAI published how it built an internal data agent. Last week Anthropic published how it built one too. Two frontier labs, the same problem, five months apart. Here is the honest side-by-side: what they agree on, where they diverge, and the one assumption they both quietly depend on.
  • Dmitry Petrov
  • Jun 04, 20265 min read
OpenAI's Data Agent and the S3 Gap
OpenAI built their in-house data agent for structured warehouse data, where schema, lineage, and queries come for free. Files in S3, GCS, or Azure - videos, sensor logs, image corpora, PDFs - have none of that, and the problems get a lot more interesting. Here is how we built the four foundations that close the gap.
  • Dmitry Petrov
  • May 07, 202610 min read
The Neuro-Data Bottleneck: Why Neuro-AI Interfacing Breaks the Modern Data Stack
Neural data like EEG and MRI is never 'finished' - it's meant to be revisited as new ideas and methods emerge. Yet most teams are stuck in a multi-stage ETL nightmare. Here's why the modern data stack fails the brain.
  • Dmitry Petrov
  • Jan 23, 20265 min read
Parquet Is Great for Tables, Terrible for Video - Here's Why
Parquet is great for tables, terrible for images and video. Here's why shoving heavy data into columnar formats is the wrong approach - and what we should build instead. Hint: it's not about the formats, it's about the metadata.
  • Dmitry Petrov
  • Sep 03, 20255 min read