The Land Grab Data Stack

Robert Yi 🐳

Apr 15

How we should be using AI with data

Read →

4 Comments

Entertaining read.

Are there any examples you can point to of tools that are local and non-moated ?

Reply (1)

Robert Yi 🐳

Apr 15

Hey Brad! Off the top of my head, this is where I'm at:

- duckdb (+ducklake): query engine

- open storage formats like iceberg and parquet

- airlayer: in-process semantic layer (https://github.com/oxy-hq/airlayer)

- dlt: for the ETL step, for the time being

But I don't have a solid opinion here on the transform layer or the BI layer. I'm not so happy with dbt lately (it is mostly local, but something about the philosophy feels off in an AI world). And while there are a ton of open-source BI tools available (e.g. preset, evidence, lightdash, etc.), they all follow a server-client format -- but it's actually not particularly clear what the format here should look like, how it should interact with AI, etc. E.g. LLMs for a long time liked to use matplotlib, but an engine that works more natively with the data stack would obviously be better.

I'm currently working on building more with my team, intended to be building blocks that an LLM can use. Eventually, you need a platform to ship with all of these or it ends up being arduous to set up independently (just like the MDS), and we're working on that with Oxy: https://github.com/oxy-hq/oxy

Reply (1)

Brad Lowenstein

Apr 15Edited

awesome- thanks! oxy sounds pretty great. Just an FYI, I had claude code do a deep dive on the git repo before i pulled it locally. And CC found a few issues you may want to address. I requested you on linkedin if you want me to send privately.