Latent Space Podcast

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

二〇二六年四月三十日收听原版播客

Isn't that crazy? That number is just mind-boggling.

What is the state of the AI coding wars today?

We're in a phase of sort of like capability exploration. The general thesis that I have been pursuing now is that the same way that 2025 was the year coding agents, 2026 is coding agents breaking containment to do everything else.

Do you worry about the foundation models just eating into a bunch of these startup categories?

Mid-sized startups, yes.

What do you think the end state of this market is?

For the market structure to significantly change, there would be—

Today on unsupervised learning, we had a fun episode on what's really become an annual tradition, a crossover episode with our friends at Latent Space. Swyx and I sat down and we talked about everything happening in the AI ecosystem today, what we thought of the various changes at the model layer, what's happening in the infra world, the coding wars, and a bunch of other things. It's a ton of fun to do this with someone I really respect and another great podcaster in the game. Without further ado, here's our episode. Well, Swyx, this is, uh, super fun to be back with another unsupervised learning, uh, Latent Space crossover episode.

Yeah.

I feel like a lot of places we could start, but, you know, one thing I always find fascinating about the way you spend your time is you obviously are like at the epicenter of this AI engineering movement and community, and you run these events and conferences and put on these awesome talks and I think just have a great pulse on the zeitgeist of what's going on. Yeah. Maybe to start, just what are the biggest topics people are thinking about right now?

Yeah, so I just came back from London where we did AIE Europe, and we're doing roughly one per quarter now, which hopefully—

Yeah, you're really upping up the pace.

It's trying, we're trying to match AI speed, you know? Yeah, exactly.

The topics will be completely different, I imagine.

I definitely curate the tracks. You can see what I think when you see the track lists and the speakers that I invite. Obviously Open Claw is the story of the last 4 or 5 months. And then just below that, I would consider harness engineering and context engineering to be two related topics in agents and RAG. And then there's a long tail of evergreen stuff like evals, observability, GPUs, and LLM infra just in general. We also have other updates on multimodality and generative media, let's call it. But definitely the first 3 that I mentioned are top of mind for people.

I think harnesses in particular are so interesting. There was this tweet from Harrison Chase, the LangChain CEO, that caught my eye recently where he said, it finally feels like we have stability around the infrastructure for, around AI. And I think what he basically was implying is like, look, over the past 2, 3 years as a company at the epicenter of AI infrastructure, It was a bit like playing whack-a-mole, right? You were constantly moving around with however the building patterns were evolving.

For Harrison, for sure, right? He's basically had to reinvent the company every year since he started LangChain, right? It was LangChain, LangGraph, and now DeepAgents. And like, I think he's like one of the most nimble, adept, sharp people about this.

Yeah. But yeah, basically now, now it's fine.

This time it's different.

Yeah. Yeah. Do you buy that or what do you kind of make of that take?

Hmm. I think that it's very expensive to say this time is different sometimes. But when you're just writing code, it's actually okay to just try to make a call. And I think it may not even matter if this call is right or not. I just don't even care that much because you can be right on the thesis, but if you don't figure out how to monetize the thesis, then who cares if you said something first? That said, it does feel like, for example, We went through a lot of different ways of packaging integrations up with agents, and it feels like we've landed at Skills, which is like the minimal viable format, which is just a markdown file with some scripts attached to it. And I don't see how it can be more simple than that. And so there is some justification for the stability around Harnesses. I feel like there may be more adaptation with regards to maybe the real-time elements or subagents or memory or any of those agent disciplines, let's call it, in agent engineering. But if the thesis is that, okay, you just want agents are LLMs with tools in a loop with a file system where they can do retrieval with skills and all these standard tooling that now seems to be relatively consensus, then probably that makes sense. I just think there's no point trying to stake your reputation on this thesis that we're there because if it changes again, just change with it. It's fine. Yeah.

I've always been struck by how that is much more challenging for infrastructure companies than application companies. Obviously, I think on the application side, you've seen Brett Taylor from Sierra, Maxine Ström from Logora. They're like, look, we build what's ahead of the models and we're willing to throw everything out every 3 months as the models get better and better. —But the thing you at least have there is you have, uh, you have an end customer, right, that's like decently sticky, um, you know, they will mostly stick, you know, they'll give you a shot at least of building these things. What I've always found more challenging, uh, at the kind of like, you know, reinvent yourself every 3 months at the infrastructure layer, it's like, you know, developers are definitely a pickier audience maybe than an accounting firm or, uh, you know, a bank. And so it's definitely a more challenging position to be in, to have to constantly reinvent yourself.

Yeah, and when they churn, it's very complete. They'll leave to the hot new thing because there's no defensibility, I guess. Even if you are a database, people can migrate workloads off databases. It's a known thing. So I think basically what we're talking about is the vertical versus horizontal debate in AI startups. And the way I think about it also is just that When you are, um, LaGora, when you are Abridge, you are the outsourced AI team. Your job is to apply whatever state-of-the-art AI methods.

Yeah, like this translation layer between model capabilities and your end customers.

Yeah, to the end customers. And if they didn't have you, they would have to hire in-house and they're not going to hire in-house, so they have you. And I think that's a reasonable, very robust to any whatever trends and discoveries that people make in the engineering layer. I do think there is useful horizontal companies being built, but they're all very much the reinventions of classic cloud in the AI era, and the primary one being sandboxes. Yeah. Which it's another form of compute, guys. Let's not get too excited about it. But I mean, the workloads are enormous. Right.

It's interesting. And I feel like as part of this, the questions that folks are asking around infrastructure, there's a lot around the extent to which companies should have their own AI teams and what they should be doing in-house. And I think there's questions around should people be training their own models? Should people be doing RL in-house based on the data they have? I feel like one has to evolve their takes on this every 3 months with paces.

But where are you at on this today? I think actually all models have gone up. And obviously, I'm involved in cognition and also Cursor is doing a lot of own model training. And I think that that is some part of what I've been calling the agent lab playbook, where you start off with the state-of-the-art models from the big labs and you specialize for your domain. But once you have enough workload and enough high-quality data from your users, then you can obviously train your own models and save a lot on cost and latency and all that good stuff. You also get a marketing bonus of cost calling it some fancy name and putting out some research.

From my seat, I can't tell how much of it is like actual, you know, value that's provided to the end user and how much of it is that marketing bonus, right? It seems some combination of the—

I think it's both. Yeah. No, no, there actually is real value. And you know that for a number of reasons. Like one, even when it's not subsidized, people do choose it as like one of the top 4 or 5. This is both Composer 2 and Suite 1.6. I want the top 5 models, like, you know, in a fair market, in a free market, in a model switch where people do choose it and like it's not subsidized. So that's as good as it gets. But beyond that, like domain-specific models, for example, for search, which both companies have, absolutely makes a ton of sense. Everyone says like, yeah, we should always do this. And honestly, like, I think the infrastructure for that is becoming easier with like Thinking Machine's Tinker thing, as well as Prime Intellec's lab stuff. Yeah, I mean, like, this is one of those like reversal of the, the bitter lesson where you first bootstrap on the large models and the general purpose models to get big. And as you get very well-defined workloads that are just high quantity but not high variance, then you just distill down to a smaller model and run that on your own, which totally makes sense.

What I'm less clear on is the kind of DIY RL use case, which I think is really mostly around, you know, improved quality for, for different things. Obviously there's probably like more efficient ways to get a smaller model that's faster and cheaper. And it'll be interesting to see whether, you know, obviously you had, you know, 2, 3 years ago, this whole case of companies that were, you know, pre-training and claiming better outcomes in their domains, then getting kind of cooked as each model iteration improved. You know, I wonder whether that's a similar story plays out in the RL space. Yeah. For the focus on pure outcomes and quality, not the cost side, which clearly your own models for cost at scale makes a ton of sense.

I think they're just— they're two sides of the same coin. Like, you basically always want to hold quality constant or trade off a little bit of quality for a drastic decrease in cost, and that's true for everyone. Uh, one element I wanted to bring out which is very much in favor of open models is custom chips. So this would be Cerebras but also Talus, and then there's a huge range of stuff in between. Uh, this has been a huge story this past year on just like everything non-NVIDIA is getting bid up, including like freaking MaddX is working for it, which is very rewarding for me. But I think one of those things where like, oh, like the suddenly because the number of alternative hardware is increasing and the inference that you can get is insanely high, like we're talking thousands of tokens per second instead of less than 100. So the trade-off for quality doesn't hold as much anymore because the speed is so high.

Have you seen a lot of companies go all in on the alternative chips?

So Cognition has on Cerebras, uh, and so has OpenAI. Um, and so no, I don't think so beyond that. Uh, and that's mostly because that's foreshadowing. Yeah, I used to be kind of a skeptic in terms of like, okay, so what if I get my inference at 100 tokens per second sped up to 200 tokens per second? It's only 2x faster. It's not that big a deal. Um, but when you— I think Every 10x does unlock a different usage pattern. And we have proof in Thales and some of the others that you can actually drastically improve inference speed. And what happens from there, I don't even really know. It's so hard to predict when entire applications just appear at once. And it also isn't that expensive, right? So this is one of those things where I think the investment cycle is going to be multi-year. And I would caution people to not dismiss it too, too quickly. Yeah.

I mean, one other like infra question I was curious to get your thoughts on is obviously it seems increasingly a lot of the cutting edge infra companies are building for agents as the buyers of their product or users of their product, right?

Ooh. And another huge theme.

Yeah. And I'm trying to figure out like what, what do you have to do differently about selling into agents? Um, are they just the ultimate rational developers? Uh, or is there, you know, no, absolutely not.

Um, I think they are easily prompt injected. And very attuned towards like basically compounding existing winners. Yeah. So like, if like, congrats if you won the lottery for getting into the training data before 2023, because now you're like installed in there for the foreseeable future. But yeah, you know, one stat that Vercel CTO Malte Ubel dropped at my conference was that there are now 60% of traffic to Vercel's admin app architecture for configuring Vercel applications is bots. It's not human. So your primary customer is agents now, and it's mostly coding agents, mostly people using CLI, MCP, whatever. But yeah, I think step one, if it doesn't exist as an API that agents can use, it doesn't exist, which I think it's a good hygiene thing anyway to make everything API available. but now it's like an extra, um, push on like products people to not only work on the UI, um, you should probably work on the CLI stuff. Beyond that, I think honestly there is like, so I come from the sensibility of I think everything that you are trying to do for agent experience now, which is the term that Matt Bilman at Netlify is trying to coin, is the same thing that you should have been doing for developer experience. Like you should have had good docs, you should have had a consistent API, uh, that is mostly stateless. You should have, I guess, discoverable or progressive disclosure or like search or like whatever. And so now that people have energy in like finding these customers to do that, that's great. Do I believe in extending beyond that into something like AEO for gaming the chatbots? Not necessarily, but obviously there's going to be huge advantages from people who figure out the short-term wins, and short-term wins can compound.

Do you think these compounding advantages to like the pre-training data cutoff companies, like, you know, obviously over some period of time I imagine that doesn't persist. And so as you think about like, I don't know, 3, 4 years from now, what the selection criteria end up being, do you think it still mirrors exactly what you were saying before? Like it's exactly what you should have been doing all along to sell a good product to developers?

It could be, except that I think in 3, 4 years we'll probably have much better memory and personalization. So then general AEO or GEO doesn't really matter as much. So I think whatever memory or personalization system we end up with will probably determine what you end up choosing much more than what is currently the case, which is just frequency of mentions, let's call it. Yeah. So you just spam quantity. And I think that's, I mean, that's something I'm looking forward to. I do think like, you know, I think that the fundamental exercise to work through for yourself is if you start a new, disruptor company now, there's a big incumbent that everyone knows, like Supabase. Supabase is kind of like the Postgres database incumbent. If you want to start a new Supabase, how would you compete with them? And I don't necessarily have the answer, but I do think people like Resend, relatively new, I think they were started in 2023, and it's still, there was a recent survey where people checked what Claude recommends by default if you just don't prompt it with anything, just say give me an email provider. And says Resend in like 70% of cases. Like the fact that you can get in there with like such a relatively short existence, I think is encouraging. I do think like you do want to do whatever it is to get in that very short mentions list because it's not going to be 20 of them. It's going to be like 3. No, definitely.

It feels like probably more consolidation than ever or kind of like a winner-take-most market than maybe the physics of go-to-market in the past might have enabled.

The other thing also is semantic association is going to be very important in the sense that you want to do the combo articles where you're like, use my thing with Vercel, with blah, blah, blah. And that all gets picked up in a corpus. And so that's probably one thing that you want to do well. I don't know what else. It's one of those things where I feel I'm behind. I don't know how you feel about this, but like—

I think AI is just everyone constantly feeling like they're behind some— Yeah, with— I want to meet the person that doesn't feel behind.

But like with AX, right? Like, so my stance was exactly what I said before. Like, everything that you should do for agents is something that you should have done for humans anyway. And so to the extent that you're just getting more energy to do things for agents, great. But like, it's hard to articulate what new thing apart from just like more spam that you should be doing anyway. That will be my take right now. I do think like there will be more turns at this. I think the personalization turn that is coming will be big. And I don't know what that looks like because like basically we're kind of— we feel kind of tapped out on the memory side of things.

Yeah, I guess since we last chatted, you know, you took this role over at Cognition and you obviously have a front row seat to the AI coding space today. I feel like coding in many ways, you know, people view it as this like, I mean, besides being like the mother of all markets and this massive opportunity, I think it's kind of a preview of like what's to come for many other spaces, both, you know, I feel like agents are most advanced in coding. I also feel like the, you know, competition between foundation models and application companies, you know, mirrors what we may see in other spaces. And so maybe for our listeners, can you just lay out like what is the state of the AI coding wars today? It is massive, right?

And I don't think necessarily last time we talked about this, we appreciated the size of what— No, I wish we did. The state of AI coding wars today, both OpenAI and Tropic have made it their P0s to compete in coding. And Tropic is at like $2.5 billion in ARR just from Cloud Code. The way they recognize ARR is up for debate. OpenAI, I don't think a public number is known, but let's call it $2 billion as well. And then Cursor is rumored to be $2 billion. And those are the public numbers that are known. So huge markets that have just been created in the past 1 year. Anthropic just, like Claude Coe just recently celebrated their 1-year anniversary, which is pretty nice. Yeah, that's pretty nice. So, and then I think, The other thing that I see is there's some other people who are like, oh, here's the relative penetration of cloud use cases. And it's coding 50% and then legal, whatever, health, it's the remaining ones. And there was a very popular tweet that was like, okay, look at the empty space and all these other use cases. If you are a new founder today, you should be betting on the other stuff because on a catch-up theory. And my consider my, my pushback is the same pushback that I had on Apple versus Google, which is like, well, why is this time different? Like, why, if it went from, let's say, 10 to 50% in the past year, why can't it keep going? And like, getting that wrong is actually a very painful one because you could have just did the momentum bet instead of the mean reversion bet. So I, I think that, that is the, the state of things now, that people are very much into psychosis. Um, there are getting rewarded for spending more rather than spending less. And I think we're not in that phase of efficiency. We're in a phase of sort of like capability exploration. So I think people who are more crazy, who are more creative, get rewarded comparatively. Well, it's interesting.

I mean, it feels like behind these like token maxing leaderboards and whatnot is this— it's like the first phase of this transition from a workforce perspective is you just got to show your employer like, hey, I use these tools.

Here's my number of tokens I cost. And that's it. They don't care about the quality right now. It is maybe distasteful to someone who cares about the craft and all that, but directionally everyone just wants you to go up regardless. And so it's not very discerning, and it's probably very sloppy, but I think it's net fine because we're still probably underusing AI just in generally. And so I think that's very interesting. We had on the podcast Ryan Lopopolo from OpenAI, who spends a billion tokens a day. And that's for those counting at home, it's something like $10,000 worth a day of API tokens if they did market rates. And most of us can't afford that. And probably a lot of what he does is slop. But if there were a new capability, he would discover it first before you because he was trying and you were not trying. And like, you only do things that work. Like, well, good for you, but like the people who are going to discover the next hot thing are living at the edge.

Right. And increasing living at the edge is just having the compute budget to like run these experiments. I mean, kind of similar to what living at the edge on the research side has always been. You know, it was constrained in many ways by the amount of compute you had to run these experiments. It feels similarly on the, almost on the builder or like actualizing these tools now.

The other thing that's, I mean, very obvious is Anthropic is kind of like the high-priced premium player. That where restricting limits or restricting model releases even is like the name of the game, whereas Codex is like, come on in guys, use our SDK, use our login, we don't care, we're going to reset limits, whatever. You do want to try to exploit the subsidies where you can get it. And definitely Codex is super subsidized right now. Gemini also very subsidized. And comparatively, like I think you should make hay, I guess, while that's going on. It's not that bad. To be a capabilities explorer on just the $200 a month plan from Cloud Code or from OpenAI. Um, and, uh, I, I, my sense is that people aren't even there yet.

How do you think this like market ultimately plays? I mean, it's obviously such a big market that, you know, any slice of that market is interesting for, for anyone going after it. But I think what, what makes people so interested in the coding market particularly is it feels like it's kind of this foreshadowing of what will happen in other, you know, any other kind of application market that the foundation models eventually turn to and run their models against and gather data around. And so how do you think, you know, like, does there end up being room for lots of different kinds of players? Or like, what do you think the end state of this market is? And is that— do you think that's applicable to other markets?

I feel like there will be— I mean, status quo is probably the most likely outcome, which is there are two big players and there's a small range of longer-tailed people that fit other use cases that the two big players don't. That feels right to me. I think that for it, for the market structure to significantly change, there would be, there needs to be significant change in like the economics or like the brand building or like the value propositions of the companies involved. And I haven't seen any in the last 6 months that have really changed the stories materially. So I feel like they would just keep going until something else happens. Something else happens, meaning Microsoft wakes up and goes like, guys, we have GitHub, we have, we'll do something much bigger here other than just Copilot. And that would be a big change. MSL has put out a model now and I was in a breakfast with Alex Wang where they were like, yeah, we really, really want to go after the coding use case. They haven't done anything yet, but don't underestimate them. And similarly for the Chinese labs, I think they're trying to go after it. Like ZAI is doing stuff, GLM, ZAI and GLM are the same thing. And so it's like everyone's trying to get a piece of that pie. I feel like the status quo has been pretty stable for the past almost a year, I will say.

Yeah. And is there room for the application companies more on the enterprise side, or where do the What surface area do the model companies leave for application companies?

Yeah, that's a good one. Um, it's very much evolving. Um, it— I, I will say, because OpenAI did not have this, the, the, this level of attention on coding a year ago, we just don't have that much history, right? Um, and it seems like, for example, so the big push at OpenAI now is the super app. Um, is that a consumer thing? Is that like a products portfolio rationalization thing, how much is that going to take away attention from coding at the time when they actually do want to put more coding? I think it's very unclear. So I do think there's all these— at both big labs, there's— sorry, at both OpenAI and Anthropic, and D-Minus and XAI are separate cases. They are trying to see the other TAM expansion areas. So cloud code for finance. Cloud Cowork, all those, all those things. Whereas I think Cursor and Cognition are like comparatively just focused on coding. And so I do think they leave space, and I do think for the other verticals that also means the same thing, right? That, uh, that they're not going to be that, um, intensely focused on, on that domain. Except for, I think I would mark out finance and healthcare as like the next ones, um, that they're clearly going after. Uh, I, I would say comparatively healthcare seems more thorny. There have been some announcements about it, but I would respect the finance work a lot more just because the path to money is a lot clearer.

Yeah. No, I mean, obviously, I think maybe similar to the space that's being left in these other domains, there's obviously a lot that's required to actually implement these tools in enterprises versus maybe just giving them, giving model access to folks out of the box. Yeah. Yeah. Yeah.

So the agent lab thing is like, we'll do the last mile for you. Whereas I think the model labs tend to just trust the model and be minimalist about it. Both of them work. I don't, I don't necessarily think one beats the other for every use case. All I do know is that it does seem like the large enterprises do want a dedicated partner that isn't just the model labs, which is kind of interesting.

We've been in this phase of pure capability exploration, and so I think nothing has been better for large labs, right? I mean, they were always gonna be at the frontier of capability exploration. And so I think have a very good relationship with a lot of these enterprises. But ultimately over time, like the incentive structure of these labs is always gonna be maximal token consumption for the end customers they work with. And there's just, I think, so few companies that have actually gotten to massive scale. Maybe coding again is the most interesting. So it's the first space that really is just completely gone, you know, Yeah, you must live it every day, like absolutely insane. And I think even—

okay, I mean, like, I think we say good things about Crystal Cognition, but the sheer liftoff of like both Anthropic and OpenAI, because they have independent valuations. I mean, let's throw in xAI in there. Yeah, it's now IPOing at $1.2 trillion. That number is just mind-boggling. Like, I feel like in normal investing or normal startups, there's kind of like a ceiling market cap or valuation that, that like you, you reach and you're going like, all right, it's going to be chiller from now on. Like, and these guys are not slowing down.

Yeah, well, I also think the dynamic that's fascinating about some of these later stage companies is, is, you know, in the past, I feel like in venture world, if you got to a certain level of scale, the question around you was really more a valuation question. This is like why there was different phase, like, you know, types of venture people did it. Like the late stage growth people were just incredible at like, you know, a little bit of what's the ultimate market opportunity of this company, but also what's the right way to, to evaluate. Like we know it's it's in some bands of an outcome that is like, sure, there's some variance to it, but it's like relatively understood what that band is. And then maybe you get over time surprise to the upside. Whereas any kind of like, even the labs themselves, any later stage company, the bands of which that company might be worth right now, even in a year or 2 years, are so massive because of how fast the ecosystem changes that it's like, even for later stage companies, every 3 months could be an existential level event to the upside, to the downside. Yeah. And I think that, like, you're obviously seeing it in the positive with code, which, you know, if you think about a company like Anthropic, you know, that for a while it was like unclear if they were going to have access to enough capital to really stay in the race. Right. And then coding hit at the exact right time. They had the perfect model for it. They executed brilliantly. And, you know, now we're, you know, one of the most valuable companies in the world.

At the same time, I don't find— I have zero sympathy for OpenAI because they're crushing it. And they're all rich. This is like a high-class champagne problem to have, to be number 2 at coding or whatever. Like, who cares? Like, you're doing great. It's funny though, I can't even—

I mean, you would be closer to this, even though you're in the iCoding space, but it's like, a lot of people I talk to think Codex is just as good, if not better, than Cloud Code, right? I think one thing that I've been really surprised by, and maybe Cloud Code is a better product in some ways, I'm curious your thoughts, is just in consumer AI with ChatGPT, you saw this big first mover advantage, right? Where admittedly today, like, I don't know, Claude, Gemini, great products. Not sure, not abundantly clear ChatGPT is any better, but like people stick with ChatGPT. It's the first thing they introduce them.

They stay, but they're not growing anymore. I don't know if you've seen. Right.

But that to me is more of like a product problem than it is. They're not like— it's not like they've like lost share to someone else. My understanding is the overall problem with consumer AI today is much more of a how do you take this tool and And, you know, for folks like us, like knowledge workers, it's like this incredible magic tool, but it's not necessarily a daily active use tool for a lot of people around the world today. And what are the like product— it's kind of a category-wide problem. Like in coding, for example, like the entire space has gone parabolic. There may be some relative growth in other consumer AI players, but it's not like consumer AI as a category is like going parabolic and they're not capturing most of that thing. I think it's actually the larger problem is much more, hey, the category has kind of hit a bit of a plateau. —people haven't figured out how to bring, you know, tons more users on board or increase the frequency of those users. And so it seems more of a category-wide problem than it is, you know, a massive market share change. I was gonna draw the comparison to the coding space, where Cloud Code was the first product, obviously, to introduce people to this magical experience. You know, by all accounts, Codex is pretty damn close to as good, if not better. Um, but, like, still, that first product, you would have thought that would not be a super sticky, product surface area and it actually has, it turns out, it feels like the first lab to introduce you to an experience really does keep a lot of the focus.

I think maybe it's like still early days, you know, ChatGPT is like 3+ years old and Cloud Code is only 1. Just turned a year. So give it time. Yeah. I mean, definitely a lot of people have switched to Codex. Maybe that will keep going. It's like really hard to tell. Yeah, I do think that because we are in this, like, high volatility, high temperature phase, the loyalty and stickiness to first movers and category creators, I don't think is as high as it might be in some other areas in our careers that we've looked at.

Yeah, though, I mean, I've been surprised by the Cloud Code thing. I would have thought that, like, in many ways, I always worried about the—

You think it would have been gone by now?

Not gone, but I would have— I always worried that the, that the consumer business of these companies would be quite sticky, and the enterprise API business, uh, was actually like, you know, in some ways like your least loyal buyers. Like, they would, they would move to—

but they worked out that it wasn't the enterprise API, it was enterprise product.

Totally. And maybe that was the, that was the secret that like— but the amount of lock-in or just default behavior that has happened in that space, uh, is, is more than I might have imagined with two products that by all accounts are pretty damn similar.

Yeah, no fight there. I will say I do think that Codex is still in a catch-up, in terms of personal experience. The only thing I like out of Codex is Spark. I feel like the skills integration is a little bit better. I feel like the speed is a bit better, maybe because it's written in Rust or whatever. Very minor things that you almost telling yourself. Rather than objectively assessing between two of them. I do think vibes-wise, I think that's going on. I feel like the missing questions in this whole debate is why is it so concentrated in only two names? Where is the Gemini presence? Where is the XAI presence? And they are trying. It's just they haven't made that much progress yet.

I think what the Claude moment does show, it actually in some ways makes you a little more bullish on the potential for someone else to catch up because it does feel like if you're the first person to introduce some magical net new product experience, that that actually might be stickier than one might have imagined.

Right, right, right. Okay. Yeah. And so what do you think that new product experience might be like? It's like, and this is a failure of imagination on my part. Like, I always wonder, like, people always say this, like, well, the thing that will save us is like being first to the next new thing. Like, what is it?

Yeah. I don't know, something around like, uh, consumer agent, computer use, like hybrid. I think the obvious— I think we're like scratching the surface on the consumer side.

So my current theory is like the Open Claw is like a vision of things to come. Totally. Um, and, uh, it's gonna be good that OpenAI has like the association with Open Claw, but by no means do they have the rights to win it. The general thesis that I have been pursuing now is that the year— the same way that 2025 was the year of coding agents, 2026 is coding agents breaking containment to do everything else. Coding agents continue to still win, but because they generate software and software eats the world, so it's kind of like the trans-associative property of software eats the world, coding agents eat software, therefore coding agents eat the world, which is an interesting—

Breaking containment, always an easier phrase in the consumer context than the enterprise one. You've seen people run these really cool experiments in their own personal lives, I think, like figuring out— Yes. How you— obviously everyone's focused, you know, on the enterprise side now around how you create these experiences. I feel like the vibes, you know, people love to have these narratives of like everything is completely shifted. It's like, I actually, you know, OpenAI organizationally, uh, you know, volatility aside, is, you know, great products, great team, great models. Like everyone else in the world is incentivized for there to be 2, 3 more. Everyone would love more like great model companies. And so I feel like the the natural forces of the world revolt when any one company, you know, is too much the star of the show, right? There's so many people in the ecosystem that are incentivized for that not to happen. And so I think I'd be shocked if we don't have a reversion of vibes, not maybe completely the other way, but at least a little bit more equal at some point over the next 6-12 months.

I think there's just kind of different stages. So when you talk about the world once wanting more model companies, I think about like the Neo Labs. Yeah. And I mean, I don't know, is it fair to say none of them have really broken through in the past year?

I think that's totally fair.

Which is rough. And well, how are we going to grow that diversity in choice? Like, this is it.

Yeah, it'll be really interesting to see what ends up happening with that. And you've seen folks like NVIDIA very incentivized to make sure there's a broader platform of other model providers.

I think, uh, I don't know, people say this, but I, I don't think they try that hard. NVIDIA tries harder to build Neo Clouds. Yeah. Than Neo Labs.

Well, they try pretty damn hard to build Neo Clouds, so that's—

yeah. But like, you know, let's call it like the, the CoreWeaves of the world, much happier place in, you know, than any Neo Lab built on top of them. Yeah.

Though one might argue it's, it's easier to, to enable a Neo Cloud to be successful than it is, uh, you can't will a Neo Lab into existence.

The same way. So NVIDIA has more direct control over it, for sure.

What else is kind of catching your eye today on the startup side? I mean, you worry— there's obviously this whole narrative of like, you know, the foundation models, you know, they announce a product and every stock goes down 15%. Like, yeah, do you worry about the foundation models just kind of eating into a bunch of these startup categories?

Not really. I think actually, like, as, uh, there's— okay, there's, there's, there's the point of view of like being an investor in startups, And there's a point of view of like, do you want to start something? And I think honestly, like the downside for all these is so minimal in the sense of like the worst you do is you just get hired into one of these labs anyway. So I think the market for people who just do things and try things and try to execute in like a competent way, even if like it doesn't work out commercially, even if it just wasn't that great anyway, but like that's your job interview to go into one of these things anyway. I don't feel that from a very, very small startup's perspective. Mid-sized startups, yes. I would say there's been a lot of dead LLM infra consolidation, like the Langfuses of the world getting absorbed into ClickHouse. I think people have maybe worked out the domain-specific playbook, and I think that's okay. Yeah, I'm not that worried about— OK, so I would say I'd be more worried about traditional SaaS, like low-end PaaS SaaS. This is the whole AI versus SaaS debate that's been going on. And literally, I'm going through that exact thing in my company. So I'm kind of thinking through this on a very visceral level, right? On one hand, you have the people who say, you vibecoders don't appreciate the amount of work that goes into a CRM. And like, yeah, you think you can rip out Salesforce, so did the 30 entrepreneurs before you, right? Like, you know, you classically underestimate the things that you don't deeply know, and the target audience is not you. At the same time, like, we have never been able to build software so easily and customize software so easily. And like, yeah, you're not going to use 90% of the things in Salesforce. So like, what's the tip?

What have you done internally?

So we have the main SaaS that we do for event management and sponsor management. and we pay $200K a year for that. Not huge, but like chunky for my scale. And like, yeah, I could probably spend $2,000 and build like a custom version of that. The trick has been dealing with the rest of my team and getting them on board because I'm the most technical person on my team, but like I can't make that decision myself, right? Like I think in the same way I've been telling with other CEOs, team leaders as well, it's like, well, you can be super cloud-pilled, you can be super LLM psychosis and that you think that's okay, but you have to bring your team with you. And I think the sort of widening disparity in LLM psychosis in companies is causing real rifts because on one hand, the people who are less AI native are not getting with the picture. They're actually behind. They're actually not waking up to the fact that you everything you think is necessary is not actually that necessary. And in fact, it would be better of you if you just like held your nose and went in and came out the other side only talking to agents in natural language. And like, your life would actually be better and you're just like close-minded. There's that perspective. The other perspective is, oh, you vibecoder, you did this in a weekend and you got the 80% solution and now the rest of your employees have to pick up the rest of your shit. That you thought you were such hot, amazing at, but actually you didn't figure it out. And actually LLMs are still useless at this and blah, blah, blah. So I think there's this huge debate going on in every company right now. And I have a small microcosm of it, but yeah, it's making me hesitate to pull the trigger, but I will at some point. It's like maybe I put it off for 1 year, but not 5. Yeah, but like, so, so like SaaS is definitely getting squeezed. Um, it does make me wonder, like, I do think that there's an opportunity for a more AI-native, um, system of record thing that is not just Postgres, um, or not just MongoDB, although both are very good. Maybe it's like a Convex, or like people bring up Convex a lot. I don't know, like, like I just feel like the sort of quote-unquote Firebase of AI apps isn't really a thing yet beyond what we have, which is fine. It's just we could probably start in a more sort of rapid iteration cycle first before scaling up to like a Postgres or MongoDB, which are more sort of old tech. I was at a dinner with Mike Krieger, the CPO of Anthropic, and we were just kind of going around the room going like, What are people most worried about? And for me, instead of security, I brought up biosafety. Yeah, classic. Actually, I said it was cliché and classic, and the rest of the table were like, what do you mean someone sitting at home can manufacture a virus that wipes out half of humanity? It was like the OG Geoffrey Hinton, like, this is why you should be scared. I'm like, yeah, like, read the risk reports. Like, this is like the thing. I think, and Mike was just sitting there knowing he was sitting on Mythos and going like, actually it's security. And I think like, I think there's part of it is very good marketing, like too good. Like I would actually advise Anthropic to tune down the marketing because also it's just a very good model and you don't have to make so many marketing claims around it. At the same time, it is not really a private model if you give it to 40 companies, each of whom have like 10,000 employees or whatever, right? It's not private. It's like there's bad actors in there.

Yeah, hopefully not as bad as releasing it widely. But no, I mean, it's an interesting case study for how, I mean, many model releases might, I mean, you know, this might be the first model release that looks like the rest of them from now on, right?

There's an overall product strategy of for Anthropic of like bundle, restrict access, bundle product with model maybe, whereas OpenAI has definitely been a lot more philosophically aligned on we will just enable access everywhere and we don't know what will come out of it.

Though I mean, this current moment, obviously the cynical take is also just ties to the amount of compute that both companies have.

Yeah, I think that's true. I do think this is the The scale and the dawn of larger than 10 trillion parameter models is very interesting. I don't think it— I think it's a temporary phenomenon because we have much larger compute clusters coming online for everyone over the next 3, 5 years. And this is already written in the cards. So to the extent that, will we have rationing of models above 10 trillion in 2 years? I don't think so. I think everyone will have access.

No, we'll just have rationing of the next phase.

Right, right. But like, that's as it should be, almost. Like, um, my classic example, which I— this is just me theorizing, not anything confirmed by Google— when Google announced Gemini, they actually announced 3 sizes, which was Flash, Pro, Ultra. They never released Ultra. They only have Pro and Flash. Um, so my theory is they have Ultra sitting in the basement and they just keep distilling from it for, for Flash and Pro. Which like, yeah, I mean, I actually think that's as it should be for any lab that they do that.

Yeah, just because those are the models that people actually want to end up using and it's just like cost per hundred.

It's more, yeah, it's cost. It's not the want, it's just the cost. I do think like it is interesting that for a while I was considering the theory that models capped out at 2 trillion and I think that's, proving to be wrong. And well, then if I'm wrong, how wrong am I? Do we do 200 trillion? Do we do 2 quadrillion or whatever? And I don't think we have the straight answer to that. But it's interesting that we are continuing to scale number of params when everyone can see that we're not going to get the next 1,000 or 1 million X from this paradigm. So the others, like the Elio of the world, are working on other model architecture improvements, we need a different scaling law, I guess, because I feel like people already feel like we're tapped out on this. The end state of this is we turn most of the world into data centers. And I don't know if we want that.

Yeah, though, I mean, if the returns of intelligence are there, maybe not so bad.

I think there's just a sheer amount of like unscalability that is wrangling people's sensibilities right now, especially in terms of context lengths. My classic quote is that context length is the slowest scaling factor in LLMs. Yeah. We took maybe 3 years to go from 4,000 context length to 1 million, and that's about it. Gemini has had 1 million token context length for 2 years now. And no one's using it. So like, yeah, memory is probably going to be the biggest limiting constraint on all these things.

Yeah, certainly seems that way. I guess I'm curious over the last year since we recorded last, like what's one thing you've changed your mind on?

I feel like I was kind of bearish on open models like last year in a sense of like I had just done the podcast with Ankur Goyal of Braintrust, where he, and he, I mean, you know, he has a good cross-section of all the top AI companies and he says market share of open source is 5% and going down. I think that's changed. I think it's going up. And even if— Even though the capability gap does seem to be increasing depending on the tech. It's hard to tell. Yeah, it's really hard to tell because like, okay, for listeners, capability gap increasing is like on public benchmarks, and let's say you're comparing Mythos versus, I don't know, GPT-OS or GLM 5.1. And it was really hard to tell because even if they were closing, you will also not believe that they were closing that much because it's very easy to game the benchmarks. So you just don't really, really know. All you know is there's somewhat objective open router stats on what people choose in a free market, and people do choose some of these open models. In significant volume, except that a lot of them are heavily discounted. So you need to kind of like price adjust these things. So even if, even if that were true, which I'm not sure, like, I feel like the number is just up now instead of down, uh, I think the separation between what the top-tier agent labs are doing versus the average startup in AI or the average GPT wrapper is significant enough that you should not worry about the, the the sort of mean industry number, and you should cohort things into like, here's the median, here's like the bottom 80%, and here's the top 20%. And top 20% acts very differently than the bottom 80%. And so top 20%, which is all I care about, is definitely going towards more open models. The Fireworks and the Togethers are crushing, and so will the fine-tuners, right? So like, I think maybe last time we even said things like fine-tuning as a service doesn't work. Well, now it's going to work. It's a derivative of the open models market.

Well, also in the workload scaling to the point where people care about cost and speed more and more and moving from just pure use case discovery of what can these models do to, okay, we know what they can do at scale.

Now let's do them cheaper and faster. Yeah. That change I think is probably the most significant in in my mind. And like, I always like to do the mental math of like, uh, this is what I think about, uh, scheduling a learning rate. Like, when you've been wrong once, what else were you wrong on? Um, and I'm kind of working through it. I, to me, the, the other thing was the coding one, um, which obviously I, I have now come full 360 on. But I think like people are not appreciating dark factories enough, which I don't know if you've discussed in the pod yet. No. Um, And so this is a kind of a StrongDM/Simon Willison term. The general idea is, okay, there's different levels of AI coding psychosis you can have. The very first level, which I, by the way, I encountered first in Cognition 5 months ago, was zero human-written code. Yeah. Right, which seems like a reasonable thing now, was less reasonable 5 months ago. The next frontier that sounds as crazy today as zero coding was in the past is zero human review. Yeah, like you just, just check it in without even reviewing it. And very few people are doing that. But OpenAI is exploring this, and I feel like it's definitely the only scalable way to do this, which it just means like you have to just kind of like flip the SDLC or change large amounts of what you normally do. which is probably things you should have done anyway, more testing, more automated verification or whatever. But that is a frontier at which when you have unlocked that in your companies, you are just going to produce much more quantity of software than you've ever had. It's going to be so much, so disposable, so cheap that you can probably innovate in quality a lot as well. That quantity helps you get to quality, which I think people are very uncomfortable with because people associate more quantity with slop.

Right, no, it's back to exactly the discussion we're having on like the reaction to these token maxing scoreboards and the idea that like today maybe that's not the most, uh, the, the, the best sign of productivity and efficiency, but going forward—

yeah, but you still get rewarded for it, so you're like, fuck it, whatever. But like, uh, I, I think like the, the people who are doing well, who do well, who do most well in 2026 are not the cynics who go like, oh, that's just slop, I'm not going to want to participate in that. They're like, okay, this is happening with or without me. Let's bend this the right way.

Yeah, I love that. I think for me, a kind of related thing on the open source model side is for so long, I really didn't think it made any sense to do any sort of RL, post-training, pre-training, anything you could do to improve kind of overall quality. Certainly for latency and cost, it always made sense to me. But for overall quality, like, God, you just get that for free in the models like 3, 6 months later. I think what I'm starting to change my tune on a little bit is, you know, hearing all these app companies talk about like, you know, we build stuff and then we throw it out 3 months later as like the models improve. You're like, okay, well then what you're doing for capability improvement is just another version of that, right? Like I still don't think that like your RL or like post-train is gonna make you have a better model for like years and years to come. But maybe I think you still have to be pretty rigorous on like, is that the single best thing you can do to solve a customer problem? Oftentimes it's literally just like, now add more data and feed more data, even via connectors to these models, or do some clever engineering on the backend or whatever it is. But if the single best thing you can do for that 3-month time period to improve your customers' outcomes is post-training in some way that really improves the output of the model, even if you throw it out 3 months later because the general models get up there, it still might have been worth doing. And so I think I'm more open to—

You throw out the results, but you don't throw out the raw data. Totally. Right, then you just run it again.

And so basically there's some— obviously at the level of cost of like $10 million, maybe that's too much, but there's some level of cost where—

no, it's actually not even $10 million.

No, of course it's not, you know. Yeah, there's obviously some level of investment, uh, at which it's the equivalent of just like staffing 4 engineers to go build something for 3 months.

Yeah. So the other thing I really— for listeners, I'm just going to leave some, some droplets of info, uh, look into like the, the long trajectory, the synthetic rubrics work that people are doing is very important, including something that's called Dr. GRPO. I'll just leave those key search terms in there. I think what it means is that RL is going much more multi-turn than people think. And that means that you can customize the models in way more specific dimensions than traditional, let's call it SFT or shallow RL. That was done a year ago. So like hundreds of turns. And I think that leads you down a path of complete domain specificity.

What else are you, of these unanswered questions in AI today, are you looking for in the next year? Are you paying close attention to?

I have a few thesis for what is the next frontier. One is memory, which memory and personalization we talked about. The other is really world models. Which we've done a small little series on from Fei-Fei Li all the way to even Moonlake and General Intuition. And there's a lot of debate as to the relative importance of this. I think a lot of it manifests as 3D static worlds that you inhabit for a little bit and you walk around and they're like, cool, but how does this help me with my B2B SaaS?

It's like all the hype now is robotics, right?

Yeah. And there's obviously a correlation between world models and embodied vision and experiences, which leads to robotics. But I think world models is very interesting in just improving intelligence itself from the next token prediction paradigm. And so I think people are kind of testing their edges around that. One of our top articles this year so far has been on adversarial world models. I do think like if you don't do anything else, just read Fei-Fei Li's essay on spatial intelligence on why LLMs don't need, don't have it. And she is, she may not have the solution yet, but she has the right problem statement. And so everyone else is trying to solve that problem statement in their own way. And let's see who wins. But like, I don't think it, does you any favor to equate world models to robotics or world models to gaming or some kind of like, or like the current manifestations. Because what is at stake is a much more important conception of intelligence than just answering questions. It is, does the AI understand what a table is? Like what matter is, what physics is. It's almost like for those who are movie fans, it's like Good Will Hunting. Where Matt Damon knows everything because he read it in a book, but he's never lived it.

Great scene with Robin Williams.

Robin Williams. I look at that scene and I go, that's exactly the difference between a very intelligent LLM who knows everything but hasn't experienced anything. Wow.

That's an awesome note to end on. Have you used that answer before?

That was great. Yeah. One thing I've done with LLMspace is I moved to adding daily write-ups. One of the times I was doing this data write-up.

That's a great one. I love that. Um, well, so it's been a ton of fun. Thanks so much for coming on, Josh. I'm Jacob Efron, and this has been Unsupervised Learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses in the world. As I hope is clear, I have a ton of fun doing this. It's a nights and weekends project in addition to my day job as an investor at Redpoint. But our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work. And so please consider doing that. And thank you so much for your support and listening. We'll see you next episode.