Transcript Reader Lenny's Podcast
Library
Builder transcript 中文已完成

'Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon'

Read the source conversation in a calm, mobile-friendly layout.

ChannelLenny's Podcast
Language中文
SourceYouTube
Coverage100%
0% 章节 01
Video Source 'Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon'

Lenny's Podcast

https://www.youtube.com/watch?v=z7T1pCxgvlA
Reading Mode

默认显示中文,缺失的章节会自动回退到英文原文,保证这页随时可读。

章节 01 / 01

全文

中文 译稿已完成

Lenny Rachitsky我们一起写了一篇客座文章。他们有一个非常关键的洞察——构建AI产品与构建非AI产品是非常不同的。

Aishwarya Naresh Reganti大多数人都倾向于忽略非确定性。你不知道用户会如何与你的产品互动,你也不知道LLM可能会如何回应。第二个区别是自主权与控制的权衡。每次你把决策能力交给代理系统时,你都在某种程度上放弃了自己一端的控制权。

Lenny Rachitsky这极大地改变了产品构建的方式。

Kiriti Badam所以我们建议逐步构建。当你从小处着手时,它会迫使你思考我要解决的问题是什么。在AI的所有这些进展中,一个常见又危险的陷阱是不断思考解决方案的复杂性,而忘记你试图解决的问题。

Aishwarya Naresh Reganti问题不在于成为竞争对手中第一个拥有代理的公司。而在于你是否建立了正确的飞轮,以便能够随着时间推移不断改进。

Lenny Rachitsky你在那些成功构建AI产品的公司里看到了哪些工作方式?

Aishwarya Naresh Reganti我以前和现在Rackspace的CEO一起工作过。他每天早上有一个固定时段,上面写着"处理AI事务 凌晨4点到6点"。领导者必须回到亲自动手的状态。你必须接受一个事实——你的直觉可能并不正确。而且你可能是房间里最蠢的人,你要向每个人学习。

Lenny Rachitsky你认为未来一年AI会是什么样子?

Kiriti Badam坚持非常宝贵。现在在任何新领域构建产品的成功公司,他们正在经历学习、实施和理解什么有效什么无效的痛苦。痛苦是新的护城河。

Aishwarya Naresh Reganti谢谢你,Lenny。

Kiriti Badam谢谢你邀请我们。非常兴奋。

Lenny Rachitsky让我为今天的对话铺垫一下。你们两位自己构建了很多AI产品。你们与很多公司进行了深入合作——有些成功构建了AI产品,有些在构建AI产品和AI代理方面遇到了困难。你们还教授一门关于成功构建AI产品的课程,而且你们正处于一个使命中——减少人们在构建AI产品时不断经历的痛苦和失败。在我们即将进行的对话开始之前,你们对正在尝试构建AI产品的公司看到了什么?什么做得好?什么做得不好?

Aishwarya Naresh Reganti我认为2025年和2024年相比已经显著不同。首先,怀疑态度已经显著减少。去年有大量领导者可能认为这又是一波加密浪潮,对开始持怀疑态度。去年我看到的很多用例更多是在你的数据上做聊天机器人。然后就称自己是AI产品了。而今年,大量公司正在真正重新思考他们的用户体验和工作流程,真正理解你需要解构和重建你的流程,以便构建成功的AI产品。这是好的部分。不好的部分是执行仍然非常混乱。这个领域才三岁。没有现成的 方法论,没有教科书。所以你需要边走边摸索。AI生命周期,无论是部署前还是部署后,与传统软件生命周期都非常不同。

所以很多传统角色之间的旧契约和交接,比如PM和工程师和数据人员,现在已经被打破了,人们正在真正适应这种新的协同工作方式,以某种方式拥有相同的反馈循环。因为以前,我觉得PM和工程师和所有这些人都有自己需要优化的反馈循环。现在你需要可能坐在同一个房间里。你们可能一起查看代理追踪,决定你的产品应该如何表现。所以这是一种更紧密的协作形式。公司们仍在摸索这一点。这是我今年在咨询实践中看到的。

Lenny Rachitsky让我顺着这条线继续。我们几个月前一起写了一篇客座文章。这篇文章中给我留下最深刻印象、最挥之不去的关键洞察是——构建AI产品与构建非AI产品是非常不同的。你们强调要传达的两个重大差异是什么?谈谈这两个差异。

Aishwarya Naresh Reganti是的。我再次想确保我们传达正确的观点。构建AI系统和软件系统有很多相似之处,但确实有一些东西从根本改变了构建软件系统与构建AI系统的方式。其中一个大多数人都倾向于忽略的是非确定性。你基本上是在使用一个非确定性API,而不是传统软件。这意味着什么,为什么这会影响我们?在传统软件中,你有一个非常明确映射好的决策引擎或工作流。想想像Booking.com这样的产品。你有一个意图——你想在旧金山预订两晚的房间等等。产品已经被构建成这样,你的意图可以被转化为一个特定的行为,你通过点击一系列按钮、选项、表单,最终完成你的意图。

但现在在AI产品中,这一层已经被一个非常流动的界面完全取代了,这个界面主要是自然语言,这意味着用户可以用大量不同的方式表达或沟通他们的意图。这改变了很多东西,因为现在你不知道用户的输入会是什么。这是输入端。输出也是你正在使用一个非确定性的概率API,也就是你的LLM。LLM对提示措辞非常敏感,而且它们基本上是黑箱。所以你甚至不知道输出会是什么样子。所以你不知道用户会如何与你的产品交互,你也不知道LLM会如何回应。所以你现在面对的是输入、输出和过程。你对这三者都不完全了解。你在尝试预测行为并为之构建。
在使用代理系统时,这种难度就更大了。这也是我们在说第二个差异的地方,也就是自主权与控制的权衡。我们所说的意思——我非常惊讶有这么多人不谈论这个。他们一味追着构建自主系统,即能够为你工作的代理。但每次你把决策能力或自主权交给代理系统时,你都在某种程度上放弃了自己一端的控制权。当你这样做时,你想确保你的代理已经赢得了你的信任,或者它足够可靠,你可以允许它做决定。这也是我们在说的自主权控制权衡——如果你给你的AI代理或AI系统更多的自主权,也就是决策能力,你也在失去一些控制权,而且你想确保代理或AI系统已经获得了这种能力,或者已经随时间建立了信任。

Lenny Rachitsky所以我来总结一下你们分享的内容——本质上,人们构建产品、软件产品已经很久了。我们现在处于一个你所构建的软件具有两个特性的世界:第一,非确定性,可以以不同方式做事。正如你所说,你去Booking.com,找一家酒店,每次体验都是一样的。你会看到不同的酒店,但那是一个可预测的体验。而AI,你无法预测它每次都会是你计划的完全相同的东西。第二,存在自主权与控制之间的权衡。AI为你做多少,人类应该保留多少主导权?我听到的核心观点是,这极大地改变了产品构建的方式。我们接下来要谈谈这对产品开发生命周期应该产生的影响。

在我们深入之前,有什么想补充的吗?

Kiriti Badam是的,这确实是关键点之一——当你开始构建时,这种区分需要存在于你的脑海中。例如,想想如果你的目标是攀登Yosemite的Half Dome。你不会每天一开始就爬它,而是从小的部分开始训练自己,然后慢慢进步,最后达到最终目标。我觉得这和你构建AI产品的过程非常相似——你不要在第一天就上线配备了所有工具和公司所有上下文的代理,然后指望它能工作,甚至在那个层面进行调试。你需要有意地从影响最小、人类控制最多的地方开始,这样你就能很好地掌握当前的能力是什么,我能做什么,然后慢慢转向更多的自主权和更少的控制。

所以这给了你信心——好的,我知道这是我面临的特定问题,AI能解决其中多少。然后让我下一步思考我需要引入什么上下文,我需要添加什么样的工具来改善体验。所以我觉得这也是一件好事和坏事——好的是你不必看到这个外部世界的复杂性,所有这些花哨的AI代理带来的复杂性,然后觉得你做不了。每个人都是从非常简单的结构开始,然后逐步演进。不好的第二点是,当你试图在你公司里构建一键代理时,你不必被这种复杂性淹没。你可以慢慢进阶。
所以这非常重要。我们看到这是一个反复出现的模式。

Lenny Rachitsky好的。让我们顺着这个话题继续,因为这是你们建议人们构建AI产品、AI代理、所有AI相关东西的方式中一个非常重要的部分。给我们举个例子来说明你们在说的——从低自主权低控制开始,然后逐步提升的想法。

Kiriti Badam好的。例如,AI代理的一个非常重要和应用广泛的场景是客服支持。想象你是一家公司,有很多客服工单——为什么OpenAI也是完全一样的情况——当我们发布产品时,比如Image或GPT-5这样的成功产品,会有大量的支持量涌进来。你遇到的问题类型是不同的。客户带来的问题类型是不同的。所以这不是简单地把你的帮助中心文章列表丢给AI代理。你需要理解你能构建什么东西。所以最初的第一步可能是这样:你的支持代理是人类支持代理,但你向他们提供建议,告诉他们AI认为正确的做法是什么。

然后你从人类那里获得反馈循环,告诉你这个建议在这个特定情况下是好的,这个建议是不好的。然后你可以回过头来理解——这是缺点,这是盲点——然后我如何修复。一旦你获得了这个,你就可以增加自主权,说我不需要向人类建议了。我会直接向客户展示答案。然后我们实际上可以增加更多复杂性——我原来只根据帮助中心文章回答问题,但现在我可以添加新功能。我实际上可以为客户处理退款。我实际上可以向工程团队提交功能请求。所有这些。如果你从第一天就构建所有这些,复杂性将难以控制。
所以我们建议逐步构建,然后逐步增加自主权。

Lenny Rachitsky太棒了。你们实际上有一个可视化图表我们会分享,展示这个过程。但我来复述一下你们描述的内容——从高控制、低自主权开始,你们给的例子是支持代理只提供建议,不能做任何事情,用户处于主导地位。然后当这变得有用,你确信它在做正确的工作,你就给它多一点自主权,减少用户的控制。然后如果进展顺利,你就给它更多自主权,用户需要更少的控制来控制它。太棒了。

Aishwarya Naresh Reganti我觉得更高层次的想法是——对于AI系统,一切都关乎行为校准。提前预测你的系统如何表现是极其不可能的。那么你应该怎么做?你要确保你不要破坏你的客户体验或最终用户体验。你保持体验不变,但减少人类的控制权。没有单一正确的做法。你可以决定如何约束这种自主权。约束自主权的另一个例子是预授权场景。保险预授权对AI来说是一个非常成熟的应用场景,因为临床医生花大量时间预授权血液检测、MRI之类的事情。有些案例是更容易落地的部分。例如,MRI和血液检测,因为一旦你了解患者信息,批准就更容易了,AI可以做到。而像侵入性手术之类的事情风险更高。你不想自主处理那些。

所以你可以决定哪些用例应该经过人类在循环中的环节,哪些用例AI可以方便地处理。然后在整个过程中,你也在记录人类的行为,因为你想建立一个飞轮,用来改进你的系统。所以本质上你没有破坏用户体验,没有侵蚀信任,同时记录了人类本会做什么,以便你能持续改进你的系统。

Lenny Rachitsky让我再给你们一些你们推荐的这种演进模式的例子。我花这么多时间在这上面的原因是——这是你们帮助人们构建更成功的AI产品建议中非常关键的部分。这个想法是从高控制、低自主权开始,然后随着你建立了它在做正确工作的信心,再逐步提升。所以你们在文章中分享的几个例子我读一下。例如你要构建一个编码助手,V1只是建议内联补全和样板代码片段。V2是生成更大的代码块,比如测试或重构,供人类审查。然后V3就是自主应用更改并打开PR。另一个例子是营销助手。V1是起草邮件或社交文案,就像这样——这是我做的。

V2是构建多步骤营销活动并运行活动。V3就是直接发布,经过A/B测试,自动优化跨渠道活动。太棒了。是的,再来总结一下我们现在的位置,给大家我们目前分享的建议。首先,理解AI产品是不同的这一点很重要。它们是非确定性的。你们指出了——我忘了复述这一点——输入端和输出端都是如此。用户体验是非确定性的。人们会看到不同的东西、不同的输出、不同的聊天对话,如果它是在为你设计UI的话,也许还有不同的UI。输出端显然也是非确定性的。所以这是一个问题和挑战。然后——

Aishwarya Naresh Reganti如果你想想,这也是AI最美好的部分——我们都比按一堆按钮之类的事情更舒服地交谈。所以使用AI产品的门槛低得多,因为你可以像和人类交流一样自然,但问题也在这里——我们交流的方式太多了,你要确保意图被正确传达,正确的行动被执行,因为你的大部分系统是确定性的,你要达到确定性的结果,但用的是非确定性的技术——这就是混乱开始的地方。

Lenny Rachitsky太棒了。好的。我喜欢这个乐观版本的解释。另一个部分是关于设计产品时自主权与控制权衡的想法。我想象你们看到的是,人们试图直接跳到理想状态,比如V3,然后就陷入了麻烦——既难构建,又不起作用。然后他们就会说,"好吧,这是个失败。我们到底在做什么?"

Kiriti Badam完全正确。我觉得在到达V3之前,你实际上需要在很多事情上建立信心。你很容易感到不堪重负——天哪,我的AI代理以一百种不同的方式做错了——你不可能把这些问题都列出来然后修复。即便你学会了如何处理评估实践之类的东西,如果你在错误的地方开始,你实际上很难从那里纠正。而当你从小处着手,当你从一个人类高度控制、低自主权的极简版本开始,这也迫使你思考我要解决的问题是什么。我们用这个词叫"问题优先"。对我来说,这本应是显而易见的——我确实需要思考问题——但令人难以置信的是,它和人们产生了多大的共鸣。在我们看到的AI所有进展中,一个常见又危险的陷阱就是不断思考解决方案的复杂性,而忘记你试图解决的问题。

所以当你从更小规模的自主权开始时,你开始真正思考我要解决的问题是什么,以及如何将其分解为我以后可以构建的自主权级别。这非常有用,我们和每个交流的人反复强调这一点。

Lenny Rachitsky限制自主权还有很多其他好处,因为AI做太多事情本身也存在危险——它可能搞乱你的数据库,发送你没预料到的邮件。有太多理由说明这是个好的做法。

Aishwarya Naresh Reganti是的。我最近读了UC Berkeley一群人写的论文。基本上是Matei Zaharia和Databricks的人,论文说他们交流过的企业中,大约74%或75%的企业最大的问题是可靠性。这也是为什么他们对将产品部署到最终用户或构建面向客户的产品感到不舒服,因为他们不确定或不愿意这样做,让用户暴露在这些风险中。这也是为什么他们认为很多今天的AI产品都与生产力有关,因为那需要的自主权远低于替代工作流的端到端代理。是的,我很喜欢他们的其他工作,但我认为这和我们至少在创业公司看到的非常一致。

Lenny Rachitsky好的。非常有趣。在这次对话之前会有一期节目,我们会深入探讨这个问题避免的另一个问题——就是提示注入和越狱,以及这对AI产品来说是多么大的风险——本质上这是一个可能无解的问题。我不打算深入那条线,但我们之前进行的那个相当令人担忧的对话会在这次对话之前发布。

Aishwarya Naresh Reganti我认为一旦系统成为主流,这将是一个巨大的问题。我们现在太忙于构建AI产品,不担心安全问题,但随着这个非确定性API再次出现,这将成为一个巨大的问题。所以你基本上被卡住了,因为有很多指令可以被注入到你的提示中,然后事情就会变得非常糟糕。

Lenny Rachitsky好的。让我们花点时间讨论这个,因为对我来说真的很有趣,而且没有人在谈论这些——就是我们进行的那次对话,让AI做它不该做的事真的很容易。人们设置了很多护栏系统,但事实证明这些护栏实际上并不那么有效,你总是可以绕过它们。和你说的一样,随着代理变得更自主、更机器人化,你能让人工智能做不该做的事就变得相当可怕了。

Kiriti Badam我认为这绝对是个问题,但我觉得在当前客户采用AI的范围内,公司实际能从AI获得优势、改善流程或简化现有流程的程度,我觉得仍然处于非常早期。2025年是AI代理和客户尝试采用AI极其繁忙的一年,但我觉得渗透率还没有高到你能真正从中获利的程度。通过在正确的地方设置人类在循环中的节点,我觉得我们实际上可以避免很多事情,更多地专注于简化流程。我是乐观派,你需要尝试和采用AI,而不是只关注可能出错负面方面。

我强烈认为公司应该采用AI,他们绝对……我们交谈过的每家OpenAI公司都从来没有过这样的情况——天哪,AI在这种情况下帮不了我。实际情况总是——天哪,有这么一组事情它可以为我优化,然后让我看看我能如何采用它。

Lenny Rachitsky好的。我总是喜欢乐观的视角。我很期待你们听听这个并看看你们怎么想,因为真的很有趣。和你说的,有很多事情值得专注。这是众多需要担心和思考的事情之一。好的,让我们回到正轨。我们分享了一堆实用建议和重要观点。我想问,在做得好、成功构建AI产品的公司和团队中,你们还看到了哪些其他模式和工作方式?人们最常犯的错误是什么?我们可以从公司做得好、成功构建AI产品的其他方面开始。

Aishwarya Naresh Reganti我几乎把它想象成一个成功三角形,有三个维度——它从来不全是技术问题。每个技术问题首先是人的问题。对于我们合作过的公司,有这三个维度——优秀的领导者、良好的文化和技术实力。就领导者而言,我们与很多公司在AI转型、培训、战略等方面合作。我觉得很多公司的领导者建立了10年或15年的直觉,他们因这些直觉而受到高度尊重。但现在有了AI,这些直觉将不得不重新学习,领导者必须脆弱地去做这件事。我以前和现在Rackspace的CEO Gagan一起工作过。他每天早上有一个固定时段,上面写着"处理AI事务 凌晨4点到6点",那段时间他不会有任何会议之类的事情。

那是他了解最新AI播客或信息的时间。他周末还会有 vibe coding 环节。所以我认为领导者必须回到亲自动手的状态。这不是因为他们必须实施这些东西,而是更多是为了重建他们的直觉,因为你必须接受一个事实——你的直觉可能并不正确,而且你可能是房间里最蠢的人,你要向每个人学习。我见过这是成功构建产品的公司一个非常显著的特征,因为你带来了自上而下的方法。这几乎总是不可能自下而上的。你不能让一群工程师去获得领导者的认可,如果他们不信任这项技术,或者对技术有不匹配的期望的话。
我听过很多构建产品的人说,我们的领导者就是不理解AI能在多大程度上解决特定问题,或者他们只是 vibe code 了点什么,然后就认为把它带到生产环境很容易。你真的需要了解AI今天能解决什么范围的问题,这样才能指导公司内的决策。第二是文化本身。同样,我与一些不是以AI为主业的企业合作,他们需要把AI引入流程,只是因为竞争对手在这样做。而且也确实有意义,因为有些用例非常成熟。一路走来,我觉得很多公司有这种FOMO文化——你会失业——之类的东西,人们变得非常害怕。领域专家是构建有效AI产品如此重要的一部分,因为你真的需要咨询他们,了解你的AI表现如何,或者理想行为应该是什么。
但我也和很多公司谈过,他们的领域专家就是不想和你谈,因为他们认为他们的工作正在被取代。所以我的意思是,这又来自于领导者本身。你想建立一种赋能的文化,把AI融入你自己的工作流程,这样你就能在你做的事情上实现10倍的效果,而不是说如果你不采用AI你可能会被取代之类的话。这种赋能的文化总是有帮助的。你想让整个组织团结一致,让AI为你服务,而不是试图保护自己的工作等等。对于AI来说,确实它比以往开启了更多机会。所以你可以让你的员工做比以前更多的事情,提高10倍的生产力。第三是技术部分,也就是我们谈论的。

Aishwarya Naresh Reganti我认为成功的团队对深入理解自己的工作流程非常执着,找到哪些部分适合用AI增强,哪些可能需要在某些环节由人类介入。当你试图自动化工作流程的某一部分时,从来不存在你用一个AI代理就能解决你所有问题的情况。通常是,你可能有一个机器学习模型来做部分工作,有确定性代码做另一部分工作。所以你真的需要非常执着地理解那个工作流程,这样你才能为问题选择正确的工具,而不是对技术本身执着。另一个我看到的模式是,团队真正理解与这个非确定性API——也就是你的LLM——协作的意义。这意味着他们也理解AI开发生命周期看起来非常不同,他们迭代得非常快——也就是能否快速构建一个东西,在不破坏客户体验的同时,给我足够的数据来估计行为。

所以他们很快建立了那个飞轮。截至今天,问题不在于成为竞争对手中第一个拥有代理的公司。而在于你是否建立了正确的飞轮,以便能够随着时间推移改进。当有人来找我说,"我们有这个一键代理,它将被部署在你的系统中。"然后两三天后,它就会开始给你带来显著收益。我几乎会怀疑,因为这不可能。这不是因为模型不行,而是因为企业数据和基础设施非常混乱,你需要一点时间……即使是代理也需要一点时间来理解这些系统如何运作。每个地方的分类法都非常混乱。人们倾向于做这样的事情——获取客户数据,我们要,获取客户数据,我们做——诸如此类。所有这些函数都存在,它们在被调用,而且基本上有很多技术债务你需要处理。
所以大多数时候,如果你对问题本身非常执着,而且你非常了解你的工作流程,你就会知道如何随着时间改进你的代理,而不是简单地上线一个代理然后假设它从第一天就能工作。我甚至可以说,如果有人向你销售一键代理,那是纯粹的营销。你不应该买这个。我宁愿选择一家说"我们将为你们构建这个管道"的公司,这个管道会随着时间学习,建立飞轮来改进,而不是开箱即用的东西。要替代任何关键工作流或构建能给你带来显著ROI的东西,即使你有最好的数据层和基础设施层,也很容易需要四到六个月的工作。

Lenny Rachitsky太棒了。这里有很多内容与我在这档播客上的其他对话产生了深刻共鸣。首先,一家公司要成功地看到AI带来的很大影响,创始人CEO必须深入其中。我请了Dan Shipper来播客,他和很多公司合作帮助他们采用AI。他说成功的首要预测指标是:CEO是否每天多次与ChatGPT、Claude或其他工具聊天。我喜欢你举的这个Rackspace CEO的例子——他每天早上都会跟进AI新闻。我想象他是在和聊天机器人聊天而不是读新闻。

Aishwarya Naresh Reganti以你今天拥有的信息类型,你完全可以……我是说,你也想选择正确的渠道,因为每个人都有看法。所以你想信任谁的观点?我觉得拥有一组高质量的信息来源你一直在听真的很有意义。所以他只有两三个他一直看的来源。然后他带着一堆问题回来,和一堆AI专家讨论看他们怎么想。我曾是那个小组的一员,所以我大概知道——

Lenny Rachitsky我喜欢这样。

Aishwarya Naresh Reganti……他提出的那些问题。

Lenny Rachitsky这很酷。

Aishwarya Naresh Reganti很酷。我就问,"你为什么做这么多?"然后他说,"这会影响我们做出的很多决策。"

Lenny Rachitsky好的。让我谈谈另一个话题,这个话题在播客上非常……在播客上是个热门话题。在Twitter上也火了一阵——evals。很多人执着于evals,认为它们是AI很多问题的解决方案。很多人都认为它们被高估了,你不需要evals,你只要凭感觉就行了。你们怎么看evals?它们在解决你们谈论的很多问题上能帮人们走多远?

Kiriti Badam就社区里发生的事情而言,我觉得存在一个虚假二分法——要么evals能解决一切,要么线上监控或生产监控能解决一切。我觉得没有理由相信这两个极端中的任何一个,我不会把我的应用完全押在这上面或那上面来解决事情。所以如果你退一步想,evals是什么?Evals基本上是你的可信赖的产品思维,或者你关于产品的知识,进入你要构建的这组数据集——这就是对我来说重要的东西,这是我的代理不应该犯的错误类型,让我构建一个数据集列表,这样我就能在这些上面表现好。就生产监控而言,你在那里做的是部署你的应用,然后你有一些关键指标实际上在向你反馈客户如何使用你的产品。

你可以部署任何代理,如果客户对你的互动给了赞,你最好想知道这一点。这就是生产监控要做的。这个生产监控已经存在于产品中很长时间了,只是现在有了AI代理,你需要监控更多的粒度。不只是客户总是给你明确的反馈,还有很多你可以获得的隐含反馈。例如,在ChatGPT中,如果你喜欢答案,你可以给赞。或者如果你不喜欢答案,有时候客户不会点踩,但他们会重新生成答案。所以这是明确的信号,表明最初的答案不满足客户期望。所以这些是你需要思考的隐含信号类型。
而生产监控的范畴一直在扩大。现在让我们回到最初的话题——好的,evals还是生产监控?这重要吗?所以我觉得,我们再次回到这个"问题优先"的方法——你要构建什么?你要为客户构建一个可靠的应用,它不会做坏事,它总是做正确的事。或者如果它做了错误的事,你基本上会很快收到警报。所以我把它分成两部分。第一,没有人会在不实际测试的情况下部署应用。这个测试可能是 wipes,或者这个测试可能是,"好吧,我有这10个问题,无论我做什么改变都不应该出错,让我构建这个,我们称之为评估数据集。"现在,假设你构建了这个,部署了这个,然后你发现,"好吧,现在我需要了解它是否在做正确的事。"
所以如果你是一个高吞吐量或高交易量的客户,你实际上不可能坐下来评估所有追踪。你需要一些指标来了解我应该看哪些东西。这就是生产监控介入的地方——你无法预测你的代理可能出错的基础,但所有这些其他隐含信号和显式信号会向你反馈你需要查看哪些追踪。这就是生产监控帮助的地方。一旦你得到这种追踪,你需要检查在这些不同类型的互动中你看到了哪些失败模式。是否有我真的在乎的不应该发生的事情?如果这种失败模式正在发生,那么我需要考虑为它构建一个评估数据集。
好吧,假设我为我的代理试图提供退款但我明确配置它不能这样做构建了一个评估数据集。我构建了这个评估数据集,然后我在工具或提示或任何东西上做了修改,然后我部署了第二版产品。现在不能保证这是你唯一会看到的问题。你仍然需要生产监控来实际捕获你可能遇到的不同类型的问题。所以我觉得evals很重要,生产监控很重要,但认为只有一个能解决问题——这个观点我认为完全站不住脚。

Lenny Rachitsky好的。一个非常合理的答案。这里的要点不是"两者都做"这么简单。而是不同的事情需要捕获不同的东西,一种方法无法捕获你需要关注的所有事情。

Aishwarya Naresh Reganti正是。

Lenny Rachitsky太棒了。

Aishwarya Naresh Reganti我想退后两步,谈谈"evals"这个词在2025年下半年承载了多少重量。因为你去见一个数据标注公司,他们告诉你我们的专家在写evals,然后你有所有这些人在说PM应该写evals,它们是新的PRD。然后你又有人说evals基本上就是一切,也就是你应该构建的用来改进产品的反馈循环。现在作为一个初学者退一步想,evals是什么?为什么每个人都在说evals?这些实际上是过程中不同的部分,没有人是错的——是的,这些都是evals——但当一个数据标注公司告诉你我们的专家在写evals时,他们实际上指的是错误分析,或者专家只是在记录什么是正确的。

律师和医生写evals,那不意味着他们在构建LLM评判器或整个反馈循环。当你说PM应该写evals,这不意味着他们必须写一个对生产足够好的LLM评判器。我认为也有一些非常规范的方法,加上KD,也就是你无法提前预测你需要构建LLM评判器,还是需要使用生产监控的隐含信号等等。Martin Fowler在某个时候提出了一个叫"语义扩散"的术语,那大概是2000年代的事情,意思是有人提出了一个术语,每个人开始用自己的定义扭曲它,然后你基本上失去了它真正的定义。这就是现在evals或代理或AI中任何词汇正在发生的事情,每个人对它都有不同的理解。
但如果你让一群实践者坐在一起,问他们,"为AI产品构建可操作的反馈循环重要吗?"我想他们都会同意。现在,你怎么做真的取决于你的应用本身。当你进入复杂用例时,构建LLM评判器是非常困难的,因为你看到很多新出现的模式。如果你构建了一个用来测试冗长性或类似东西的评判器,结果你发现你正在看到评判器无法捕获的新模式,然后你最终就构建了太多evals。在那点上,看用户信号、修复、检查你是否退步然后继续而不是构建这些评判器就变得合理了。所以一切取决于上下文。我觉得每个ML实践者都会告诉你的一件事是——这真的取决于上下文。不要执着于会改变的处方。

Lenny Rachitsky这是一个非常重要的观点,特别是evals对不同的人来说现在意味着很多不同的事情。它只是一个涵盖很多事情的术语。当你看到数据标注公司给你的东西和PMR的东西时,谈论evals就变得复杂了。对吧?而且还有基准测试。人们把基准测试也叫做evals。就像——

Aishwarya Naresh Reganti我最近和一个客户交谈,他告诉我,"我们做evals。"我说,"好吧,能给我看看你的数据集吗?"他说,"不,我们只是查了LM Arena和Artificial Analysis。这些是独立基准,我们知道这个模型对我们的用例是正确的。"我说,"你没有做evals。那不是evals。那些是模型evals。"

Lenny Rachitsky但这有道理。这个词在那种语境下可以用。我理解为什么人们这么想,但确实,现在它让事情更混乱了。

Aishwarya Naresh Reganti是的。

Lenny Rachitsky这里还有一条追问是我一直在想的——这件事引发大讨论的原因是Cloud Code。Cloud Code的负责人Boris说,"不,我们在Cloud Code不做evals。全凭感觉。"Kiriti,你在Kodex和Kodex团队,关于你们如何做evals,能分享什么?

Kiriti Badam在Kodex,我们有这种平衡的方法——你需要有evals,你也确实需要倾听你的客户。我认为Alex最近来过你的播客,他一直在谈论他多么专注于构建正确的产品。而很大一部分基本上就是倾听你的客户。编码代理与其他领域的代理相比非常独特,因为这些实际上是为定制化和为工程师构建的。所以编码代理不是一个用来解决前五或前六个工作流的产品。它旨在以多种不同方式可定制。这意味着你的产品将被用于不同的集成、不同的工具和不同种类的东西。所以为你产品将被使用的所有类型的交互构建评估数据集真的非常困难。

话虽如此,你也必须理解,如果我要做一次改变,它至少不会损害对产品真正重要的东西。所以我们做评估来做到这一点,但同时我们极度注意理解客户如何使用它。例如,我们最近构建了这个代码审查产品,它获得了极大的关注。我觉得OpenAI以及我们外部客户的很多bug都被它捕获了。现在假设我要对代码审查做一个模型改变,或者用不同的RL机制训练它,现在如果我要部署它,我绝对想做A/B测试,看看它是否真的找到了正确的错误,以及用户如何反应。有时候如果用户确实因为你的错误代码审查而烦恼,他们会干脆关掉这个产品。
所以这些是你要关注的信号,确保你的新更改在做正确的事情。对我们来说,预先想出这些场景并为之开发评估数据集是极其困难的。所以我觉得两者都需要。有很多直觉,也有很多客户反馈,我们在社交媒体上非常活跃,了解是否有任何人在遇到某些类型的问题,然后快速修复。所以我觉得这是……我怎么说呢?这就像一个你需要做的领域。

Lenny Rachitsky这太有道理了。好。我听到的是,Codex做evals,但只有evals不够。

Kiriti Badam是的。

Lenny Rachitsky同时也要关注客户行为和反馈。有时候也有点凭感觉——这个东西用起来感觉好吗?当我用它的时候,生成的代码让我兴奋,我觉得它很棒。

Kiriti Badam我不认为有人能拿出这样一套具体的evals,然后说我就靠这个了,然后我不需要想其他的了——这不是可行的方式。每次你要推出新模型,我们作为一个团队聚在一起测试不同的事情。每个人专注于不同的事情。我们有这份难题清单,我们把它们交给模型,看看它们进展如何。所以这就像每个工程师有自定义的evals,你可以说,只是理解你的产品在新模型中的表现。

Lenny Rachitsky如果你是一个创始人,创业最难的部分不是有点子,而是扩大业务规模而不被后台工作埋没。这就是Brex的用武之地。Brex是面向创始人的智能财务平台。使用Brex,你可以获得高限额的企业卡、便捷的银行业务,高收益的国库券,外加一支AI代理团队帮你处理手动财务任务。他们会做所有你不想做的事情——记录你的开支、仔细检查交易中的浪费、按照你的规则运行报告。使用Brex的AI代理,你可以更快地行动,同时保持完全控制。在美国,每三家创业公司就有一家使用Brex。你也可以——请访问brex.com。

我们已经聊了将近一个小时了,我们还没有聊到你们两个开发的、你们在课程里教的、基本上把我们今天讨论的所有东西整合成构建AI产品的逐步方法的、非常强大的软件开发工作流程。你们称之为持续校准、持续开发框架。让我们展示一个可视化图表给人们看看我们在说什么,然后带我们了解一下这是什么,它如何运作,团队如何转变他们构建AI产品的方式到这个方法上来,帮助他们避免很多痛苦和失败。

Aishwarya Naresh Reganti在我们解释这个生命周期之前,先讲一个简短的故事,讲讲Kiriti和我为什么想到这个——因为我们不断在和很多公司交流,他们因为竞争对手都在构建代理而感到压力。我们应该构建完全自主的代理。我确实和一些客户合作过,我们构建了这些端到端代理。结果发现,因为你从一个你不知道用户会如何与你的系统交互、AI可能会做出什么样的响应或行动的地方开始,当你有这个需要走四五步、做大量决策的大型工作流时,修复问题真的很难。你最终要调试这么多东西,然后热修复——有一次我们为客服用例构建产品的时候,就是我们在新闻通讯里给出的例子。

我们不得不关闭那个产品,因为我们在做太多热修复,而且没有办法统计所有正在出现的问题。网上也有相当多的新闻。最近我想Air Canada有这件事——他们的一个代理预测或幻觉出了一个退款政策,这不是他们原始playbook的一部分,但他们不得不执行,因为法律问题。已经有很多非常可怕的incident。所以这就是这个想法的来源。如何构建才能不失去客户信任,如何确保你的代理或AI系统不会做出对公司本身非常危险的决策?同时构建一个飞轮,这样你就能随着时间改进你的产品。这就是我们提出持续校准、持续开发这个想法的地方。
这个想法非常简单——我们有循环的右侧,也就是持续开发,你在这里界定能力范围并整理数据,基本上得到一组关于你预期输入和预期输出应该是什么样的数据集。在你开始构建任何AI产品之前,这是一个非常好的练习,因为很多时候你发现团队中的很多人对产品应该如何表现根本没有达成一致。这就是你的PM和领域专家可以输入大量信息的地方。所以你有这个你知道你的AI产品应该表现出色的数据集。它不是全面的,但它让你能够开始。然后你设置应用程序,然后设计正确的评估指标。我故意使用"评估指标"这个术语,虽然我们说evals,因为我只是想非常具体地说明它是什么——因为评估是一个过程,评估指标是你在这个过程中想要关注的维度。
然后你去部署,运行你的评估指标。第二部分是持续校准,这就是你理解你最初没有预料到的行为的部分,对吧?因为当你开始开发过程时,你有这个你正在优化的数据集,但更多时候,你会发现那个数据集不够全面,因为用户开始以你无法预测的方式与你的系统交互。这就是你需要做校准的部分。我已经部署了我的系统。现在我看到有些模式我确实没有预料到,你的评估指标应该给你一些关于那些模式的洞察,但有时候你发现那些指标也不够,你可能有你没有想过的新错误模式。这就是你分析行为、发现错误模式的地方。
你对看到的问题应用修复,但你也要设计新的评估指标来找出它们是新出现的模式。但这不意味着你应该总是设计评估指标。有些错误你可以直接修复,不需要再回来处理,因为它们是非常个别的错误。例如,有一个工具调用错误,只是因为你的工具定义得不好之类。你可以修复它然后继续。这基本上就是AI产品生命周期应该的样子。但我们特别还提到的是,当你在这些迭代中时,在开始时尝试考虑低自主权迭代和高控制权迭代。这意味着约束你的AI系统可以做出的决策数量,并确保有人类在循环中,然后随着时间增加自主权,因为你正在构建一个行为飞轮,你正在理解什么样的用例正在出现,或者你的用户如何使用系统。
我们在新闻通讯里给出的一个例子是客服。这是一个很好的图,显示了如何把自主权和控制作为两个维度来思考。你的每个版本都在不断提升你的AI系统的自主权或决策能力,并在过程中降低控制。我们给出的一个例子是客服代理,你可以把它分解为三个版本。第一个版本只是路由——你的代理能够分类并将特定工单路由到正确的部门吗?有时候当你读到这个,你可能会想,做路由就这么难吗?为什么代理不能轻松做到?当你进入企业时,路由本身可能是一个超级复杂的问题。任何你能想到的流行零售公司都有层次化的分类体系。
大多数时候分类体系非常混乱。我曾经处理过一些用例,你可能有这样的分类体系——它说某种层次,然后是鞋子,然后是女鞋和男鞋都在同一层,而正确的应该是鞋子是父类,然后女鞋和男鞋是子类。然后你说,好吧,行。我可以合并这个。然后你继续看,你发现还有一个区域在说女性用的和男性用的鞋子,但它没有被聚合。出于某种原因它没有被修复。如果一个代理看到这种分类体系,它应该怎么做?它应该路由到哪里?很多时候我们直到你实际开始构建并理解某些东西时才意识到这些问题。
当这些真实的人类代理看到这些问题时,他们知道下一步要检查什么。也许他们意识到那个在鞋子下面说女性和男性的节点上次更新是在2019年,这意味着它只是一个死节点,躺在那里不被使用。所以他们大概知道,好吧,我们应该看另一个节点,诸如此类。我不是说代理不能理解这些,或者模型不够强大来理解这些,但企业内部有很多没有记录在任何地方的非常奇怪的规则。你想确保代理有所有这些上下文,而不是简单地把问题丢给它。
是的。回到我们说的版本,路由是一个你有非常高控制的版本,因为即使你的代理路由到错误的部门,人类可以接管并撤销那些操作。一路走来,你还发现你可能在处理大量需要修复的数据问题,并确保你的数据层足够好,让代理能够运作。我们做的是Copilot,也就是既然你经过几次迭代后发现路由工作正常,你已经修复了所有的数据问题,你可以进入下一步——我的代理能够基于我们为客服代理制定的一些标准操作程序提供建议吗?它可以生成一份人类可以修改的草稿。当你这样做的时候,你也在记录人类行为——也就是说,人类使用了多少这个草稿,什么被省略了。所以当你记录用户所做的所有事情时,你实际上免费获得了错误分析,这些你可以构建回到你的飞轮中。
然后我们说,在这之后,一旦你发现那些草稿看起来不错,而且大多数时候人类可能没有做太多修改,他们基本上按原样使用这些草稿——那就是你想进入端到端解决方案助手的时候,它可以生成可以解决工单的解决方案。这些就是自主权的阶段,你从低自主权开始,然后逐步提升到高自主权。我们还有这个非常好的表格,是我们整理的——在每个版本中你做什么,你学到什么可以推动你进入下一步,以及你获得什么信息可以输入到循环中,对吧?当你只做路由时,你有更好的质量路由数据,你还知道需要构建什么样的提示来改进路由系统。

Aishwarya Naresh Reganti本质上,你正在搞清楚你的上下文工程结构,并构建你要的飞轮。当我浏览这个的时候,我还想非常明确地说两件事。第一,当你用CCCD思维构建时,这不意味着你已经一次性解决了问题。你可能已经走过了V3,你看到了一种你以前从未想象过的新数据分布,但这只是一种降低风险的方式——也就是说,在达到完全自主之前,你有足够的信息了解用户如何与你的系统交互。第二件事是,你也在构建这个隐式日志系统。很多人来告诉我们,"哦,等等,有evals。你为什么还需要这样的东西?"问题在于只构建一组评估指标然后在生产中使用它们——评估指标只能捕获你已经知道的错误,但可能有很多新出现的模式,只有在你将东西投入生产后才能理解。

所以对于那些新出现的模式,你正在创建一种低风险的框架,这样你就能理解用户行为,而不会陷入大量错误同时你试图一次修复所有问题的境地。这不是唯一的方法。有很多不同的方式。你想决定如何约束你的自主权。它可以基于代理采取的行动数量,这就是我们在这个例子中做的。它可以基于主题。某些领域在让系统对某些决策完全自主方面风险很高,但对于其他一些主题,让它们完全自主是可以的,这取决于问题的复杂性。这就是你真的想让你的产品经理、工程师和领域专家对齐的地方——如何构建这个系统并持续改进。
这个想法就是行为校准,而不是在校准时失去用户信任,我想。

Lenny Rachitsky我们会链接到这个帖子,如果他们想深入了解的话。你们基本上是一步一步地走过所有这些步骤,很多例子。而这个想法是,正如你所说的,这就是为什么——你描述的所有这些是关于使其持续和迭代,并沿着他提到的这个从更高自主权、更少控制的演进过程。这个甚至称之为持续校准、持续开发的想法——传达的是这是一种迭代过程。需要澄清的是,这个命名有点像致敬CI/CD,也就是持续集成、持续部署。而这里的想法是,这是AI版本的CI/CD——不是集成到单元测试和持续部署,而是运行evals、查看结果、迭代你正在观察的指标、找出哪里出问题了并迭代修复。太棒了。好的。

所以我们会指引人们找到这篇帖子如果他们想深入了解的话。那是一个很棒的整体概述。在我们进入这个框架的不同话题之前,有什么你觉得对人们来说需要知道的重要事情吗?

Aishwarya Naresh Reganti我认为我们最常被问到的问题之一是:我怎么知道我需要进入下一阶段,或者这已经校准足够了?没有真正可以遵循的规则书,但这都是关于最小化意外——也就是说,假设你每隔一两天校准一次,你发现你没有看到新的数据分布模式,你的用户一直非常一致地按照他们与系统的交互方式行事。那么你获得的信息量非常低,那就是你知道你可以进入下一阶段的时候了。那时候就是关于——你知道你准备好了,你没有收到任何新信息。但也真的有助于理解,有时候有些事件可能完全破坏你系统的校准。一个例子是GPT-4o不再存在了,或者它将在API中被废弃。

所以大多数使用4o的公司应该切换到5,而5有非常不同的特性。所以你的校准又出问题了。你想回去重新做这个过程。有时候用户也开始随着时间以不同的方式与系统交互,或者用户行为在演变。即使是消费品,你和ChatGPT交谈的方式也和两年前不一样了,只是因为你知道能力已经提升了很多。而且人们当这些系统能解决一个任务时会很兴奋,他们想把它试在别的任务上。我们为承销商构建了一个系统。承销是一项痛苦的任务。协议就像贷款申请,有30或40页,这家银行的想法是构建一个可以帮助承销商挑选政策和信息的系统,这样他们就可以批准贷款。
在整整三四个月里,每个人都对系统印象深刻。承销商实际上报告了他们在时间等方面的收益。前三个月,我们意识到他们对产品非常兴奋,他们开始问一些我们从未预料到的非常深入的问题。他们只是把整个申请文档扔给系统,然后说,"对于这样一个案例,以前的承销商做了什么?"对于一个用户来说,这似乎是一个对你正在做的事情的非常自然的延伸,但背后的构建需要显著改变。现在,你需要理解"对于这样一个案例"在贷款本身的背景下是什么意思?它指的是特定收入范围的人,还是特定地理位置的人,诸如此类?
然后你需要挑选历史文档,分析那些文档,然后告诉他们,"好吧,这是它看起来的样子"——而不是仅仅说有一个政策X、Y和Z,你想查找那个政策。所以对于最终用户来说看起来非常自然的东西,作为产品构建者来说可能非常难构建,你看到用户行为也随着时间演变,那就是你知道你想回去重新校准的时候。

Lenny Rachitsky你认为目前AI领域里什么被过度炒作了?更重要的是,你觉得什么被低估了?

Kiriti Badam正如我所说,我对AI中正在发生的不同事情超级乐观。所以我不会说过度炒作,但我觉得对多代理这个概念有些被误解了。人们有这样一个概念:"我有一个非常复杂的问题。现在我要把它分解——你是这个代理,负责这个;你是那个代理,负责那个。"然后如果我以某种方式连接所有这些代理,他们认为这是代理的乌托邦——从来没有非常成功的多代理系统被构建。毫无疑问这一点。但我觉得很多问题在于你如何限制系统偏离轨道的方式。例如,如果你正在构建一个监督代理,而有子代理实际为超级代理监督代理工作,这是一个非常成功的模式。

但带着这个"我将根据功能分解责任,并以某种gossip协议让所有这些一起工作"的想法——这是被极度误解的,你不能那样做。我不认为当前的构建方式和当前模型能力已经到了能够构建那种应用的程度。我觉得这更多是被误解,而不是被高估。被低估的,我觉得可能难以置信,但我仍然觉得编程代理被低估了——我的意思是,你可以上Twitter,你可以上Reddit,你看到很多关于编程代理的讨论,但和任何随机公司的一个工程师交谈,特别是湾区以外的,你能看到这些编程代理能创造的影响量和渗透率非常低。所以我觉得2025年和2026年将是优化所有这些流程令人难以置信的一年。
我觉得这将用AI创造大量价值。

Lenny Rachitsky这第一个观点真的很有意思。所以这个想法是,你更可能成功地构建和使用一个能够自己进行子代理工作分流的代理,而不是一堆——比如说——Codex代理。你做这个任务,你做那个任务?

Kiriti Badam你可以有这些代理来做这些事情,你作为一个人类可以编排它们,或者你可以有一个更大的代理来编排所有这些事情——但让代理以点对点协议的方式相互通信,特别是这样做在客服用例中——要控制哪个代理在回复你的客户是极其困难的,因为你需要把护栏放在所有地方。

Lenny Rachitsky是的。好的。很棒的选择。好的。Ash,你有什么要说的?

Aishwarya Naresh Reganti我可以说 evals 吗?我会被取消吗?

Lenny Rachitsky在哪个类别?哪个桶里?

Aishwarya Naresh Reganti被高估了。

Lenny Rachitsky被高估。好的,说吧。我们不会让你被取消的。

Aishwarya Naresh Reganti开玩笑的。我觉得 evals 被误解了。它们很重要,伙计们。我不是说不重要,但我觉得我只是会不断跨工具跳来跳去,然后如果有新工具就学——这是被高估的。我还是比较老派的,我觉得你真的需要痴迷于你要解决的业务问题。AI 只是一个工具。我试着这样想。当然,你需要学习最新最棒的,但不要过于痴迷于快速构建。现在构建非常便宜。设计更贵,真正思考你的产品,你要构建什么。它真的能解决一个痛点吗?这个在今天更有价值。而且在不久的将来只会变得更加真实。所以真正痴迷于你的问题和设计是被低估的,而死记硬背的构建是被高估的,我想。

Lenny Rachitsky太棒了。好的。类似的问题。从产品的角度来看,你觉得未来一年 AI 会是什么样子?给我们一个愿景,告诉我们你预计到2026年底事情会发展到哪里。

Kiriti Badam是的,我觉得后台代理——主动代理——有很大的前景。它们基本上会更好地理解你的工作流。如果你想想今天 AI 在哪里没能创造价值,主要是因为不理解上下文。而它不理解上下文的原因是它没有接入实际工作发生的地方。当你做得更多,你可以给代理更多上下文,然后它开始看到你周围的世界,理解你在优化的指标集是什么,或者你在尝试做的活动类型。从那里到一个实际从中获得更多然后让代理回调你是一个非常容易的扩展。我们已经在 ChatGPT pulse 中这样做了,它给你每天更新你可能关心的东西。

能够有那个来激活你的大脑真的很好——"哦,这是我没想到的事情。也许这很好。"现在当你把这个扩展到更复杂的任务,比如一个编码代理,它会说,"好吧,我已经修复了你的五个工单,这是补丁。就在你一天开始时审查它们吧。"所以我觉得这将极其有用。我把它视为2026年产品将构建的一个强劲方向。

Lenny Rachitsky这太酷了。所以基本上代理在预判你要做什么,走在你前面,我为你解决了这些问题——或者我觉得这会让你的网站崩溃。也许你应该修复这个东西——我看到这里有个峰值,我们重构一下数据库吧。太厉害了。什么样的世界啊。好的。Ash,你有什么要说的?

Aishwarya Naresh Reganti我对2026年的多模态体验全情投入。我觉得我们在2025年已经取得了相当不错的进展,而且不仅仅是在生成方面,也包括理解。迄今为止,我觉得LLM一直是我们最常用的模块,但作为人类,我们是多模态生物,我想。语言可能是我们进化中最后的形式之一。当我们三个人在交谈时,我觉得我们不断获得这么多信号。我会说,"哦,Lenny 在点头,所以可能我应该往这个方向走;或者 Lenny 无聊了,所以我停止说话。"所以在你思维链背后有一套思维链,你不断用语言改变它——那个表达维度没有被很好探索。所以如果我们能构建更好的多模态体验,那会让我们更接近人类对话的丰富性。而且你也知道——鉴于这些模型的种类——也有很多无聊的任务,非常适合 AI。

如果多模态理解变得更好,有那么多手写文档和非常混乱的 PDF——即使是今天最好的模型也无法处理。如果可能的话,有那么多数据我们可以利用。

Lenny Rachitsky太棒了。我刚看到 DeepMind AI、Google 或他们现在叫的那个整个组织里的 Demis 在说这个——他认为那将是他们前进方向的重要部分,结合图像模型工作、LLM,还有他们的世界模型东西——Genie,我想是这个名字。是的。所以那将是一个疯狂的、疯狂的时代。好的。最后一个问题。如果有人想提高构建 AI 产品的能力,你觉得他们应该专注培养哪一项或两项技能?

Aishwarya Naresh Reganti我觉得我们确实覆盖了很多 AI 产品的最佳实践,也就是从小处着手,试着让你的迭代良好运行,构建飞轮,诸如此类。但再,如果你从一万英尺的高度看今天构建的任何人,就像我说的,实现将在未来几年变得极其便宜。所以真的要打磨你的设计、你的判断力、你的品味,等等。一般来说,如果你在构建职业生涯,我觉得在过去的几年,你职业生涯的前几年,总是专注于执行、机械的东西等等。现在我们有 AI 可以帮助你快速上手,在那之后……我是说,几年之后,我觉得每个人的工作都变成关于你的品味、你的判断力,以及什么是独特属于你的东西。

我觉得要打磨那部分,试着找出你能带来什么样的视角。这不意味着你必须非常年长、有多年经验。我们最近雇了一个人,我们使用这个非常流行的应用来跟踪我们的任务,我们已经用了好几年了,我们支付了很高的订阅费。这个人只是在会议上带来了他自己的 vibe coded 应用。他让我们都迁移到那个上面,他说,"好吧,让我们开始用这个吧。"我觉得这种能动性和这种所有权——真正重新思考体验——是将人们区分开来的东西。我不是无视 vibe coded 应用有高维护成本的事实。也许随着我们作为公司扩大规模,我们必须替换它,或者必须考虑更好的方法。
但考虑到我们现在是一家小公司……我真的震惊了,因为我从未想过这个。如果你一直以某种方式工作,你会把构建和某种成本联系起来。我觉得在那个时代成长的人在他们脑海中与构建相关的成本要低得多。他们就是不在乎构建一些东西然后继续前进。他们也非常热衷于尝试新工具。这也可能是为什么 AI 产品有这种留存问题——因为每个人对尝试这些新工具都如此兴奋。但本质上——有这种能动性和所有权——我觉得这也将是繁忙工作时代的终结。你不能坐在角落里做一些对公司没有推动的事情。你真的需要思考端到端的工作流,你如何能带来更大的影响。
我觉得所有这些都将超级重要。

Lenny Rachitsky这让我想起,我刚请了 Jason Lemkit 来播客。他在销售、市场推广、运营 GTM 方面非常聪明,他用代理替换了整个销售团队。他有10个销售人员,然后他是1.2个人和20个代理。其中一个代理只是追踪每个人的 Salesforce 更新,并基于他们的电话自动为他们更新。其中一个销售人员说,"好吧,我不干了。"结果发现他其实什么都没做。他只是整天闲着,然后他说,"好吧,这会抓住我。我得离开这里。"所以你说的——整天闲着磨蹭会更难——我觉得真的没错。

Kiriti Badam是的。我想补充一下,我觉得坚持也是极其宝贵的,特别是对于任何想要构建东西的人来说——信息比过去十年更是在你指尖。你可以一夜之间学会任何东西,变成那种钢铁侠式的方法。所以我觉得有这种坚持,穿过学习这个,实施这个,理解什么有效什么无效的痛苦。当你经历这个痛苦——开发多种方法然后解决问题——我觉得这将成为个人真正的护城河。我喜欢称之为"痛苦是新的护城河"——但我觉得这确实是极其有用的,特别是在构建这些 AI 产品中。

Lenny Rachitsky多说说这个。我喜欢这个概念。痛苦是新的护城河。还有更多内涵吗?

Kiriti Badam是的,我觉得作为一家公司——我是说,在任何新领域构建的,成功公司——他们的成功不是因为他们最先进入市场,或者他们有这个更受客户喜欢的花哨功能。他们经历了痛苦,理解了一系列不可协商的事情,并与他们可以用什么功能或什么模型能力来解决那个问题做权衡。这不是一个直接的过程。没有教科书教你这样做,也没有直接的方法或已知的credit路径来达到这里。所以我说的很多这种痛苦就是经历这种迭代——"好吧,试试这个,如果不行,试试这个。"而你在整个组织或你自己的亲身经历中建立的这种知识——那种痛苦转化为公司的护城河。这可以是 evals 的产物,或者你构建的东西。我觉得这将成为改变游戏规则的东西。

Lenny Rachitsky太厉害了。简直就是把煤变成钻石。

Kiriti Badam是的。

Lenny Rachitsky好的。我觉得我们做得很好,帮助人们避免了人们在构建 AI 产品时经常遇到的一些最大问题。我们涵盖了很多陷阱和正确做事的方法。在我们进入非常激动人心的快速问答环节之前,有什么你想分享的吗?有什么想留给听众的吗?

Aishwarya Naresh Reganti痴迷于你的客户。痴迷于问题。AI 只是一个工具,试着确保你真正理解你的工作流。80% 的所谓 AI 工程师,AI PM 实际上花时间非常好地理解他们的工作流。他们不是在构建最花哨和最酷的模型或工作流。他们实际上在泥地里理解他们的客户行为和数据。每当一个从未做过 AI 的软件工程师——看这个词——看你的数据。我觉得这对他们来说是一个巨大的启示,但它一直就是这样的。你需要去那里,看你的数据,理解你的用户,而这将成为一个巨大的差异化因素。

Lenny Rachitsky这是结束的好方式。AI 不是答案。它是解决问题的工具。有了这个,我们已经进入了非常激动人心的快速问答环节。我有五个问题问你们两个。你们准备好了吗?

Aishwarya Naresh Reganti好的。是的。

Lenny Rachitsky好的。你们都可以回答。你们可以选择一个想回答的。随你们。以下哪两到三本书是你最常推荐给别人的?

Aishwarya Naresh Reganti对我来说,是这本书叫《当呼吸化为空气》,Lenny。它是 Paul Kalanithi 写的。我想他是一位印度裔神经外科医生,在31或32岁时被诊断出肺癌。这整本书是他的回忆录,是在确诊后写的。它真的美极了,尤其是因为我在 COVID 期间读的——COVID 期间我们唯一想做的就是活着。书里也有很多很棒的引语,但我记得其中之一——他反对苏格拉底的一个流行语——"未经审视的人生不值得过",或类似的——它的意思是你真的需要思考你的选择,你需要理解你的价值观、你的使命等等。Paul 说:"如果未经审视的人生不值得过,那么未经经历的人生值得审视吗?"它的意思是,你是否花了太多时间只是理解你的使命和目的,以至于忘记了生活?

我觉得在这个待在 AI 时代、不断在自我重塑的空间里构建的每个人,都需要暂停一下,活在当下。我想他们需要停止对人生做太多 evals。

Lenny Rachitsky我要说的是——这就是我的想法。你得为你的人生写一些 evals。天哪,我们走太远了。

Aishwarya Naresh Reganti是的。

Lenny Rachitsky太美了。

Aishwarya Naresh Reganti那是我最喜欢的书。

Kiriti Badam我更喜欢科幻书。所以我真的喜欢《三体》系列。它是一个三部曲。它有比科幻更宏大的元素——地球外的生命以及它如何影响人类决策过程。它也有地缘政治的元素,以及抽象科学对人类进步有多重要或有多大价值。当那停止的时候,它在日常生活中不容易被注意到,但它可能造成毁灭性的影响。所以我觉得 AI 在这些领域提供帮助——例如——将极其关键。而这本书是一个很好的反面例子。

Lenny Rachitsky完全同意。绝对喜欢。可能是我最喜欢的科幻书——不,是系列——有三本。顺便说一句,我必须把三本都读完。我发现它只是在大概一本半之后才真的变好。所以如果有人试过然后说,"这是什么鬼?"继续读,到第二本中间,然后就会变得 mind-blowing。

Kiriti Badam是的。

Lenny Rachitsky如果你喜欢科幻而且你在 AI 领域,你得读这本书——Vernon Vinge 的《深渊上的火》。去看看。它难以置信。事实上它讲的是 AGI 和超级智能这些东西,它非常史诗。而且没人听说过它。

Kiriti Badam谢谢你。

Lenny Rachitsky给你一个。下一个问题。你最近发现并真正喜欢的产品中,你最喜欢哪个?

Aishwarya Naresh Reganti我开始重看《硅谷》,我觉得它太真实了。太永恒了。一切都在重演。任何几年前看过的人都应该开始重看,你会发现它和现在 AI 浪潮中发生的所有事情惊人地相似。

Lenny Rachitsky重看是个好主意。我喜欢他们整个业务就像一个压缩算法。就像一个压缩算法。这可能在某种意义上是 LLMs 的先驱。好的,我懂了。好的,Kiriti,你有什么要说的?

Kiriti Badam我要拖一下,说一部电影或电视节目,但我最近玩了一个游戏叫 Expedition 33。它和 AI 无关,但在游戏性或电影、故事和音乐方面,这是一个制作得非常棒的游戏。太厉害了。

Lenny Rachitsky我喜欢你有时间玩游戏。这是一个很棒的信号。我喜欢。OpenAI 的某个人,我只是想象你是……除了编码和开会,没别的了。

Kiriti Badam是的,这真的很难找到时间。

Lenny Rachitsky那很好。那是一个好信号。我很高兴听到这个。好的。你最近发现并真正喜欢的产品中,你最喜欢哪个?

Aishwarya Naresh Reganti对我来说,是 Whisper Flow。我想我用了不少,我之前不知道我这么需要它。最棒的是它是一个概念转录工具——意思是如果你去 Codex 开始用 Whisper Flow,它开始识别变量之类的东西。在转录到指令之间它是如此无缝。你可以这样说,"我今天好兴奋。加上三个感叹号,"它无缝切换。它加上那三个感叹号而不是写"加上三个感叹号"。我觉得这很酷。如果你在用它,你应该试试。
我就是这么知道它的,Lenny。

Lenny Rachitsky好的。我想人们真的不完全理解这有多不可思议。他们说,"不可能这是真的。它是真的。"还有18款其他产品,lennysproductpass.com,去看看。继续。Kiriti。

Kiriti Badam太棒了。我实际上是一个生产力迷。我不断尝试新的 CLI 工具和让我更快的东西。所以我觉得 Raycast 太棒了。我发现了所有这些你可以用的新快捷方式——打开不同的东西,输入快捷命令之类。还有 Caffeinate 是我最近从队友那里发现的另一个东西。它帮助你防止 Mac 睡着,所以你可以本地运行这个很长的 Codex 任务四五个小时,让它构建那个东西,然后你醒来可以说,"好吧,这个很好。我喜欢这个。"

Lenny Rachitsky那个组合太有意思了,Codex 加 Caffeinate。你们需要用这个,构建一个你自己的版本——或者 OpenAI 版本的——或者 Codex 代理应该只是让你的 Mac 不睡着。太有趣了。顺便说一句,Raycast 也是 Lenny's product pass 的一部分。一年了,Raycast。太厉害了。

Aishwarya Naresh RegantiLenny 没有告诉我们这些伙计。是的。这些实际上是我们最喜欢的产品。

Lenny Rachitsky这只是19个产品中的两个。不过没有 Caffeinate。我不知道那个是不是付费的。好的,继续。你在工作或生活中最喜欢的座右铭是什么?

Aishwarya Naresh Reganti对我来说,这是我爸小时候告诉我的,它一直留在我心里——他们说那不可能做到,但那个傻瓜不知道,所以他还是做了。我觉得要足够傻,相信如果你用心做任何事情你都能做到——尤其是现在——因为你有这么多数据在手,可能指向你可能会失败。有多少播客达到超过一千订阅?有多少公司达到超过一百万 ARR?总是有数据显示你不会成功,但有时候就傻一点,然后去做吧。

Lenny Rachitsky太棒了。是的。

Kiriti Badam对我来说,我更像一个想太多的人。所以我真的很喜欢乔布斯的那句语——你只能回顾过去连接那些点。所以很多时候有无数选择,你真的不知道哪个是最优的可以选,但人生以这样的方式运作——你实际上可以回顾然后说,"哦,这些实际上在我如何过渡的方面是美丽的。"所以我觉得在不断前进、不断实验方面极其有用。

Lenny Rachitsky最后一个问题。每当我同时请两位嘉宾来播客,我喜欢问这个问题。你欣赏对方的什么?

Aishwarya Naresh Reganti我觉得 Kiriti 他非常冷静、非常脚踏实地,他一直是我的智囊团。我可以向他抛出很多想法,他总是能预见到可能会出现的问题类型。他非常善良,让他的工作说话而不是说很多话,我想。但如果我选一个的话——我觉得他是最不可思议的丈夫。

Lenny Rachitsky揭晓了。没人知道。

Aishwarya Naresh Reganti我们结婚四年了,是我生命中最美好的四年。

Lenny Rachitsky哇。好的。你怎么接这个?

Kiriti Badam是的,接这个真的很难。我想说我在与硅谷伟大公司的真正聪明的人合作方面非常幸运。我觉得与 Aishwarya 与我合作过的任何其他聪明人相比,她有一个真正惊人的天赋——以一种非常易懂和容易理解的方式教学和解释事情。结合坚持——超级有用,尤其是在这个我们身处的快速移动的 AI 世界——有这么多新东西出现,感觉势不可挡,但当我听她谈论——这是你理解整个事情的方式,这是它插入的地方——我觉得,哦,那太简单了。我也能做到。所以她通过简化事情、以最易懂的方式解释事情,赋能了很多人。

所以我觉得那是一个不可思议的品质。

Lenny Rachitsky太厉害了。太甜了。我需要每次都这样。我需要更多嘉宾这样做。太棒了。好的。最后的问题。人们可以在哪里找到你们在做的事情,在网上找到你们,分享你们的课程链接,然后听众如何能帮到你们?

Aishwarya Naresh Reganti我在 LinkedIn 上写很多东西。所以如果你想听在泥地里工作过的 AI 产品实践者的声音,了解他们看到了什么,你可以关注我的工作。我们也有一个 GitHub 仓库,大约有2万星——那个仓库是关于学习 AI 的好资源。完全是免费的。如果你喜欢我们今天聊的,我们也运行一个超受欢迎的课程。我们留了一个链接在构建企业 AI 产品上。这个课程很多是关于——忘却心态——遵循问题优先的方法而不是工具优先或炒作优先的方法。所以你也可以去看看。如果你不想上那个课,我们写很多东西,我们给出很多免费资源,我们有免费场次,所以确保你关注我们的工作。

Kiriti Badam是的,我也要补充,你也可以在 LinkedIn 上找到我。我想我写得不多,但如果你在构建复杂产品,我真的非常兴奋想和你交谈。如果你有关于如何使用编码代理让生活更美好的想法,或者你看到的任何问题,我的 DM 总是开放的,我们可以进行很棒的讨论。

Lenny Rachitsky太棒了。好的。Kiriti 和 Ash,非常感谢你们来。

Kiriti Badam非常感谢你们。

Aishwarya Naresh Reganti谢谢你,Lenny。这太开心了。

Lenny Rachitsky太开心了。大家再见。

English Original transcript

Lenny RachitskyWe worked on a guest post together. They had this really key insight that building AI products is very different from building non-AI products.

Aishwarya Naresh RegantiMost people tend to ignore the non-determinism. You don't know how the user might behave with your product, and you also don't know how the LLM might respond to that. The second difference is the agency control trade-off. Every time you hand over decision-making capabilities to agentic systems, you're kind of relinquishing some amount of control on your end.

Lenny RachitskyThis significantly changes the way you should be building product.

Kiriti BadamSo we recommend building step-by-step. When you start small, it forces you to think about what is the problem that I'm going to solve. In all this advancements of the AI, one easy, slippery slope is to keep thinking about complexities of the solution and forget the problem that you're trying to solve.

Aishwarya Naresh RegantiIt's not about being the first company to have an agent among your competitors. It's about have you built the right flywheels in place so that you can improve over time.

Lenny RachitskyWhat kind of ways of working do you see in companies that build AI products successfully?

Aishwarya Naresh RegantiI used to work with the CEO of now Rackspace. He would have this block every day in the morning, which would say catching up with AI 4:00 to 6:00 AM. Leaders have to get back to being hands-on. You must be comfortable with the fact that your intuitions might not be right. And you probably are the dumbest person in the room and you want to learn from everyone.

Lenny RachitskyWhat do you think the next year of AI is going to look like?

Kiriti BadamPersistence is extremely valuable. Successful companies right now building in any new area, they are going through the pain of learning this, implementing this and understanding what works and what doesn't work. Pain is the new moat.

Aishwarya Naresh RegantiThank you, Lenny.

Kiriti BadamThank you for having us. Super excited for this.

Lenny RachitskyLet me set the stage for the conversation that we're going to have today. So you two have built a bunch of AI products yourself. You've gone deep with a lot of companies who have built AI products, have struggled to build AI products, build AI agents. You also teach a course on building AI products successfully and you're kind of on this mission to just reduce pain and suffering and failure that you constantly see people go through when they're building AI products. So to set a little just foundation for the conversation we're going to have, what are you seeing on the ground within companies trying to build AI products? What's going well? What's not going well?

Aishwarya Naresh RegantiI think 2025 has been significantly different than 2024. One, the skepticism has significantly reduced. There were tons of leaders last year who probably thought this would be yet another crypto wave and kind of skeptical to get started. And a lot of the use cases that I saw last year were more of slap chat on your data. And that was calling themselves an AI product. And this year, a ton of companies are really rethinking their user experiences and their workflows and all of that and really understanding that you need to deconstruct and reconstruct your processes in order to build successful AI products. And that's the good stuff. The bad stuff is the execution is still all over the place. Think of it. This is a three-year-old field. There are no playbooks, there are no textbooks. So you really need to figure out as you go. And the AI lifecycle, both pre-deployment and post-deployment is very different as compared to a traditional software lifecycle.

And so a lot of old contracts and handoffs between traditional roles, like say PMs and engineers and data folks has now been broken and people are really getting adapted to this new way of working together and kind of owning the same feedback loop in a way. Because previously, I feel like PMs and engineers and all of these folks had their own feedback loops to optimize. And now you need to be probably sitting in the same room. You're probably looking at agent traces together and deciding how your product should behave. So it's a tighter form of collaboration. So companies are still kind of figuring that out. That's kind of what I see in my consulting practice this year.

Lenny RachitskySo let me follow that thread. We worked on a guest post together that came out a few months ago. And the thing that stood out to me most that stuck with me most after working on that post is this really key insight that building AI products is very different from building non-AI products. And the thing that you're big on getting across is there's two very big differences. Talk about those two differences.

Aishwarya Naresh RegantiYes. And again, I want to make sure that we drive home the right point. There are tons of similarities of building AI systems and software systems as well, but then there are some things that kind of fundamentally change the way you build software systems versus AI systems. And one of them that most people tend to ignore is the non-determinism. You're pretty much working with a non-deterministic API as compared to traditional software. What does that mean and why does that have to affect us is in traditional software, you pretty much have a very well-mapped decision engine or workflow. Think of something like Booking.com. You have an intention that you want to make a booking in San Francisco for two nights, et cetera. The product has kind of been built so that your intention can be converted into a particular action and you kind of are clicking through a bunch of buttons, options, forms, and all of that, and you finally achieve your intention.

But now that layer in AI products has completely been replaced by a very fluid interface, which is mostly natural language, which means the user can literally come up with a ton of ways of saying or communicating their intentions. And that kind of changes a lot of things because now you don't know how your user's going to be here. That's on the input side. And the output is also that you're working with a non-deterministic probabilistic API, which is your LLM. And LLMs are pretty sensitive to prompt phrasings and they're pretty much black boxes. So you don't even know how the output surface will look like. So you don't know how the user might behave with your product, and you also don't know how the LLM might respond to that. So you're now working with an input, output, and a process. You don't understand all the three very well. You're trying to anticipate behavior and build for it.
And with agentic systems, this kind of gets even harder. And that's where we talk about the second difference, which is the agency control trade-off. What we mean by that, and I'm kind of shocked so many people don't talk about this. They're extremely obsessed with building autonomous systems, agents that can do work for you. But every time you hand over decision-making capabilities or autonomy to agentic systems, you're kind of relinquishing some amount of control on your end. And when you do that, you want to make sure that your agent has gained your trust or it is reliable enough that you can allow it to make decisions. And that's where we talk about this agency controlled trade-off, which is if you give your AI agent or your AI system, whatever it is, more agency, which is the ability to make decisions, you're also losing some control and you want to make sure that the agent or the AI system has earned that ability or has built up trust over time.

Lenny RachitskySo just to summarize what you're sharing here, essentially, people have been building product, software products for a long time. We're now in a world where the software you're building is one, non-deterministic, can just do things differently. As you said, you go to booking.com, you find a hotel, it's going to be the same experience every time. You'll see different hotels, but it's a predictable experience. With AI, you can't predict that it's going to be the exact same thing, the thing that you plan it to be every time. And then the other is there's this trade-off between agency and control. How much will the AI do for you versus how much should the person still be in charge? And what I'm hearing is the big point here is this significantly changes the way you should be building product. And we're going to talk about the impact on how the product development lifecycle should change as a result.

Is there anything else you want to add there before we get into that?

Kiriti BadamYeah, it's definitely one of the key points that this kind of distinction needs to exist in your mind when you're starting to build. For example, think about if your objective is to hike Half Dome in Yosemite. You don't start hiking it every day, but you start training yourself in minor parts and then you slowly improve and then you go to the end goal. I feel like that's extremely similar to what you want to build AI products in the sense that when you don't start with agents with all the tools and all the context that you have in the company in day one and expect it to work or even tinker at that level. You need to be deliberately starting in places where there is minimal impact and more human control so that you have a good grip of what are the current capabilities and what can I do with them and then slowly lean into the more agency and lesser control.

So this gives you that confidence that, okay, I can know that, okay, this is the particular problem that I'm facing and the AI can solve this extent of it. And then let me next think through what context I need to bring in, what kind of tools I need to add to this to improve the experience. So I feel like also it's a good and a bad thing in the sense that it's good that you don't have to see the complexity of the outside world of all of this fancy AI agents force and feel like I cannot do that. Everyone is starting from very minimalistic structures and then evolving. And the second part is the bad thing is that as you are trying to build this one click agents into your company, you don't have to be overwhelmed with this complexity. You can slowly graduate.
So that's extremely important. And we see this as a repeating pattern over and over.

Lenny RachitskyOkay. So let's actually follow that because that's a really important component of how you recommend people build AI stuff, AI products, AI agents, all the AI things. So give us an example of what you're talking about here, this idea of starting slow with agency and control and then moving up this rung.

Kiriti BadamYeah. For example, a very important or very prevalent application of AI agents is customer support. Imagine you are a company who has a lot of customer support tickets and why even imagine OpenAI is the exact same thing when we were launching products and there was a huge spike of support volume as we launched successful products like Image or GPT-5 and things like that. The kind of questions you get is different. The kind of problems that the customers bring to you is different. So it's not about just dumping all the list of help center articles that you have into the AI agent. You kind of understand what are the things that you can build. And so initially the first step of it would be something like you have your support agents, the human support agents, but you will be suggesting in terms of, okay, this is what the AI thinks that is the right thing to do.

And then you get that feedback loop from the humans that, okay, this is actually a good suggestion for me in this particular case and this is a bad suggestion. And then you can go back and understand, okay, this is what the drawbacks are or this is where the blind spots are, and then how do I fix that? And once you get that, you can increase the autonomy to say that, okay, I don't need to suggest to the human. I'll actually show the answer directly to the customer. And then we can actually add more complexity in terms of, okay, I was only answering questions based on health center articles, but now let me add new functionality. I can actually issue refunds to the customers. I can actually raise feature requests with the engineering team and all of these things. So if you start with all of this on day one, it's incredibly hard to control the complexity.
So we recommend building step by step and then increasing it.

Lenny RachitskyAwesome. And you have a visual actually that we'll share of what this looks like. But just to kind of mirror back what you're describing, this idea of start with high control, low agency, the example you gave is the support agents just kind of giving suggestions, is not able to do anything, the user is in charge. And then as that becomes useful and you are confident it's doing the right sort of work, you give it a little more agency and you kind of pull back on the control the user has. And then if that's starting to go well, then you give it more agency and the user needs less control to control it. Awesome.

Aishwarya Naresh RegantiI think the higher level idea here is with AI systems, it's all about behavior calibration. It's incredibly impossible to predict upfront how your system behaves. Now, what do you do about it? You make sure that you don't ruin your customer experience or your end user experience. You keep that as is, but then remove the amount of control that the human has. And there is no single right way of doing it. You can decide how to constrain that autonomy. I mean, a different example of how you could constrain autonomy is pre-authorization use cases. Insurance pre-authorization is a very ripe use case for AI because clinicians spend a lot of time pre-authorizing things like blood tests, MRIs and things like that. And there are some cases which are more of low hanging fruits. For instance, MRIs and block tests, because as soon as you know patient's information, it's easier to approve that and AI could do that versus something like an invasive surgery, et cetera, is more high risk. You don't want to be doing that autonomously.

So you can kind of determine which of these use cases should go through that human and the loop layer versus which of the use cases AI can conveniently handle. And then all through this process, you're also logging what the human is doing because you want to build a flywheel that you could use in order to improve your system. So you're essentially not showing the user experience, not eroding trust, at the same time logging what humans would otherwise do so that you can continuously improve your system.

Lenny RachitskySo let me give you a few more examples of this kind of progression that you recommend. And the reason I'm spending so much time here is this is a really key part of your recommendation to help people build more successful AI products. This idea of start slow with high control and low agency and then build up over time once you've built confidence that it's doing the right sort of work. So a few more examples that you shared in your post that I'll just read. So say you're building a coding assistant, V1 would be just suggest inline completion and boilerplate snippets. V2 would be generate larger blocks like tests or refactors for humans to review. And then V3 is just apply the changes and open PRs autonomously. And then another example is a marketing assistant. So V1 would be draft emails or social copy, just like here's what I would do.

V2 is build a multi-step campaign and run the campaign. And V3 is just launch it A/B tested auto-optimize campaigns across channels. Awesome. Yeah. And again, just to summarize where we're at, just to give people the advice we've shared so far. One is just important to understand AI products are different. They're non-deterministic. And you pointed out, and I forgot to actually mirror back this point, both on the input and the output. The user experience is non-deterministic.People will see different things, different outputs, different chat conversations, different maybe UI if it's designing the UI for you. And also the output obviously is going to be non-terministic. So that's a problem and a challenge. And then-

Aishwarya Naresh RegantiI mean, if you think of it's also the most beautiful part of AI, which is, I mean, we are all much more comfortable talking than following a bunch of buttons and all of that. So the bar to using AI products is much lower because you can be as natural as you would be with humans, but that's also the problem, which is there are tons of ways we communicate and you want to make sure that that intent is rightly communicated and the right actions are taken because most of your systems are deterministic and you want to achieve a deterministic outcome, but with non-deterministic technology and that's where it gets a little messy.

Lenny RachitskyAwesome. Okay. I love the optimistic version of why this is good. Okay. And then the other piece is this idea of this trade-off of autonomy versus control when you're designing a thing. And I imagine what you're seeing is people try to jump to the ideal, like the V3 immediately and that's when they get into trouble both. It's probably a lot harder to build that and it just doesn't work. And then they're just like, "Okay, this is a failure. What are we even doing?"

Kiriti BadamExactly. I feel there's a bunch of things that you actually have to get confidence in before you get to V3. And it's easy to get overwhelmed that, oh, my AI agent is doing these things wrong in a hundred different ways and you're not going to actually tabulate all of them and fix it. Even though you've learned how do you deal with the evaluation practices and stuff like that, if you're starting on the wrong spot, you are actually going to have a hard time correcting things from there. And when you start small and when you start with building a very minimalistic version with high human control and low agency, it also forces you to think about what is the problem that I'm going to solve. We use this term called problem first. And to me, it was obvious in the sense that that I do need to think about the problem, but it's incredible how well it resonates with the people that in all this advancements of the AI that we are seeing, one easy, slippery slope is to just keep thinking about complexities of the solution and forget the problem that you're trying to solve.

So when you're trying to start at a smaller scale of autonomy, you start to really think about what is the problem that I'm trying to solve and how do I break it down into levels of autonomy that I can build later? So that is incredibly useful and we keep repeating this part and over and over with everyone we talk to.

Lenny RachitskyAnd there's so many other benefits to limiting autonomy because there's just danger also of the thing doing too much for you and just messing up your, I don't know, your database, sending out all these emails you never expected. And there's like so many reasons this is a good idea.

Aishwarya Naresh RegantiYep. I recently read this paper from a bunch of folks at UC Berkeley. Basically Matei Zaharia, and the folks at Databricks and it said about 74% or 75% of the enterprises that they had spoken to, their biggest problem was reliability. And that's also why they weren't comfortable deploying products to their end users or building customer facing products because they just weren't sure or they just weren't comfortable doing that and exposing their users to a bunch of these risks. And that's also why they think a lot of AI products today have to do with productivity because it's much low autonomy versus end-to-end agents that would replace workflows. And yeah, I love their work otherwise as well, but I think that's very in line with what at least we are seeing at my startup as well.

Lenny RachitskyOkay. Very interesting. There's an episode that'll come out before this conversation where we go deep into another problem that this avoids, which is around prompt injection and jailbreaking and just how big of a risk that is for AI products where it's essentially an unsolved and unsolvable problem potentially. I'm not going to go down that track, but that's a pretty scary conversation we had that'll be out before this conversation.

Aishwarya Naresh RegantiI think that will be a huge problem once systems go mainstream. We're still so busy building AI products that we're not worried about security, but it will be such a huge problem to kind of, especially with this non-deterministic API again. So you're kind of stuck because there are tons of instructions that you could inject within your prompt and then it's going really bad.

Lenny RachitskyOkay. Let's actually spend a little time here because it's actually really interesting to me and no one's talking about this stuff, which is like the conversation we had is just it's pretty easy to get AI to do stuff it shouldn't do. And there's all these guardrail systems people put in place, but turns out these guardrails aren't actually very good and you can always get around them. And to your point, as agents become more autonomous and robots, it gets pretty scary that you could get AI to do things you shouldn't do.

Kiriti BadamI think this is definitely a problem, but I feel in the current spectrum of customers adopting AI, the extent to which companies can actually get advantage of AI or improve their processes or streamline the existing processes that they have, I feel it's still in the very early stage. 2025 has been an extremely busy year for AI agents and customers trying to adopt AI, but I feel the penetration is still not as much as you would actually get advantage out of it. So with the right sort of human in the loop points in here, I feel we can actually avoid a bunch of these things and focus more towards streamlining the processes. And I am more on the optimist side in the sense that you need to try and adopt this before actually trying to be only for highlighting the negative aspects of what could go wrong.

So I feel like strongly that companies has this adopt this, they definitely ... No company at OpenAI we talk to has never had been the case that, oh, AI cannot help me in this case. It has always been that, oh, there is this set of things that it can optimize for me and then let me see how I can adopt it.

Lenny RachitskySweet. I always like the optimistic perspective. I'm excited for you to listen to this and see what you think because it's really interesting. And to your point, there's a lot of things to focus on. It's one of many things to worry about and think about. Okay, let's get back on track here. So we've shared a bunch of pro-tips and important piece of advice. Let me ask, what other patterns and kind of ways of working do you see in companies that do this well and teams that build AI products successfully? And then just what are the most common pitfalls people fall into? So we could just maybe start with, what are other ways that companies do this well, build AI products successfully?

Aishwarya Naresh RegantiI almost think of it as like a success triangle with three dimensions that's never always technical. Every technology problem is a people problem first. And with companies that we have worked with, it's these three dimensions, like great leaders, good culture and technical prowess. With leaders itself, we work with a lot of companies for their AI transformation, training, strategy and stuff like that. And I feel like a lot of companies, the leaders have built intuitions over 10 or 15 years and they're kind of highly regarded for those intuitions. But now with AI in the picture, those intuitions will have to be relearned and leaders have to be vulnerable to do that. I used to work with the CEO of now Rackspace, Gagan. So he would have this block every day in the morning, which would say catching up with AI 4:00 to 6:00 AM, and he would not have any meetings or anything like that.

And that was just his time to pick up on the latest AI podcast or information and all of that. And he would have weekend vibe coding sessions and stuff like that. So I think leaders have to get back to being hands-on. And that's not because they have to be implementing these things, but more of rebuilding their intuitions because you must be comfortable with the fact that your intuitions might not be right and you probably are the dumbest person in the room and you want to learn from everyone. And that I've seen that being a very distinguishing factor of companies that build products which are successful because you're kind of bringing in that top-down approach. It's almost always impossible for it to be bottom-up. You can't have a bunch of engineers go and get buy-in from the leader if they just don't trust in the technology or if they have misaligned expectations about the technology.
I've heard from so many folks who are building that our leaders just don't understand the extent to which AI can solve a particular problem or they just vibe code something and assume it's easy to take it to production and you really need to understand the range of what AI can solve today so that you can guide decisions within the company. The second one is the culture itself. And again, I work with enterprises where AI is not their main thing and they need to bring in AI into their processes just because a competitor is doing it. And just because it does make sense because there are use cases that are very ripe. Then along the way, I feel a lot of companies have this culture of FOMO and you will be replaced and those kind of things and people get really afraid. Subject matter experts are such a huge part of building AI products that work because you really need to consult them to understand how your AI is behaving or what the ideal behavior should be.
But then I've spoken to a bunch of companies where the subject matter experts just don't want to talk to you because they think their job is being replaced. So I mean, again, this comes from the leader itself. You want to build a culture of empowerment, of augmenting AI into your own workflows so that you can 10X at what you're doing instead of saying that probably you'll be replaced if you don't adopt AI and stuff like that. So that kind of an empowering culture always helps. You want to make your entire organization be in it together and make AI work for you instead of trying to guard their own jobs, et cetera. And with AI, it's also true that it opens up a lot more opportunities than before. So you could have your employees doing a lot more things than before and 10x their productivity. And the third one is the technical part which we talk about.
I think folks that are successful are incredibly obsessed about understanding their workflows very well and augmenting parts that could be ripe for AI versus the ones that might need human in the loop somewhere, et cetera. Whenever you're trying to automate some part of a workflow, it's never the case that you could use an AI agent and that will solve your problems. It's always, you probably have a machine learning model that's going to do some part of the job. You have deterministic code doing some part of the job. So you really need to be obsessed with understanding that workflow so you can choose the right tool for the problem instead of being obsessed with the technology itself. And another pattern I see is also folks really understand this idea of working with a non-deterministic API, which is your LLM. And what that means is they also understand the AI development lifecycle looks very different and they iterate pretty quickly, which is can I build something iterate quickly in a way that it doesn't ruin my customer experience at the same time gives me enough amount of data so that I can estimate behavior.
So they build that flywheel very quickly. As of today, it's not about being the first company to have an agent among your competitors. It's about, have you built the right flywheels in place so that you can improve over time? When someone comes up to me and says, "We have this one click agent, it's going to be deployed in your system." And then in two or three days, it'll start showing you significant gains. I would almost be skeptical because it's just not possible. And that's not because the models aren't there, but because enterprise data and infrastructure is very messy and you need a bit to ... Even the agent needs a bit to understand how these systems work. There are very messy taxonomies everywhere. People tend to do things like get customer data, we want, get customer data, we do, and these kind of things. And all those functions exist and they're being called and basically there's a lot of tech debt that you need to deal with.
So most of the times, if you're obsessed with the problem itself and you understand your workflows very well, you will know how to improve your agents over time instead of just slapping an agent and assuming that it'll work from day one. I probably will go as far to say that if someone's selling you one click-agents, it's pure marketing. You don't want to buy into that. I would rather go with a company that says, "We're going to build this pipeline for you," and that will learn over time and build a flywheel to improve than something that's going to work out of the box. To replace any critical workflow or to build something that can give you significant ROI, it easily takes four to six months of work, even if you have the best data layer and infrastructure layer.

Lenny RachitskyAmazing. There's a lot there that resonates so deeply with other conversations I've been having on this podcast. One is just for a company to be successful at seeing a lot of impact from AI, the founder-CEO has to be deep into it. I had Dan Shipper on the podcast and they work with a bunch of companies helping them adopt AI. And he said that's the number one predictor of success. Is the CEO chatting with ChatGPT, Claude, whatever, many times a day. I love this example you gave with the Rackspace CEO has catch up on AI news in the morning every day. I was imagining he'd be chatting with the chatbot versus reading news.

Aishwarya Naresh RegantiWith the kind of information you have as of today, you could just ... I mean, you want to choose the right channels as well because everybody has an opinion. So whose opinion do you want to bank on? I feel like having that good quality set of people that you're listening to really makes sense. So he just has a list of two or three sources that he always looks at. And then he comes back with a bunch of questions and bounces it around with a bunch of AI experts to see what they think about it. And I was part of that group, so I kind of know-

Lenny RachitskyI love that.

Aishwarya Naresh Reganti... about the questions that he comes up with.

Lenny RachitskyThat's cool.

Aishwarya Naresh RegantiIt's pretty cool. I was like, "Why are you doing so much?" And then he says, "It trickles down into a bunch of decisions that we take."

Lenny RachitskyOkay. Let me talk about another topic that's very ... It's been a hot topic on this podcast. It was a hot topic on Twitter for a while, evals. A lot of people are obsessed with evals, think they're the solution to a lot of problems in AI. A lot of people think they're overrated that you don't need evals. You can just feel the vibes and you'll be all right. What's your take on evals? How far does that take people in solving a lot of the problems that you talk about?

Kiriti BadamIn terms of what is going on in the community, I feel there's just this false dichotomy of this either evals is going to solve everything or online monitoring or production monitoring is going to solve everything. And I find no reason to trust one of the extremes in the sense that I will entirely bank my application on this or like that to solve the thing. So if you take a step back, think of what are evals. Evals are basically your trusted product thinking or your knowledge about the product that is going into this set of data sets that you're going to build in the sense that this is what matters to me. This is the kind of problems that my agent should not do and let me build a list of datasets so that I'm going to do well on those. And in terms of production monitoring, what you're doing there is you're deploying your application and then you're having some sort of key metrics that actually communicate back to you on how customers are using your product.

You could be deploying any agent and if the customer is giving a thumbs up for your interaction, you better want to know that. So that is what production monitoring is going to do. And this production monitoring has existed for products for a long time, just that now with the AI agents, you need to be monitoring a lot more granularity. It's not just the customer always giving you explicit feedback, but there is many implicit feedback that you can get. For example, in ChatGPT, if you are liking the answer, you can actually give a thumbs up. Or if you don't like the answer, sometimes customers don't give you thumbs down, but actually regenerate the answer. So that is a clear indication that the initial answer that regenerator is not meeting the customer's expectation. So these are the kind of implicit signals you always need to think about.
And that spectrum has been increasing in terms of production monitoring. Now let's come back to the initial topic of like, okay, is it evals or is it production monitoring? What does it matter? So I feel, again, we go back to this problem first approach of what is it that you're trying to build. You're trying to build a reliable application for your customers that's not going to do a bad thing. It's always going to do the right thing. Or if it is doing a wrong thing, you're basically alerted very quickly. So I break this down into two parts. One is nobody goes into deploying an application without actually just testing that. This testing could be wipes or this testing could be, "Okay, I have this 10 questions that it should not go wrong no matter what changes I make, and let me build this and let's call this an evaluation dataset." Now, let's say you build this, you deployed this, and then you figured, "Okay, now I need to understand whether it's doing the right thing or not."
So if you're a high throughput or a high transaction customer, you cannot practically sit and evaluate all the traces. You need some indication to understand what are the things that I should look at. And this is where production monitoring comes into the picture that you cannot predict the base in which your agent could be doing wrong, but all of these other implicit signals and explicit signals, those are going to communicate back to you what are the traces that you need to look at. And that is where production monitoring helps. And once you get this kind of traces, you need to examine what are the failure patterns that you're seeing in these different types of interactions. And is there something that I really care about that should not happen? And if that kind of failure modes are happening, then I need to think about building an evaluation dataset for it.
And okay, let's say I built an evaluation dataset for my agent trying to offer refunds where explicitly I have configured it not to. So I built this evaluation dataset and then I made my changes in tools or prompts or whatever, and then I deployed the second version of the product. Now there is no guarantee that this is the only problem that you're going to see. You still need production monitoring to actually catch different kinds of problems that you might encounter. So I feel evals are important, production monitoring is important, but this notion of only one of them is going to solve things for you that is completely dismissible in my opinion.

Lenny RachitskyAll right. A very reasonable answer. And the point here isn't, it's not just as simple as do both. It's more that there are different things to catch and one approach won't catch all the things you need to be paying attention to.

Aishwarya Naresh RegantiExactly.

Lenny RachitskyAwesome.

Aishwarya Naresh RegantiI want to take two steps back and kind of talk about how much weight the term evals has had to take in the second half of 2025 because you go meet a data labeling company and they tell you our experts are writing evals and then you have all of these folks saying that PMs should be writing evals, they're the new PRDs. And then you have folks saying that evals is pretty much everything, which is the feedback loop you're supposed to be building to improve your products. Now, step back as a beginner and kind of think what are evals? Why is everyone saying evals? And these are actually different parts of the process and nobody is wrong in the sense that yes, these are evals, but when a data labeling company is telling you that our experts are writing evals, they're actually referring to error analysis or experts just leading notes on what should be right.

Lawyers and doctors write evals, that doesn't mean they're building LLM judges or they're building this entire feedback loop. And when you say that a PM should be writing evals, doesn't mean they have to write an LLM judge that's good enough for production. I think there are also very prescriptive ways of doing this and plus one to KD, which is you cannot predict upfront if you need to be building an LLM judge versus you need to be using implicit signals from production monitoring, et cetera. I think Martin Fowler at some point had this term called semantic diffusion back in the 2000s, which kind of means that someone comes up with a term, everybody starts butchering it with their own definitions and then you kind of lose the actual definition of it. That is what is happening to evals or agents or any word in AI as of today, everybody kind of sees a different side to it, I guess.
But if you make a bunch of practitioners sit together and ask them, "Is it important to build an actionable feedback loop for AI products?" I think all of them will agree. Now, how you do that really depends on your application itself. When you go to complex use cases, it's incredibly hard to build LLM judges because you see a lot of emerging patterns. If you built a judge that would test for verbosity or something like that, it turns out that you're seeing newer patterns that your LLM judge is not able to catch, and then you just end up building too many evals. And at that point, it just makes sense to look at your user signals, fix them, check if you have regressed and move on instead of actually building these judges. So it all depends. I think one statement that every ML practitioner will tell you is it really depends on the context. Don't be obsessed with prescriptions they're going to change.

Lenny RachitskyThat's such an important point, this idea that, especially that evals just means many things to different people now. It's just a term for so many things. And it's complicated to just talk about evals when you see it as the stuff data labeling companies are giving you and things PMR, right? And there's also benchmarks. People call benchmarks a little bit evals. It's like-

Aishwarya Naresh RegantiI recently spoke to a client who told me, "We do evals." And I was like, "Okay, can you show me your dataset?" And said, "No, we just checked LM Arena and Artificial Analysis. These are independent benchmarks and we know that this model is the right one for our use case." And I'm like, "You're not doing evals. That's not evals. Those are model evals."

Lenny RachitskyBut it makes sense. The word, it could be used in that context. I get why people think that, but yeah, now it's just confusing it even more.

Aishwarya Naresh RegantiYep.

Lenny RachitskyJust one more line of questioning here that I think that's on my mind is the reason this became kind of a big debate is Cloud Code. The head of Cloud Code, Boris, was like, "Nah, we don't do evals on Cloud Code. It's all vibes." What can you share, Kiriti, on Kodex and the Kodex team, how you approach evals?

Kiriti BadamSo Kodex, we have this balanced approach of you need to have evals and you need to definitely listen to your customers. And I think Alex has been on your podcast recently and he's been talking about how you're extremely focused on building the right product. And a big part of it is basically listening to your customers. And coding agents are extremely unique compared to agents for other domains in the sense that these are actually built for customizability and these are built for engineers. So coding agent is not a product which is going to solve these top five workflows or top six workflows or whatever. It's meant to be customizable in multi different ways. And the implication of that is that your product is going to be used in different integrations and different kinds of tools and different kinds of things. So it gets really hard to build an evaluation dataset for all kinds of interactions that your customers are going to use your product for.

With that said, you also need to understand that, okay, if I'm going to make a change, it's at least not going to damage something that is really core to the product. So we have evaluations for doing that, butt the same time we take extreme care on understanding how the customers are using it. For example, we built this code review product recently and it has been gaining extreme amount of traction. And I feel like many, many bugs in OpenAI as well as even our external customers are getting caught with this. And now let's say if I'm making a model change to the code review or a different kinds of RL mechanism that I trained with it, and now if I'm going to deploy it, I definitely do want A/B test and identify whether it's actually finding the right mistakes and how are users reacting to it? And sometimes if users do get annoyed by your incorrect code regis, they go to the extent of just switching off the product.
So those are the signals that you want to look at and make sure that your new changes are doing the right thing. And it's extremely hard for us to think of these kind of scenarios beforehand and develop evaluation data sets for it. So I feel like there's a bit of both. There's a lot of vibes and there's a lot of customer feedback and we are super active on the social media to understand if anybody's having certain types of problems and quickly fix that. So I feel it's a ... How do I put this? It's like a domain of things that you do here.

Lenny RachitskyThat makes so much sense. Okay. What I'm hearing, Codex, pro evals, but it's not enough.

Kiriti BadamYes.

Lenny RachitskyBut also just watch customer behavior and feedback. And also there's some vibes just like, is this feeling good? As I'm using it, generating great code that I'm excited about that I think is great.

Kiriti BadamI don't think if anybody's coming and seeing that I have this concrete set of evals that I can bet my life on and then I don't need to think about anything else, it's not going to work. And every new model that you're going to launch, we get together as a team and test different things. Each person is concentrating on something else. And we have this list of hard problems that we have and we throw that to the model and see how well they're progressing. So it's like custom evals for each engineer, you would say, and just understand what the product is doing in its new model.

Lenny RachitskyIf you're a founder, the hardest part of starting a company isn't having the idea, it's scaling the business without getting buried in back office work. That's where Brex comes in. Brex is the intelligent finance platform for founders. With Brex, you get high limit corporate cards, easy banking, high yield treasury, plus a team of AI agents that handle manual finance tasks for you. They'll do all the stuff that you don't want to do, like file your expenses, scour transactions for waste, and run reports all according to your rules. With Brex's AI agents, you can move faster while staying in full control. One in three startups in the United States already runs on Brex. You can too at brex.com.

We've been talking for almost an hour already, and we haven't even covered your extremely powerful software development workflow for building AI products that you two developed that you teach in your course, that you basically combined all the stuff we've been talking about into a step-by-step approach to building AI products. You call it the continuous calibration, continuous development framework. Let's pull up a visual to show people what the heck we're talking about, and then just walk us through what this is, how this works, how teams can shift the way they build their AI products to this approach to help them avoid a lot of pain and suffering.
And we had to shut down the product because we were doing so many hot fixes and there was no way we could count all the emerging problems that were coming up. And there's also quite some news online. Recently, I think Air Canada had this thing where one of their agents predicted or hallucinated a policy for a refund, which was not part of their original playbook, and they had to go by it because legal stuff. And there have been a ton of really scary incidents. And that's where the idea comes from. How can you build so that you don't lose customer trust and you don't end up, or your agent or AI system doesn't end up making decisions that are super dangerous to the company itself. At the same time, build a flywheel so that you can improve your product as you go. And that's where we came up with this idea of continuous calibration, continuous development.
The idea is pretty simple, which is we have this right side of the loop, which is continuous development, where you scope capability and curate data, essentially get a data set of what your expected inputs are and what your expected outputs should be looking at. This is a very good exercise before you start building any AI product because many times you figure out that a lot of the folks within the team are just not aligned on how the product should behave. And that's where your PMs can really give in a lot more information and your subject matter experts as well. So you have this data set that you know your AI product should be doing really well on. It's not comprehensive, but it lets you get started. And then you set up the application and then design the right kind of evaluation metrics. And I intentionally use the term evaluation metrics, although we say evals because I just want to be very specific in what it is because evaluation is a process, evaluation metrics are dimensions that you want to focus on during the process.
And then you go about deploying, run your evaluation metrics. And the second part is the continuous calibration, which is the part where you understand what behavior you hadn't expected in the beginning, right? Because when you start the development process, you have this data set that you're optimizing for, but more often than not, you realize that that data set is not comprehensive enough because users start behaving with your systems in ways that you did not predict. And that's where you want to do the calibration piece. I've deployed my system. Now I see that there are patterns that I did not really expect and your evaluation metrics should give you some insight into those patterns, but sometimes you figure out that those metrics were also not enough and you probably have new error patterns that you have not thought about. And that's where you analyze your behavior, spot error patterns.
You apply fixes for issues that you see, but you also design newer evaluation metrics to figure out that they are emerging patterns. And that doesn't mean you should always design evaluation metrics. There are some errors that you can just fix and not really come back to because they're very spot errors. For instance, there's a tool calling error just because your tool wasn't defined well and stuff like that. You can just fix it and move on. And this is pretty much how an AI product lifecycle would look like. But what we specifically also mention is while you're going through these iterations, try to think of lower agency iterations in the beginning and higher control iterations. What that means is constrain the number of decisions your AI systems can make and make sure that they're humans in the loop and then increase that over time because you're kind of building a flywheel of behavior and you're understanding what kind of use cases are coming in or how your users are using the system.
Most of the times the taxonomies are incredibly messy. I have worked in use cases where you probably have taxonomy that says some kind of hierarchy and then that says shoes and then women's shoes and men's shoes all at the same layer where idea you should be having shoes and then women's shoes and men's shoes should be subclasses. And then you're like, okay, fine. I could just merge that. And you go further and you see that there's also another section on the shoes that says for women and for men, and it's just not aggregated. It's not fixed for some reason. So if an agent kind of sees this kind of a taxonomy, what is it supposed to do? Where is it supposed to route? And a lot of the times we are not aware of these problems until you actually go about building something and understanding it.
And when these kind of problems, real human agents see these kind of problems, they know what to check next. Maybe they realize that the node that says for women and for men that's under shoes was last updated in 2019, which means that it's just a dead node that's lying there and not being used. So they kind of know that, okay, we're supposed to be looking at a different node and stuff like that. And I'm not saying agents cannot understand this or models are not capable enough to understand this, but there are really weird rules within enterprises that are not documented anywhere. And you want to make sure that the agents have all of that context instead of just throwing the problem at that.
Yeah. Coming back to the versions we had, routing was one where you have really high control because even if your agent routes to the wrong department, humans can take control and undo those actions. And along the way, you also figure out that you probably are dealing with a ton of data issues that you need to fix and make sure that your data layer is good enough for the agent to function. We do is what we said of a Copilot, which is now that you've figured out routing works fine after a few iterations and you've fixed all of your data issues, you could go to the next step, which is, can my agent provide suggestions based on some standard operating procedures that we have for the customer support agent? And it could just generate a draft that the human can make changes to. And when you do this, you're also logging human behavior, which means that how much of this draft was used by the customer support agent or what was omitted. So you're actually getting error analysis for free when you do this because you're literally logging everything that the user is doing that you could then build back into your flywheel.
And then we say, post that, once you've figured out that those drafts look good and most of the times maybe humans are not making too many changes, they're using these drafts as is. That's when you want to go to your end-to-end resolution assistant that could draft a resolution that could solve the ticket as well. And those are the stages of agency where you start with low agency and then you go up high. We also have this really nice table that we put together, which is what do you do at each version and what you learn that can enable you to go to the next step and what information do you get that you can feed into the loop, right? When you're just doing your routing, you have better quality routing data, you also know what kind of prompts you need to be building to improve the routing system.
Essentially, you're figuring out your structure for context engineering and building that flywheel that you want. And while I go through this, I want to also be very clear that two things. One is when you build with CCCD in mind, it doesn't mean that you've fixed the problem all for one. It's possible that you've probably gone through V3 and you see a new distribution of data that you never previously imagined, but this is just one way to lower your risk, which is you get enough information about how users behave with your system before going to a point of complete autonomy. And the second thing is you're also kind of building this implicit logging system. A lot of people come and tell us that, "Oh, wait, there are evals. Why do you need something like this? " The issue with just building a bunch of evaluation metrics and then having them in production is evaluation metrics catch only the errors that you're already aware of, but there can be a lot of emerging patterns that you understand only after you put things in production.
So for those emerging patterns, you're kind of creating a low risk kind of a framework so that you could understand user behavior and not really be in a position where there are tons of errors and you're trying to fix all of them at once. And this is not the only way to do it. There are tons of different ways. You want to decide how you constrain your autonomy. It could be based on the number of actions that the agent is taking, which is what we do in this example. It could be based on topic. There's just some domains where it's pretty high risk to make a system completely autonomous for certain decisions, but for some other topics, it's okay to make them completely autonomous and depending on the complexity of the problem. And that's where you really want your product managers, your engineers and subject matter experts to align on how to build this system and continuously improve it.
The idea is just behavior calibration and not losing user trust as you do that behavior calibration, I guess.

Lenny RachitskyWe'll link folks to this actual post if they want to go really deep. You basically go through all of these steps by step, a bunch of examples. And the idea here is, as you said, that the reason, everything about what you're describing here is about making it continuous and iterative and kind of moving along this progression of higher autonomy, less control. And this idea of even calling continuous calibration, continuous development is communicating it's this kind of iterative process. And just to be clear, this naming is kind of ode to CI/CD, continuous integration, continuous deployment suite. And the idea here is that this is the version of that for AI where instead of just integrating into unit tests and deploying constantly, it's running evals, looking at results, iterating on the metrics you're watching, figuring out where it's breaking and iterating on that. Awesome. Okay.

So again, we'll point people to this post if they want to go deeper. That was a great overview. Is there anything else before we go into different topic around this framework specifically that you think is important for people to know?

Aishwarya Naresh RegantiI think one of the most common questions we get is, how do I know if I need to go to the next stage or if this is calibrated enough? There's not really a rule book you can follow, but it's all about minimizing surprise, which means let's say you're calibrating every one or two days and you figure out that you're not seeing new data distribution patterns, your users have been pretty consistent with how they're behaving with the system. Then the amount of information you gain is kind of very low and that's when you know you can actually go to the next stage. And it's all about the wipes at that point, do you know you're ready, you're not receiving any new information. But also it really helps to understand that sometimes there are events that could completely mess up the calibration of your system. An example is GPT-4o doesn't exist anymore, or it's going to be deprecated in APIs as well.

So most companies that were using 4o should switch to 5 and 5 has very different properties. So that's where your calibration's off again. You want to go back and do this process again. Sometimes users start behaving with systems also differently over time or user behavior evolves. Even with consumer products, you don't talk to ChatGPT the same way you were talking, say, two years ago, just because you know the capabilities have increased so much. And also just people get excited when these systems can solve one task, they want to try it out on other tasks as well. We built this system for underwriters at some point. Underwriting is a painful task. There are agreements that are like loan applications are like 30 or 40 pages, and the idea for this bank was to build a system that could help underwriters pick policies and information about the bank so that they could approve loans.
And for a good three or four months, everybody was pretty impressed with the system. We had underwriters actually report gains in terms of how much time they were spending, et cetera. And first three months, we realized that they were so excited with the product that they started asking very deep questions that we never anticipated. They would just throw the entire application document at the system and go, "For a case that looks like this, what did previous underwriters do? " And for a user, that just seems like a natural extension of what they were doing, but the building behind it should significantly change. Now, you need to understand what does for a case like this mean in the context of the loan itself? Is it referring to people of a particular income range or is it referring to people in a particular geo and stuff like that?
And then you need to pick up historical documents, analyze those documents, and then tell them, "Okay, this is what it looks like," versus just saying that there's a policy X, Y, and Z, and you want to look up that policy. So something that might seem very natural to an end user might be very hard to build as a product builder, and you see that user behavior also evolves over time, and that's when you know that you want to go back and recalibrate.

Lenny RachitskyWhat do you think is overhyped in the AI space right now? And even more importantly, what do you think is under-hyped?

Kiriti BadamAs I said, super optimistic in different things that are going in AI. So I wouldn't say overhyped, but I feel kind of misunderstood is the concept of multi-agents. People have this notion of, "I have this incredibly complex problem. Now I'm going to break it down into, hey, you are this agent. Take care of this. You're this agent. Take care of this." And now if I somehow connect all of these agents, they think they're the agent utopia and it's never the case that there are incredibly successful multi-agent systems that are built. There's no doubt about that. But I feel a lot of it comes in terms of how are you limiting the ways in which the system can go off tracks. And for example, if you're building a supervisor agent and there are subagents that actually do the work for the super agent, supervisor agent, that is a very successful pattern.

But coming with this notion of I'm going to divide the responsibilities based on functionality and somehow expect all of that to work together in some sort of gossip protocol, that is extremely misunderstood that you could do that. I don't think current ways of building and current model capabilities are right there in terms of building those kind of applications. I feel that is kind of misunderstood than overrated. Underrated, I feel it's hard to probably believe, but I still feel coding agents are underrated in the sense that I feel like you can go on Twitter and you can go on Reddit and you see a lot of chatter about coding agents, but talking to an engineer in any random company, especially outside of Bay Area, you can see the amount of impact this coding agents can create and the penetration is very low. So I feel like 2025 and 2026 is going to be an incredible year for optimizing all of these processes.
And I feel that is going to be creating a lot of value with AI.

Lenny RachitskyThat's really interesting on that first point. So the idea there is you'll probably be more successful building and using an agent that is able to do its own sub-agent splitting of work versus a bunch of, say, Codex agents. Will you do this task, you do that task?

Kiriti BadamYou can have agents to do these things and you as a human can orchestrate it or you can have one larger agent that is going to orchestrate all of these things, but letting the agents communicate in terms of peer-to-peer kind of protocol, and then especially doing this in a customer support kind of use case is incredibly hard to control what kind of agent is replying to your customer because you need to shift your guardrails everywhere and things like that.

Lenny RachitskyYeah. Okay. Great picks. Okay. Ash, what do you got?

Aishwarya Naresh RegantiCan I say evals? Will I be canceled?

Lenny RachitskyIn which category? Which bucket do they go?

Aishwarya Naresh RegantiOverrated.

Lenny RachitskyOverrated. Okay, go for it. We won't let you get canceled.

Aishwarya Naresh RegantiJust kidding. I think evals are misunderstood. They are important, folks. I'm not saying they're not important, but I think just this, I'm going to keep jumping across tools and going to pick up and learn if new tool is overrated. I still am old school and feel like you would really need to be obsessed with the business problem you're trying to solve. AI is only a tool. I try to think of it that way. Of course, you need to be learning about the latest and greatest, but don't be so obsessed with just building so quickly. Building is really cheap today. Design is more expensive, really thinking about your product, what you're going to build. Is it going to really solve a pain point? Is what is way more valuable today? And it will only become more true in the near future. So really obsessing about your problem and design is underrated and just rote building is overrated, I guess.

Lenny RachitskyAwesome. Okay. Similar sort of question. From a product point of view, what do you think the next year of AI is going to look like? Give us a vision of where you think things are going to go by, say by the end of 2026.

Kiriti BadamYeah, I feel there's a lot of promise in terms of this background agents are proactive agents who is ... They're going to basically understand your workflow even more. If you think of where is AI failing to create value today, it's mainly about not understanding the context. And the reason that it's not understanding the context is it's not plugged into the right places where actual work is happening. And as you do more of this, you can give the agent more of context and then it start to see the world around you and understand what are the set of metrics that you're optimizing for or what are the kind of activities that you're trying to do. It is a very easy extension from there to actually gain more out of it and then let the agent prompt you back. We already do this in terms of ChatGPT pulse, which kind of gives you this daily update of things you might care about.

And it's very nice to actually have that jog your brain up in terms of, "Oh, this is something that I haven't thought about. Maybe this is good." And now when you extend this to more complex tasks, like a coding agent, which says that, "Okay, I have fixed five of your linear tickets and here are the patches. Just to review them at the start of your day." So I feel that is going to be extremely useful. And I see that as a strong direction in which products are going to build in 2026.

Lenny RachitskyThat's so cool. So essentially agents anticipating what you want to do and getting ahead of you and I've solved these problems for you or I think this is going to crash your site. Maybe you should fix this thing right here or I see the spike here and let's refactor our database. Amazing. What a world. Okay. Ash, what do you got?

Aishwarya Naresh RegantiI'm all in for multimodal experiences in 2026. I think we have done quite some progress in 2025, and not just in terms of generation, but also understanding. Until now, I think LLMs have been our most commonly used modules, but as humans, we are multimodal creatures, I would say. Language is probably one of our last forms of evolution. As the three of us are talking, I think we're constantly getting so many signals. I'm like, "Oh, Lenny's nodding his head, so probably I would go in this direction or Lenny's bored, so let me stop talking." So there's a chain of thought behind your chain of thought and you're constantly altering it with language that dimension of expression is not explored as well. So if we could build better multimodal experiences that would get us closer to human-like conversation richness. And you will also, just given the kind of models, there's a bunch of boring tasks as well, which are ripe for AI.

If multimodal understanding gets better, there are so many handwritten documents and really messy PDFs that cannot be passed even by the best of the models as of today. And if it's possible, there'll be so much data that we can tap into.

Lenny RachitskyAwesome. I just saw Demis from DeepMind, AI, Google, whatever they call the whole org, talking about this where he thinks that's going to be a big part of where they're going, combining the image model work, the LLM, and also their world model stuff, Genie, I think is what it's called. Yes. So that's going to be a wild, wild time. Okay. Last question. If someone wants to just get better at building AI products, what's just maybe one skill or maybe two skills that you think they should lean into and develop?

Aishwarya Naresh RegantiI think we did cover a bunch of best practices for AI products, which is start small, try to get your iteration going well and build a flywheel and all of that. But again, if you kind of look at it at a 10,000 feet level for anybody building today, like I was saying, implementation is going to be ridiculously cheap in the next few years. So really nail down your design, your judgment, your taste and all of that. And in general, if you're building a career as well, I feel for the past few years, your former years, say the first two, three years of building your career is always focused on execution, mechanics and all of that. And now we have AI that could help you ramp pretty quickly and post that. I mean, after a few years, I think everybody's job becomes about your taste, your judgment and kind of what is uniquely you.

I think nail down on that part and try to figure out how you can bring in that kind of a perspective. It doesn't have to mean that you should be significantly old, have years of experience. We recently hired someone and we use this very popular app for tracking our tasks and we've been using it for years and we pay a high subscription fee for it. And this guy just came with his own vibe coded app to the meeting. He onboarded us to all of it and he's like, "Okay, let's start using this." And I think that kind of agency and that kind of ownership to really rethink experiences is what will set people apart. And I'm not being blind to the fact that vibe coded apps have high maintenance costs. And maybe as we scale as a company, we have to replace it or we have to think of better approaches.
But given that we are a small size company now and just ... I was really shocked because I never thought of it. If you've been used to working in a certain way, you associate a cost with building. And I feel like folks who grew up in this age have a much lower cost associated in their mind. They just don't mind building something and going ahead with it. And they're also very enthusiastic to try out new tools. That's also probably why AI products have this retention problem because everybody's so excited about trying out these new tools and all of that. But essentially having the agency and ownership, and I think it's also the going to be the end of the busy work era. You can't be sitting in a corner doing something that doesn't move the needle for a company. You really need to be thinking about end-to-end workflows, how you can bring in more impact.
I think all of that will be super important.

Lenny RachitskyThat reminds me, I just had Jason Lemkit on the podcast. He's very smart on sales, go to market, run Saster, and he replaced his whole sales team with agents. He had 10 salespeople and then he was 1.2 and 20 agents. And one of the agents, it was just tracking everyone's updates to Salesforce and kind of updating it automatically for them based on their calls. And one of the salespeople was like, "Okay, I quit." And it turned out he wasn't really doing anything. He was just sitting around and he's like, "Okay, this will catch me. I got to get out of here. So to your point about, it'll be harder to sit around and twiddle your thumbs, I think is really right.

Kiriti BadamYeah. I think to add on to that, I feel like persistence is also something that is extremely valuable, especially given that anybody who wants to build something, the information is at your fingertips even more than the past decade. You can learn anything overnight and become that sort of Ironman kind of approach. So I feel like having that persistence and going through the pain of learning this, implementing this and understanding what works and what doesn't work. And as you are going through this pain of developing multiple approaches and then solving the problem, I feel that is going to be the real moat as an individual. I like to call it pain is the new moat, but I feel that is exactly super useful to actually have this in, especially in building these AI products.

Lenny RachitskySay more about this. I love this concept. Pain is the new moat. Is there more there?

Kiriti BadamYeah, I feel as a company, I mean, successful companies right now building in any new area, they are successful not because they're first to the market or they have this fancy feature that more customers are liking it. They went through the pain of understanding what are the set of non-negotiable things and trade them off exactly with what are the features or what are the model capabilities that they can use to solve that problem. This is not a straightforward process. There's no textbook to do this or there's no straightforward way or a known credit path to be here. So a lot of this pain I was talking about is just going through this iteration of like, "Okay, let's try this and if this doesn't work, let's try this." And that kind of knowledge that you built across the organization or across your own lived experiences, I feel that pain is what translates into the moat of the company. This could be a product of evals or something that you built. And I feel that is going to be the game changer.

Lenny RachitskyThat is awesome. It's like turning a coal into diamond.

Kiriti BadamYes.

Lenny RachitskyOkay. I feel like we've done a great job helping people avoid some of the biggest issues people consistently run into building AI products. We covered so many of the pitfalls and the ways to actually do it correctly. Before we get to our very exciting lightning round, is there anything else that you wanted to share? Anything else you want to leave listeners with?

Aishwarya Naresh RegantiBe obsessed with your customers. Be obsessed with the problem. AI is just a tool and try to make sure that you're really understanding your workflows. 80% of so called AI engineers, AIPMs spend their time actually understanding their workflows very well. They're not building the fanciest and the most cool models or workflows around it. They're actually in the weeds understanding their customer's behavior and data. And whenever a software engineer who's never done AI before, here's the term, look at your data. I think it's a huge revelation to them, but it's always been the case. You need to go there, look at your data, understand your users, and that's going to be a huge differentiator.

Lenny RachitskyThat's a great way to close it. The AI isn't the answer. It's a tool to solve the problem. With that, we have reached our very exciting lightning round. I've got five questions for both of you. Are you ready?

Aishwarya Naresh RegantiYay. Yes.

Lenny RachitskyAll right. So you can both answer them. You can pick one which you want to answer. Either way, up to you. What are two or three books you find yourself recommending most to other people?

Aishwarya Naresh RegantiFor me, it's this book called When Breath Becomes Air, Lenny. It was written by Paul Kalanithi. I think he was an Indian original neurosurgeon who was diagnosed with lung cancer at 31 or 32. And the whole book is his memoir and just is written after he was diagnosed. And it's really beautiful, especially because I read it during COVID and all we ever wanted to do during COVID is stay alive. There are a bunch of really nice quotes within the book as well, but I remember one of them, he was kind of arguing against a very popular quote by Socrates, which is, "The unexamined life is not worth living," or something like that, which means you really need to be thinking about your choices, you need to understand your values, your mission and all of that. And Paul says, "If the unexamined life is not worth living, was the unlived life worth examining?" Which means are you spending so much time just understanding your mission and purpose that you've forgotten to live?

And I think everybody who's staying in the AI era and building and continuously going through the space of reinventing themselves need to take a pause and live for a bit, I guess. They need to stop evaling life too much.

Lenny RachitskyI was going to say that. That's where my mind went. You got to write some evals for your life. Oh my God, we've gone too far.

Aishwarya Naresh RegantiYep. Yeah.

Lenny RachitskyBeautiful.

Aishwarya Naresh RegantiThat's my favorite book.

Kiriti BadamI like more of science fiction books. So I really like this 3 Body problem series. It's like a three book series. It has elements of grander than science fiction, life outside earth and how it impacts human decision making process. And it also has elements of geopolitics and how much important or valuable abstract science is to human progress. And then when that gets stopped, it's not noticeable in everyday life, but it can cause devastating effects. So I feel like AI helping in these areas, for example, is going to be extremely crucial. And that book is a nice example of what could happen otherwise.

Lenny RachitskyCompletely agree. Absolutely. Love. Might be my favorite sci-fi book except, or series even, and it's three. I have to read of all three, by the way. I find that it only got really good about one and a half books in. So if anyone's tried it and like, "What the heck is going on here?" Just keep reading and get to the middle of the second one and then it gets mind-blowing.

Kiriti BadamYes.
Thank you.

Lenny RachitskyThere you go. I'm giving you one back. Okay, next question. What's a favorite recent movie or TV show that you've really enjoyed?

Aishwarya Naresh RegantiI started rewatching Silicon Valley and I think it's so true. It's so timeless. Everything is repeating all over again. Anybody who's watched it a few years ago should start rewatching it and you'll see that it's eerily similar to everything that's happening right now with the AI wave.

Lenny RachitskyThat's a good idea to rewatch it. I love that their whole business was like an algorithm to compress, like a compression algorithm. It's like maybe a precursor to LLMs in some small way. No, I get it. All right, Kiriti, what you got?

Kiriti BadamI'm going to drag this and say lot a movie or a TV show, but there's this game I picked up recently called Expedition 33. It has nothing to do with AI, but it's an incredibly well-made game in terms of the gameplay or the movie and the story and the music. It's been amazing.

Lenny RachitskyI love that you have time to play games. That's a great sign. I love that. Someone OpenAI, I'm just imagining you're ... There's nothing else going on except just coding and having meetings.

Kiriti BadamYeah, it has been incredibly hard to find time for that.

Lenny RachitskyThat's good. That's a good sign. I'm happy to hear this. Okay. What's a favorite product that you've recently discovered that you really love?

Aishwarya Naresh RegantiFor me, it's Whisper Flow. I think I've been using it quite a bit and I didn't know I needed it so much. The best part is it's a conceptual transcription tool, which means if you go to Codex and start using Whisper Flow, it starts identifying variables and all of that. And it's so seamless in terms of transcription to instruction. You could say something like, "I'm so excited today. Add three exclamation marks," and it seamlessly switches. It adds those three exclamation marks instead of writing add three exclamation marks. And I think it's pretty cool. If you're not using it, you should try it.
That's how I got access to it, Lenny.

Lenny RachitskyThere we go. I think I pitched this deal. I think people don't truly understand how incredible this is. They're like, "No way this is real. It's real." And 18 other products, lennysproductpass.com, check it out. Moving on. Kiriti.

Kiriti BadamAwesome. I actually am a stickler for productivity. I keep experimenting new CLI tools and things which can make me faster. So I feel like a Raycast has been amazing. I've discovered all this new shortcuts that you can use to open different things, type in shortcut commands and things like that. And Caffeinate is another thing that I've recently discovered from my teammates. It helps you prevent Mac from sleeping so you can run this really long Codex task for four or five hours locally, let it build the thing and then you can wake up and be like, "Okay, this is good. I like this."

Lenny RachitskyThat's hilarious, that combo. Codex and Caffeinate. You guys need to use it, build that yourself, an OpenAI version of that, or the Codex agent should just keep your Mac from sleeping. That's so funny. By the way, Raycast, also part of Lenny's product pass. One year for your Raycast. Amazing. Yeah.

Aishwarya Naresh RegantiLenny didn't tell us these folks. Yes. These are actually our favorite products.

Lenny RachitskyThese are just two of 19 products. No Caffeinate though. I don't know if that's even paid. Okay, let's keep going. Do you have a favorite life motto that you find yourself coming back to in work or in life?

Aishwarya Naresh RegantiFor me, I think this is one my dad told me when I was a kid and it's always stuck, which is they told it couldn't be done, but the fool didn't know it, so he did it anyway. I think be foolish enough to believe that you can do anything if you put your heart to it, especially now because you have so much data at your hand that could be pointing towards the fact that you probably will be unsuccessful. How many podcasts made it to more than a thousand subscribers or how many companies hit more than one million ARR? And there's always data to show you that you won't be successful, but sometimes just be foolish and go ahead with it.

Lenny RachitskyThat's great. Yeah.

Kiriti BadamFor me, I am more of an overthinker. So I really like this quote from Steve Jobs that you can only connect the dots looking backwards. So a lot of the times there are numerous choices and you don't really know the optimal one to pick, but life works in ways that you can actually see back and be like, "Oh, these are actually beautiful in terms of how our transition." So I feel like that is extremely useful in keep moving forward, keep experimenting.

Lenny RachitskyFinal question. Whenever I have two guests on the podcast at once, I like to ask this question. What's something that you admire about the other person?

Aishwarya Naresh RegantiI think with Kiriti, he's pretty calm and very grounded and he's always been my sounding board. I can throw a ton of ideas at him and he always comes up with, he's able to anticipate the kind of issues that might land into. And he's extremely kind and lets his work speak instead of actually doing a lot of talking, I guess. But if I had to pick one, I think he's the most incredible husband.

Lenny RachitskyReveal. Little did people know.

Aishwarya Naresh RegantiWe've been married for four years and been the most beautiful four years of my life.

Lenny RachitskyWow. Okay. How do you follow that?

Kiriti BadamYeah, it's super hard to follow that. I would say I am extremely privileged in terms of working with really smart people in great companies in the Silicon Valley. And I feel the unique thing that stands with Aishwarya across like any other smart folks I've worked on is she has this really amazing knack of teaching and explaining something in a very understandable and easy to comprehend way. And that combined with persistence is super useful, especially in this fast-moving AI world that we are in the sense that there's so many new things coming up. It feels overwhelming, but when I hear her talk about, this is how you make sense of this entire thing, this is where it plugs in. I feel like, oh, that is so simple. I can also do that. So she empowers a lot of people by simplifying things and explaining things in the most understandable way.

So I feel that is an incredible quality.

Lenny RachitskyAmazing. How sweet. I got to do this all the time. I need more guest to do it. That was great. Okay. Final questions. Where can folks find stuff that you're working on, find you online, share your course link, and then just how can listeners be useful to you?

Aishwarya Naresh RegantiI write a lot on LinkedIn. So if you want to listen to pragmatists who've been in the weeds, working on AI products and what they're seeing, you can follow my work. We also have a GitHub repository with about 20K stars, and that repository is all about good resources for learning AI. It's completely free. And if you like what we spoke today, we also run a super popular course. We leave a link to it on building enterprise AI products. And the course is a lot about unlearning mindsets and following a problem-first approach instead of a tool-first or a hype-first approach. So you can check that out as well. And if you don't want to do the course, we write a lot, we give out a lot of free resources, we have free sessions, so make sure you follow our work.

Kiriti BadamYeah, I would also add that you can also find me on LinkedIn. I don't write a lot, I guess, but I'm super all excited to just talk to any complex product that you're building. And if you have thoughts on how you can use coding agents to make your life better or however the problems that you're seeing, always my DMs are open and we can have a great discussion.

Lenny RachitskyAwesome. Well, Kiriti and Ash, thank you so much for being here.

Kiriti BadamThank you so much.

Aishwarya Naresh RegantiThank you, Lenny. This was so much fun.

Lenny RachitskySo much fun. Bye, everyone.

Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.