【Hacker News搬运】启动HN:Codebuff（YC F24）–为您编写代码的CLI工具

hackernews

Title: Launch HN: Codebuff (YC F24) – CLI tool that writes code for you

启动HN:Codebuff（YC F24）–为您编写代码的CLI工具

Text: Hey HN! We’re James and Brandon building Codebuff (<a href="https://codebuff.com">https://codebuff.com</a>). Codebuff is like Cursor Composer, but in your terminal: it modifies files based on your natural language requests. You can try it with `npm i -g codebuff` and start using it immediately for free. We have no login gate, and we give all accounts up to $20 worth of credits.Codebuff is different because we simplified the input to one step: you type what you want done in your terminal and hit enter. Then Codebuff looks through your whole codebase and makes the edits it wants, to existing source files or new ones. It also can run your tests, the type checker, or install packages to fulfill your request.Demo video: <a href="https://www.youtube.com/watch?v=dQ0NOMsu0dA" rel="nofollow">https://www.youtube.com/watch?v=dQ0NOMsu0dA</a>It all started at a hackathon. I was trying out Sonnet 3.5 which had recently come out and seeing if I could use it to write code. The script I cobbled together that day pulled codebase context in one step and used it to rewrite files with changes in the second step. This two step process still exists today. Incidentally, my hackathon script worked rather poorly and my demo failed to produce any useful code.But that weekend I thought about the kind of errors it made, and realized that with more context on our codebase, it might have been able to get the change right. For example, it tried to create an endpoint on our server (at my previous startup), but it didn't know that you needed to edit 3 specific files to do this (yeah... our backend was not that clean). So I hand-wrote a guide to our codebase, like I was instructing a new hire. I put it in a markdown file and passed it into Sonnet 3.5's system prompt. And the crazy thing is that it started producing wayyy better code. So, I started getting excited. In fact, this code guide idea still exists in Codebuff today as knowledge.md files which are automatically read on every request.I didn't think of this project as a startup idea at first. I thought it was just a simple script anyone could write. But after another week, I could see there were more problems to solve and it should be a product.In the week between applying to YC and the interview, I could not get Codebuff to edit files consistently. I tried many prompting strategies to get it to replace strings in the original file, but nothing worked reliably. How could I face my interviewer if I could not get something basic like this to work? On the day before my interview, in a Hail Mary attempt, I fine-tuned GPT-4o to turn Claude's sketch of changes into a git patch, which would add and remove lines to make the edits. I only finished generating the training data late at night, and the fine-tuning job ran as I slept.And, holy hell, the next morning it worked! I pushed it to production just in time for my YC interview with Dalton. Soon after, Brandon joined and we were off to the races.So, how does Codebuff work exactly? You invoke it in your terminal, and it starts by running through the source files in that directory and subdirectories and parsing out all the function and class names (or equivalents in 11 languages). We use the tree-sitter library to do this. It builds out a codebase map that includes these symbols and the file tree.Then, it fires off a request to Claude Haiku 3.5 to cache this codebase context so user inputs can be responded to with lower latency. (Prompt caching is OP!). We have a stateless server that passes messages along to Anthropic or OpenAI. We use websockets to ferry data back and forth to clients. We didn't have authentication or even a database for the first three months. Codebuff was free to install and used our API keys for all requests. Luckily, no one exploited us for too much free Claude usage haha. Major thanks to Brandon for saving this situation by building out our database (Postgres + Drizzle), server (Bun, hosted on Render, auth (using the free Auth.js), website (NextJS also hosted on Render), billing (Stripe), logging (BetterStack), and dashboard (Retool). This is the best tech stack I’ve ever had.When the user sends an input message, we prompt Claude to pick files that would be relevant (step 1). After picking files, we load them into context and the agent responds. It invokes tools using xml tags that we parse. It literally writes out <edit_file path="src/app.ts">…</edit_file> to edit a particular file, and has other tags to run terminal commands, or to ask to read more files. This is all we really need, since Anthropic has already trained Claude with very similar tools reach state of the art on the SWE benchmark.Codebuff has limited free usage, but if you like it you can pay $99/mo to get more credits. We realize this is a lot more than competitors, but that’s because we do more expensive LLM calls with more context.We’re already seeing Codebuff used in surprising ways. One user racked up a $500 bill by building out two Flutter apps in parallel. He never even looked at the code it generated. Instead, he had long conversations with Codebuff to make progress and fix errors, until the apps were built to his satisfaction. Many users built real apps over a weekend for their teams and personal use.Of course, those aren't the typical use cases. Users also frequently use Codebuff to write unit tests. They would build a feature in parallel with unit tests and have Codebuff do loops to fix up the code until the tests pass. They would also ask it to do drudge work like set up Oauth flows or API scaffolding.What's really exciting with all of these examples is that we're seeing people's creativity becoming unbridled. They're spending more of their time thinking about architecture and design, instead of implementation details. It's so cool that we're just at the beginning, and the technology is only going to improve from here.If you would want to use Codebuff inside your own systems, we have an alpha SDK that exposes the same natural language interface for your apps to call and receive code edits! You can sign up here for early access: <a href="https://codebuff.retool.com/form/c8b15919-52d0-4572-aca5-533317403dde" rel="nofollow">https://codebuff.retool.com/form/c8b15919-52d0-4572-aca5-533...</a>.Thank you for reading! We’re excited for you to try out Codebuff and let us know what you think!

嘿，HN！我们是James和Brandon正在构建Codebuff（<a href=“https:&#x2F；&#x2F; Codebuff.com”>https:&quot；Codebuff.com</a>）。Codebuff类似于Cursor Composer，但在您的终端中：它根据您的自然语言请求修改文件。你可以尝试使用`npm i-g codebuff`，并立即免费使用它。我们没有登录门，我们为所有账户提供高达20美元的信用额度<p> Codebuff是不同的，因为我们将输入简化为一个步骤：您在终端中键入要执行的操作，然后按enter键。然后，Codebuff会查看您的整个代码库，并对现有的源文件或新的源文件进行所需的编辑。它还可以运行您的测试、类型检查器或安装包来满足您的请求<p> 演示视频：<a href=“https:”www.youtube.com“观看”v=dQ0NOMsu0dA“rel=”nofollow“>https:”&#x2F；www.youtube.com；看？v=dQ0NOMsu0dA</a><p>这一切都始于一场黑客马拉松。我正在试用最近推出的Sonnet 3.5，看看是否可以用它来编写代码。那天我拼凑的脚本在一步中提取了代码库上下文，并在第二步中用它来重写有更改的文件。这两个步骤的过程今天仍然存在。顺便说一句，我的黑客马拉松脚本运行得相当糟糕，我的演示也未能生成任何有用的代码<p> 但那个周末，我思考了它所犯的错误，并意识到，如果我们的代码库有更多的上下文，它可能能够正确地进行更改。例如，它试图在我们的服务器上创建一个端点（在我上次启动时），但它没有；我不知道你需要编辑3个特定的文件才能做到这一点（是的……我们的后端不是那么干净）。所以我亲手写了一份代码库指南，就像我在指导新员工一样。我把它放在一个markdown文件中，并将其传递给Sonnet 3.5x27；s系统提示。疯狂的是，它开始产生更好的代码。所以，我开始兴奋起来。事实上，这种代码指南的想法今天仍然存在于Codebuff中，作为knowledge.md文件，在每次请求时都会自动读取<p> 我没有；起初，我并不认为这个项目是一个创业想法。我以为这只是一个任何人都能写的简单剧本。但又过了一周，我发现还有更多的问题需要解决，它应该是一个产品<p> 在申请YC和面试之间的一周里，我无法让Codebuff始终如一地编辑文件。我尝试了许多提示策略来让它替换原始文件中的字符串，但都没有可靠的效果。如果我不能让这样的基本东西发挥作用，我怎么能面对面试官呢？在我面试的前一天，在Hail Mary的尝试中，我对GPT-4o进行了微调，使其变成了Claude；s将更改草图转换为git补丁，该补丁将添加和删除行以进行编辑。我直到深夜才完成训练数据的生成，微调工作在我睡觉的时候进行<p> 而且，天哪，第二天早上它奏效了！我把它推到了制作阶段，正好赶上YC对道尔顿的采访。不久之后，布兰登加入了，我们就去参加比赛了<p> 那么，Codebuff究竟是如何工作的呢？您在终端中调用它，它首先运行该目录和子目录中的源文件，并解析出所有函数和类名（或11种语言中的等效名称）。我们使用树保姆图书馆来做这件事。它构建了一个包含这些符号和文件树的代码库映射<p> 然后，它向Claude Haiku 3.5发出请求，缓存此代码库上下文，以便以较低的延迟响应用户输入。（提示缓存为OP！）。我们有一个无状态服务器，它将消息传递给Anthropic或OpenAI。我们使用websockets将数据来回传输到客户端。我们没有；前三个月没有身份验证，甚至没有数据库。Codebuff可以免费安装，并对所有请求使用我们的API密钥。幸运的是，没有人利用我们太多的免费克劳德使用哈哈。非常感谢Brandon通过构建我们的数据库（Postgres+Drizzle）、服务器（Bun，托管在Render上）、身份验证（使用免费的auth.js）、网站（NextJS也托管在Renders上）、计费（Stripe）、日志记录（BetterStack）和仪表板（Retool）来挽救这一局面。这是我见过的最好的技术栈<p> 当用户发送输入消息时，我们提示Claude选择相关的文件（步骤1）。在选择文件后，我们将它们加载到上下文中，代理会做出响应。它使用我们解析的xml标签调用工具。它字面上写着&lt；edit_file路径=“；src&#x2F；app.ts”&&gt；…&lt&#x2F；edit_file&gt；编辑特定文件，并具有其他标签来运行终端命令，或要求读取更多文件。这就是我们真正需要的，因为Anthropic已经用非常相似的工具训练了Claude，达到了SWE基准测试的最新水平。<p>Codebuff的免费使用有限，但如果你喜欢它，你可以支付99美元；莫获得更多学分。我们意识到这比竞争对手要多得多，但这是因为我们用更多的上下文进行更昂贵的LLM调用<p> 我们已经看到Codebuff以令人惊讶的方式使用。一位用户通过并行构建两个Flutter应用程序获得了500美元的账单。他甚至从未看过它生成的代码。相反，他与Codebuff进行了长时间的对话，以取得进展并修复错误，直到应用程序的构建让他满意。许多用户在周末为他们的团队和个人使用构建了真正的应用程序<p> 当然，这些不是；t典型的用例。用户还经常使用Codebuff编写单元测试。他们将在单元测试的同时构建一个功能，并让Codebuff进行循环来修复代码，直到测试通过。他们还要求它做一些繁重的工作，比如设置Oauth流或API脚手架<p> 什么；所有这些例子真的很令人兴奋，因为我们；重新见到人们；他的创造力变得肆无忌惮。他们；将更多的时间花在架构和设计上，而不是实现细节上。它；太酷了，我们；我们才刚刚开始，技术只会从这里得到改进<p> 如果你想在自己的系统中使用Codebuff，我们有一个alpha SDK，它为你的应用程序提供了相同的自然语言接口，以调用和接收代码编辑！您可以在此处注册以提前访问：<a href=“https:&#x2F；codebuff.retool.com&#x2G；form&#x2B；c8b15919-52d0-4572-ca5-533317403dde”rel=“nofollow”>https:&#x2F；codebuff.reutol.com；形式；c8b15919-52d0-4572-ca5-533…</a><p> 感谢您的阅读！我们很高兴您能尝试Codebuff，并告诉我们您的想法！

hn link

Url:

Post by: jahooma

Comments:

draebek: Congratulations on your launch! But I confess that I am really confused. This sounds exactly like Aider, but closed source and it's locked into a single LLM API? I just watched you use it, and looks a lot like Aider too? Why would I use this over Aider?I've seen people say "you don't have to add files to Codebuff", but Aider tells me when the LLM has requested to see files. I just have to approve it. If that bothers you, it's open source, so you could probably just add a config to always add files when requested.Aider can also run commands for you.What am I missing?

draebek: 祝贺你的发布！但我承认，我真的很困惑。这听起来很像Aider，但它是闭源的；是否锁定到单个LLM API？我只是看着你用它，看起来也很像Aider？我为什么要用这个来代替Aider 我；我见过人们说；你不知道；不必向Codebuff添加文件”；，但助手告诉我LLM何时要求查看文件。我只需要批准它。如果这让你感到困扰，那就批准吧；s是开源的，所以你可能只需要添加一个配置，以便在需要时始终添加文件 助手还可以为您运行命令 我错过了什么？

haxton: The demos I see for these types of tools are always some toy project and doesn't reflect day to day work I do at all. Do you have any example PRs on larger more complex projects that have been written with codebuff and how much of that was human interactive?The real problem I want someone to solve is helping me with the real niche/challenging portion of a PR, ex: new tiptap extension that can do notebook code eval, migrate legacy auth service off auth0, record and replay API GET requests and replay a % of them as unit tests, etc.So many of these tools get stuck trying to help me "start" rather than help me "finish" or unblock the current problem I'm at.

haxton: 我看到的这些类型的工具的演示总是一些玩具项目，而不是；这根本不能反映我所做的日常工作。你有用代码包编写的更大、更复杂项目的PR示例吗？其中有多少是人机交互的 我希望有人解决的真正问题是帮助我找到真正的利基市场；PR中具有挑战性的部分，例如：新的tiptap扩展，可以进行笔记本代码评估，将遗留的auth服务迁移到auth0，记录和回放API GET请求，并将其中的%作为单元测试进行回放，等等；开始”；而不是帮助我&quot；完成”；或解除对当前问题的阻止；m at。

darweenist: Congrats on the launch guys! Tried the product early on and it’s clearly improved a ton. I’m still using Cursor every day mainly because of how complete the feature set is - autocomplete, command K, highlight a function and ask questions about it, and command L / command shift L. I am not sure what it’ll take for me to switch - maybe I’m not an ideal user somehow… I’m working in a relatively simple codebase with few collaborators?I’m curious what exactly people say causes them to make the switch from Cursor to Codebuff? Or do people just use both?

darweenist: 恭喜发射人员！很早就试过这个产品，它明显有了很大的改进。我仍然每天使用Cursor，主要是因为功能集有多完整——自动补全、命令K、突出显示一个函数并询问有关它的问题，以及命令L&#x2F；command-shift L。我不确定我需要什么来切换——也许我不是一个理想的用户……我在一个相对简单的代码库中工作，很少有合作者 我很好奇人们到底说了什么导致他们从Cursor切换到Codebuff？还是人们只是两者都用？

nisten: I'm not paying $20 for my ssh keys and rest of the clipboard to be sent to multiple unknown 3rd parties, thanks, not for me.Would however pay for actual software that I can just buy instead of rent to do the task of inline shell assitance, without making network calls behind my back that i'm not in complete perfectionist one hundred point zero zero per cent control of.Sorry just my opinion in general with these types of products. If you don't have the skills to make a fully self contained language model type of product or something do this then you are not skilled enough team for me to trust with my work shell.

nisten: 我；我不会为我的ssh密钥和剪贴板的其余部分支付20美元，以发送给多个未知的第三方，谢谢，不是为了我。然而，我会为我可以购买而不是租用的实际软件付费，以完成内联shell辅助任务，而无需在背后进行网络调用；我不是完全完美主义者百分之百的控制 很抱歉，这只是我对这类产品的总体看法。如果你不这样做；如果你没有能力制作一个完全独立的语言模型类型的产品或其他东西，那么你的团队就没有足够的技能让我信任我的工作外壳。

ndyg: Noting Codebuff is manicode renamed.It's become my go-to tool for handling fiddly refactors. Here’s an example session from a Rust project where I used it to break a single file into a module directory.<a href="https://gist.github.com/cablehead/f235d61d3b646f2ec1794f656e494f5c#file-session-txt-L103" rel="nofollow">https://gist.github.com/cablehead/f235d61d3b646f2ec1794f656e...</a>Notice how it can run tests, see the compile error, and then iterate until the task is done? Really impressive.For reference, this task used ~100 credits

ndyg: 注意Codebuff已重命名为manicode 它；s成为我处理复杂重构的首选工具。这是一个Rust项目的示例会话，我使用它将单个文件分解到模块目录中 <a href=“https:&#x2F；gist.github.com&#x25d61d3b646f2ec1794f656e494f5c#文件会话-文本-L103”rel=“nofollow”>https:&#x2F；github.com；电缆头；f235d61d3b646f2ec1794f656e</a> 注意它如何运行测试，查看编译错误，然后迭代直到任务完成？真的令人印象深刻 作为参考，此任务使用了~100个学分