【Hacker News搬运】使用LLM来增强我们的测试实践

hackernews

Title: Using LLMs to enhance our testing practices

使用LLM来增强我们的测试实践

Text:

Url: https://www.assembled.com/blog/how-we-saved-hundreds-of-engineering-hours-by-writing-tests-with-llms

由于我是一个AI，无法直接访问网络上的内容，包括上述链接中的文章。不过，我可以提供一个假设性的分析，如果你能够提供文章的内容或者摘要，我可以帮你进行分析和总结。

以下是一个基于假设内容的分析：

---

**文章标题：** 如何通过编写LLM测试节省数百个工程小时

**内容摘要：**

这篇文章讨论了如何利用大型语言模型（LLM）来编写测试，从而显著提高工程效率。作者分享了他们团队在实施这一策略过程中的经验，以及它如何帮助他们节省了大量的工程时间。

**分析：**

1. **背景介绍：**
   文章开始可能介绍了LLM的基本概念，以及它们在软件测试中的应用潜力。LLM是近年来人工智能领域的重大突破，能够理解和生成人类语言。

2. **测试挑战：**
   作者可能会提到传统测试方法的局限性，例如手动编写测试用例的耗时和复杂性，以及测试用例覆盖率的不足。

3. **LLM测试的优势：**
   文章会详细解释LLM如何帮助编写测试。LLM能够快速生成大量的测试用例，覆盖更多的场景，从而提高测试的全面性和准确性。

4. **实施过程：**
   作者可能分享他们是如何将LLM集成到现有的测试流程中的，包括选择合适的LLM模型、调整参数以适应特定的测试需求和挑战。

5. **结果与效益：**
   文章的核心部分会展示实施LLM测试后的成果。可能会包括节省的时间、提高的测试效率、更低的错误率等具体数据。

6. **案例研究：**
   文章可能包含一个或多个具体的案例研究，展示LLM如何在实际项目中发挥作用。

7. **结论与展望：**
   最后，作者可能会总结LLM在测试领域的潜力，并展望未来的发展趋势。

**总结：**

通过使用LLM编写测试，作者的团队成功地节省了大量工程时间，提高了测试效率和质量。这一策略不仅减少了测试成本，还提升了软件的可靠性。

---

请注意，以上内容是基于假设的分析，实际文章可能包含不同的观点和细节。如果你能够提供文章的具体内容，我可以提供一个更精确的分析和总结。

Post by: johnjwang

Comments:

renegade-otter: In every single system I have worked on, tests were not just tests - they were their own parallel application, and it required careful architecture and constant refactoring in order for it to not get out of hand."More tests" is not the goal - you need to write high impact tests, you need to think about how to test the most of your app surface with least amount of test code. Sometimes I spend more time on the test code than the actual code (probably normal).Also, I feel like people would be inclined to go with whatever the LLM gives them, as opposed to really sitting down and thinking about all the unhappy paths and edge cases of UX. Using an autocomplete to "bang it out" seems foolish.

renegade-otter: 在我工作过的每一个系统中，测试不仅仅是测试——它们是自己的并行应用程序，需要仔细的架构和不断的重构，才能使其不失控 &quot；更多测试&quot；这不是目标——你需要编写高影响力的测试，你需要考虑如何用最少的测试代码来测试你的应用程序的大部分表面。有时我花在测试代码上的时间比实际代码多（可能是正常的） 此外，我觉得人们倾向于选择LLM给他们的任何东西，而不是真正坐下来思考用户体验的所有不愉快的路径和边缘案例。使用自动补全功能&quot；砰的一声&quot；看起来很愚蠢。

mkleczek: I am very sceptical of LLM (or any AI) code generation usefulness and it does not really have anything to do with AI itself.In the past I've been involved in several projects deeply using MDA (Model Driven Architecture) techniques which used various code generation methods to develop software. One of the main obstacles was the problem of maintaining the generated code.IOW: how should we treat generated code?If we treat it in the same way as code produced by humans (ie. we maintain it) then the maintenance cost grows (super-linearly) with the amount of code we generate. To make matters worse for LLM: since the code it generates is buggy it means we have more buggy code to maintain. Code review is not the answer because code review power in finding bugs is very weak.This is unlike compilers (that also generate code) because we don't maintain code generated by compilers - we regenerate it anytime we need.The fundamental issue is: for a given set of requirements the goal is to produce less code, not more. Any code generation (however smart it might be) goes against this goal.EDIT: typos

mkleczek: 我非常怀疑LLM（或任何AI）代码生成的有用性，它与AI本身没有任何关系 在过去，我；我深度参与了几个使用MDA（模型驱动架构）技术的项目，这些技术使用各种代码生成方法来开发软件。主要障碍之一是维护生成的代码的问题 IOW：我们应该如何处理生成的代码 如果我们以与人类生成的代码相同的方式对待它（即我们维护它），那么维护成本就会随着我们生成的代码量而增长（超线性）。更糟糕的是，LLM生成的代码有缺陷，这意味着我们需要维护更多有缺陷的代码。代码审查不是答案，因为代码审查在发现bug方面的能力非常弱 这与编译器（也生成代码）不同，因为我们不会；不要维护编译器生成的代码，我们可以随时重新生成它 根本问题是：对于给定的一组需求，目标是生成更少的代码，而不是更多的代码_任何代码生成（无论多么聪明）都违背了这一目标 编辑：拼写错误

nazgul17: Should we not, instead, write tests ourselves and have LLMs write the code to make them pass?

nazgul17: 我们难道不应该自己编写测试，让LLM编写代码使其通过吗？

mastersummoner: I actually tested Claude Sonnet to see how it would fare at writing a test suite for a background worker. My previous experience was with some version of GPT via Copilot, and it was... not good.I was, however, extremely impressed with Claude this time around. Not only did it do a great job off the bat, but it taught me some techniques and tricks available in the language/framework (Ruby, Rspec) which I wasn't familiar with.I'm certain that it helped having a decent prompt, asking it to consider all the potential user paths and edge cases, and also having a very good understanding of the code myself. Still, this was the first time for me I could honestly say that an LLM actually saved me time as a developer.

mastersummoner: 我实际上测试了Claude Sonnet，看看它在为后台工作人员编写测试套件方面的表现如何。我之前的经验是通过Copilot使用GPT的某个版本，它是。。。不好 然而，这次我对克劳德印象深刻。它不仅在一开始就做得很好，而且教会了我一些语言中可用的技巧和窍门；框架（Ruby、Rspec），我不是；我不熟悉 我；我确信，它有助于提供一个体面的提示，要求它考虑所有潜在的用户路径和边缘情况，并且我自己也对代码有很好的理解。尽管如此，这是我第一次诚实地说，法学硕士实际上为我节省了作为开发人员的时间。

iambateman: I did this for Laravel a few months ago and it’s great. It’s basically the same as the article describes, and it has definitely increased the number of tests I write.Happy to open source if anyone is interested.

iambateman: 几个月前我为Laravel做了这件事，它很棒。它与文章描述的基本相同，而且它确实增加了我编写的测试数量 如果有人感兴趣，我们很乐意开源。