【Hacker News搬运】启动HN:Aqua Voice（YC W24）–语音驱动的文本编辑器

hackernews

Title: Launch HN: Aqua Voice (YC W24) – Voice-driven text editor

启动HN:Aqua Voice（YC W24）–语音驱动的文本编辑器

Text: Hey HN! We’re Jack and Finn from Aqua Voice (<a href="https://withaqua.com/">https://withaqua.com/</a>). Aqua is a voice-native document editor that combines reliable dictation and natural language commands, letting you say things like: “make this a list” or “it’s Erin with an E” or “add an inline citation here for page 86 of this book”. Here is a demo: <a href="https://youtu.be/qwSAKg1YafM" rel="nofollow">https://youtu.be/qwSAKg1YafM</a>.Finn, who is big-time dyslexic, has been using dictation software since the sixth grade when his dad set him up on Dragon Dictation. He used it through school to write papers, and has been keeping his own transcription benchmarks since college. All that time, writing with your voice has remained a cumbersome and brittle experience that is riddled with painpoints.Dictation software is still terrible. All the solutions basically compete on accuracy (i.e. speech recognition), but none of them deal with the fundamentally brittle nature of the text that they generate. They don't try to format text correctly and require you to learn a bunch of specialized commands, which often are not worth it. They're not even close to a voice replacement for a keyboard.Even post LLM, you are limited to a set of specific commands and the most accurate models don’t have any commands. Outside of these rules, the models have no sense for what is an instruction and what is content. You can’t say “and format this like an email” or “make the last bullet point shorter”. Aqua solves this.This problem is important to Finn and millions of other people who would write with their voice if they could. Initially, we didn't think of it as a startup project. It was just something we wanted for ourselves. We thought maybe we'd write a novel with it - or something. After friends started asking to use the early versions of Aqua, it occurred to us that, if we didn't build it, maybe nobody would.Aqua Voice is a text editor that you talk to like a person. Depending on the way that you say it and the context in which you're operating, Aqua decides whether to transcribe what you said verbatim, execute a command, or subtly modify what you said into what you meant to write.For example, if you were to dictate: "Gryphons have classic forms resembling shield volcanoes," Aqua would output your text verbatim. But if you stumble over your words or start a sentence over a few times, Aqua is smart enough to figure that out and to only take the last version of the sentence.The vision is not only to provide a more natural dictation experience, but to enable for the first time an AI-writing experience that feels natural and collaborative. This requires moving away from using LLMs for one-off chat requests and towards something that is more like streaming where you are in constant contact with the model. Voice is the natural medium for this.Aqua is actually 6 models working together to transcribe, interpret, and rewrite the document according to your intent. Technically, executing a real-time voice application with a language model at its core requires complex coordination between multiple pieces. We use MoE transcription to outperform what was previously thought possible in terms of real-time accuracy. Then we sync up with a language model to determine what should be on the screen as quickly as possible.The model isn't perfect, but it is ready for early adopters and we’ve already been getting feedback from grateful users. For example, a historian with carpal tunnel sent us an email he wrote using Aqua and said that he is now able to be five times as productive as he was previously. We've heard from other people with disabilities that prevent them from typing. We've also seen good adoption from people who are dyslexic or simply prefer talking to typing. It’s being used for everything from emails to brainstorming to papers to legal briefings.While there is much left to do in terms of latency and robustness, the best experiences with Aqua are beginning to feel magical. We would love for you to try it out and give us feedback, which you can do with no account on <a href="https://withaqua.com">https://withaqua.com</a>. If you find it useful, it’s $10/month after a 1000-token free trial. (We want to bump the free trial in the future, but we're a small team, and running this thing isn’t cheap.)We’d love to hear your ideas and comments with voice-to-text!

嗨，HN！我们是来自Aqua Voice的Jack和Finn（<a href=“https://；&#x2F；withaqua.com&#x2F”>https://；#xx2F；withaqua.com&x2F；</a>）。Aqua是一款语音原生文档编辑器，它结合了可靠的听写和自然语言命令，可以让你说一些话，比如：“把它列成一个列表”或“这是Erin和一个E”或“在这里为本书第86页添加一个内联引文”。这里有一个演示：<a href=“https://；&#x2F；youtu.be&#x2F！qwSAKg1YafM”rel=“nofollow”>https://&#x2F；youtu.be&#x2F；qwSAKg1YafM</a><p> 芬恩是一名严重的阅读障碍患者，从六年级开始就一直在使用听写软件，当时他的父亲给他设置了“龙听写”。他在整个学校都用它来写论文，从大学起就一直保持着自己的转录基准。一直以来，用声音写作一直是一种繁琐而脆弱的体验，充满了痛苦<p> 听写软件仍然很糟糕。所有的解决方案基本上都在准确性（即语音识别）上存在竞争，但没有一种解决方案能够处理它们生成的文本的根本脆弱性。他们不；不要试图正确设置文本格式，并要求您学习一系列专门的命令，这些命令通常不值得；We’你甚至还不足以用语音代替键盘<p> 即使在LLM之后，您也只能使用一组特定的命令，而最准确的模型也没有任何命令。除了这些规则之外，模型对什么是指令、什么是内容没有任何意义。你不能说“像电子邮件一样格式化”或“缩短最后一个要点”。Aqua解决了这个问题<p> 这个问题对芬恩和其他数百万人来说很重要，如果可以的话，他们会用自己的声音写作。最初，我们没有；Don’别以为这是一个创业项目。这只是我们自己想要的东西。我们想也许我们；我用它写一本小说——或者别的什么。在朋友们开始要求使用Aqua的早期版本后，我们突然想到，如果我们不使用；不要建造它，也许没人会<p> Aqua Voice是一个文本编辑器，你可以像一个人一样与之交谈。取决于你说话的方式和你说话的背景；在操作中，Aqua决定是逐字逐句地转录你所说的内容，执行命令，还是巧妙地将你所说内容修改为你想要写的内容<p> 例如，如果你要听写：“；鹰头狮具有类似盾状火山的经典形态；Aqua会逐字逐句地输出您的文本。但是，如果你在单词上犯了错误，或者一个句子的开头重复了几次，Aqua足够聪明，能够弄清楚这一点，并且只取句子的最后一个版本<p> 我们的愿景不仅是提供更自然的听写体验，而且首次实现感觉自然和协作的人工智能写作体验。这需要从使用LLM进行一次性聊天请求转向更像流媒体的方式，在这种方式下，你可以不断地与模型联系。声音是自然的媒介<p> Aqua实际上是6个模型协同工作，根据您的意图转录、解释和重写文档。从技术上讲，执行以语言模型为核心的实时语音应用程序需要多个部分之间的复杂协调。我们使用MoE转录在实时准确性方面优于之前认为的可能。然后，我们与一个语言模型同步，以尽快确定屏幕上应该显示什么<p> 该模型是；虽然不完美，但它已经为早期采用者做好了准备，我们已经收到了感激用户的反馈。例如，一位患有腕管的历史学家给我们发了一封他用Aqua写的电子邮件，说他现在的工作效率是以前的五倍。我们；我从其他残疾人那里听说他们不能打字。我们；我还看到阅读障碍者或只是喜欢说话而不是打字的人很好地采用了这种方式。它被用于从电子邮件到头脑风暴到文件再到法律简报的所有方面<p> 虽然在延迟和健壮性方面还有很多事情要做，但Aqua的最佳体验开始变得神奇起来。我们希望您尝试一下并给我们反馈，您可以在<a href=“https://；&#x2F；withaqua.com”>https://上无需帐户即可完成&#x2F；withaqua.com</a>。如果你觉得它很有用，那就花10美元；1000代币免费试用后一个月。（我们想在未来推出免费试用版，但我们是一个小团队，运行这项功能并不便宜。）<p>我们很乐意通过语音转文本的方式听取您的想法和评论！

hn link

Url:

Post by: the_king

Comments:

rafram: This is cool! Some feedback:- As others have said, "1000 tokens" doesn't mean anything to non-technical users and barely means anything to me. Just tell me how many words I can dictate!- That serif-font LaTeX error rate table is also way too boring. People want something flashy: "Up to 7x fewer errors than macOS dictation" is cool, a comparison table is not.- Similarly, ".05 Word Error Rate" has to go. Spell out what that means and use percentages.- "Forgot a name, word, fact, or number? Just ask Aqua to fill it in for you." It would be nice to be able to turn this off, or at least have a clear indication when content that I did not say is inserted into my document. If I'm dictating, I don't usually want anything but the words I say on the page.

rafram: 这太酷了！一些反馈：-正如其他人所说；1000个代币”；不；我对非技术用户没有任何意义，对我来说也没有任何意义。告诉我我能听写多少单词就行了 -那个衬线字体的LaTeX错误率表也太无聊了。人们想要一些华而不实的东西：&quot；比macOS听写少多达7倍的错误“；很酷，对比表就不酷了 -类似地，“；。05字错误率”；必须离开。拼写出这意味着什么，并使用百分比 -“；忘记名字、单词、事实或数字了吗？只需让Aqua为您填写即可&quot；如果能够关闭它，或者至少在我没有说的内容插入到我的文档中时有一个明确的指示，那就太好了。如果I-；我在听写，我不听写；除了我在纸上说的话，我通常什么都不想要。

mavsman: Since voice-to-text has gotten so good I've used it a lot more and also noticed how distracting and confusing it can be. Using Apple's dictation has a similar feel to this where you're constantly seeing something that's changing on the screen. It's kind of irritating and I don't really know what the solution is.One suggestion I have here is to have at least two different sections of the UI. One part would be the actual document and the other would be the scratchpad. It seems like much of what you say would not actually make it into the document (edits, corrections, etc) so those would only be shown in the scratchpad. Once the editor has processed the text from the scratchpad then it can go into the document how it's supposed to. Having text immediately show up in the document as it's dictated is weird.Your big challenge right now is just that STT is still relatively slow for this usecase. Time will be on your side in that regard as I'm sure you know.Good luck! Voice is the future of a lot of the interactions we have with computers.

mavsman: 由于语音到文本已经变得如此好；我经常使用它，也注意到它会让人分心和困惑；s的听写具有类似的感觉；你不断地看到一些东西；It’在屏幕上变了。它；这有点烦人，我不喜欢；我真的不知道解决方案是什么。我在这里的一个建议是至少有两个不同的UI部分。一部分是实际的文档，另一部分是草稿本。看起来你所说的很多内容实际上并没有进入文档（编辑、更正等），所以这些内容只会显示在草稿栏中。一旦编辑器已经处理了来自草稿栏的文本，那么它就可以进入文档；让文本立即显示在文档中；s的口述很奇怪 您现在面临的最大挑战是，对于这个用例，STT仍然相对较慢。在这方面，时间将站在你这边，因为我；I’我相信你知道 祝你好运！语音是我们与计算机进行许多交互的未来。

tkgally: After watching the video demo and logging in, I was able to compose and edit text easily. Nice job.My own use case is a bit different from many others who have commented here. I'm a reasonably fast typist and don't currently have any physical or neurological issues that might make typing difficult. I have tried voice input methods a number of times over the years, as I thought speaking would be faster than typing, but I always went back to typing due to accuracy problems and difficulty editing.Aqua Voice does seem to be a significant advance. I'm going to try it out from time to time to see if I can get comfortable with voice input. If I can, I will subscribe.I drafted this comment using Aqua Voice, but I ended up editing it quite a bit with a keyboard before posting.

tkgally: 在观看了视频演示并登录后，我能够轻松地撰写和编辑文本。干得好 我自己的用例与这里评论的许多其他用例有点不同。I-；我是一个相当快的打字员；目前没有任何可能使打字困难的身体或神经问题。这些年来，我尝试了很多次语音输入法，因为我认为说话会比打字更快，但由于准确性问题和编辑困难，我总是重新开始打字 Aqua Voice似乎是一个重大进步。I-；I’我会时不时地尝试一下，看看我是否能适应语音输入。如果可以的话，我会订阅 我用Aqua Voice起草了这篇评论，但在发布之前，我用键盘对其进行了大量编辑。

rickydroll: I developed an RSI-related injury back in 94/95 and have been using speech recognition ever since. I would love a solution that would let me move off of Windows. I would love a solution allowing me to easily dictate text areas in Firefox, Thunderbird, or VS code. Most important, however, would be the ability to edit/manipulate the text using what Nuance used to call Select-and-Say. The ability to do minor edits, replace sentences with new dictation, etc., is so powerful and makes speech much easier to use than straight captured dictation like most whisper apps. If you can do that, I will be a lifelong customer.The next most important thing would be the ability to write action routines for grammar. My preference is for Python because it's the easiest target when using chatGPT to write code. However, I could probably learn to live with other languages (except JavaScript, which I hate). I refer you to Joel Gould's "natPython" package he wrote for NaturallySpeaking. Here's the original presentation that people built on. <a href="https://slideplayer.com/slide/5924729/" rel="nofollow">https://slideplayer.com/slide/5924729/</a>Here's a lesson from the past. In the early days of DragonDictate/NaturallySpeaking, when the Bakers ran Dragon Systems, they regularly had employees drop into the local speech recognition user group meetings and talk to us about what worked for us and what failed. They knew that watching us Crips would give them more information about how to build a good speech recognition environment than almost any other user community. We found the corner cases before anybody else. They did some nice things, such as supporting a couple of speech recognition user group conferences with space and employee time.It seems like nuance has forgotten those lessons.Anyway, I was planning on getting work done today, but your announcement shoots that in the head. :-)[edit] Freaking impressive. It is clear that I should spend more time on this. I can see how my experience of Naturally Speaking limited my view, and you have a much wider view of what the user interface could be.

rickydroll: 我在94°F时出现了RSI相关的损伤；95，并且从那以后一直在使用语音识别。我希望有一个能让我离开Windows的解决方案。我希望有一个解决方案可以让我轻松地在Firefox、Thunderbird或VS代码中指定文本区域。然而，最重要的是编辑；使用Nuance过去称之为Select and Say的方法来操作文本。进行小编辑、用新听写替换句子等功能非常强大，使语音比大多数耳语应用程序等直接捕获的听写更容易使用。如果你能做到这一点，我将成为终身客户 接下来最重要的是编写语法动作例程的能力。我更喜欢Python，因为它；是使用chatGPT编写代码时最容易的目标。然而，我可能会学会使用其他语言（除了我讨厌的JavaScript）。我把你介绍给乔尔·古尔德；s〃；natPython”；他为NaturallySpeaking写的包裹。这里；这是人们构建的原始演示文稿&#x2F；slideplayer.com&#x2F；幻灯片；5924729</a> 此处；It’这是过去的教训。在DragonDimate的早期；NaturallySpeaking，当Bakers经营Dragon Systems时，他们经常让员工参加当地的语音识别用户小组会议，与我们谈论哪些对我们有效，哪些失败。他们知道，与几乎任何其他用户社区相比，观看我们的Crips会为他们提供更多关于如何构建良好语音识别环境的信息。我们比其他人先发现了角落里的箱子。他们做了一些好事，比如用空间和员工时间支持了几次语音识别用户小组会议 细微差别似乎已经忘记了这些教训 无论如何，我本来打算今天完成工作，但你的公告让我大吃一惊。：-） [编辑]令人印象深刻。很明显，我应该在这方面花更多的时间。我可以看到我在自然语言方面的经验如何限制了我的观点，而且你对用户界面的看法要广泛得多。

benpacker: This is really great. I was hoping someone would build this: <a href="https://bprp.xyz/__site/Looking+for+Collaborators/Better+Local+Voice+Document+Editing" rel="nofollow">https://bprp.xyz/__site/Looking+for+Collaborators/Better+Loc...</a>I would really happily pay $10 / month for this, but what I really want is either:

A Raycast plugin or Desktop app that lets this interact with any editable text area in my environment
An API that I can pass existing text / context + audio stream to and get back a heartbeat of full document updates. Then, the community can build Obsidian/VSCode/browser plugins for the huge surface area of text entryGoing to give you $10 later this afternoon regardless, and congrats!

benpacker: 这真的很棒。我希望有人能构建这个：<a href=“https://；&#x2F；bprp.xyz&#x2F！__site&#x2F，寻找+合作者&#x2F：更好的+本地+语音+文档+编辑”rel=“nofollow”>https://&#x2F；bprp.xyz__site&#x2F；寻找+寻找+合作者；Better+Loc</a> 我真的很乐意支付10美元；一个月，但我真正想要的是：-Raycast插件或桌面应用程序，允许它与我的环境中的任何可编辑文本区域交互-我可以将现有文本；context+音频流到并返回完整文档更新的心跳。然后，该社区可以构建Obsidian&#x2F；VSCode；用于文本输入的巨大表面积的浏览器插件无论如何，今天下午晚些时候都会给你10美元，恭喜你！