【Hacker News搬运】Launch HN: Soundry AI (YC W24) – Music sample generator for music creators
-
Title: Launch HN: Soundry AI (YC W24) – Music sample generator for music creators
推出HN:Soundry AI(YC W24)-音乐创作者的音乐样本生成器
Text: Hi everyone! We’re Mark, Justin, and Diandre of Soundry AI (<a href="https://soundry.ai/">https://soundry.ai/</a>). We provide generative AI tools for musicians, including text-to-sound and infinite sample packs.<p>We (Mark and Justin) started writing music together a few years ago but felt limited in our ability to create anything that we were proud of. Modern music production is highly technical and requires knowledge of sound design, tracking, arrangement, mixing, mastering, and digital signal processing. Even with our technical backgrounds (in AI and cloud computing respectively), we struggled to learn what we needed to know.<p>The emergence of latent diffusion models was a turning point for us just like many others in tech. All of a sudden it was possible to leverage AI to create beautiful art. After meeting our cofounder Diandre (half of the DJ duo Bandlez and expert music producer), we formed a team to apply generative AI to music production.<p>We began by focusing on generating music samples rather than full songs. Focusing on samples gave us several advantages, but the biggest one was the ability to build and train our custom models very quickly due to the small required length of the generated audio (typically 2-10 seconds). Conveniently, our early text-to-sample model also fit well within many existing music producers’ workflows which often involve heavy use of music samples.<p>We ran into several challenges when creating our text-to-sound model. The first was that we began by training our latent transformer (similar to Open AI’s Sora) using off-the-shelf audio autoencoders (like Meta’s Encodec) and text embedders (like Google’s T5). The domain gap between the data used to train these off-the-shelf models and sample data was much greater than we expected, which caused us to incorrectly attribute blame for issues in the three model components (latent transformer, autoencoder, and embedder) during development. To see how musicians can use our text-to-sound generator to write music, you can see our text-to-sound demo below:<p><a href="https://www.youtube.com/watch?v=MT3k4VV5yrs&ab_channel=SoundryAI" rel="nofollow">https://www.youtube.com/watch?v=MT3k4VV5yrs&ab_channel=Sound...</a><p>The second issue we experienced was more on the product design side. When we spoke with our users in-depth we learned that novice music producers had no idea what to type into the prompt box, and expert music producers felt that our model’s output wasn’t always what they had in mind when they typed in their prompt. It turns out that text is much better at specifying the contents of visual art than music. This particular issue is what led us to our new product: the Infinite Sample Pack.<p>The Infinite Sample Pack does something rather unconventional: prompting with audio rather than text. Rather than requiring you to type out a prompt and specify many parameters, all you need to do is click a button to receive new samples. Each time you select a sound, our system embeds “prompt samples” as input to our model which then creates infinite variations. By limiting the number of possible outputs we’re able to hide inference latency by pre-computing lots of samples ahead of time. This new approach has seen much wider adoption and so this month we’ll be opening the system up so that everyone can create Infinite Sample Packs of their very own! To compare the workflow of the two products, you can check out our new demo using the Infinite Sample Pack:<p><a href="https://www.youtube.com/watch?v=BqYhGipZCDY&ab_channel=SoundryAI" rel="nofollow">https://www.youtube.com/watch?v=BqYhGipZCDY&ab_channel=Sound...</a><p>Overall, our founding principle is to start by asking the question: "what do musicians actually want?" Meta's open sourcing of MusicGen has resulted in many interchangeable text-to-music products, but ours is embraced by musicians. By constantly having an open dialog with our users we’ve been able to satisfy many needs including the ability to specify BPM and key, including one-shot instrument samples (so musicians can write their own melodies), and adding drag-and-drop support for digital audio workstations via our desktop app and VST. To hear some of the awesome songs made with our product, take a listen to our community showcases below!<p><a href="https://soundcloud.com/soundry-ai/sets/community-showcases" rel="nofollow">https://soundcloud.com/soundry-ai/sets/community-showcases</a><p>We hope you enjoy our tool, and look forward to discussion in the comments
大家好!我们是Soundry AI的Mark、Justin和Diandre。我们为音乐家提供生成性人工智能工具,包括文本到声音和无限样本包<p> 几年前,我们(马克和贾斯汀)开始一起创作音乐,但我们感到创作任何我们引以为傲的东西的能力有限。现代音乐制作技术含量很高,需要声音设计、跟踪、编排、混音、掌握和数字信号处理方面的知识。即使有我们的技术背景(分别是人工智能和云计算),我们也很难学习我们需要知道的东西<p> 潜在扩散模型的出现对我们来说是一个转折点,就像科技领域的许多其他人一样。突然之间,利用人工智能创造美丽的艺术成为可能。在与我们的联合创始人Diandre(DJ二人组Bandlez的一半成员和专业音乐制作人)会面后,我们组建了一个团队,将生成人工智能应用于音乐制作<p> 我们开始专注于生成音乐样本,而不是完整的歌曲。专注于样本给了我们几个优势,但最大的优势是能够非常快速地构建和训练我们的自定义模型,因为生成的音频所需的长度很小(通常为2-10秒)。方便的是,我们早期的文本到样本模型也很适合许多现有音乐制作人的工作流程,这些工作流程通常涉及大量使用音乐样本<p> 在创建文本到声音模型时,我们遇到了一些挑战。首先,我们开始使用现成的音频自动编码器(如Meta的Encodec)和文本嵌入器(如谷歌的T5)来训练我们的潜在转换器(类似于Open AI的Sora)。用于训练这些现成模型的数据与样本数据之间的领域差距远大于我们的预期,这导致我们在开发过程中错误地将三个模型组件(潜在转换器、自动编码器和嵌入器)的问题归咎于此。要了解音乐家如何使用我们的文本到声音生成器来创作音乐,您可以在下面看到我们的文本-声音演示:<p><a href=“https://;/;www.youtube.com/!watch?v=MT3k4VV5yrs&;ab_channel=SoundryAI”rel=“nofollow”>https:///;www.youtube.com/;看v=MT3k4VV5年&;ab_channel=声音</a> <p>我们遇到的第二个问题更多的是在产品设计方面。当我们与用户深入交谈时,我们了解到新手音乐制作人不知道该在提示框中键入什么,而专业音乐制作人认为,当他们键入提示时,我们模型的输出并不总是他们想要的。事实证明,文本比音乐更善于指定视觉艺术的内容。这个特殊的问题正是我们推出新产品的原因:无限样品包<p> Infinite Sample Pack做了一些非常非传统的事情:用音频而不是文本进行提示。您所需要做的不是键入提示并指定许多参数,而是单击一个按钮来接收新样本。每次您选择声音时,我们的系统都会嵌入“提示样本”作为我们模型的输入,然后创建无限的变化。通过限制可能输出的数量,我们能够通过提前预计算大量样本来隐藏推理延迟。这种新方法得到了更广泛的采用,因此本月我们将开放该系统,以便每个人都可以创建自己的无限样本包!要比较这两种产品的工作流程,您可以使用Infinite Sample Pack查看我们的新演示:<p><a href=“https://;/;www.youtube.com/!watch?v=BqYhGipZCDY&;ab_channel=SoundryAI”rel=“nofollow”>https:///;www.youtube.com/;看v=BqYhGipZCDY&;ab_channel=声音</a> <p>总的来说,我们的创立原则是从提出一个问题开始:;音乐家们到底想要什么";Meta;MusicGen的开源带来了许多可互换的文本到音乐产品,但我们的产品受到了音乐家的欢迎。通过不断与用户进行开放式对话,我们能够满足许多需求,包括指定BPM和键的能力,包括一次性乐器样本(这样音乐家就可以创作自己的旋律),以及通过我们的桌面应用程序和VST添加对数字音频工作站的拖放支持。要听一些用我们的产品制作的很棒的歌曲,请收听下面我们的社区展示<p> <a href=“https://;/;soundcloud.com/:soundry ai/!sets/,community showings”rel=“nofollow”>https:///;soundcloud.com/;soudry ai/;集合;社区展示</a><p>我们希望您喜欢我们的工具,并期待在评论中进行讨论
Url: https://soundry.ai/
很抱歉,由于超时错误,我无法使用 webscraper 工具抓取 https://soundry.ai/ 的内容。如果您有其他网站需要抓取或其他问题需要帮助,请告诉我。
Post by: kantthpel
Comments:
chaosprint: Congratulations on your launch!<p>I started music DRL (<a href="https://github.com/chaosprint/RaveForce">https://github.com/chaosprint/RaveForce</a>) a few years ago. At that time, SOTA was still the "traditional" method of GANSynth.<p>Later, I mainly turned to Glicol (<a href="https://glicol.org" rel="nofollow">https://glicol.org</a>) and tried to combine it with RaveForce.<p>There are many kinds of music generation nowadays, such as Suno AI, but I think the biggest pain point is the lack of controllability. I mean, after generation, if you can't fine-tune the parameters, it's going to be really painful. As for pro, most of the generated results are still unusable. This is why I wanted to try DRL in the first place. Also worth checking<p><a href="https://forum.ircam.fr/projects/detail/rave-vst/" rel="nofollow">https://forum.ircam.fr/projects/detail/rave-vst/</a><p>If this is your direction, I'm wondering if you have compared the methods of generating midi? After all, the generated midi and parameters can be adjusted quickly, it is also in the form of a loop, and it can be lossless.<p>In addition, I saw that the demo on your official website was edited at 0:41, so how long does it take to generate the loop? Is this best quality or average quality?<p>Anyway, I hope you succeed.
chaosprint: 祝贺您的发布<p> 几年前,我开始了音乐DRL(<a href=“https://;/;github.com#xx2F;chaosburt#xx2F:RaveForce”>https://;#xx2F!github.com/!chaosburt/:RaveForce</a>)。当时,SOTA仍然是“;传统的“;GANSynth方法<p> 后来,我主要转向了Glicol(<a href=“https://;/;Glicol.org”rel=“nofollow”>https://;#x2F;Glicol.org/</a>),并尝试将其与RaveForce相结合<p> 现在的音乐世代有很多种,比如苏诺AI,但我认为最大的痛点是缺乏可控性。我的意思是,一代人之后,如果你能;t微调参数;It’这将是非常痛苦的。至于pro,大多数生成的结果仍然无法使用。这就是为什么我一开始就想尝试DRL。同样值得一看<p><a href=“https://;/;论坛.icam.fr/:项目/,详细信息/!rave vst/”rel=“nofollow”>https:///;论坛.icam.fr/;项目;细节;锐舞vst</a> <p>如果这是你的方向;我想知道你是否比较过生成midi的方法?毕竟,生成的midi和参数可以快速调整,它也是循环的形式,并且可以是无损的<p> 此外,我看到你官方网站上的演示是在0:41编辑的,那么生成循环需要多长时间?这是最好的质量还是一般的质量<p> 不管怎样,我希望你成功。
nkko: While the generative AI tools for musicians seem promising, I'm skeptical about how much they can enhance creativity and originality in music production. There's a risk of over-reliance on AI, leading to more formulaic and homogeneous music. The human element and the "soul" of music creation shouldn't be lost in pursuing technological convenience.
nkko: 虽然音乐人的生成人工智能工具似乎很有前景;我怀疑他们能在多大程度上提高音乐制作的创造力和独创性。有;这是过度依赖人工智能的风险,导致音乐更加公式化和同质化。人的因素和“;灵魂;音乐创作应该;不要迷失在追求技术便利的道路上。
frankdenbow: Love this. Used jukedeck in the past and did a comp sci for music class at CMU way back in the day. After reading I understand your focus may be on people who would already classify themselves as musicians but I think theres definitely a world where you are making it easier for the amateur who makes music for recreation or are musicians in training (same market as Artiphon who I have worked with). One element of the UX as you describe is that text may be difficult, would you imagine having input being described the way some artists do with humming and audio descriptions? Something along the lines of this: <a href="https://www.youtube.com/watch?v=yhOsxMhe8eo" rel="nofollow">https://www.youtube.com/watch?v=yhOsxMhe8eo</a><p>I put together my initial thoughts here: <a href="https://www.youtube.com/watch?v=nAZAWBw7c7o" rel="nofollow">https://www.youtube.com/watch?v=nAZAWBw7c7o</a>
frankdenbow: 喜欢这个。过去用过点唱机,很久以前在CMU上过音乐课。在阅读后,我理解你的重点可能是那些已经将自己归类为音乐家的人,但我认为,在这个世界上,你肯定会让那些为娱乐创作音乐或是正在接受培训的音乐家的业余爱好者(与我合作过的Artiphon市场相同)变得更容易。正如你所描述的,用户体验的一个元素是文本可能很困难,你能想象输入被描述成一些艺术家对哼唱和音频描述的方式吗?大致如下:<a href=“https://;/;www.youtube.com/!watch?v=yhOsxMhe8eo”rel=“nofollow”>https:///;www.youtube.com/;看v=yhOsxMhe8eo</a><p>我在这里整理了我的初步想法:<a href=“https://x2F;/;www.youtube.com/?watch?v=nAZAWBw7c7o”rel=“nofollow”>https://x2F/;www.youtube.com/;看v=nAZAWBw7c7o</a>
antidnan: Very cool!<p>I'm particularly excited by the idea of gen ai creating entirely new sounds, sort of becoming its own kind of instrument instead of generating or emulating samples previously created / trained on.<p>Somewhat analogous to how the MPC etc. enabled a generation of musicians to chop and pitch and arrange soul samples into new types of hip hop music. Not super familiar with the history but I don't believe they thought it would be used like that.<p>I'd imagine a gen AI musical instrument just needs a lot more "knobs" to tweak and eventually someone will find that a particular "hallucination" sound to be interesting. Exciting times!
antidnan: 非常酷<p> I-;我对gen ai创造全新声音的想法感到特别兴奋,有点像是成为自己的乐器,而不是生成或模仿之前创建的样本;类似于MPC等使一代音乐家能够将灵魂样本剪切、投球和编排成新类型的嘻哈音乐。对历史不是很熟悉,但我不太熟悉;我不相信他们会这样使用它<p> I-;d想象一下,一代AI乐器只需要更多的“;旋钮”;进行调整,最终有人会发现;幻觉”;听起来很有趣。激动人心的时刻!
pea: Awesome work! One thing I'm curious about in this space is why people generally generate the sound form directly. I always imagined you'd get better results teaching the model to output parameters which you could feed into synths (wavetable/fm/granular/VA), samplers, and effects, alongside MIDI.<p>You'd imagine you could estimate most music with this with less compute and higher determinism and introspection. Is it because there isn't enough training data for the above?
pea: 干得真棒!有一件事我;在这个空间里,我很好奇为什么人们通常会直接产生声音形式。我一直想象你;d可以获得更好的结果,教授模型输出参数,这些参数可以输入合成器(波形表/;fm/:颗粒状/!VA)、采样器和效果,以及MIDI<p> 您;我想你可以用更少的计算和更高的决定论和内省来估计大多数音乐。是因为没有;没有足够的训练数据用于上述内容吗?