【Hacker News搬运】推出HN：帆船赛存储（YC F24）-将S3转变为类似本地的POSIX云FS

hackernews

Title: Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS

推出HN：帆船赛存储（YC F24）-将S3转变为类似本地的POSIX云FS

Text: Hey HN, I’m Hunter the founder of Regatta Storage (<a href="https://regattastorage.com">https://regattastorage.com</a>). Regatta Storage is a new cloud file system that provides unlimited pay-as-you-go capacity, local-like performance, and automatic synchronization to S3-compatible storage. For example, you can use Regatta to instantly access massive data sets in S3 with Spark, Pytorch, or pandas without paying for large, local disks or waiting for the data to download.Check out an overview of how the service works here: <a href="https://www.youtube.com/watch?v=xh1q5p7E4JY" rel="nofollow">https://www.youtube.com/watch?v=xh1q5p7E4JY</a>, and you can try it for free at <a href="https://regattastorage.com">https://regattastorage.com</a> after signing up for an account. We wanted to let you try it without an account, but we figured that “Hacker News shares a file system and S3 bucket” wouldn’t be the best experience for the community.I built Regatta after spending nearly a decade building and operating at-scale cloud storage at places like Amazon’s Elastic File System (EFS) and Netflix. During my 8 years at EFS, I learned a lot about how teams thought about their storage usage. Users frequently told me that they loved how simple and scalable EFS was, and -- like S3 -- they didn’t have to guess how much capacity they needed up front.When I got to Netflix, I was surprised that there wasn’t more usage of EFS. If you looked around, it seemed like a natural fit. Every application needed a POSIX file system. Lots of applications had unclear or spikey storage needs. Often, developers wanted their storage to last beyond the lifetime of an individual instance or container. In fact, if you looked across all Netflix applications, some ridiculous amount of money was being spent on empty storage space because each of these local drives had to be overprovisioned for potential usage.However, in many cases, EFS wasn’t the perfect choice for these workloads. Moving workloads from local disks to NFS often encountered performance issues. Further, applications which treated their local disks as ephemeral would have to manually “clean up” left over data in a persistent storage system.At this point, I realized that there was a missing solution in the cloud storage market which wasn’t being filled by either block or file storage, and I decided to build Regatta.Regatta is a pay-as-you-go cloud file system that automatically expands with your application. Because it automatically synchronizes with S3 using native file formats, you can connect it to existing data sets and use recently written file data directly from S3. When data isn’t actively being used, it’s removed from the Regatta cache, so you only pay for the backing S3 storage. Finally, we’re developing a custom file protocol which allows us to achieve local-like performance for small-file workloads and Lustre-like scale-out performance for distributed data jobs.Under the hood, customers mount a Regatta file system by connecting to our fleet of caching instances over NFSv3 (soon, our custom protocol). Our instances then connect to the customer’s S3 bucket on the backend, and provide sub-millisecond cached-read and write performance. This durable cache allows us to provide a strongly consistent, efficient view of the file system to all connected file clients. We can perform challenging operations (like directory renaming) quickly and durably, while they asynchronously propagate to the S3 bucket.We’re excited to see users share our vision for Regatta. We have teams who are using us to build totally serverless Jupyter notebook servers for their AI researchers who prefer to upload and share data using the S3 web UI. We have teams who are using us as a distributed caching layer on top of S3 for low-latency access to common files. We have teams who are replacing their thin-provisioned Ceph boot volumes with Regatta for significant savings. We can’t wait to see what other things people will build and we hope you’ll give us a try at regattastorage.com.We’d love to get any early feedback from the community, ideas for future direction, or experiences in this space. I’ll be in the comments for the next few hours to respond!

嗨，HN，我是赛舟存储公司的创始人Hunter（<a href=“https:&#x2F；regattastorage.com”>https:&quot；regattastore.com</a>）。Regatta Storage是一种新的云文件系统，提供无限的现收现付容量、类似本地的性能，并自动同步到S3兼容存储。例如，您可以使用Regatta通过Spark、Pytorch或pandas立即访问S3中的大量数据集，而无需支付大型本地磁盘或等待数据下载<p> 请在此处查看该服务的工作原理概述：<a href=“https:&#x2F；&#x2F; www.youtube.com&#x2F-watch？v=xh1q5p7E4JY”rel=“nofollow”>https:&quot&#x2F；www.youtube.com；看？v=xh1q5p7E4JY</a>，您可以在<a href=“https:&#x2F；regattastorage.com”>https:&quot；免费试用&#x2F；注册帐户后返回regattastorage.com</a>。我们想让你在没有帐户的情况下尝试它，但我们认为“Hacker News共享文件系统和S3存储桶”对社区来说不是最好的体验<p> 在亚马逊的弹性文件系统（EFS）和Netflix等地花费近十年时间构建和运营大规模云存储后，我创建了Regatta。在EFS工作的8年里，我学到了很多关于团队如何看待他们的存储使用情况的知识。用户经常告诉我，他们喜欢EFS的简单性和可扩展性，而且与S3一样，他们不必预先猜测需要多少容量<p> 当我来到Netflix时，我很惊讶EFS的使用率没有提高。如果你环顾四周，它似乎很自然。每个应用程序都需要一个POSIX文件系统。许多应用程序都有不明确或尖锐的存储需求。通常，开发人员希望他们的存储能够持续到单个实例或容器的生命周期之后。事实上，如果你查看所有的Netflix应用程序，就会发现有大量资金被花在了<i>空存储空间</i>上，因为每个本地驱动器都必须为潜在用途进行过度配置<p> 然而，在许多情况下，EFS并不是这些工作负载的完美选择。将工作负载从本地磁盘移动到NFS通常会遇到性能问题。此外，将本地磁盘视为临时磁盘的应用程序必须手动“清理”持久存储系统中的剩余数据<p> 此时，我意识到云存储市场缺少一个解决方案，块存储或文件存储都无法填补这个问题，于是我决定构建Regatta<p> Regatta是一个随应用程序自动扩展的按需付费云文件系统。因为它使用本机文件格式自动与S3同步，所以您可以将其连接到现有数据集，并直接从S3使用最近写入的文件数据。当数据未被积极使用时，它会从Regatta缓存中删除，因此您只需支付备份S3存储的费用。最后，我们正在开发一种自定义文件协议，该协议允许我们为小文件工作负载<i>实现类似本地的性能，并为分布式数据作业实现类似Lustre的横向扩展性能<p> 在幕后，客户通过NFSv3（即将成为我们的自定义协议）连接到我们的缓存实例舰队，从而安装Regatta文件系统。然后，我们的实例连接到后端客户的S3存储桶，并提供亚毫秒级的缓存读写性能。这种持久缓存使我们能够为所有连接的文件客户端提供高度一致、高效的文件系统视图。我们可以快速持久地执行具有挑战性的操作（如目录重命名），同时它们异步传播到S3存储桶<p> 我们很高兴看到用户分享我们对帆船赛的愿景。我们有一些团队正在使用我们为他们的AI研究人员构建完全无服务器的Jupyter笔记本服务器，他们更喜欢使用S3 web UI上传和共享数据。我们的团队将我们用作S3之上的分布式缓存层，以实现对常见文件的低延迟访问。我们有一些团队正在用Regatta替换他们的精简配置Ceph引导卷，以节省大量成本。我们迫不及待地想看看人们会建造什么其他东西，我们希望您能在regattastorage.com上给我们一个尝试。<p>我们很乐意从社区获得任何早期反馈、未来方向的想法或在这个领域的经验。我会在接下来的几个小时里在评论中回复！

hn link

Url:

Post by: huntaub

Comments:

garganzol: I used the same approach based on Rclone for a long time. I wondered what makes Regatta Storage different than Rclone. Here is the answer: "When performing mutating operations on the file system (including writes, renames, and directory changes), Regatta first stages this data on its high-speed caching layer to provide strong consistency to other file clients." [0].Rclone, on the contrary, has no layer that would guarantee consistency among parallel clients.[0] <a href="https://docs.regattastorage.com/details/architecture#overview">https://docs.regattastorage.com/details/architecture#overvie...</a>

garganzol: 我使用了基于Rclone的相同方法很长一段时间。我想知道是什么让Regatta Storage与Rclone不同。答案如下：&quot；在文件系统上执行变异操作（包括写入、重命名和目录更改）时，Regatta首先将此数据存储在其高速缓存层上，以向其他文件客户端提供强一致性&“；[0]. 相反，Rclone没有保证并行客户端之间一致性的层 [0]<a href=“https:&#x2F；docs.regattastorage.com/详细信息&#x2F-架构概述”>https:&quot&#x2F；docs.regattastorage.com；详情；架构#overvie</一

memset: This is honestly the coolest thing I've seen coming out of YC in years. I have a bunch of questions which are basically related to "how does it work" and please pardon me if my questions are silly or naive!1. If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?2. Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?3. I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?4. Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?5. I have to ask - how do you think about open source here?6. Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)I haven't played with the so maybe doing so would help answer questions. But I'm really excited about this! I have tried using EFS for small projects in the past but - and maybe I was holding it wrong - I could not for the life of me figure out what I needed to get faster bandwidth, probably because I didn't know how to turn the knobs correctly.

memset: 老实说，这是我最酷的事情；我已经看过YC几年了。我有很多问题，基本上与&quot；它是如何运作的&quot；如果我的问题很愚蠢或幼稚，请原谅我 1。如果我有一个10 GB的本地磁盘，当我试图处理50 GB范围内的数据时会发生什么（比如，更多可以在本地缓存的数据？）我会在10 GB标记处立即看到降级或抖动吗 2。这只适用于AWS实例吗？例如，我可以在不同的云上运行它，但在实践中，由于在AWS中运行所有内容，我们只能真正获得快速的速度 3。我；我在不同类型的docker环境中总是遇到FUSE的问题。看起来你；重复使用FUSE和NFS挂载。这一切是如何运作的 4。我真的可以用赛舟会卷作为后备存储来运行Clickhouse或Postgres吗 5。我不得不问，你怎么看待这里的开源 6。我可以安装在多台服务器上吗？那里有什么限制？（即lambda函数。）我没有；我并没有玩，所以也许这样做有助于回答问题。但我；我对此真的很兴奋！我过去曾尝试将EFS用于小型项目，但也许我弄错了，我始终无法弄清楚我需要什么来获得更快的带宽，可能是因为我没有；我不知道如何正确地转动旋钮。

zX41ZdbW: That is interesting, but I haven't read how it is implemented yet.The hard part is a cache layer with immediate consistency. It likely requires RAFT (or, otherwise, works incorrectly). Integration of this cache layer with S3 (offloading cold data to S3) is easy (not interesting).It should not be compared to s3fs, mountpoint, geesefs, etc., because they lack consistency and also slow and also don't support full filesystem semantics, and break often.It could be compared with AWS EFS. Which is also slow (but I didn't try to tune it up to maximum numbers).For ClickHouse, this system is unneeded because ClickHouse is already distributed (it supports full replication or shared storage + cache), and it does not require full filesystem semantics (it pairs with blob storages nicely).

zX41ZdbW: 这很有趣，但我还没有；我还没读过它是如何实现的 最困难的部分是具有即时一致性的缓存层。它可能需要RAFT（否则，工作不正常）。将此缓存层与S3集成（将冷数据卸载到S3）很容易（不有趣） 它不应该与s3fs、mountpoint、geesefs等进行比较，因为它们缺乏一致性，而且速度慢，也不稳定；t支持完整的文件系统语义，并且经常中断 它可以与AWS EFS进行比较。这也很慢（但我没有尝试将其调整到最大数字） 对于ClickHouse来说，不需要这个系统，因为ClickHouse已经是分布式的（它支持完全复制或共享存储+缓存），并且不需要完整的文件系统语义（它与blob存储很好地配对）。

mritchie712: Pretty sure we're in your target market. We [0] currently use GCP Filestore to host DuckDB. Here's the pricing and performance at 10 TiB. Can you give me an idea on the pricing and performance for Regatta?Service Tier: ZonalLocation: us-central110 TiB instance at $0.35/TiB/hrMonthly cost: $2,560.00Performance Estimate:Read IOPS: 92,000Write IOPS: 26,000Read Throughput: 2,600 MiB/sWrite Throughput: 880 MiB/s0 - <a href="https://www.definite.app/blog/duckdb-datawarehouse" rel="nofollow">https://www.definite.app/blog/duckdb-datawarehouse</a>

mritchie712: 我们很确定；我们在你的目标市场。我们[0]目前使用GCP Filestore来托管DuckDB。这里；这是10 TiB的定价和性能。你能告诉我赛舟会的定价和性能吗 服务级别：区域位置：us-central110 TiB实例，价格为0.35美元；TiB；hr每月成本：2560.00美元性能估计：读取IOPS:92000写入IOPS:26000读吞吐量：2600MiB；s写入吞吐量：880MiB&#x2F；s0-<a href=“https:”www.definite.app“blog”duckdb数据仓库“rel=”nofollow“>https:”&#x2F；www.definite.app；博客&#x2F；duckdb数据仓库</a>

jitl: I’m very interested in this as a backing disk for SQLite/DuckDB/parquet, but I really want my cached reads to come straight from instance-local NVMe storage, and to have a way to “pin” and “unpin” some subdirectories from local cache.Why local storage? We’re going to have multiple processes reading & writing to the files and need locking & shared memory semantics you can’t get w/ NFS. I could implement pin/unpin myself in user space by copying stuff between /mnt/magic-nfs and /mnt/instance-nvme but at that point I’d just use S3 myself.Any thoughts about providing a custom file system or how to assemble this out of parts on top of the NFS mount?

jitl: 我非常有兴趣将其作为SQLite的备份磁盘&#x2F；DuckDB&#x2F；拼花，但我真的希望我的缓存读取直接来自实例本地NVMe存储，并有一种方法从本地缓存中“固定”和“取消固定”一些子目录 为什么选择本地存储？我们将有多个进程来读取和；写入文件并需要锁定；共享内存语义，你无法理解；NFS。我可以实现pin；通过在▲和▲之间复制内容，将自己从用户空间中解放出来；mnt&#x2F；magic nfs和#x2F；mnt&#x2F；实例nvme，但那时我自己只会使用S3 您对提供自定义文件系统或如何在NFS挂载上组装它有什么想法吗？