【Hacker News搬运】MusicBrainz：开放式音乐百科全书

hackernews

Title: MusicBrainz: An open music encyclopedia

MusicBrainz：开放式音乐百科全书

Text:

Url: https://musicbrainz.org/

很抱歉，我无法直接访问外部网站如 `musicbrainz.org`。不过，我可以根据你提供的链接描述来解释这个网站以及如何分析其内容。

MusicBrainz 是一个非营利性的音乐数据库项目，旨在收集和提供关于音乐信息的数据。它类似于维基百科，但专注于音乐相关的信息。以下是如何使用 JinaReader 抓取并分析 MusicBrainz 内容的一般步骤：

1. **安装 JinaReader**：
   JinaReader 是一个基于 Python 的库，用于简化从网页中提取文本的过程。首先，你需要确保安装了 JinaReader 和其他必要的依赖项。

   ```bash
   pip install jina reader

抓取网页内容：
使用 JinaReader，你可以编写一个脚本来自动化地抓取 MusicBrainz 网页的内容。

from jina import Document
from jina.client import Client

# 创建一个 Jina 客户端
client = Client()

# 使用 JinaReader 从网页抓取内容
response = client.post(
    inputs=[Document(text="https://musicbrainz.org/artist/12345")],
    doctype="text",
    endpoint="https://127.0.0.1:5001"
)

# 输出抓取的文本
print(response[0].text)

内容分析：
抓取到内容后，你可以使用自然语言处理（NLP）技术来分析文本。以下是一些可能的分析任务：

文本摘要：提取长文本的关键信息或总结。
实体识别：识别文本中的特定实体，如艺术家名称、专辑名称、流派等。
主题建模：识别文本中的主要主题。
情感分析：分析文本的情感倾向。

例如，使用 JinaReader 的客户端功能，你可以对抓取的文本进行情感分析：

from jina import Document
from jina.client import Client

# 创建一个 Jina 客户端
client = Client()

# 使用 JinaReader 从网页抓取内容
response = client.post(
    inputs=[Document(text="https://musicbrainz.org/artist/12345")],
    doctype="text",
    endpoint="https://127.0.0.1:5001"
)

# 分析抓取的文本的情感
sentiment_response = client.post(
    inputs=[Document(text=response[0].text)],
    doctype="text",
    endpoint="https://127.0.0.1:5001",
    operations=[{
        "name": "sentiment_analysis",
        "operation": "perform_sentiment_analysis"
    }]
)

# 输出情感分析结果
print(sentiment_response[0].text)

内容翻译：
如果抓取到的内容不是中文，你需要使用翻译服务将其翻译成中文。这可以通过集成如 Google Translate API 的服务来实现。

请注意，上面的代码示例是假设性的，因为它们需要运行在具有 Jina 服务器的环境中，并且假设你已经设置好了相关的 NLP 模型和翻译服务。在实际应用中，你需要根据 MusicBrainz 网站的实际结构和数据来调整抓取和解析的逻辑。

        
## Post by: mmh0000
        
### Comments: 
        
**stego-tech**: MusicBrainz and its software companion, Picard, are absolute blessings when it comes to micromanaging a music library in this day and age. It can&#x27;t find _everything_ I have due to entire artists appearing and disappearing between the closure of Napster and the creation of YouTube, but it gets me to that 95% CI that puts me at ease and lets me enjoy my collection.  The fact it&#x27;s global instead of regional (like a lot of automated DB lookups that cannot find my JP&#x2F;ZA&#x2F;DE&#x2F;FR&#x2F;etc albums here in AMER) is also a big notch in its belt.<p>Which reminds me, it&#x27;s about time for the yearly re-scan and re-tag.
> **stego-tech**: MusicBrainz及其软件伴侣Picard在当今时代对音乐库进行微观管理时是绝对的祝福。它可以；我找不到所有东西，因为从Napster关闭到YouTube创建期间，整个艺术家都在出现和消失，但这让我达到了95%的CI，让我放松下来，享受我的收藏。事实是；其全球而非区域性（就像许多自动数据库查找在AMER找不到我的JP、ZA、DE、FR等专辑一样）也是一个很大的问题<p> 这提醒了我，它；是时候进行年度重新扫描和重新标记了。
            
**jabo**: If anyone&#x27;s interested, a while ago I downloaded the MusicBrainz database and built a search-as-you-type experience here with about 32M songs:<p><a href="https:&#x2F;&#x2F;songs-search.typesense.org" rel="nofollow">https:&#x2F;&#x2F;songs-search.typesense.org</a><p>The dataset has been very helpful to benchmark Typesense across releases. So I&#x27;m grateful that it exists!
> **jabo**: 如果有人；我很感兴趣，不久前我下载了MusicBrainz数据库，并在这里构建了一个搜索即输入的体验，大约有3200万首歌曲：<p><a href=“https:”歌曲搜索.typesense.org“rel=”nofollow“>https:”&#x2F；songs search.typesense.org</a><p>该数据集对跨版本的typesense基准测试非常有帮助。因此，我；我很感激它的存在！
            
**dannyobrien**: I wrote about the history of MusicBrainz for the EFF in 2021, as part of a series looking at how &quot;public interest internet&quot; (ie commons-based work) survives outside of the constant coverage and mergers of bigger, more commercial projects:<p><a href="https:&#x2F;&#x2F;www.eff.org&#x2F;deeplinks&#x2F;2021&#x2F;06&#x2F;organizing-public-interest-musicbrainz" rel="nofollow">https:&#x2F;&#x2F;www.eff.org&#x2F;deeplinks&#x2F;2021&#x2F;06&#x2F;organizing-public-inte...</a>
> **dannyobrien**: 我在2021年为EFF撰写了MusicBrainz的历史，作为探讨如何&quot；公共利益互联网”；（即基于公共资源的工作）在更大、更商业的项目的不断覆盖和合并之外得以生存：<p><a href=“https:”www.eff.org“deeplinks”2021“06”组织公益音乐brainz“rel=”nofollow“>https:”&#x2F；www.eff.org；deeplinks；2021年；06；组织公众参与</一
            
**AndyKelley**: I recently ported [Chromaprint](<a href="https:&#x2F;&#x2F;github.com&#x2F;acoustid&#x2F;chromaprint&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;acoustid&#x2F;chromaprint&#x2F;</a>) to Zig. If there&#x27;s interest I would be happy to extract it into a separately maintained package. For now it lives [here](<a href="https:&#x2F;&#x2F;codeberg.org&#x2F;andrewrk&#x2F;player&#x2F;src&#x2F;branch&#x2F;main&#x2F;player&#x2F;chromaprint.zig" rel="nofollow">https:&#x2F;&#x2F;codeberg.org&#x2F;andrewrk&#x2F;player&#x2F;src&#x2F;branch&#x2F;main&#x2F;player&#x2F;...</a>). I also did a [semi-related talk about this](<a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=SCLrNqc9jdE" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=SCLrNqc9jdE</a>).<p>For context, Acoustid is a MusicBrainz-adjacent service for figuring out the MusicBrainz ID of a song based on the sonic content alone, even if it has been distorted or compressed. Chromaprint is the logic for computing an Acoustid given a song as input.
> **AndyKelley**: 我最近将[Chromaprint]（<a href=“https:”acoustid“Chromaprint.”>https:”cooustid“Chromaprint.”</a>）移植到Zig。如果存在；如果您有兴趣，我很乐意将其提取到单独维护的包中。现在，它就存在于[这里]（<a href=“https:”codeberg.org“andrewrk”player“src”branch“main”player”chromaprint.zig“rel=”nofollow“>https:”code berg.org“arrewrk”player“src”分支“main”player“…</a>）。我还做了一次[半相关的讨论]（<a href=“https:”www.youtube.com“watch？v=SCLrNqc9jdE”rel=“nofollow”>https:“www.youtube.com:”watch？v=SCLrNQ9jdE</a>）<p> 就上下文而言，Acoustic是一个MusicBrainz相邻服务，用于仅根据声音内容计算歌曲的MusicBrainz ID，即使它已经失真或压缩。Chromatrint是计算给定歌曲作为输入的Acoustic的逻辑。
            
**mmh0000**: Make sure to checkout Picard:<p><a href="https:&#x2F;&#x2F;picard.musicbrainz.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;picard.musicbrainz.org&#x2F;</a><p>Which uses the MusicBrainz DB to auto tag and correct audio file names. Makes it really easy to organize a large collection of (pirated) audio.
> **mmh0000**: 请确保签出Picard:<p><a href=“https:”Picard.musicbrainz.org.“rel=”nofollow“>https:”&#x2F；picard.musiccrainz.org</a> <p>它使用MusicBrainz DB自动标记和更正音频文件名。使组织大量（盗版）音频变得非常容易。