【Hacker News搬运】理解和管理机器学习模型对网络的影响

hackernews

Title: Understanding and managing the impact of machine learning models on the web

理解和管理机器学习模型对网络的影响

Text:

Url: https://www.w3.org/reports/ai-web-impact/

该文档“AI & the Web: Understanding and managing the impact of Machine Learning models on the Web”由Dominique Hazael-Massieux撰写，分析了人工智能系统，特别是基于机器学习（ML）模型的对网络的系统影响以及网络标准化在管理这种影响中的作用。该文件旨在结构化关于可能需要在标准化层面上做出的改变，以使人工智能和ML模型的系统影响减少有害或更易于管理。它涵盖了AI系统对网络的伦理、社会和技术影响，并突出了标准化、指南和互操作性可以帮助管理这些变化的领域。该文件旨在捕捉W3C团队当前的共识，不代表W3C会员的任何共识。它寻求社区对可能有助于在这些主题上取得进展的提案以及其他该文件可能未能识别的主题的反馈。

提供的文本包含与人工智能、网络技术和平等访问相关的各种文档和资源的列表。以下是关键点的概述：

1. **人工智能概念和术语**：这是ISO/IEC于2022年7月发布的标准，提供了AI相关术语的定义和解释。

2. **模型卡用于模型报告**：来自不同作者的论文，于2019年1月发表，介绍了记录机器学习模型、它们的表现和创造者的标准格式。

3. **机器人排除协议（RFC9309）**：IETF的拟议标准，概述了网络机器人访问网站的方法，并描述了如何限制对网站某些部分的访问。

4. **Schema.org**：由W3C Schema.org社区小组维护的词汇，用于网站提供有关其内容的元数据，截至提供的URL，版本为6.0。

5. **图像中的加速形状检测（SHAPE-DETECTION-API）**：WICG的草案规范，旨在提高网络上的图像中检测形状的效率。

6. **Web语音API（SPEECH-API）**：WICG的规范，为网络开发者提供了将语音识别和语音合成功能纳入其应用程序的能力。

7. **文本和数据挖掘预留协议（TDMRep）**：W3C社区组的最终报告，讨论了预留文本和数据挖掘权的协议。

8. **trust.txt**：由JournalList.net制定的规范，定义了网站声明其关于网络爬虫和归档服务的政策的method。

9. **联合国教科文组织关于人工智能伦理的建议**：一份来自联合国教科文组织的文件，提供了在人工智能开发和使用中考虑伦理问题的建议。

10. **可验证凭据数据模型（VC-DATA-MODEL）**：一个W3C推荐，概述了一个可验证凭据的数据模型，可验证凭据是可以通过加密验证的数字凭证。

11. **W3C关于网络和机器学习的研讨会报告**：一份来自W3C的报告，讨论了机器学习在网络上的当前状态和未来方向。

12. **W3C愿景**：一份W3C工作小组笔记，概述了该组织在未来网络角色上的愿景。

13. **2023年人工智能（AI）和平等访问研究研讨会（WAI-AI）**：一个由W3C网络可访问性计划主办的event，重点是AI和平等访问领域的交集。

14. **WebAssembly核心规范（WASM-CORE-2）**：一个W3C工作草案，定义了WebAssembly的核心规范，WebAssembly是一种基于堆栈的二进制指令格式虚拟机。

15. **WebGPU**：一个W3C工作草案，定义了在网络浏览器中编程GPU功能的网络级接口。

16. **网络机器学习伦理原则（webmachinelearning-ethics）**：一个W3C工作小组笔记，讨论了网络上的机器学习伦理考虑。

17. **Web神经网络API（WEBNN）**：一个W3C候选推荐，为网络开发者提供了一个在网络应用程序中运行神经网络的API。

这些资源共同涵盖了与AI、网络技术和平等访问相关的广泛主题，为这些领域的开发者和研究人员提供了指导、规范和研究结果。

Post by: kaycebasques

Comments:

pmayrgundter: I agree with the general idea of tagging content to help classify.I'd given this some thought via MIME and ended up with a kind of BioNFT.. so named bc it uses NFTs piecewise, but tracking the creation events and agents types (bio, ai, etc) as part of the content lifecycle<a href="https://twitter.com/PMayrgundter/status/1638016474483683328" rel="nofollow">https://twitter.com/PMayrgundter/status/1638016474483683328</a>Highlight..What if:<pre><code> - devices sign source creations with a biosignature

editing tools sign input
media types include that, effectively saying:

ai_edited(human_created(photo))
</code></pre>
and do this under the experimental namespace in MIME:<pre><code> image/x.bio(pablo@example.com/photo123).html

image/x.adobe.photoai(http://x.bio(pablo@example.com/photo123)).html</code></pre>

pmayrgundter: 我同意标记内容以帮助分类的总体想法 I-；d通过MIME对此进行了一些思考，最终得到了一种BioNFT。。因此命名为bc，它分段使用NFT，但跟踪创建事件和代理类型（bio、ai等）作为内容生命周期的一部分<a href=“https://；&#x2F；twitter.com&#x2F，PMayrgundter&#x2F）status&#x2F：1638016474483683328”rel=“nofollow”>https://&#x2F；twitter；PMayrgundter&#x2F；status；1638016474483683328</a>突出显示 如果：＜p＞＜pre＞＜code＞-设备使用生物签名签署源创建-编辑工具符号输入-媒体类型包括，有效地说：ai_edit（人工创建（照片））</code></pre>并在MIME中的实验名称空间下执行此操作：＜p＞＜pre＞＜code＞image；x.bio(pablo@example.com&#x2F；照片123）.html图像；x.adobe.proai（http:&#x2F；&#x2F：x.bio(pablo@example.com&#x2F；photo123））.html</code></pre>

MacsHeadroom: > the copyright system creates a (relatively) shared understanding between creators and consumers that, by default, content cannot be redistributed, remixed, adapted or built upon without creators' consent. This shared understanding made it possible for a lot of content to be openly distributed on the Web.That is not remotely a shared understanding, is wrong, and has nothing to do with making it possible for a lot of content to be openly distributed on the web. Content is distributed quite widely without concern for copyright.> A number of AI systems combine (1) automated large-scale consumption of Web content, and (2) production at scale of content, in ways that do not recognize or otherwise compensate content it was trained from.> While some of these tensions are not new (as discussed below), systems based on Machine Learning are poised to upend the existing balance. Unless a new sustainable equilibrium is found, this exposes the Web to the following undesirable outcomes:> Significantly less open distributed content (which would likely have a disproportionate impact on the less wealthy part of the population)That's even more ridiculous. The wealthy stand the most to gain from restricting the flow of information to channels which collect rent on behalf of their capital. It's the "less wealthy" who routinely find ways to distribute content outside of rent-seeking channels. It's the "less wealthy" who benefit the most from the commoditization of creative content via generative algorithms.Quite frankly, I expected better from W3C.

MacsHeadroom: &gt；版权系统在创作者和消费者之间创建了一种（相对）共享的理解，即默认情况下，内容不能在没有创作者的情况下重新分发、重新混合、改编或构建；同意这种共同的理解使得许多内容可以在网络上公开分发 这根本不是一种共同的理解，是错误的，与许多内容在网络上公开分发无关。内容的分发范围相当广泛，不考虑版权问题 &gt；许多人工智能系统将（1）网络内容的自动大规模消费和（2）内容的大规模生产结合在一起，它们无法识别或以其他方式补偿训练的内容 &gt；虽然其中一些紧张关系并不新鲜（如下所述），但基于机器学习的系统正准备颠覆现有的平衡。除非找到新的可持续平衡，否则网络将面临以下不良后果：&gt；明显不那么开放的分布式内容（这可能会对人口中不太富裕的部分产生不成比例的影响）；这更荒谬。富人从限制信息流向代表其资本收取租金的渠道中获益最多。它；s是“；不那么富有”；他们经常想办法在寻租渠道之外分发内容。它；s是“；不那么富有”；他们通过生成算法从创意内容的商品化中受益最大 坦率地说，我期望W3C做得更好。

kaycebasques: This would have been a better link: <a href="https://www.w3.org/reports/ai-web-impact/" rel="nofollow">https://www.w3.org/reports/ai-web-impact/</a>

kaycebasques: 这将是一个更好的链接：<a href=“https://；&#x2F；www.w3.org/&#x2F，reports&x2F；ai web impact&#x2F”rel=“nofollow”>https://&#x2F；www.w3.org/x2F；报告；ai网络影响</一