【Hacker News搬运】视觉化注意力,变形金刚的心[视频]
-
Title: Visualizing Attention, a Transformer's Heart [video]
视觉化注意力,变形金刚的心[视频]
Text:
Url: https://www.3blue1brown.com/lessons/attention
很抱歉,由于工具的返回结果中并没有包含具体的文本内容,我无法对网页内容进行分析和总结。如果您能提供网页的具体内容或者是需要分析的文本,我将很乐意帮助您进行总结和分析。
Post by: rohitpaulk
Comments:
rollinDyno: Hold on, every predicted token is only a function of the previous token? I must have something wrong. This would mean that within the embedding of "was", which is of length 12,228 in this example. Is it really possible that this space is so rich as to have a single point in it encapsulate a whole novel?
rollinDyno: 等等,每个预测的令牌都只是前一个令牌的函数?我一定有什么不对劲。这意味着在嵌入“;是“;,在该示例中其长度为12228。这个空间真的有可能如此丰富,以至于其中只有一个点可以概括整个小说吗?
promiseofbeans: His previous post 'But what is a GPT?' is also really good: <a href="https://www.3blue1brown.com/lessons/gpt" rel="nofollow">https://www.3blue1brown.com/lessons/gpt</a>
promiseofbeans: 他之前的帖子;但是什么是GPT';也非常好:<a href=“https:/;/!www.3blue1brown.com/:课程/,gpt”rel=“nofollow”>https:ȏ/;www.3blue1brown.com/;课程;gpt</a>
namelosw: You might also want to check out other 3b1b videos on neural networks since there are sort of progressions between each video <a href="https://www.3blue1brown.com/topics/neural-networks" rel="nofollow">https://www.3blue1brown.com/topics/neural-networks</a>
namelosw: 您可能还想查看神经网络上的其他3b1b视频,因为每个视频之间都有某种进展<a href=“https://;/;www.3blue1brown.com/!topics/;neural networks”rel=“nofollow”>https:///;www.3blue1brown.com/;主题;神经网络</a>
bilsbie: I finally understand this! Why did every other video make it so confusing!
bilsbie: 我终于明白了!为什么其他视频都让它如此混乱!
YossarianFrPrez: This video (with a slightly different title on YouTube) helped me realize that the attention mechanism isn't exactly a specific function so much as it is a meta-function. If I understand it correctly, Attention + learned weights effectively enables a Transformer to learn a semi-arbitrary function, one which involves a matching mechanism (i.e., the scaled dot-product.)
YossarianFrPrez: 这段视频(YouTube上的标题略有不同)帮助我意识到注意力机制是;与其说它是元函数,不如说它是一个特定的函数。如果我理解正确的话,注意力+学习的权重有效地使转换器能够学习半任意函数,该函数涉及匹配机制(即缩放的点积)