【Hacker News搬运】您本可以设计最先进的位置编码

hackernews

Title: You could have designed state of the art positional encoding

您本可以设计最先进的位置编码

Text:

Url: https://fleetwood.dev/posts/you-could-have-designed-SOTA-positional-encoding

很抱歉，我无法直接访问外部网站，包括您提供的链接。因此，我无法使用JinaReader或其他工具来抓取和翻译该网页内容。

不过，我可以帮助您了解如何使用JinaReader这样的工具来抓取和分析内容，以及如何处理非中文内容。

### 使用JinaReader抓取和分析内容

1. **设置环境**：首先，您需要在您的计算机上安装JinaReader。这通常涉及到安装Python环境和必要的库。

2. **抓取网页**：使用JinaReader的API或命令行工具，您可以抓取指定的网页内容。以下是一个基本的Python代码示例，使用`requests`库来获取网页内容（这不需要JinaReader）：

```python
import requests

url = "https://fleetwood.dev/posts/you-could-have-designed-SOTA-positional-encoding"
response = requests.get(url)
html_content = response.text

# 现在您可以使用JinaReader或其他库来处理html_content

分析内容：抓取到HTML内容后，您可以使用JinaReader中的自然语言处理（NLP）工具来分析内容，如提取摘要、关键词等。

处理非中文内容

如果抓取到的内容不是中文，您需要将其翻译成中文。以下是一些步骤：

翻译服务：使用在线翻译服务API，如Google Translate API，来翻译内容。这通常需要注册API密钥。
设置翻译API：在Python中，您可以使用requests库来调用翻译API。

以下是一个使用Google Translate API进行翻译的示例代码：

import requests

def translate_text(text, target_language='zh-CN'):
    url = "https://translation.googleapis.com/language/translate/v2"
    params = {
        'q': text,
        'target': target_language,
        'key': 'YOUR_API_KEY'
    }
    response = requests.get(url, params=params)
    result = response.json()
    return result['data']['translations'][0]['translatedText']

# 使用示例
text_to_translate = "This is an example text."
translated_text = translate_text(text_to_translate)
print(translated_text)

请替换'YOUR_API_KEY'为您的Google Translate API密钥。

请注意，以上代码示例仅用于演示目的，实际使用时需要确保您遵守API的使用条款和条件。

        
## Post by: Philpax
        
### Comments: 
        
**rgovostes**: Thanks to the author for clarifying something that&#x27;s been a mystery to me for a few years. The positional encoding scheme in the &quot;Attention Is All You Need&quot; paper is only given half a page and the construction appears to come out of nowhere.
> **rgovostes**: 感谢作者澄清了一些事情；这几年来一直是个谜。“位置编码”中的位置编码方案；你只需要注意&quot；这篇论文只有半页，结构似乎是凭空而来的。
            
**valine**: One of the things I really love about rope is that it allows for a lot of interesting encoding schemes at inference time without model retraining. I’ve had a lot of fun playing with different relative positions. You can elicit a lot of interesting behaviors from the model when you use different rotations for keys vs queries, they don’t always have to match.<p>For example exact position doesn’t matter too much when tokens are spaced out. Let’s say you use token position 100 for your query, you can shift all the keys around position 100, and the further they are back in the context the more freedom you have to play with the value.
> **valine**: 我非常喜欢rope的一点是，它允许在推理时使用许多有趣的编码方案，而无需重新训练模型。我在不同的相对位置上玩得很开心。当你对键和查询使用不同的旋转时，你可以从模型中引出很多有趣的行为，它们并不总是必须匹配的<p> 例如，当令牌间隔开时，确切的位置并不重要。假设你在查询中使用令牌位置100，你可以将所有键移到位置100附近，它们在上下文中的位置越远，你就越可以自由地使用该值。
            
**throwawaymaths**: Maybe someone could answer this for me:  it seems like encoding the positional embeddings as augmentations to the &quot;natural&quot; activations instead of as their own inputs (concatenated onto the activations) make things like sliding a window much harder... I guess obviously the drawback is you have a somewhat less textually derived information.<p>I recall a early transformers video where they tried both and it turned out that adding the position onto the existing vectors was no worse so they went with it...  No further discussion about motivations happened in that video.<p>Is it worth revisiting that maybe now that activations have a gobsmackingly large dimension?
> **throwawaymaths**: 也许有人可以为我回答这个问题：这似乎是将位置嵌入编码为对&quot；自然&quot；激活而不是作为自己的输入（连接到激活上）使滑动窗口等事情变得更加困难。。。我想很明显，缺点是你的文本信息较少<p> 我记得早期的变压器视频，他们尝试了这两种方法，结果发现，将位置添加到现有向量上并没有更糟，所以他们还是选择了……视频中没有进一步讨论动机<p> 现在激活的维度大得惊人，值得重新审视吗？