【Hacker News搬运】不要让字典破坏你的代码
-
Title: Don't let dicts spoil your code
不要让字典破坏你的代码
Text:
Url: https://roman.pt/posts/dont-let-dicts-spoil-your-code/
很抱歉,但作为一个AI,我无法直接访问或分析网页内容。我无法使用JinaReader或其他工具来抓取和分析特定URL的内容。 不过,我可以提供一些指导,说明如果你想要使用JinaReader或其他类似的工具来抓取和分析网页内容,以及如何处理非中文内容进行翻译的方法: 1. **使用JinaReader抓取内容**: - 首先,你需要安装JinaReader或类似的内容抓取工具。 - 然后,你可以使用该工具提供的API或命令行工具来抓取指定URL的内容。 - 例如,如果你使用的是命令行工具,你可能需要执行类似以下命令的操作(以JinaReader为例): ```bash jina-reader --url https://roman.pt/posts/dont-let-dicts-spoil-your-code/ ``` - 这条命令会输出抓取到的网页内容。 2. **分析抓取的内容**: - 使用JinaReader或其他文本分析工具来提取关键信息,如标题、摘要、关键词等。 - 你可以使用正则表达式、NLP库(如spaCy或NLTK)来处理文本,提取所需的信息。 3. **翻译非中文内容**: - 如果你抓取到的内容是英文或其他非中文语言,你可以使用在线翻译服务或翻译API来将其翻译成中文。 - 例如,你可以使用Google Translate API、Microsoft Translator Text API或其他类似的翻译服务。 - 这通常涉及到发送请求到翻译服务的API,并接收翻译后的文本。 以下是一个使用Google Translate API进行文本翻译的示例代码(Python): ```python from google.cloud import translate_v2 as translate def translate_text(text, target='zh-CN'): translate_client = translate.Client() result = translate_client.translate(text, target_language=target) return result['translatedText'] # 假设你已经从网页上抓取到了文本 text_to_translate = "This is the text you want to translate." translated_text = translate_text(text_to_translate) print(translated_text)
请注意,上述代码需要安装
google-cloud-translate
包,并且你需要设置Google Cloud项目以使用API。综上所述,要完成你的要求,你需要先使用JinaReader抓取内容,然后分析内容,并对于非中文部分使用翻译API进行翻译。
## Post by: juniperplant ### Comments: **cardanome**: This is absolute key advice.<p>Another way to look at it is the functional core, imperative shell pattern.<p>Wrapping up your dict in a value object (dataclass or whatever that is in you language) early on means you handle the ugly stuff first. Parse don't validate. Resist the temptation of optional fields. Is there really anything you can do if the field is null? No, then don't make it optional. Let it crash early on. Clearly define you data.<p>If you have put your data in a neat value objects you know what is in it. You know the types. You know all required fields are there. You will be so much happier. No checking for null throughout the code, no checking for empty strings. You can just focus on the business logic.<p>Seriously so much suffering can be avoided by just following this pattern. > **cardanome**: 这绝对是关键建议<p> 另一种看待它的方式是功能核心,命令式shell模式<p> 在早期将你的字典包装在一个值对象(数据类或你语言中的任何东西)中意味着你首先处理丑陋的东西。解析don™;t验证。抵制可选字段的诱惑。如果字段为空,你真的能做什么吗?不,那么不要;不要让它成为可选的。让它早点崩溃。清楚地定义你的数据<p> 如果你把数据放在一个整洁的值对象中,你就知道里面有什么。你知道类型。你知道所有必填字段都在那里。你会更快乐的。在整个代码中不检查null,不检查空字符串。你可以专注于业务逻辑<p> 说真的,只要遵循这种模式,就可以避免如此多的痛苦。 **jimmytucson**: Here’s an out-there take, but one I’ve held loosely for a long time and haven’t shed yet: dicts are not appropriate for what people mostly use them for, which is named access to member attributes.<p>dict is an implementation of a hash table. Hash table are designed for o(1) lookup of items. As such, they are arrays which are much bigger than the number of items they store, to allow hashing items into integers and sidestep collisions. They’re meant to act like an index that contains many records, not a single record.<p>A single record is more like a tuple, except you want named access instead of, title = movie[0], release_year = movie[1], etc. And Python had that, in NamedTuple, but it was kinda magical and no one used it (shoutout Raymond Hettinger).<p>Granted, this rant is pretty much the meme with the guy explaining something to a brick wall, in that dicts are so firmly entrenched as the "record" type of choice in Python (but not so in other languages: struct, case class, etc. and JSON doesn’t just deserialize to a weak type but I digress). > **jimmytucson**: 这里有一个观点,但我已经松散地坚持了很长时间,还没有放弃:字典不适合人们主要使用它们的目的,即访问成员属性<p> dict是哈希表的一种实现。哈希表是为o(1)查找项目而设计的。因此,它们是比它们存储的项目数量大得多的数组,可以将项目散列为整数并避免冲突。它们的作用就像一个包含许多记录的索引,而不是一条记录<p> 一条记录更多就像元组一样,除了你想要命名访问而不是title=move[0]、release_year=move[1]等。Python在NamedTuple中有这个功能,但它有点神奇,没有人使用它(Raymond Hettinger说)<p> 诚然,这种咆哮几乎是一个家伙向砖墙解释某事的模因,因为格言是如此根深蒂固,如";记录”;Python中的选择类型(但在其他语言中不是这样:struct、case类等,JSON不仅反序列化为弱类型,而且我离题了)。 **bigstrat2003**: For better or for worse, Python doesn't do typing well. I don't disagree that I prefer well defined types, but if that is your desire then I think Python is perhaps not the correct choice of language. > **bigstrat2003**: 无论好坏,Python都不会;我打字打得不好。我不知道;我不同意我更喜欢定义良好的类型,但如果这是你的愿望,那么我认为Python可能不是正确的语言选择。 **fhdsgbbcaA**: Seems like the issue is less using dicts than not treating external APIs as input that needs to be sanitized. > **fhdsgbbcaA**: 似乎问题不在于使用字典,而在于不将外部API视为需要净化的输入。 **cschneid**: I generally support this. When dealing with API endpoints especially I like to wrap them in a class that ends up being. I also like having nested data structures as their own class sometimes too. Depends on complexity & need of course.<p><pre><code> class GetThingResult def initialize(json) @json = json end # single thing def thing_id @json.dig('wrapper', 'metadata', 'id') end # multiple things def history @json['history'].map { |h| ThingHistory.new(h) } end ... two dozen more things end</code></pre> > **cschneid**: 我总体上支持这一点。当处理API端点时,我尤其喜欢将它们包装在一个最终为的类中。有时我也喜欢将嵌套数据结构作为自己的类。取决于复杂性和;当然需要<p> <pre><code>类GetThingResultdef初始化(json)@json=json结束#一件事def thing_id@json.dig(包装器、元数据、id)结束#多种事物def历史@json[#x27;history#x27].map{|h|ThingHistory.new(h)}结束…还有二十多件事end</code></pre>