【Hacker News搬运】显示HN:HTML for People
-
Title: Show HN: HTML for People
显示HN:HTML for People
Text:
Url: https://htmlforpeople.com
很抱歉,但我无法直接访问或处理网页内容,包括使用 JinaReader 或任何其他工具来抓取和分析网页。我的功能限于处理文本和数据,但我可以指导你如何使用 JinaReader 或类似工具来抓取和分析网页内容。 以下是一个使用 JinaReader 进行网页抓取和内容总结的一般步骤: 1. **安装 JinaReader**: 如果你还没有安装 JinaReader,你需要先安装它。通常,你会使用 pip 来安装,如下所示: ```bash pip install jina
-
创建一个 Jina 流:
使用 Jina 创建一个流,用于处理网页内容。from jina import Flow flow = Flow.load_config('flow.yml') flow.add_component( 'transformer', component_type='transformer', name='transformer', inputs=['text'], outputs=['summary'] ) flow.compile()
注意:你需要有一个配置文件
flow.yml
,它定义了流的组件和连接。 -
抓取网页内容:
你可以使用 JinaReader 中的组件来抓取网页内容。这通常涉及使用 HTTP 请求获取网页内容。from jina import Document url = 'https://htmlforpeople.com' document = Document() document.set('text', url) flow.call([document])
-
处理和总结内容:
JinaReader 中的transformer
组件将处理文本,并可能使用 NLP 工具来生成摘要。print(document.summary)
如果你需要将非中文内容翻译成中文,你可以在
transformer
组件中集成一个翻译服务,比如 Google Translate API。以下是一个示例:from jina import Flow from jina.proto import jina_pb2 flow = Flow.load_config('flow.yml') flow.add_component( 'translator', component_type='translator', name='translator', inputs=['text'], outputs=['translated_text'] ) flow.add_component( 'transformer', component_type='transformer', name='transformer', inputs=['text'], outputs=['summary'] ) flow.compile() # 创建文档并设置原始文本 document = Document() document.set('text', 'This is an example sentence in English.') # 翻译文本 translated_text = flow.components['translator'].call([document]) print(translated_text[0].text) # 使用翻译后的文本生成摘要 summary = flow.components['transformer'].call([document.set('text', translated_text[0].text)]) print(summary[0].summary)
请注意,这只是一个示例,你需要根据实际情况调整代码,并且确保你有适当的 API 密钥和权限来使用翻译服务。
## Post by: blakewatson ### Comments: **simonw**: This is great. The decision to skip CSS by depending on <a href="https://simplecss.org/" rel="nofollow">https://simplecss.org/</a> is smart - CSS is a whole other thing, and having that on top of basic HTML would be pretty intimidating.<p>I did worry a bit about <a href="https://htmlforpeople.com/zero-to-internet-your-first-website/" rel="nofollow">https://htmlforpeople.com/zero-to-internet-your-first-websit...</a> - "Step 1. Create a folder on your computer" - because apparently a large number of people these days don't understand files and folders at all! <a href="https://www.theverge.com/22684730/students-file-folder-directory-structure-education-gen-z" rel="nofollow">https://www.theverge.com/22684730/students-file-folder-direc...</a><p>Not sure how best to approach that though. Having a whole chapter of the book explaining files and folders feels pretty redundant. Maybe there's something good you could link to? > **simonw**: 这太棒了。跳过CSS的决定取决于<a href=“https:”simplecss.org:“rel=”nofollow“>https:”/;simplecss.org</a> 这很聪明——CSS是另一回事,把它放在基本的HTML之上会很吓人<p> 我确实有点担心<a href=“https:/;htmlforpeople.com/;零到互联网连接你的第一个网站/”rel=“nofollow”>https:/;htmlforpeoples.com;零到互联网你的第一个网站</a> -";步骤1。在您的计算机上创建一个文件夹“-因为现在显然有很多人不这样做;根本不懂文件和文件夹<a href=“https:#x2F;#x26; www.theverge.com#x26684730#x2F学生-文件-文件夹-目录-结构-教育-根-z”rel=“nofollow”>https:/;www.theverge.com;22684730;学生文件夹目录</a> <p>但我不确定如何最好地处理这个问题。这本书中有整整一章解释文件和文件夹感觉很多余。也许有;你能链接到什么好东西吗? **mightybyte**: I think the fundamental approach being taken by this project is immensely valuable to the world. This kind of education about open standards might actually be the most powerful tool that can help us take steps in the direction away from giant opaque corporations and back towards the systems based on open standards that the internet originated from. I really hope this project continues to be updated and get more and more eyes and contributors. If you feel the same way, I'd say at least throw it a GitHub star. <a href="https://github.com/blakewatson/htmlforpeople">https://github.com/blakewatson/htmlforpeople</a><p>(Note: I have nothing to do with this project thus far and have nothing to gain from saying this.) > **mightybyte**: 我认为这个项目所采取的基本方法对世界来说非常有价值。这种关于开放标准的教育实际上可能是最强大的工具,可以帮助我们朝着远离不透明的大公司、回到基于互联网起源的开放标准的系统的方向迈进。我真的希望这个项目能继续更新,得到越来越多的关注和贡献。如果你有同样的感觉,我;我说至少给它一个GitHub之星<a href=“https:/;M;github.comH;blakewatsonO;htmlforpeople”>https:"/;github.com;布莱克沃森;htmlforpeople</a><p>(注意:到目前为止,我与这个项目无关,说这些也没有任何好处。) **forbiddenvoid**: I love the idea and I'm thrilled to see more sites like this out there. But I do think this assumes a level of computer literacy that isn't consistent with typical, non-technical users.<p>Step 1 starts with:<p>> Pick a location on your computer and create a folder. Call it my-site or something similar.<p>You've already lost the vast majority of people right here. There are a shockingly large number of people out there that use computers EVERY day that won't know how to do this. > **forbiddenvoid**: 我喜欢这个想法,我;我很高兴看到更多这样的网站。但我确实认为这假设了一种计算机素养水平,而不是;与典型的非技术用户不一致<p> 第一步开始于:<p>>;在计算机上选择一个位置并创建一个文件夹。称之为我的网站或类似的东西<p> 您;我已经失去了这里的绝大多数人。每天使用电脑的人多得令人震惊,他们赢了;我不知道怎么做。 **aardvark179**: Why, “Start coding already!” rather than something like, “Start writing already?” I think half the barrier to people building sites is that they think they need to know how to code, and that seems scary, but they do know what they want to write, and that seems more approachable. > **aardvark179**: 为什么,“已经开始编码了!”而不是“已经开始写作了?”我认为人们构建网站的一半障碍是他们认为他们需要知道如何编码,这似乎很可怕,但他们确实知道自己想写什么,这似乎更平易近人。 **Brajeshwar**: A few months back, someone asked for suggestions on which new AI options to learn to beef up his marketing career. I told him to learn HTML first. That will be useful for a long time and will likely last his lifetime. After that, he can start learning others.<p>I even x-ed it at <a href="https://xcancel.com/brajeshwar/status/1812149514632925525" rel="nofollow">https://xcancel.com/brajeshwar/status/1812149514632925525</a> > **Brajeshwar**: 几个月前,有人问他应该学习哪些新的人工智能选项来加强他的营销生涯。我告诉他先学习HTML。这将在很长一段时间内有用,并可能持续他的一生。之后,他就可以开始学习别人了<p> 我甚至在<a href=“https:”xcancel.com“brajeshwar”status“1812149514632925525”rel=“nofollow”>https:”/;xcancel.com;brajeshwar;状态;1812149514632925525</a>
-