【Hacker News搬运】普通话12个月
-
Title: 12 Months of Mandarin
普通话12个月
Text:
Url: https://isaak.net/mandarin/
很抱歉,作为一个AI,我无法直接访问外部链接或使用特定的工具如JinaReader来抓取和分析网页内容。但是,我可以指导你如何使用JinaReader来抓取和分析内容,并告诉你如果遇到非中文内容应该如何处理。 ### 使用JinaReader抓取分析内容 1. **安装JinaReader**: 首先,你需要确保你已经安装了JinaReader。这通常涉及到安装Python和JinaReader库。 ```bash pip install jina
-
编写抓取脚本:
使用JinaReader,你可以编写一个Python脚本来抓取网页内容。from jina import Document, Client # 创建一个客户端实例 client = Client(host='localhost', port='5000') # 发送文档到JinaReader client.post('/index', inputs=Document(text='https://isaak.net/mandarin/')) # 获取结果 result = client.get('/index') print(result)
-
分析结果:
JinaReader会处理抓取到的内容,并返回分析结果。你可以根据返回的结果进行进一步的处理或总结。
翻译非中文内容
如果抓取到的内容不是中文,你需要使用翻译服务将其翻译成中文。以下是一些常见的步骤:
-
集成翻译API:
使用如Google Translate API或其他翻译服务。以下是一个使用Google Translate API的示例代码片段:from google.cloud import translate_v2 as translate client = translate.Client() # 翻译文本 result = client.translate('Hello, world!', target_language='zh') print('Translation: {}'.format(result['translatedText']))
-
在JinaReader流程中添加翻译步骤:
你可以在你的JinaReader流程中添加一个步骤来处理非中文内容,并在抓取内容后调用翻译API。
请注意,这些代码示例需要适当的错误处理和配置才能在实际环境中工作。如果你需要具体处理“https://isaak.net/mandarin/”这个链接的详细步骤,请提供更多的信息或具体的抓取和分析需求。
## Post by: misiti3780 ### Comments: **msvan**: I kind of see myself from ten years ago in this blog post! I also obsessively studied Mandarin Chinese in my late teens for the sheer fun of it, before doing a math undergrad. I even wrote comments on Hacker News about it a decade ago: <a href="https://news.ycombinator.com/item?id=7622940">https://news.ycombinator.com/item?id=7622940</a>.<p>At the time I had seemingly limitless motivation for grinding away on flashcards and other learning materials. My progress was strong and I passed the HSK6 after a year and a half or so of studying, which at the time was the highest level of certification offered. I think they changed the system since and added more levels beyond 6. You can do amazing things if you're dedicated!<p>Today my Chinese is absolutely unusable, and my views on China have soured to the extent that I don't really want to revive my old skills. My takeaway is that learning one of these languages, the CJK languages, Arabic, or similarly weird languages, is just too much effort and I don't think it's worth it. I clearly had a lot of excess energy at the time that I could've directed towards something better. Knowing Chinese is about as useful as juggling and you might as well get really good at juggling if you're bored. It'll save you a few thousand hours. > **msvan**: 我在这篇博客文章中看到了十年前的自己!在读数学本科之前,我还在十几岁的时候痴迷于学习普通话,纯粹是为了好玩。十年前,我甚至在Hacker News上写过关于它的评论:<a href=“https:/;News.ycombinator.comM;item?id=7622940”>https:/;news.ecombinator.com;项目?id=7622940</a><p> 当时,我似乎有无限的动力去学习抽认卡和其他学习材料。我的进步很快,经过一年半左右的学习,我通过了HSK6,这是当时提供的最高级别的认证。我认为他们从那以后改变了系统,增加了6级以上的更多级别。如果你;重新奉献<p> 今天,我的中文完全无法使用,我对中国的看法已经恶化到我无法接受的程度;我真的不想恢复我的旧技能。我的收获是,学习这些语言中的一种,CJK语言、阿拉伯语或类似的奇怪语言,太费力了,我不知道;我不这么认为;这是值得的。当时我显然有很多多余的精力;我指向了更好的东西。懂中文和杂耍一样有用,如果你真的擅长杂耍,你可能会变得非常擅长杂耍;无聊。它;我会为你节省几千个小时。 **wantsanagent**: I don't care about learning Mandarin, I want to find out how this guy's motivation system works and then download it into my brain.<p>Doing a PhD and learning Mandarin as a <i>side project</i>?! Doing hours of Anki practice and new note taking, some of it while running on a treadmill? There's just a crazy amount of drive (and what sounds like an epic memory) here.<p>I don't think people consider base motivation enough when thinking about processes and this guy won some kind of biological and/or upbringing lottery. > **wantsanagent**: 我不知道;我不在乎学习普通话,我想知道这家伙是怎么做到的;他的激励系统工作,然后下载到我的大脑中<p> 攻读博士学位并学习普通话作为<i>的附带项目</i>?!做几个小时的Anki练习和新的笔记,其中一些是在跑步机上跑步时做的?那里;这只是疯狂的驱动力(听起来像是史诗般的记忆)<p> 我不知道;我认为人们在考虑过程时没有充分考虑基础动机,而这个人赢得了某种生物和;或教养彩票。 **lxe**: I don't think I can do SRS. My dopamine system is at a point where I can't do anything for a long time that isn't interesting, has immediate or intermediate rewards, or can capture attention for a long time. And on top of that, repeating that habit requires all these criteria.<p>Examples:<p>Scrolling on the phone?: Basically direct dopamine injected into my brain. Can do indefinitely. Not good.<p>Programming? Sure I can put a few hours in, or even days if building quick prototypes where the payoff is imminent.<p>Reading? Can go on indefinitely, depending on the book: it's just continual stream of interesting immersive stuff<p>Exercise? Well that depends on the activity. Running indoors without any stimulation: absolutely cannot do. Cycling or running or walking outside with an audiobook, or music? Absolutely: constant stimulation plus endorphins.<p>Learning Piano? Only if I can bang out a few good tunes immediately in the session, then I can allow myself to struggle with the difficult stuff in between. Absolutely cannot and won't do rote deliberate practice. This hinders my progress significantly, but at least I have fun.<p>Learning a language? Well, unless I can get imminent rewards, or be continually interested and engaged, there's just no way I'll be able to do this. And I feel like rote, deliberate practice is just impossible for me to build a habit out of.<p>One way I know for a fact that I can learn another language is through necessity to communicate with it. Let's say I'm thrown into an environment where the ONLY way I can get anything done is through having to communicate directly, without the aid of translators or tools. I think this is how babies learn. > **lxe**: 我不知道;我想我不能做SRS。我的多巴胺系统正处于可以做SRS的阶段;长时间不做任何事情;不有趣,有直接或中间的回报,或者可以长时间吸引注意力。最重要的是,重复这个习惯需要所有这些标准。<p>示例:<p>在手机上滚动?:基本上直接将多巴胺注入我的大脑。可以无限期地做。不好<p> 编程?当然,如果构建快速原型,我可以投入几个小时,甚至几天的时间,因为回报迫在眉睫<p> 读书?可以无限期地继续下去,具体取决于书:;这只是源源不断的有趣的沉浸式内容<p>锻炼?这取决于活动。在没有任何刺激的情况下在室内跑步:绝对不行。骑自行车、跑步或带着有声读物或音乐在外面散步?当然:持续刺激加内啡肽<p> 学钢琴?只有当我能在会议中立即弹出几首好歌时,我才能让自己在两者之间的困难中挣扎。绝对不能赢;不要死记硬背。这严重阻碍了我的进步,但至少我玩得很开心<p> 学习一门语言?好吧,除非我能得到立竿见影的回报,或者继续保持兴趣和参与度,否则;我不可能;我能做到。我觉得死记硬背、刻意练习对我来说是不可能养成习惯的<p> 我知道我可以学习另一种语言的一种方式是通过与它交流的必要性;我们说我;我被扔进了一个环境,在这个环境中,我做任何事情的唯一方法就是必须直接沟通,而不需要翻译或工具的帮助。我认为这就是婴儿学习的方式。 **edent**: While I haven't the same proficiency, I had the same "local celebrity" experience when visiting Beijing. While it is fun at first seeing people double take and then ask to take a photo with you - it gets old fast!<p>Mind you, I'll never tire of (partially) understanding what people say about me when they think I don't understand.<p>One thing not mentioned is that it is often a good idea to have some <i>formal</i> testing. Friends and tutors may overlook your mistakes. A dispassionate exam board likely won't. > **edent**: 虽然我还没有;虽然我的熟练程度不一样,但我也有同样的";当地名人”;访问北京时的体验。虽然一开始看到人们拍两张照片,然后要求和你合影很有趣,但它很快就会变老<p> 请注意,我;当人们认为我不懂时,我永远不会厌倦(部分)理解他们对我的评价;我不明白<p> 有一件事没有提到,那就是进行一些<i>正式的</i>测试通常是个好主意。朋友和导师可能会忽略你的错误。一个冷静的考试委员会可能会赢;t。 **cmuguythrow**: I’ve been learning Mandarin via Comprehensible Input (CI) for about 9 months and really admire OP’s dedication and consistency. In the first 4-5 months of being truly consistent with ~1hr a day of Anki and Peppa pig I got to around 2,000 words and was able to have a great experience when I traveled to Taiwan, so I can vouch for the core methodology in this post. It’s not “easy”, but it’s definitely the most effective way to learn a foreign language that I know of.<p>The CI community has come a long way in the last ~5 or so years - the general consensus looks a lot like OP’s methods, which I would summarize as:<p>1. Brute force [premade Anki flashcard decks](<a href="https://ankiweb.net/shared/info/810519009" rel="nofollow">https://ankiweb.net/shared/info/810519009</a>) for the first ~1k most common words<p>2. Start watching comprehensible input as soon as you can, ideally for an hour a day or more<p>3. [Sentence mine](<a href="https://www.youtube.com/watch?v=QBcQJESGQvc)the" rel="nofollow">https://www.youtube.com/watch?v=QBcQJESGQvc)the</a> comprehensible input and add it to the daily SRS flashcard grind<p>The best summary of these methods that I’ve found is <a href="https://refold.la/" rel="nofollow">https://refold.la/</a><p>Self plug: I’ve been working on a way to generate Mandarin audio comprehensible input using LLMs/TTS models. The idea is that there aren’t many great CI options between 500 words and ~3k-5k words - OP himself mentions that when he started watching Scissor Seven 刺客伍六七 he barely understood anything, which is pretty hard to “push through” without some hardcore willpower. My project <a href="https://plusonechinese.com" rel="nofollow">https://plusonechinese.com</a> makes Mandarin audio stories that are 85% comprehensible at any level from 400 words all the way to 8k or more words and then auto-imports the audio snippets into SRS flashcards, which makes a CI workflow like this a lot easier to engage with at a lower level and without advanced willpower. Still working on making the content _truly_ interesting, but would love some feedback! > **cmuguythrow**: 我通过可理解输入法(CI)学习普通话大约9个月了,我真的很钦佩OP的奉献精神和一致性。在真正与Anki和Peppa pig每天约1小时保持一致的前4-5个月里,我达到了约2000个单词,当我去台湾旅行时,能够有一个很好的体验,所以我可以保证这篇文章的核心方法论。这并不“容易”,但这绝对是我所知道的学习外语最有效的方法<p> 在过去的大约5年里,CI社区已经取得了长足的进步——普遍的共识看起来很像OP的方法,我将其总结为:<p>1。对于前1k个最常见的单词<p>2,使用暴力[预制Anki抽认卡副](<a href=“https:/;ankiweb.net/ shared/  info& 810519009”rel=“nofollow”>https:";ankiweb.netTf®;infoTh; 810510519009</a>)。尽快开始观看可理解的输入,最好每天观看一小时或更长时间<p>3。[我的句子](<a href=“https:”www.youtube.com“watch”v=QBcQJESGQvc)“rel=”nofollow“>https:”/;www.youtube.com;看?v=QBcQJESGQvc)</a>可理解的输入,并将其添加到每日SRS抽认卡研磨中<p>我发现的这些方法的最佳总结是<a href=“https:#x2F;refold.la#x2F”rel=“nofollow”>https:/;refold.la/</a> <p>自我插入:我一直在研究一种使用LLMs生成普通话音频可理解输入的方法;TTS型号。这个想法是,在500个单词到3k-5k个单词之间没有多少好的CI选项——OP自己提到,当他开始看《剪刀七》时,他几乎什么都不懂,如果没有一些坚定的意志力,很难“坚持下去”。我的项目<a href=“https:”plusonechinese.com“rel=”nofollow“>https:”/;plusonechinese.com</a>使普通话音频故事在400个单词到8k个单词或更多单词的任何级别上都有85%的可理解性,然后自动将音频片段导入SRS抽认卡,这使得这样的CI工作流程在较低级别上更容易参与,而无需高级意志力。仍在努力使内容变得有趣,但希望得到一些反馈!
-