【Hacker News搬运】美国地质调查局利用机器学习在阿肯色州展示了巨大的锂潜力

hackernews

Title: USGS uses machine learning to show large lithium potential in Arkansas

美国地质调查局利用机器学习在阿肯色州展示了巨大的锂潜力

Text:

Url: https://www.usgs.gov/news/national-news-release/unlocking-arkansas-hidden-treasure-usgs-uses-machine-learning-show-large

由于我无法直接访问外部网站，我将无法直接抓取并分析您提供的链接内容。不过，我可以指导您如何使用JinaReader（一个文本处理和机器学习平台）来分析网页内容，并说明如果内容不是中文，如何将其翻译成中文。

以下是使用JinaReader分析网页内容的步骤，以及如何处理非中文内容的翻译：

1. **设置JinaReader**：
   - 首先，确保您已经安装了JinaReader，并且熟悉其基本操作。
   - 创建一个新的工作空间或项目。

2. **抓取网页内容**：
   - 使用JinaReader的网页抓取功能，将提供的URL输入到抓取工具中。
   - 确保抓取器能够正确地解析网页内容，并将文本数据提取出来。

3. **分析抓取的内容**：
   - 使用JinaReader的分析工具对抓取的文本进行自然语言处理（NLP）。
   - 可以使用关键词提取、情感分析、主题建模等工具来分析文本内容。

4. **处理非中文内容**：
   - 如果抓取到的内容不是中文，您可以使用JinaReader中的机器翻译功能。
   - 选择一个支持将目标语言翻译成中文的翻译API，例如Google翻译API。
   - 在JinaReader中配置翻译API，并将抓取的文本发送到翻译API进行翻译。

5. **总结内容**：
   - 在翻译成中文后，使用JinaReader的文本摘要功能来生成内容的总结。
   - 这可以帮助您快速了解网页的主要内容，而不必阅读全部文本。

以下是一个简化的代码示例，说明如何使用JinaReader进行网页内容抓取、翻译和总结（请注意，这只是一个示例，实际的JinaReader API调用可能有所不同）：

```python
# 假设JinaReader提供了以下API函数
from jinareader import WebScraper, Translator, TextSummary

# 抓取网页内容
scraper = WebScraper()
web_content = scraper.scrape("https://www.usgs.gov/news/national-news-release/unlocking-arkansas-hidden-treasure-usgs-uses-machine-learning-show-large")

# 翻译非中文内容
translator = Translator(api_key='YOUR_API_KEY')
translated_content = translator.translate(web_content, target_language='zh')

# 总结内容
summary = TextSummary()
summary_text = summary.summarize(translated_content)

print(summary_text)

在这个示例中，YOUR_API_KEY 应该替换为您使用的翻译API的密钥。以上步骤将指导您如何使用JinaReader来分析网页内容，即使内容不是中文也可以翻译成中文并生成总结。

        
## Post by: antidnan
        
### Comments: 
        
**folli**: From the paper&#x27;s method section, a bit more about which type of ML algo was used:<p>An RF machine-learning model was developed to predict lithium concentrations in Smackover Formation brines throughout southern Arkansas. The model was developed by (i) assigning explanatory variables to brine samples collected at wells, (ii) tuning the RF model to make predictions at wells and assess model performance, (iii) mapping spatially continuous predictions of lithium concentrations across the Reynolds oolite unit of the Smackover Formation in southern Arkansas, and (iv) inspecting the model for explanatory variable importance and influence. Initial model tuning used the tidymodels framework (52) in R (53) to test XGBoost, K-nearest neighbors, and RF algorithms; RF models consistently had higher accuracy and lower bias, so they were used to train the final model and predict lithium.<p>Explanatory variables used to tune the RF model included geologic, geochemical, and temperature information for Jurassic and Cretaceous units. The geologic framework of the model domain is expected to influence brine chemistry both spatially and with depth. Explanatory variables used to train the RF model must be mapped across the model domain to create spatially continuous predictions of lithium. Thus, spatially continuous subsurface geologic information is key, although these digital resources are often difficult to acquire.<p>Interesting to me that RF performed better the XGBoost, would have expected at least a similar outcome if tuned correctly.
> **folli**: 来自论文；s方法部分，更多地介绍了使用哪种类型的ML算法：<p>开发了一个RF机器学习模型来预测阿肯色州南部Smackover地层卤水中的锂浓度。该模型是通过以下方式开发的：（i）为在井中收集的盐水样本分配解释变量，（ii）调整RF模型以在井中进行预测并评估模型性能，（iii）绘制阿肯色州南部Smackover地层Reynolds鲕粒单元锂浓度的空间连续预测图，以及（iv）检查模型的解释变量重要性和影响。初始模型调优使用R（53）中的tidymodels框架（52）来测试XGBoost、K近邻和RF算法；RF模型始终具有更高的准确性和更低的偏差，因此它们被用于训练最终模型和预测锂<p> 用于调整RF模型的解释变量包括侏罗纪和白垩纪单元的地质、地球化学和温度信息。预计模型域的地质框架将在空间和深度上影响盐水化学。用于训练RF模型的解释变量必须映射到整个模型域，以创建锂的空间连续预测。因此，空间连续的地下地质信息是关键，尽管这些数字资源往往难以获取<p> 让我感兴趣的是，如果正确调整，RF在XGBoost中的表现会更好，至少会得到类似的结果。
            
**Animats**: There&#x27;s also a big lithium deposit in Nevada, and preparations for mining are underway there.[1] General Motors put in $650 million for guaranteed access to the output of this Thacker Mine.<p>It&#x27;s in a caldera in a mountain that I-80 bypassed to go through Winnemuca, Nevada. Nearest town is Mill City, NV, which is listed as a ghost town, despite being next to I-80 and a main line railroad track.
The mine site is about 12km from Mill City on a dirt road not tracked by Google Street View.<p>Google Earth shows signs of development near Mill City. Looks like a trailer park and a truck stop. The road to the mine looks freshly graded. Nothing at the mine site yet.<p>It&#x27;s a good place for a mine. There are no neighbors for at least 10km, but within 15km, there&#x27;s good road and rail access.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Thacker_Pass_lithium_mine" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Thacker_Pass_lithium_mine</a>
> **Animats**: 那里；内华达州也有一个大型锂矿床，开采准备工作正在那里进行。[1] 通用汽车公司投入6.5亿美元，以保证获得这座Thacker矿场的产出<p> 它；它位于I-80绕过内华达州温尼穆卡的一座山上的火山口。最近的城镇是内华达州密尔城，尽管毗邻I-80和一条主线铁路轨道，但仍被列为鬼城。该矿场距离米尔城约12公里，位于一条谷歌街景无法追踪的土路上<p> 谷歌地球在米尔城附近显示出发展迹象。看起来像一个拖车停车场和一个卡车停车场。通往矿井的路看起来很平整。矿场还没有任何东西<p> 它；这是个开矿的好地方。至少10公里内没有邻居，但在15公里内，有；它有良好的公路和铁路交通<p> [1]<a href=“https:”en.wikipedia.org:”wiki:”Thacker_Pass_lithium_mine“rel=”nofollow“>https:”&#x2F；en.wikipedia.org；维基；解冻机_组件_锂_分钟</a>
            
**_heimdall**: Well I guess this is a good win for short term energy infrastructure, though I&#x27;m always pretty torn when its at the cost of ripping open huge swaths of earth to get at the raw material.<p>It is interesting to see how much of this data could be modelled based on wastewater brines from other industries in the area, assuming we go on to mine the lithium it will say a lot if the ML predictions prove accurate.<p>One thing I couldn&#x27;t tell, and its probably just a limitation of how much time I could spend reading the source paper, is what method would be needed to extract the bulk of the lithium expected to be there. If processing brine water is sufficient that may be easier to control externalities than if they have to strip mine and get all the overburden out of the way first.
> **_heimdall**: 好吧，我想这对短期能源基础设施来说是一个很好的胜利，尽管我；我总是很伤心，因为这是以挖出大片土地来获取原材料为代价的<p> 有趣的是，这些数据中有多少可以根据该地区其他行业的废水盐水进行建模，假设我们继续开采锂，如果机器学习预测被证明是准确的，这将说明很多问题<p> 有一件事我做不到；我不知道，这可能只是我阅读源论文的时间有限，需要什么方法来提取预计存在的大部分锂。如果处理盐水就足够了，那么控制外部性可能比他们必须先剥离矿山并清除所有覆盖层更容易。
            
**tommykins**: Ah spatial autocorrelation, my old friend.<p>Very good work - but typically we don&#x27;t build prospectivity models this way (or rather we don&#x27;t validate them this way anymore). Great to see the USGS starting to dip their toe back in this though, they and the GSC were long the leaders in this, but have dropped it on the last 5-7 years.
> **tommykins**: 啊，空间自相关，我的老朋友<p> 工作做得很好，但通常我们不会；不要以这种方式构建前瞻性模型（或者更确切地说，我们不再以这种方式验证它们）。很高兴看到美国地质调查局开始重新涉足这一领域，他们和GSC长期以来一直是这方面的领导者，但在过去的5-7年里已经放弃了。
            
**greenie_beans**: ugh i really don&#x27;t want people to mine in the mobile basin. that&#x27;s one of the most diverse ecosystems in north america. <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=8j9coyJeB4Q" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=8j9coyJeB4Q</a>
> **greenie_beans**: 啊，我真的不知道；我不希望人们在流动盆地采矿。即；它是北美最多样化的生态系统之一<a href=“https://www.youtube.com观看？v=8j9coyJeB4Q”rel=“nofollow”>https:&#x2F；www.youtube.com；看？v=8j9coyJeB4Q</a>