【Hacker News搬运】Mergiraf:Git的语法感知合并驱动程序
-
Title: Mergiraf: a syntax-aware merge driver for Git
Mergiraf:Git的语法感知合并驱动程序
Text:
Url: https://mergiraf.org/
很抱歉,作为一个文本处理和知识生成的AI,我无法直接访问或分析外部网站的内容,包括您提供的链接。因此,我无法使用JinaReader或其他工具来抓取和分析该网站的内容。 不过,如果您能提供该网站的内容摘要或文本片段,我可以帮助您分析这段文本并进行总结。如果您需要将非中文内容翻译成中文,我也可以提供翻译服务。请提供需要帮助的具体内容。
Post by: p4bl0
Comments:
DarkPlayer: Looking at the architecture, they will probably run into some issues. We are doing something similar with SemanticDiff [1] and also started out using tree-sitter grammars for parsing and GumTree for matching. Both choices turned out to be problematic.<p>Tree sitter grammars are primarily written to support syntax highlighting and often use a best effort approach to parsing. This is perfectly fine for syntax highlighting, since the worst that can happen is that a few characters are highlighted incorrectly. However, when diffing or modifying code you really want the code to be parsed according to the upstream grammar, not something that mostly resembles it. We are currently in the process of moving away from tree-sitter and instead using the parsers provided by the languages themselves where possible.<p>GumTree is good at returning a result quickly, but there are quite a few cases where it always returned bad matches for us, no matter how many follow-up papers with improvements we tried to implement. In the end we switched over to a dijkstra based approach that tries to minimize the cost of the mapping, which is more computationally expensive but gives much better results. Difftastic uses a similar approach as well.<p>[1]: <a href="https://semanticdiff.com/" rel="nofollow">https://semanticdiff.com/</a>
DarkPlayer: 从架构上看,他们可能会遇到一些问题。我们正在使用SemanticDiff[1]做类似的事情,并开始使用树保姆语法进行解析,使用GumTree进行匹配。事实证明,这两种选择都是有问题的<p> 树型语法主要是为了支持语法高亮显示而编写的,并且通常使用尽力而为的方法进行解析。这对于语法高亮显示来说是完全可以的,因为最糟糕的情况可能是一些字符被错误地高亮显示。然而,当对代码进行差异化或修改时,你真的希望代码按照上游语法进行解析,而不是与之最相似的语法。我们目前正在摆脱树状图,尽可能使用语言本身提供的解析器<p> GumTree擅长快速返回结果,但在很多情况下,无论我们尝试实施多少改进的后续论文,它总是为我们返回糟糕的匹配结果。最后,我们切换到了一种基于dijkstra的方法,该方法试图最小化映射的成本,这在计算上更昂贵,但会给出更好的结果。Difftastic也使用了类似的方法<p> [1]:<a href=“https:”semanticdiff.com“rel=”nofollow“>https:”/;semanticdiff.com</一
Game_Ender: The tool has an excellent architecture section [0] that goes into how it works under the hood. It stands out to me that a complex tool has an overview to this depth that allows you to grasp conceptually how it works.<p>0 - <a href="https://mergiraf.org/architecture.html" rel="nofollow">https://mergiraf.org/architecture.html</a>
Game_Ender: 该工具有一个优秀的架构部分[0],介绍了它在引擎盖下的工作原理。在我看来,一个复杂的工具具有这种深度的概述,可以让你从概念上掌握它是如何工作的<p> 0-<a href=“https://mergiraf.org#x2F architecture.html”rel=“nofollow”>https:///;mergiraf.org;架构.html</a>
nathell: ‘Why the giraffe? Two reasons. First, it can see farther due to its height; second, it has one of the biggest hearts of all land mammals. Besides, its ossicones make you believe it listens to you when you look at it.’ – My NVC teacher<p>Kudos for the nonviolence.
nathell: “为什么是长颈鹿?有两个原因。首先,由于它的高度,它可以看得更远;其次,它拥有所有陆地哺乳动物中最大的心脏之一。此外,当你看到它时,它的听锥细胞会让你相信它在听你说话。”——我的NVC老师为非暴力运动致敬。:)
chrismorgan: Going through the sorts of conflicts it solves, and limitations in that, I find it claiming that in some insertions, order doesn’t matter <<a href="https://mergiraf.org/conflicts.html#neighbouring-insertions-and-deletions-of-elements-whose-order-does-not-matter" rel="nofollow">https://mergiraf.org/conflicts.html#neighbouring-insertions-...</a>>.<p>I <i>really</i> don’t like that. At the <i>language</i> level, order may not matter, but quite frequently in such cases the order <i>does</i> matter, insofar as almost every human would put the two things in a particular order; or where there is a particular convention active. If you automatically merge the two sides in a <i>different</i> order from that, doing it automatically has become <i>harmful</i>.<p>My clearest example: take Base
struct Foo; struct Bar;
, then between these two items, Left insertsimpl Foo { }
, Right insertsstruct Baz;
. To the computer, the difference doesn’t matter, but merging it asstruct Foo; struct Baz; impl Foo { } struct Bar;
is <i>obviously</i> bad to a human. This is the problem: it’s handling language <i>syntax</i> semantics, but can’t be aware of <i>logical</i> semantics. (Hope you can grasp what I’m trying to convey, not sure of the best words.) Left was not inserting something between Foo and Bar, it was attaching something to the end of Foo. Whereas Right was probably inserting something between Foo and Bar—but maybe even it was inserting something before Bar. You perceive that these are all different things, <i>logically</i>.<p>Another example where this will quickly go wrong: in CSS rulesets, some will sort the declarations by property name lexicographically, some by property name length (seriously, it’s frequently <i>so pretty</i>), some will group by different types of property… you can’t know.chrismorgan: 通过分析它解决的各种冲突及其局限性,我发现它声称在某些插入中,顺序并不重要<<a href=“https:#x2F;#x2F mergiraf.org#conflicts.html#顺序无关紧要的元素的相邻插入和删除”rel=“nofollow”>https:/;mergiraf.org;conflicts.html#相邻插入-</a> ><p> 我真的不喜欢这样。在<i>语言</i>层面,顺序可能并不重要,但在这种情况下,<i>的顺序往往很重要,因为几乎每个人都会把这两件事按特定的顺序排列;或者在有特定公约生效的情况下。如果你以不同的<i>顺序自动合并两边,那么自动合并就变成了<i>有害的</i><p> 我最清楚的例子是:以Base
struct Foo为例;struct Bar;
,然后在这两个项目之间,Left插入impl-Foo{}
,Right插入struct Baz;
。对于计算机来说,差异并不重要,但可以将其合并为struct Foo;结构体Baz;impl-Foo{}结构栏;
<i>显然</i>对人类有害。这就是问题所在:它正在处理语言<i>语法</i>语义,但无法感知<i>逻辑</i>语义学。(希望你能理解我想表达的意思,不确定最好的词是什么。)左边不是在Foo和Bar之间插入什么,而是在Foo的末尾附加什么。而Right可能是在Foo和Bar之间插入了一些东西,但甚至可能是在Bar之前插入了一些什么。你认为这些都是不同的东西,<i>逻辑上</i><p> 这很快就会出错的另一个例子是:在CSS规则集中,有些将按属性名字典排序声明,有些按属性名长度排序(严重的是,它经常是<i>所以很漂亮</i>),有些将按照不同类型的属性分组……你不知道。lucasoshiro: Happy to see something being developed for merge drivers, they are a underrated Git feature that could save a lot since the standard three-way merge of file contents is not aware of the language and can create some problems. For example, if you have this valid Python code:<p>x = input()<p>if x == 'x':
print('foo')<p><pre><code> print('bar')
</code></pre>
If you delete the first print in a branch, delete the other print in another branch, then merge the two branches, you'll have this:<p>x = input()<p>if x == 'x':<p>Both branches delete a portion of the code inside the if block, leaving it only with a whitespace. In Python it is not a valid code, as they empty scopes need to be declared with pass.<p>I installed Mergiraf to see if it can solve this situation, but sadly, it doesn't support Python...lucasoshiro: 很高兴看到为合并驱动程序开发的东西,它们是一个被低估的Git功能,可以节省很多,因为标准的文件内容三方合并不知道语言,可能会产生一些问题。例如,如果您有以下有效的Python代码:<p>x=input()<p>如果x==;x:print(foo)<p><pre><code>print(bar)</code></pre>如果删除分支中的第一个打印,删除另一个分支中的另一个打印,然后合并这两个分支,则;I have this:<p>x=input()<p>如果x==;x:<p> 这两个分支都删除了if块中的一部分代码,只留下空白。在Python中,这不是一个有效的代码,因为它们的空作用域需要用pass声明。<p>我安装了Mergiraf,看看它是否能解决这种情况,但遗憾的是,它不能;不支持Python。。。