我可以让Mac,Windows和linux共享一个git repo而不用担心行尾恐怖吗?



Okay everyone: I'm setting up a git repository for researchers to share scripts and data for a research project. The researchers aren't programmers or particularly git-savvy, so I'm hoping to point desktop git clients at a shared repository — everyone has access to this in their local filesystem.


The problem: line endings. We have people using:

  • Windows(主要是R)(CRLF)
  • Linux和Mac脚本(主要是R和python)(仅适用于LF)
  • 在Mac上为Excel,另存为.CSV(仅CR,是的,这是实际的事情)


git's autocrlf doesn't understand Mac line endings for some reason, so that doesn't work well for me.


First, I want to track changes to these files without telling people "you can't use the tools you're familiar with" because then they will just store the data and scripts somewhere outside of the repo.

第二,我不想让git repo充满愚蠢的行结尾提交和合并冲突,因为我可能需要解决所有发生的合并冲突.

Second, I want to have the git repo not be full of stupid line ending commits and merge conflicts, because I will probably need to solve all the merge conflicts that happen.


Third, I'd like people to not have to manually run some "fix all the line endings" script because that would suck. If this is what I need to do... whatever, I guess.


Assuming "first, normalize the line endings" is the answer, any sense of which ones I should choose?


I'd thought about a pre-commit hook, but it sounds like this would involve somehow getting the same script to run on both Windows and unix, and that sounds terrible. Maybe this is a secretly practical option?


As Marek Vitek said in comments, you may need to write at least a tiny bit of code.


Second, for a bit of clarity, here's how Git itself deals—or doesn't deal—with data transformation:

  • 提交中的数据(文件)是不可侵犯的.它实际上是 不能更改的,因此一旦提交中包含某些内容,它就永远存在. 1

  • Data (files) inside commits is sacrosanct. It literally can't be changed, so once something is inside a commit, it is forever.1

工作树中的数据可以并且应该采用主机友好"格式.也就是说,如果您在Mac上运行的程序 P mac 要求行以 CR 结尾,则数据可以采用该格式.如果您在Windows上运行等效的 P windows 的Windows框,要求该行以 CR + LF 结尾,则数据可以在该行中格式.

Data in the work-tree can and should be in a "host friendly" format. That is, if you're on a Mac running program Pmac that requires that lines end with CR, the data can be in that format. If you're on a Windows box running the equivalent Pwindows that requires that lines end with CR+LF, the data can be in that format.


Conversions to "host format" happen when files move from the index/staging-area to the work-tree. Conversions from "host format" to "internal storage format" happen when files move from the work-tree to the index/staging area.


Most of Git's built in filters do only CRLF to LF, or LF to CRLF, transformations. There is one "bigger" built in filter, called ident (not to be confused with indent), and you can define your own filters called clean and smudge, which can do arbitrary things. This means you can define a smudge filter that, on the Mac (but not on Windows) will (e.g.) change LF to CR. The corresponding Mac-only clean filter might then change CR to LF.


Note that many transformations are not data-preserving on raw binary data: there might be a byte that happens to resemble an LF, or CR, or two in a row that resemble CRLF, but are not meant to be interpreted that way. If you change these, you wreck the binary data. So it's important to apply filtering only to files where a byte that seems to be one of these things, really is one of these things. You can use .gitattributes path name matching, e.g., *.suffix, to infer which files get what filters applied.


The correct filtering actions to apply will, of course, depend on the host.

进行合并时,Git通常只从涉及的每个提交内的纯版本中直接获取文件.由于它是Git(和git diff)进行行的解释,因此通常希望它们具有Git的首选行"格式,即以LF结尾(只要馈入三路合并的所有三个版本均具有相同的(CR-before-LF-ness).但是,您可以使用"renormalize"设置,使Git在进行三向合并之前,先对污迹然后清洁的过滤器进行虚拟遍历.仅当您现在打算合并的现有提交(基础提示和两个分支提示)以与您现在都同意保留在永久提交中的方式不同的方式存储时,才需要使用此方法. (我实际上没有尝试过任何方法,但是原理很简单.)

When doing a merge, Git normally just takes the files directly from the pure versions inside each of the commits involved. Since it's Git (and git diff) doing interpretation of lines, you generally want these to have Git's preferred "line" format, i.e., ending with LF (it's OK if they have or lack a CR before the LF as long as all three versions feeding into a three-way merge all have the same CR-before-LF-ness). You can use the "renormalize" setting, though, to make Git do a virtual pass through your smudge-and-then-clean filters before it does the three-way merging. You would need this only when existing commits (base and two branch tips) that you now intend to merge, were stored in a different way from the way you have all agreed now to keep inside the permanent commits. (I have not actually tried any of this, but the principle is straightforward enough.)

1 您可以删除提交,但是,您还必须删除该提交的所有后代.实际上,这意味着已经共享/推送的提交通常不会消失;只有私有提交可以消失或被新的和改进的提交所代替.即使您可以将这个单词传播给所有人,也很难使每个已提交a9f3c34...的人抛弃它,而支持新的和改进的07115c3....

1You can remove a commit, but to do so, you must also remove all of that commit's descendants. In practice, this means commits that have been shared / pushed, generally never go away; only private commits can go away or be replaced with new-and-improved commits. It's difficult to get everyone who has commit a9f3c34... to ditch it in favor of the new and improved 07115c3..., even if you can get this word out to everyone.