Paul Burba
2011-07-18 15:48:24 UTC
http://blogs.collab.net/subversion/2008/07/subversion-merg/
I found the article to be a good overview of the issues. I think that we
need help from Mark. On the other hand, I have seen that Mark sometimes
makes discouraging comments. My work is apparently “hand wavey” and
“proprietary”. I’m used to this treatment because I have 25 developers who
work for me who often think that I am full of crap. However, it might have
a discouraging effect on other contributors. For example, you can see in
this great ticket thread -
http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I do
not think it is possible in this design....I think we need to accept the
limitations of the current design and work towards doing the best we can
within that design” Apparently that was enough to kill progress. I think
we should keep a more open mind going forward.
I’m going to make some claims that some problems have “straightforward”
solutions. That doesn’t mean they are simple solutions. Handling all of
the merge cases is going to be hard. However, they are straightforward in
the sense that we can discuss the strategy at the high level used in Mark’s
article.
Let’s consider three issues: Subtree merginfo, cyclic merge, and tree
change operations
SUBTREE MERGINFO
Mark notes that reintegrate does not work if you have subtree merginfo.
The subtrees potentially make the top-level mergeinfo inaccurate.
Hi Andy,I found the article to be a good overview of the issues. I think that we
need help from Mark. On the other hand, I have seen that Mark sometimes
makes discouraging comments. My work is apparently “hand wavey” and
“proprietary”. I’m used to this treatment because I have 25 developers who
work for me who often think that I am full of crap. However, it might have
a discouraging effect on other contributors. For example, you can see in
this great ticket thread -
http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I do
not think it is possible in this design....I think we need to accept the
limitations of the current design and work towards doing the best we can
within that design” Apparently that was enough to kill progress. I think
we should keep a more open mind going forward.
I’m going to make some claims that some problems have “straightforward”
solutions. That doesn’t mean they are simple solutions. Handling all of
the merge cases is going to be hard. However, they are straightforward in
the sense that we can discuss the strategy at the high level used in Mark’s
article.
Let’s consider three issues: Subtree merginfo, cyclic merge, and tree
change operations
SUBTREE MERGINFO
Mark notes that reintegrate does not work if you have subtree merginfo.
The subtrees potentially make the top-level mergeinfo inaccurate.
The blog you reference is three years old. A lot has changed since
then. You might want to take a look at
https://svn.apache.org/repos/asf/subversion/branches/1.7.x/CHANGES for
the items related to merging.
So,
basically everyone that has looked at merge problems in the past four years,
including Mark, has tried to get rid of subtree merginfo.
On that note: As far back as 1.5.5 (12/2008) reintegrate mergesbasically everyone that has looked at merge problems in the past four years,
including Mark, has tried to get rid of subtree merginfo.
tolerated subtree mergeinfo if all previous sync merges occurred at
the root of the branch. The forthcoming 1.7 release supports
reintegrate merges even if some of the sync merges were done at the
subtree level, as long as the branch is effectively synced -- see this
issue for more info:
http://subversion.tigris.org/issues/show_bug.cgi?id=3577
This is why one of my first requests on your earlier thread was for
the *specific* use-cases where the current merge-tracking
implementation doesn't work (and preferably some test patches for
these use cases).
It’s amazing that
Subversion still tries to support this feature. It can’t be supported in
NewMerge.
To clarify, when you say "This feature" you mean what?Subversion still tries to support this feature. It can’t be supported in
NewMerge.
A) The ability to do subtree merges at all
B) Recording mergeinfo describing subtree merges (i.e. subtree
merges are still allowed, but not tracked).
C) Other
In the following sections, we will also see that the merginfo data is too
sparse, and we need to replace it with something bigger and more extensible.
Sidebar: You might want to use another term besides "sparse" whensparse, and we need to replace it with something bigger and more extensible.
talking about mergeinfo granularity. We typically use "sparse" to
refer to the depth of a working copy (e.g. 'svn co %URL% wc --depth
immediates wc-path) or the operative depth of an operation (e.g. 'svn
propget svn:mergeinfo %URL% --depth empty'). Sparse merges and sparse
merge targets can both result in subtree mergeinfo, so there is the
potential for some confusion for anybody following this casually.
CYCLIC MERGE
The case where we merge back and forth between a development or deployment
branch, and trunk, is the base case for merge. It should be supported.
Subversion only supports it with special instructions. This is the “cyclic
merge” problem.
It seems that we have two basic ways to do a merge. We can grab all of the
changes that we are trying to merge in one big diff between the branch we
are merging from and the branch we are merging into - the reintegrate merge
as described in Mark’s article. Or, we can sequentially apply or “replay”
each of the changes that we want to merge into our working copy - the
“recursive” strategy that is the default for git.
If each of these sequential "replays" is a separate editor drive (seeThe case where we merge back and forth between a development or deployment
branch, and trunk, is the base case for merge. It should be supported.
Subversion only supports it with special instructions. This is the “cyclic
merge” problem.
It seems that we have two basic ways to do a merge. We can grab all of the
changes that we are trying to merge in one big diff between the branch we
are merging from and the branch we are merging into - the reintegrate merge
as described in Mark’s article. Or, we can sequentially apply or “replay”
each of the changes that we want to merge into our working copy - the
“recursive” strategy that is the default for git.
the svn_delta_editor_t in
https://svn.apache.org/repos/asf/subversion/trunk/subversion/include/svn_diff.h)
you might need to consider the impact of performance for merges over a
WAN. If a single editor drive turns into multiple editor drives it
will certainly be slower. How much slower obviously depends on a lot
of factors, but it is something to keep in mind since this isn't git
and we don't have the entire repository history locally available.
It seems to me that the “one big diff” and the replay strategy are closely
related. When you are replaying, you grab all of the changes in any
sequence of revisions that doesn’t include a merge as one big diff. So, the
current “one big diff” strategy is a special case of the replay strategy
that applies when there are no intermediate merges from other branches or
cherrypicks.
But wait! According to this article, we can’t use the replay strategy
because we are missing part of the replay. We lose information that was
used to resolve a merge when composing merge commits. If we had that
information, we could replay individual merges, and handle a higher
percentage of the cyclic merge cases.
This problem seems to have a straightforward solution. When we commit the
merge, we can stuff the changeset that represents the difference between the
merge, and the commit, into the merge_history. We just need an extensible
merge_history format to hold it.
It’s totally not clear to me why you need to say “reintegrate” when you
merge to trunk,
You don't need to; reintegrate is simply a shortcut for a two-URLrelated. When you are replaying, you grab all of the changes in any
sequence of revisions that doesn’t include a merge as one big diff. So, the
current “one big diff” strategy is a special case of the replay strategy
that applies when there are no intermediate merges from other branches or
cherrypicks.
But wait! According to this article, we can’t use the replay strategy
because we are missing part of the replay. We lose information that was
used to resolve a merge when composing merge commits. If we had that
information, we could replay individual merges, and handle a higher
percentage of the cyclic merge cases.
This problem seems to have a straightforward solution. When we commit the
merge, we can stuff the changeset that represents the difference between the
merge, and the commit, into the merge_history. We just need an extensible
merge_history format to hold it.
It’s totally not clear to me why you need to say “reintegrate” when you
merge to trunk,
merge. The blog you reference explains this.
and why you need to update the branch after you do a
reintegrate merge from it. The computer should be able to remember the
history of merges and it should be obvious which things have been merged and
which revisions have been committed on both branches. The only reason that
I can think if is that that the mergeinfo is so sparse that the computer
doesn’t remember enough about the merge history. Would a bigger and more
extensible data format give us a straightforward way to solve that problem?
I think you are asking, at a very high level, "The svn:mergeinforeintegrate merge from it. The computer should be able to remember the
history of merges and it should be obvious which things have been merged and
which revisions have been committed on both branches. The only reason that
I can think if is that that the mergeinfo is so sparse that the computer
doesn’t remember enough about the merge history. Would a bigger and more
extensible data format give us a straightforward way to solve that problem?
property tracks merges down to the revision level. Would a
finer-grained tracking solution allow us to solve the remaining cyclic
merge problems and do away with the need for the reintegrate option
and the workflow that goes with it?"
If that is a fair representation of your core question then I can say,
at an equally high level, "yes, it probably can".
TREE CHANGE
We can identify tree changes by pattern matching. This is the same tactic
that git uses, without any other tree change tracking.
Could you provide a bit more explanation of what you mean here? ForWe can identify tree changes by pattern matching. This is the same tactic
that git uses, without any other tree change tracking.
those of us not familiar with the inner workings of git.
We can identify when
this match is successful because the match is applied, examined by the
merger, and then the merge is committed.
In this case we could write the
tree map into the merge_history so that we can map changes bi-directionally
during future merges without guessing again. This is another case of
saving information that we need to replay a merge.
I think we could get a similar effect by generating a move operation (normal
copy & delete form) as part of the merge. I think that this mapping would
need to be done by updates as well as by explicit merges.
EXPERTISE
Who on this list knows enough about the core algorithm used in merge to
critique these suggestions and point to places in the code or documentation?
If you haven't already read this I'd start here:this match is successful because the match is applied, examined by the
merger, and then the merge is committed.
In this case we could write the
tree map into the merge_history so that we can map changes bi-directionally
during future merges without guessing again. This is another case of
saving information that we need to replay a merge.
I think we could get a similar effect by generating a move operation (normal
copy & delete form) as part of the merge. I think that this mapping would
need to be done by updates as well as by explicit merges.
EXPERTISE
Who on this list knows enough about the core algorithm used in merge to
critique these suggestions and point to places in the code or documentation?
http://subversion.apache.org/docs/community-guide/
See svn_client_merge_peg4, svn_client_merge4, and
svn_client_merge_reintegrate in
https://svn.apache.org/repos/asf/subversion/trunk/subversion/include/svn_client.h
As I mentioned above, see the svn_delta_editor_t in
https://svn.apache.org/repos/asf/subversion/trunk/subversion/include/svn_diff.h
See all of https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_client/merge.c.
Though much of this is dedicated to the current merge tracking
implementation so you can probably skip a lot of it since you want to
be rid or it :-)
You might want to try some merges with the --ignore-ancestry option.
This disables merge tracking and performs the merge in what amounts to
a pre-1.5 manner and will allow you to get your head around the
essentials of how Subversion does a merge without worrying about
mergeinfo stuffs.
Paul