Re: Subversion history problem



On จ., 2006-02-27 at 16:00 +0800, James Henstridge wrote:
> Ross Golder wrote:
> > I'm considering a (hopefully) cleaner alternative.
> > 
> > Assuming that on 2003-09-11, the server clock read 1997-01-04, we should
> > be able to calculate (roughly) the drift as a fixed number of
> > days/seconds. 
> 
> The full list of discontinuities I posted a link to covers more than
> just the 2003-09-11 -> 1997-01-04 problem.  That was the only one that
> affected the jhbuild module where I discovered that something had gone
> wrong.
> 
> There were a number of other occasions that the clock was set back to
> January 1997, which is most likely reflects the default date set by the
> BIOS if the date gets lost when the CVS server got rebooted.
> 

OK, that wouldn't have worked then :)

> > Around about line 4519 in cvs2svn is a piece of code that checks that
> > the previous revision's timestamp is lower than the current revision's
> > timestamp, and that the current revision's timestamp is lower than the
> > next revision's timestamp and spits out a warning if not. We can add our
> > hook here to adjust the current revision's timestamp by our fixed
> > amount.
> 
> I don't know the cvs2svn code base, so can't say for for sure.  Is that part
> of the code working with CVS-style per-file revisions, or
> Subversion-style tree-wide revisions?
> 

Actually, the lines I pointed out were in the second pass. I was looking
in the wrong place. Since then, I've hacked the first pass a bit to do
the date resyncing using a slightly improved algorithm. The comments in
the patch attached should explain how it's different.

Before, it worked from the latest to the first revisions and checks
continuity of the timestamps. If a previous revision is after the
current one, it changes the previous revisions timestamp to be a second
or two before the current one. The problem is, on reaching the clockskew
in September 2003, it pushes all prior revisions back to January 1997.
You can see this here:

http://svn.gnome.org/viewsvn/gtkhtml/trunk/src/gtkhtml.c

My version uses a different approach. When it finds the clockskew, it
works its way back up toward the latest revision aligning them towards
the top (iyswim). Actually, might be a useful upstream patch if it
works.

> If it is the latter, then is this before or after the revisions have
> been sorted?  I did notice that the dates in the Subversion import had
> been adjusted, but it looked like the adjustments were to give
> increasing time to the revision ordering that it had chosen.
> 
> Adjusting the times after sorting would not necessarily help if the
> revision sorting was done incorrectly beforehand.
> 

I copied two example modules down to my laptop and run the latest
cvs2svn, with my patch again them. The modules are 'jhbuild' and
'pyIDL' (the others listed look a bit big for my modem today). I've
processed them locally and uploaded them to container:/svn/gnome, so
they should be reflected in the viewcvs.

If I recall, you mentioned that Guilherme had manually editted the ',v'
files for jhbuild, so that's probably not a great example, as it will
have been fixed at source already. So, using pyIDL as a reference for
some spot-checks:

http://svn.gnome.org/viewsvn/pyIDL/trunk/src/pyIDL.py?view=log
http://svn.gnome.org/viewsvn/pyIDL/trunk/tests/Makefile.am?view=log

Your list showed these as having incontinuities before, but they seem to
be re-aligned OK after being re-migrated with the updated script.

> > It might need testing a few times on a couple of the identified modules
> > to fine-tune our fixed amount to a resolution of at least a few minutes
> > (or until there are no more warnings).
> 
> Sure.  The data I provided may help in checking if the solution works:
> if there are going to be problems, they'll most likely be visible at one
> of the discontinuities.
> 

Unfortunately, I'm still struggling to get a version of python > 2.2
onto container, so the cvs2svn script won't actually run for me on the
server at the moment :( Unfortunately, building a local python (2.3.4 or
2.4.2) in my home directory gives the following error when run:

Traceback (most recent call last):
  File "./cvs2svn", line 5291, in ?
    main()
  File "./cvs2svn", line 5283, in main
    convert(start_pass, end_pass)
  File "./cvs2svn", line 4903, in convert
    _passes[i]()
  File "./cvs2svn", line 4189, in pass1
    cd = CollectData()
  File "./cvs2svn", line 1423, in __init__
    DB_OPEN_NEW)
  File "./cvs2svn", line 810, in __init__
    self.db = anydbm.open(filename, mode)
  File "/home/users/rossg/local/lib/python2.4/anydbm.py", line 83, in
open
    return mod.open(file, flag, mode)
  File "/home/users/rossg/local/lib/python2.4/dbhash.py", line 16, in
open
    return bsddb.hashopen(file, flag, mode)
  File "/home/users/rossg/local/lib/python2.4/bsddb/__init__.py", line
285, in hashopen
    e = _openDBEnv()
  File "/home/users/rossg/local/lib/python2.4/bsddb/__init__.py", line
339, in _openDBEnv
    e.open('.', db.DB_PRIVATE | db.DB_CREATE | db.DB_THREAD |
db.DB_INIT_LOCK | db.DB_INIT_MPOOL)
bsddb._db.DBError: (38, 'Function not implemented -- process-private:
unable to initialize environment lock: Function not implemented')

Any ideas welcome :) Not sure exactly what causes it, but to me that
looks like it's an out-of-date db library, so it's probably more than
just python that needs upgrading on the server :( I Googled and found
someone else with the same problem (also RHEL3) but no solution.

At the moment, my best option seems to be to hope and pray that Matt
finds time soonish to get at least one of the new servers installed with
RHEL4 and on-line next to container, so maybe we can make use of a
recent python on that for running the migration script. Otherwise, if I
can't find a usable historical version of cvs2svn that will still run on
python2.2, we might have to postpone the migration for a while, which
would be a shame.

--
Ross
Index: cvs2svn
===================================================================
--- cvs2svn	(revision 1803)
+++ cvs2svn	(working copy)
@@ -1683,59 +1683,72 @@
     # time before rev 1.35.  If we inserted 1.35 *first* (due to the time-
     # sorting), and then tried to insert 1.34, we'd be screwed.
 
-    # to perform the analysis, we'll simply visit all of the 'previous'
-    # links that we have recorded and validate that the timestamp on the
-    # previous revision is before the specified revision
+	# what we need to do is to work backwards, checking timestamps occur 
+	# previous to the current one. On meeting a previous revision with a
+	# timestamp after the current one, we need to wind the current one
+	# (and any other related parts in this changeset) forward, and the
+	# next one and the next, until the next one is after the current one,
+	# then we carry on back towards the initial revision.
 
+	# btw, this could (did) happen because of clockskew in the history
+	
     # if we have to resync some nodes, then we restart the scan. just keep
     # looping as long as we need to restart.
-    while 1:
-      for current, prev in self.prev_rev.items():
-        if not prev:
-          # no previous revision exists (i.e. the initial revision)
-          continue
-        t_c = self.rev_data[current][0]
-        t_p = self.rev_data[prev][0]
-        if t_p >= t_c:
-          # the previous revision occurred later than the current revision.
-          # shove the previous revision back in time (and any before it that
-          # may need to shift).
+    for current, prev in self.prev_rev.items():
+      if not prev:
+        # no previous revision exists (i.e. the initial revision)
+        continue
+      t_c = self.rev_data[current][0]
+      t_p = self.rev_data[prev][0]
+      if t_p >= t_c:
+        # the previous revision occurred later than the current revision.
+        # shove this revision forward past the previous one
+        self.rev_data[current][0] = t_p + 1	# new timestamp
+        self.rev_data[current][2] = t_c	# old timestamp
+        delta = t_p - 1 - t_c
+        msg =  "PASS1 RESYNC: '%s' (%s): old time='%s' new time='%s' delta=%ds" \
+              % (self.cvs_path, current,
+                time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(t_c)),
+                time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(t_p)),
+                delta)
+        Log().write(LOG_VERBOSE, msg)
+        if (delta > COMMIT_THRESHOLD
+            or delta < (COMMIT_THRESHOLD * -1)):
+          str = "%s: Significant timestamp change for '%s' (%d seconds)"
+          Log().write(LOG_WARN,
+                      str % (warning_prefix, self.cvs_path, delta))
 
-          # We sync backwards and not forwards because any given CVS
-          # Revision has only one previous revision.  However, a CVS
-          # Revision can *be* a previous revision for many other
-          # revisions (e.g., a revision that is the source of multiple
-          # branches).  This becomes relevant when we do the secondary
-          # synchronization in pass 2--we can make certain that we
-          # don't resync a revision earlier than it's previous
-          # revision, but it would be non-trivial to make sure that we
-          # don't resync revision R *after* any revisions that have R
-          # as a previous revision.
-          while t_p >= t_c:
-            self.rev_data[prev][0] = t_c - 1	# new timestamp
-            self.rev_data[prev][2] = t_p	# old timestamp
-            delta = t_c - 1 - t_p
-            msg =  "PASS1 RESYNC: '%s' (%s): old time='%s' delta=%ds" \
-                  % (self.cvs_path, prev, time.ctime(t_p), delta)
-            Log().write(LOG_VERBOSE, msg)
-            if (delta > COMMIT_THRESHOLD
-                or delta < (COMMIT_THRESHOLD * -1)):
-              str = "%s: Significant timestamp change for '%s' (%d seconds)"
-              Log().write(LOG_WARN,
-                          str % (warning_prefix, self.cvs_path, delta))
-            current = prev
-            prev = self.prev_rev[current]
-            if not prev:
-              break
-            t_c = t_c - 1		# self.rev_data[current][0]
-            t_p = self.rev_data[prev][0]
+        self.tree_completed_reorder_clockskewed_revisions(current)
 
-          # break from the for-loop
-          break
-      else:
-        # finished the for-loop (no resyncing was performed)
-        return
+  def tree_completed_reorder_clockskewed_revisions(self, current):
+    "Adjust the timestamps for a set of patches where clockskew has thrown them out of sync."
+    
+    # work back towards the latest revision shoving them forward
+    # until the next one is already after the current one
+    while current:
+      try:
+        nextrev = self.next_rev[current]
+      except KeyError:
+      	return
+      t_c = self.rev_data[current][0]
+      t_n = self.rev_data[nextrev][0]
+      if t_n >= t_c:
+      	return
 
+      # the next revision occurred before the current revision.
+      # shove it a bit behind the current one
+      self.rev_data[nextrev][0] = t_c + 1	# new timestamp
+      self.rev_data[nextrev][2] = t_n	# old timestamp
+      delta = t_n - 1 - t_c
+      msg =  "PASS1 RESYNCED: '%s' (%s): old time='%s' new time='%s' delta=%ds" \
+            % (self.cvs_path, current,
+                time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(t_n)),
+                time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(t_c)),
+                delta)
+      Log().write(LOG_VERBOSE, msg)
+
+      current = nextrev
+	
   def set_revision_info(self, revision, log, text):
     timestamp, author, old_ts = self.rev_data[revision]
     digest = sha.new(log + '\0' + author).hexdigest()


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]