From:
[email protected]
Package: libsvn0
Version: 1.0.6-1.1
Severity: important
Recently (since last weekend) a recurring problem started plaguing the SchoolTool subversion repository at
http://source.schooltool.org/. The symptoms are quite different from the ones described in bugs #266314 and #252974: all processes that try to access the repository (including svn, svnadmin, svnserve, python for viewcvs.cgi, and apache2 with mod_svn)
just hang in an infinite loop. Running strace shows that they all
execute the same loop, repeatedly calling
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
ltrace on those pids reports nothing.
fuser shows that all of those processes are accessing db.lock from the repository. lsof shows that they all hold read locks on that file.
Looking at the stack trace with gdb reveals that all of them have the
same topmost 9 frames:
#0 0x4036bdd2 in select () from /lib/libc.so.6
#1 0x401bc448 in db_xa_switch_4002 () from /usr/lib/libdb-4.2.so
#2 0x4019d5c9 in __memp_sync_int_4002 () from /usr/lib/libdb-4.2.so
#3 0x4019cd24 in __memp_sync_4002 () from /usr/lib/libdb-4.2.so
#4 0x401a46ee in __txn_checkpoint_4002 () from /usr/lib/libdb-4.2.so
#5 0x401a43cd in __txn_checkpoint_pp_4002 () from /usr/lib/libdb-4.2.so
#6 0x40048e60 in svn_fs_list_transactions () from /usr/lib/libsvn_fs-1.so.0
#7 0x40048f1c in svn_fs_list_transactions () from /usr/lib/libsvn_fs-1.so.0
#8 0x4004904f in svn_fs__retry_txn () from /usr/lib/libsvn_fs-1.so.0
Killing all processes that are accessing the repository does not help --
any newly started processes also hang. Stracing svnadmin verify shows
that the last syscalls performed before the select loop are:
...
stat64("/svn/schooltool/db/uuids", {st_mode=S_IFREG|0664, st_size=8192, ...}) = 0
open("/svn/schooltool/db/uuids", O_RDWR|O_LARGEFILE) = 11
fcntl64(11, F_SETFD, FD_CLOEXEC) = 0
fstat64(11, {st_mode=S_IFREG|0664, st_size=8192, ...}) = 0
time(NULL) = 1093011345
time([1093011345]) = 1093011345
stat64("/svn/schooltool/db/log.0000000275", {st_mode=S_IFREG|0664, st_size=590372, ...}) = 0
open("/svn/schooltool/db/log.0000000275", O_RDWR|O_CREAT|O_LARGEFILE, 0666) = 12
fcntl64(12, F_SETFD, FD_CLOEXEC) = 0
read(12, "\202\372\17\0\34\0\0\0$(\21\324\210\t\4\0\10\0\0\0\0\0"..., 28) = 28
_llseek(12, 590372, [590372], SEEK_SET) = 0
write(12, "\322\1\t\0R\0\0\0\275\23\255\336\2\0\0\0\265\201\7\200"..., 1002) = 1002
fsync(12) = 0
pwrite(7, "\370\0\0\0!\24\10\0\0\0\0\0b1\5\0\t\0\0\0\0\20\0\0\0\t"..., 4096, 0) = 4096
pwrite(6, "\20\1\0\0c\344\n\0\0\0\0\0b1\5\0\t\0\0\0\0\20\0\0\0\t\0"..., 4096, 0) = 4096
pwrite(5, "\371\0\0\0\266c\3\0\0\0\0\0b1\5\0\t\0\0\0\0\20\0\0\0\t"..., 4096, 0) = 4096
pwrite(8, "\23\1\0\0\23B\4\0\0\0\0\0b1\5\0\t\0\0\0\0\20\0\0\0\t\0"..., 4096, 0) = 4096
pwrite(10, "\23\1\0\0\220)\5\0\0\0\0\0b1\5\0\t\0\0\0\0\20\0\0\0\t\0"..., 4096, 0) = 4096
pwrite(9, "\22\1\0\0\362w\t\0\0\0\0\0b1\5\0\t\0\0\0\0\20\0\0\0\t\0"..., 4096, 0) = 4096
pwrite(11, "\0\0\0\0\1\0\0\0\0\0\0\0b1\5\0\t\0\0\0\0\20\0\0\0\t\0\0"..., 4096, 0) = 4096
pwrite(4, "\23\1\0\0\362Q\3\0\0\0\0\0b1\5\0\t\0\0\0\0\20\0\0\0\t\0"..., 4096, 0) = 4096
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
The file descriptors at that time are (from ls -l /proc/$pid/fd):
0 -> /dev/pts/4
1 -> /dev/pts/4
2 -> /dev/pts/4
3 -> /var/lib/svn/schooltool/locks/db.lock
4 -> /var/lib/svn/schooltool/db/nodes
5 -> /var/lib/svn/schooltool/db/revisions
6 -> /var/lib/svn/schooltool/db/transactions
7 -> /var/lib/svn/schooltool/db/copies
8 -> /var/lib/svn/schooltool/db/changes
9 -> /var/lib/svn/schooltool/db/representations
10 -> /var/lib/svn/schooltool/db/strings
11 -> /var/lib/svn/schooltool/db/uuids
12 -> /var/lib/svn/schooltool/db/log.0000000275
Killing all processes that are accessing the repository and then running svnadmin recover helps.
It is possible, but not certain, that these wedges are triggered by
msnbot indexing the schooltool repository, as indicated by apache's
access.log. There are problems registered in error.log at the time of
the wedge, although there are a number of error messages about problems
closing the Berkeley DB execution environment from earlier days when I
had resorted to killall and killall -9.
-- System Information:
Debian Release: 3.0
APT prefers testing
APT policy: (300, 'testing'), (200, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.4.26-1-686
Locale: LANG=C, LC_CTYPE=lt_LT.UTF-8
Versions of packages libsvn0 depends on:
ii libapr0 2.0.50-9 The Apache Portable Runtime
ii libc6 2.3.2.ds1-13 GNU C Library: Shared libraries an ii libdb4.2 4.2.52-16 Berkeley v4.2 Database Libraries [ ii libexpat1 1.95.6-8 XML parsing C library - runtime li ii libldap2 2.1.30-2 OpenLDAP libraries
ii libneon24 0.24.7.dfsg-0.1 An HTTP and WebDAV client library ii libperl5.8 5.8.4-2 Shared Perl library.
ii libssl0.9.7 0.9.7d-4 SSL shared libraries
ii libswig1.3.21 1.3.21-5 Runtime support libraries for swig ii libxml2 2.6.11-3 GNOME XML library
ii python2.3 2.3.4-5 An interactive high-level object-o ii zlib1g 1:1.2.1.1-5 compression library - runtime
-- no debconf information
Marius Gedminas
--
We have an advanced scalable groupware communication environment (email)
-- Alan Cox
--
To UNSUBSCRIBE, email to
[email protected]
with a subject of "unsubscribe". Trouble? Contact
[email protected]
--- SoupGate-Win32 v1.05
* Origin: you cannot sedate... all the things you hate (1:229/2)