Discussion:
XFS Kernel 2.6.27.7 oopses
Ralf Liebenow
2009-01-30 22:23:59 UTC
Permalink
Hello !

I heavily use XFS for an incremental backup server (by using rsync --link-dest option
to create hardlinks to unchanged files), and therefore have about 10 million files
on my TB Harddisk. To remove old versions nightly an "rm -rf" will remove a million
hardlinks/files every night.

After a while I had regular oopses and so I updated the system to make sure its
on a current version.

It is now a SuSE 11.1 64Bit with SuSE's Kernel 2.6.27.7-9-default

The Server is a Quad-Core Intel 64Bit with 8 GB RAM running a 64Bit Linux.
(I have vmware server 2 installed, so those modules can be seen in the kmesg,
but the OOPs happens also without them).

Now sometimes the "rm -rf" Job OOPses the kernel and get stuck (there is no
other measurable IO traffic on that system). The /proc/kmesg gives:

cat /proc/kmsg
<0>general protection fault: 0000 [1] SMP
<0>last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
<4>CPU 3
<4>Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device binfmt_mi
sc vmnet(N) vsock(N) vmci(N) vmmon(N) nfsd lockd nfs_acl auth_rpcgss sunrpc expo
rtfs microcode fuse loop dm_mod snd_hda_intel st r8169 snd_pcm snd_timer osst sn
d_page_alloc ppdev iTCO_wdt mii shpchp button rtc_cmos snd_hwdep pci_hotplug par
port_pc rtc_core sky2 ohci1394 intel_agp rtc_lib snd i2c_i801 iTCO_vendor_suppor
t ieee1394 parport pcspkr i2c_core sg soundcore raid456 async_xor async_memcpy a
sync_tx xor raid0 sd_mod crc_t10dif ehci_hcd uhci_hcd usbcore edd raid1 xfs fan
ahci libata dock aic79xx scsi_transport_spi scsi_mod thermal processor thermal_s
ys hwmon
<4>Supported: No
<4>Pid: 5176, comm: xfssyncd Tainted: G 2.6.27.7-9-default #1
<4>RIP: 0010:[<ffffffff80230865>] [<ffffffff80230865>] __wake_up_common+0x29/0x
76
<4>RSP: 0018:ffff880114df9d30 EFLAGS: 00010086
<4>RAX: 7fff8800255b8a70 RBX: ffff8800255b8a60 RCX: 0000000000000000
<4>RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8800255b8a68
<4>RBP: ffff880114df9d60 R08: 7fff8800255b8a58 R09: 0000000000000282
<4>R10: 0000000000000002 R11: ffff8800255b87c0 R12: 0000000000000001
<4>R13: 0000000000000282 R14: ffff8800255b8a70 R15: 0000000000000000
<4>FS: 0000000000000000(0000) GS:ffff88012fba0ec0(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 00007f28d42a2000 CR3: 0000000124e34000 CR4: 00000000000006e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process xfssyncd (pid: 5176, threadinfo ffff880114df8000, task ffff88012bc1e0
c0)
<4>Stack: 0000000300000000 ffff8800255b8a60 ffff8800255b8a68 0000000000000282
<4> ffff88012d802000 0000000000000001 ffff880114df9d90 ffffffff8023219a
<4> 0000000000000286 0000000000000000 ffff88006ef1d240 ffff88012aca3800
<4>Call Trace:
<4> [<ffffffff8023219a>] complete+0x38/0x4b
<4> [<ffffffffa00f5316>] xfs_iflush+0x73/0x2ab [xfs]
<4> [<ffffffffa010a7a2>] xfs_finish_reclaim+0x12a/0x168 [xfs]
<4> [<ffffffffa010a871>] xfs_finish_reclaim_all+0x91/0xcb [xfs]
<4> [<ffffffffa010925c>] xfs_syncsub+0x50/0x22b [xfs]
<4> [<ffffffffa0118a3a>] xfs_sync_worker+0x17/0x36 [xfs]
<4> [<ffffffffa01189d4>] xfssyncd+0x15d/0x1ac [xfs]
<4> [<ffffffff8025434d>] kthread+0x47/0x73
<4> [<ffffffff8020d7b9>] child_rip+0xa/0x11
<4>
<4>
<0>Code: c9 c3 55 48 89 e5 41 57 4d 89 c7 41 56 4c 8d 77 08 41 55 41 54 41 89 d4
53 48 83 ec 08 89 75 d4 89 4d d0 48 8b 47 08 4c 8d 40 e8 <49> 8b 40 18 48 8d 58
e8 eb 2d 45 8b 28 4c 89 f9 8b 55 d0 8b 75
<1>RIP [<ffffffff80230865>] __wake_up_common+0x29/0x76
<4> RSP <ffff880114df9d30>
<4>---[ end trace a069bd11f2b4e6ab ]---

It _always_ gets stuck at the same place in "complete" of xfssyncd, so i dont
think its hardware related.

I also always did a xfs_repair after very OOPS->Reboot, so the filesystem itself
should be consistent.

I initilly used default settings for mkfs.xfs and mount. Now I use different
settings, but get the same OOPs again, it seems to be unrelated.

What do you recommend ? Has this bug already been addressed within the
hundrets of fixes I've seen on the mailing list ? Shall I try a stock 2.6.28
kernel ?

Thanks in advance !

Ralf
--
theCode AG
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0 fax -10
***@theCo.de http://www.theCo.de
Dave Chinner
2009-02-01 00:37:44 UTC
Permalink
Post by Ralf Liebenow
Hello !
I heavily use XFS for an incremental backup server (by using rsync --link-dest option
to create hardlinks to unchanged files), and therefore have about 10 million files
on my TB Harddisk. To remove old versions nightly an "rm -rf" will remove a million
hardlinks/files every night.
After a while I had regular oopses and so I updated the system to make sure its
on a current version.
It is now a SuSE 11.1 64Bit with SuSE's Kernel 2.6.27.7-9-default
What kernel did you originally see this problem on?
Post by Ralf Liebenow
<4> [<ffffffff8023219a>] complete+0x38/0x4b
<4> [<ffffffffa00f5316>] xfs_iflush+0x73/0x2ab [xfs]
<4> [<ffffffffa010a7a2>] xfs_finish_reclaim+0x12a/0x168 [xfs]
<4> [<ffffffffa010a871>] xfs_finish_reclaim_all+0x91/0xcb [xfs]
<4> [<ffffffffa010925c>] xfs_syncsub+0x50/0x22b [xfs]
<4> [<ffffffffa0118a3a>] xfs_sync_worker+0x17/0x36 [xfs]
<4> [<ffffffffa01189d4>] xfssyncd+0x15d/0x1ac [xfs]
<4> [<ffffffff8025434d>] kthread+0x47/0x73
<4> [<ffffffff8020d7b9>] child_rip+0xa/0x11
That may be a use after free. I know lachlan fixed a few in this
area, but I'm not sure what release those fixeѕ ended up in....
Post by Ralf Liebenow
What do you recommend ? Has this bug already been addressed within the
hundrets of fixes I've seen on the mailing list ? Shall I try a stock 2.6.28
kernel ?
Try the lastest 2.6.28.x stable kernel (*not* the straight 2.6.28 release
as there's a directory traversal bug that is fixed in 2.6.28.1) and
see if the problem persists.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Ralf Liebenow
2009-02-05 05:38:47 UTC
Permalink
Hello !

Finally I found the time to compile and test the latest stable 2.6.28.3 kernel
but I can reproduce it:

Feb 5 03:00:19 up kernel: general protection fault: 0000 [#1] SMP
Feb 5 03:00:19 up kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
Feb 5 03:00:19 up kernel: CPU 2
Feb 5 03:00:19 up kernel: Modules linked in: vmnet parport_pc vsock vmci vmmon nfsd lockd nfs_acl auth_rpcgss snd_pcm_oss sunrpc snd_mi
xer_oss exportfs snd_seq snd_seq_device binfmt_misc microcode fuse loop dm_mod snd_hda_intel osst st snd_pcm snd_timer snd_page_alloc pp
dev shpchp rtc_cmos i2c_i801 rtc_core button snd_hwdep r8169 rtc_lib pcspkr ohci1394 intel_agp mii i2c_core parport sky2 pci_hotplug iTC
O_wdt ieee1394 iTCO_vendor_support snd sg soundcore raid456 async_xor async_memcpy async_tx xor raid0 sd_mod crc_t10dif ehci_hcd uhci_hc
d usbcore edd raid1 xfs fan ahci libata aic79xx scsi_transport_spi scsi_mod thermal processor thermal_sys hwmon [last unloaded: vmnet]
Feb 5 03:00:19 up kernel: Pid: 1462, comm: xfssyncd Not tainted 2.6.28.3-9-default #1
Feb 5 03:00:19 up kernel: RIP: 0010:[<ffffffff802327a1>] [<ffffffff802327a1>] __wake_up_common+0x29/0x76
Feb 5 03:00:19 up kernel: RSP: 0018:ffff88012e56fcf0 EFLAGS: 00010086
Feb 5 03:00:19 up kernel: RAX: 7fff8800255b8a70 RBX: ffff8800255b8a60 RCX: 0000000000000000
Feb 5 03:00:19 up kernel: RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8800255b8a68
Feb 5 03:00:19 up kernel: RBP: ffff88012e56fd20 R08: 7fff8800255b8a58 R09: ffff880129d02e18
Feb 5 03:00:19 up kernel: R10: 0000000000000002 R11: 0000000300000000 R12: 0000000000000001
Feb 5 03:00:19 up kernel: R13: 0000000000000286 R14: ffff8800255b8a70 R15: 0000000000000000
Feb 5 03:00:19 up kernel: FS: 0000000000000000(0000) GS:ffff88012fb2e8c0(0000) knlGS:0000000000000000
Feb 5 03:00:19 up kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 5 03:00:19 up kernel: CR2: 00007f075ee9ab00 CR3: 0000000000201000 CR4: 00000000000006e0
Feb 5 03:00:19 up kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 5 03:00:19 up kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 5 03:00:19 up kernel: Process xfssyncd (pid: 1462, threadinfo ffff88012e56e000, task ffff88012c842640)
Feb 5 03:00:19 up kernel: Stack:
Feb 5 03:00:19 up kernel: 0000000300000000 ffff8800255b8a60 ffff8800255b8a68 0000000000000286
Feb 5 03:00:19 up kernel: ffff88012b922000 ffff88012a1eb000 ffff88012e56fd50 ffffffff8023410a
Feb 5 03:00:19 up kernel: ffff8800255b87c0 0000000000000000 ffff8800255b8980 ffff88004dc64140
Feb 5 03:00:19 up kernel: Call Trace:
Feb 5 03:00:20 up kernel: [<ffffffff8023410a>] complete+0x38/0x4c
Feb 5 03:00:20 up kernel: [<ffffffffa01a2424>] xfs_iflush+0x7a/0x2b2 [xfs]
Feb 5 03:00:20 up kernel: [<ffffffff802241cc>] ? default_spin_lock_flags+0x17/0x1b
Feb 5 03:00:20 up kernel: [<ffffffffa01b7cf9>] xfs_finish_reclaim+0x136/0x175 [xfs]
Feb 5 03:00:20 up kernel: [<ffffffffa01b7dd0>] xfs_finish_reclaim_all+0x98/0xd4 [xfs]
Feb 5 03:00:20 up kernel: [<ffffffffa01b694c>] xfs_syncsub+0x55/0x22f [xfs]
Feb 5 03:00:20 up kernel: [<ffffffffa01b6b68>] xfs_sync+0x42/0x47 [xfs]
Feb 5 03:00:20 up kernel: [<ffffffffa01c55fd>] xfs_sync_worker+0x1f/0x41 [xfs]
Feb 5 03:00:20 up kernel: [<ffffffffa01c558f>] xfssyncd+0x15d/0x1ac [xfs]
Feb 5 03:00:20 up kernel: [<ffffffffa01c5432>] ? xfssyncd+0x0/0x1ac [xfs]
Feb 5 03:00:20 up kernel: [<ffffffff802563e5>] kthread+0x49/0x76
Feb 5 03:00:20 up kernel: [<ffffffff8020d659>] child_rip+0xa/0x11
Feb 5 03:00:20 up kernel: [<ffffffff8025639c>] ? kthread+0x0/0x76
Feb 5 03:00:20 up kernel: [<ffffffff8020d64f>] ? child_rip+0x0/0x11
Feb 5 03:00:20 up kernel: Code: c9 c3 55 48 89 e5 41 57 4d 89 c7 41 56 4c 8d 77 08 41 55 41 54 41 89 d4 53 48 83 ec 08 89 75 d4 89 4d d
0 48 8b 47 08 4c 8d 40 e8 <49> 8b 40 18 48 8d 58 e8 eb 2d 45 8b 28 4c 89 f9 8b 55 d0 8b 75
Feb 5 03:00:20 up kernel: RIP [<ffffffff802327a1>] __wake_up_common+0x29/0x76
Feb 5 03:00:20 up kernel: RSP <ffff88012e56fcf0>
Feb 5 03:00:20 up kernel: ---[ end trace a0fbe14899a3ce1c ]---

So its not SuSEs fault, and its the latest stable kernel from kernel.org ....

Hmmm ... can I do something to help you find the problem ? I can
reproduce it by creating some millon of hardlinks to files and then remove some
million hardlinks with one "rm -rf"

The Filesystem is 1 TB big.

Settings:
meta-data=/dev/sdd1 isize=256 agcount=32, agsize=7630937 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=244189984, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=65536 blocks=0, rtextents=0

[I originally had log version=1 but with the same problem. The problem occurs
with barriers=on and with barriers=off ]

I have not tried to run the system with one CPU core yet, that maybe a thing
I can check tomorrow ...

Thanks for your help
Ralf
Post by Dave Chinner
Post by Ralf Liebenow
Hello !
I heavily use XFS for an incremental backup server (by using rsync --link-dest option
to create hardlinks to unchanged files), and therefore have about 10 million files
on my TB Harddisk. To remove old versions nightly an "rm -rf" will remove a million
hardlinks/files every night.
After a while I had regular oopses and so I updated the system to make sure its
on a current version.
It is now a SuSE 11.1 64Bit with SuSE's Kernel 2.6.27.7-9-default
What kernel did you originally see this problem on?
Post by Ralf Liebenow
<4> [<ffffffff8023219a>] complete+0x38/0x4b
<4> [<ffffffffa00f5316>] xfs_iflush+0x73/0x2ab [xfs]
<4> [<ffffffffa010a7a2>] xfs_finish_reclaim+0x12a/0x168 [xfs]
<4> [<ffffffffa010a871>] xfs_finish_reclaim_all+0x91/0xcb [xfs]
<4> [<ffffffffa010925c>] xfs_syncsub+0x50/0x22b [xfs]
<4> [<ffffffffa0118a3a>] xfs_sync_worker+0x17/0x36 [xfs]
<4> [<ffffffffa01189d4>] xfssyncd+0x15d/0x1ac [xfs]
<4> [<ffffffff8025434d>] kthread+0x47/0x73
<4> [<ffffffff8020d7b9>] child_rip+0xa/0x11
That may be a use after free. I know lachlan fixed a few in this
area, but I'm not sure what release those fixe?? ended up in....
Post by Ralf Liebenow
What do you recommend ? Has this bug already been addressed within the
hundrets of fixes I've seen on the mailing list ? Shall I try a stock 2.6.28
kernel ?
Try the lastest 2.6.28.x stable kernel (*not* the straight 2.6.28 release
as there's a directory traversal bug that is fixed in 2.6.28.1) and
see if the problem persists.
Cheers,
Dave.
--
Dave Chinner
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
--
theCode AG
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0 fax -10
***@theCo.de http://www.theCo.de
Dave Chinner
2009-02-10 09:50:45 UTC
Permalink
Post by Ralf Liebenow
Hello !
Finally I found the time to compile and test the latest stable 2.6.28.3 kernel
OK.

....
Post by Ralf Liebenow
Hmmm ... can I do something to help you find the problem ? I can
reproduce it by creating some millon of hardlinks to files and then remove some
million hardlinks with one "rm -rf"
Interesting. Sounds like a race between writing back the inode and
it being freed. How long does it take to reproduce the problem?
Do you have a script that you could share?

Next question - what is the setting of ikeep/noikeep in your mount
options? If you dump /proc/self/mounts on 2.6.28 it will tell us
if inode clusters are being deleted or not....

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Ralf Liebenow
2009-02-10 19:41:03 UTC
Permalink
Hello !

Here are my mount settings:

cat /proc/self/mounts

..
/dev/sdc1 /backup xfs rw,nobarrier,logbufs=8,logbsize=256k,noquota 0 0

This is my current setting, but it also happend before i changed the
settings. Before I had it with this:

/dev/sdc1 /backup xfs rw,noquota 0 0

The problem occured independently of the settings i changed.

Shall I try to set ikeep/noikeep (whats the default for that ?).

At the moment I have no time to create a minimum Script to
reproduce, but essentially I do the following:
- I have a tree with about 2 million files in it called daily.1
- I create a new tree daily.0 with rsync --link-dest=daily.1
so that the most (the unchanged ones) of those million files
just get hardlinked to the ones in daily.1 and only the
changed ones are created newly.
- every day daily.1 gets renamed daily.2 and daily.0 gets
renamed daily.1 (currently I have rotated to daily.14)
The oldest daily.X folder gets removed by "rm -rf" which
is where the oops sometimes (not every time, but often
enough to reproduce) happens.

So the setting is: I have about 2 million files, and most of
them are multip hardlinked, so i have about > 20 million Inodes
on this system. Every night about 2 million of those inodes
get removed, most of them pointing to files which have other
hardlinks and therefore are not really removed.
Post by Dave Chinner
How long does it take to reproduce the problem?
On my system I just need to make a new rsync and remove
some million files/hardlinks, but it take some hours until it happens.
Somtimes it even runs successfully through ...

As I said before an xfs_check/xfs_repair does not detect any
incosistencies after the problem happend. ( But the rm process
hangs and the filesystem cannot be umounted any more )

I need to see, if the problem is with the massive hardlinking,
or if it just can be reproduced by creating 20 million files,
and remove them in one sweep ... I will check it, when i have
the time.

Greets
Ralf
Post by Dave Chinner
Post by Ralf Liebenow
Hello !
Finally I found the time to compile and test the latest stable 2.6.28.3 kernel
OK.
.....
Post by Ralf Liebenow
Hmmm ... can I do something to help you find the problem ? I can
reproduce it by creating some millon of hardlinks to files and then remove some
million hardlinks with one "rm -rf"
Interesting. Sounds like a race between writing back the inode and
it being freed. How long does it take to reproduce the problem?
Do you have a script that you could share?
Next question - what is the setting of ikeep/noikeep in your mount
options? If you dump /proc/self/mounts on 2.6.28 it will tell us
if inode clusters are being deleted or not....
Cheers,
Dave.
--
Dave Chinner
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
--
theCode AG
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0 fax -10
***@theCo.de http://www.theCo.de
Ralf Liebenow
2009-02-17 12:33:46 UTC
Permalink
Hello !

More testing reveals the same problem with a different oops ..
I did the remove again, and that worked without oops, but the oops
happens shortly after, when the machine needed to swap/reorganice memory,
and kswapd tried to cleanup/reclaim inode space.

It looks like the there are invalid (nulled) inodes in an (freed ?) inode
list, which generates oopses whenever a process tries to cleanup/reclaim them.

Is there a debugging/compile time option I can use to checkup that an
inode pointer is valid and usable ??

Thanks !!

Ralf

Feb 17 12:13:53 up kernel: general protection fault: 0000 [#1] SMP
Feb 17 12:13:53 up kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
Feb 17 12:13:53 up kernel: CPU 1
Feb 17 12:13:53 up kernel: Modules linked in: vmnet vsock vmci vmmon snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device binfmt_misc nfsd lockd n
fs_acl auth_rpcgss sunrpc exportfs microcode fuse loop dm_mod snd_hda_intel osst snd_pcm st snd_timer rtc_cmos ppdev snd_page_alloc shpchp r81
69 snd_hwdep parport_pc rtc_core i2c_i801 ohci1394 iTCO_wdt snd parport mii intel_agp button rtc_lib ieee1394 pcspkr pci_hotplug iTCO_vendor_s
upport i2c_core sky2 sg soundcore raid456 async_xor async_memcpy async_tx xor raid0 sd_mod crc_t10dif ehci_hcd uhci_hcd usbcore edd raid1 xfs
fan ahci libata aic79xx scsi_transport_spi scsi_mod thermal processor thermal_sys hwmon
Feb 17 12:13:53 up kernel: Pid: 38, comm: kswapd0 Not tainted 2.6.28.3-9-default #1
Feb 17 12:13:53 up kernel: RIP: 0010:[<ffffffffa01a1cf3>] [<ffffffffa01a1cf3>] xfs_idestroy_fork+0x1f/0xca [xfs]
Feb 17 12:13:53 up kernel: RSP: 0018:ffff88012bb05bd0 EFLAGS: 00010202
Feb 17 12:13:53 up kernel: RAX: ffff8800813dcb80 RBX: 1000000000000000 RCX: ffff8800813dcb00
Feb 17 12:13:53 up kernel: RDX: ffff8800813dcb80 RSI: 0000000000000001 RDI: ffff8800813dcb00
Feb 17 12:13:53 up kernel: RBP: ffff88012bb05bf0 R08: ffff88012bb05d1b R09: a55a5a5a5a5a5a5a
Feb 17 12:13:53 up kernel: R10: ffa5a5a5a5a5a5a5 R11: 0000000300000000 R12: ffff8800813dcb00
Feb 17 12:13:53 up kernel: R13: 0000000000000001 R14: ffff88012bb05d1b R15: ffff88012dc81000
Feb 17 12:13:53 up kernel: FS: 0000000000000000(0000) GS:ffff88012fac22c0(0000) knlGS:0000000000000000
Feb 17 12:13:53 up kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 17 12:13:53 up kernel: CR2: 00007f9e560ef000 CR3: 00000000993b2000 CR4: 00000000000006e0
Feb 17 12:13:53 up kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 17 12:13:53 up kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 17 12:13:53 up kernel: Process kswapd0 (pid: 38, threadinfo ffff88012bb04000, task ffff88012bb02180)
Feb 17 12:13:53 up kernel: Stack:
Feb 17 12:13:53 up kernel: ffff8800813dcb00 ffff8800813dcb00 ffff8800813dcb00 ffff88009fa38240
Feb 17 12:13:53 up kernel: ffff88012bb05c20 ffffffffa01a1dec ffff8800813dcb00 ffff8800813dcb00
Feb 17 12:13:53 up kernel: ffff88009fa38240 ffff88012bb05d1b ffff88012bb05c40 ffffffffa019f4c6
Feb 17 12:13:53 up kernel: Call Trace:
Feb 17 12:13:53 up kernel: [<ffffffffa01a1dec>] xfs_idestroy+0x4e/0xbc [xfs]
Feb 17 12:13:53 up kernel: [<ffffffffa019f4c6>] xfs_ireclaim+0x83/0x87 [xfs]
Feb 17 12:13:53 up kernel: [<ffffffffa01b7d5e>] xfs_finish_reclaim+0x167/0x175 [xfs]
Feb 17 12:13:53 up kernel: [<ffffffffa01b7eb6>] xfs_reclaim+0x76/0x10e [xfs]
Feb 17 12:13:53 up kernel: [<ffffffffa01c41db>] xfs_fs_clear_inode+0xf1/0x115 [xfs]
Feb 17 12:13:53 up kernel: [<ffffffff802d225f>] clear_inode+0x79/0xd2
Feb 17 12:13:53 up kernel: [<ffffffff802d236f>] dispose_list+0x68/0x138
Feb 17 12:13:53 up kernel: [<ffffffff802d264a>] shrink_icache_memory+0x20b/0x241
Feb 17 12:13:53 up kernel: [<ffffffff802961eb>] shrink_slab+0xe3/0x158
Feb 17 12:13:53 up kernel: [<ffffffff802969b2>] kswapd+0x4b2/0x63d
Feb 17 12:13:53 up kernel: [<ffffffff80294011>] ? isolate_pages_global+0x0/0x22d
Feb 17 12:13:53 up kernel: [<ffffffff80256758>] ? autoremove_wake_function+0x0/0x38
Feb 17 12:13:53 up kernel: [<ffffffff80296500>] ? kswapd+0x0/0x63d
Feb 17 12:13:53 up kernel: [<ffffffff802563e5>] kthread+0x49/0x76
Feb 17 12:13:53 up kernel: [<ffffffff8020d659>] child_rip+0xa/0x11
Feb 17 12:13:53 up kernel: [<ffffffff8025639c>] ? kthread+0x0/0x76
Feb 17 12:13:53 up kernel: [<ffffffff8020d64f>] ? child_rip+0x0/0x11
Feb 17 12:13:53 up kernel: Code: be 03 00 00 00 e8 9a 24 09 e0 c9 c3 55 48 89 e5 41 55 41 89 f5 41 54 49 89 fc 53 48 8d 5f 60 48 83 ec 08 85 f
6 74 04 48 8b 5f 58 <48> 8b 7b 08 48 85 ff 74 0d e8 81 9e 01 00 48 c7 43 08 00 00 00
Feb 17 12:13:53 up kernel: RIP [<ffffffffa01a1cf3>] xfs_idestroy_fork+0x1f/0xca [xfs]
Feb 17 12:13:53 up kernel: RSP <ffff88012bb05bd0>
Feb 17 12:13:53 up kernel: ---[ end trace 564bbbd2e5103836 ]---
Post by Dave Chinner
Post by Ralf Liebenow
Hello !
Finally I found the time to compile and test the latest stable 2.6.28.3 kernel
OK.
.....
Post by Ralf Liebenow
Hmmm ... can I do something to help you find the problem ? I can
reproduce it by creating some millon of hardlinks to files and then remove some
million hardlinks with one "rm -rf"
Interesting. Sounds like a race between writing back the inode and
it being freed. How long does it take to reproduce the problem?
Do you have a script that you could share?
Next question - what is the setting of ikeep/noikeep in your mount
options? If you dump /proc/self/mounts on 2.6.28 it will tell us
if inode clusters are being deleted or not....
Cheers,
Dave.
--
Dave Chinner
--
theCode AG
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0 fax -10
***@theCo.de http://www.theCo.de
Christoph Hellwig
2009-02-10 09:56:12 UTC
Permalink
Post by Ralf Liebenow
Hmmm ... can I do something to help you find the problem ? I can
reproduce it by creating some millon of hardlinks to files and then remove some
million hardlinks with one "rm -rf"
Can you isolated that testcase to a simple shell script?
Loading...