Discussion:
[BUG, 3.17-rc4] dentry still in use during unmount
Dave Chinner
2014-09-16 21:53:36 UTC
Permalink
Hi Al,

One of my xfstest rigs tripped over this last night when running
xfs/301 on a pair of 4G ramdisks during an auto group run:

BUG: Dentry ffff8803c14fc870{i=0,n=dir} still in use (-127) [unmount of xfs ram1]
------------[ cut here ]------------
WARNING: CPU: 4 PID: 27856 at fs/dcache.c:1319 umount_check+0x7f/0x90()
Modules linked in:
CPU: 4 PID: 27856 Comm: umount Tainted: G W 3.17.0-rc4-dgc+ #479
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
0000000000000009 ffff88025eeefd80 ffffffff81cf7327 0000000000000000
ffff88025eeefdb8 ffffffff810933bd ffff8803c14fc870 ffff88011aeba800
ffffffff81db2860 ffff88023acf72c0 00000000002353d2 ffff88025eeefdc8
Call Trace:
[<ffffffff81cf7327>] dump_stack+0x45/0x56
[<ffffffff810933bd>] warn_slowpath_common+0x7d/0xa0
[<ffffffff8109349a>] warn_slowpath_null+0x1a/0x20
[<ffffffff811c1baf>] umount_check+0x7f/0x90
[<ffffffff811c2ea6>] d_walk+0x66/0x2c0
[<ffffffff811c1b30>] ? d_lru_del+0xa0/0xa0
[<ffffffff811c3286>] do_one_tree+0x26/0x40
[<ffffffff811c43ba>] shrink_dcache_for_umount+0x5a/0x90
[<ffffffff811aebe1>] generic_shutdown_super+0x21/0xf0
[<ffffffff811aefdc>] kill_block_super+0x3c/0x90
[<ffffffff811af309>] deactivate_locked_super+0x49/0x60
[<ffffffff811af8b6>] deactivate_super+0x46/0x60
[<ffffffff811cb04a>] mntput_no_expire+0xca/0x120
[<ffffffff811cc5ce>] SyS_umount+0x8e/0x100
[<ffffffff81d025e9>] system_call_fastpath+0x16/0x1b
---[ end trace 15254c3c565abf1a ]---
VFS: Busy inodes after unmount of ram1. Self-destruct in 5 seconds. Have a nice day...

I can't reproduce it easily:

$ sudo MKFS_OPTIONS="-m crc=1,finobt=1" ./check xfs/301
FSTYP -- xfs (debug)
PLATFORM -- Linux/x86_64 test4 3.17.0-rc4-dgc+
MKFS_OPTIONS -- -f -m crc=1,finobt=1 /dev/ram1
MOUNT_OPTIONS -- /dev/ram1 /mnt/scr

xfs/301 12s ... 12s
Ran: xfs/301
Passed all 1 tests
$

And even run in a loop for 20 minutes it hasn't triggered the error.
I haven't seen this before, so I thought I better give you the heads
up just in case.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Al Viro
2014-09-16 22:30:44 UTC
Permalink
Post by Dave Chinner
Hi Al,
One of my xfstest rigs tripped over this last night when running
BUG: Dentry ffff8803c14fc870{i=0,n=dir} still in use (-127) [unmount of xfs ram1]
Umm... -127 == "already got past the beginning of __dentry_kill()". And if
it had been seen by d_walk() callback, it must have gotten past the point where
__dentry_kill() unlocks that sucker.

Very interesting... I don't see how that could happen, TBH - __dentry_kill()
is called with parent and victim locked; it sets DCACHE_DENTRY_KILLED and
removes the victim from parent's ->d_subdirs before dropping either lock.
Moreover, the victim can't have any children at that point - it must have
had the last reference held by called of __dentry_kill() and each child
would've contributed to refcount.

And d_walk() goes through the list of children with parent kept locked.
It does unlock the parent after walking one level deeper, but on the
way back it
* checks that there had been no renames
* checks that child isn't marked with DCACHE_DENTRY_KILLED
after relocking the parent. In case of anything fishy it restarts the
whole thing with renames excluded. If those tests succeed, we are guaranteed
that we'll continue walking the parent's list of children with parent locked,
AFAICS, not that there could legitimately be anything playing with the
dentry tree modifications in parallel with fs shutdown...

It might be interesting to slap WARN_ON(dentry->d_flags & DCACHE_DENTRY_KILLED)
for dentry and target in __d_move() and for anon in __d_materialise_dentry(),
after dentry_lock_for_move() in both functions. And see if it triggers.
IOW, whether it's possible for doomed dentry to be readded to someone's
->d_subdirs after it has entered __dentry_kill().
Dave Chinner
2014-09-16 22:40:56 UTC
Permalink
Post by Al Viro
Post by Dave Chinner
Hi Al,
One of my xfstest rigs tripped over this last night when running
BUG: Dentry ffff8803c14fc870{i=0,n=dir} still in use (-127) [unmount of xfs ram1]
Umm... -127 == "already got past the beginning of __dentry_kill()". And if
it had been seen by d_walk() callback, it must have gotten past the point where
__dentry_kill() unlocks that sucker.
Very interesting... I don't see how that could happen, TBH - __dentry_kill()
is called with parent and victim locked; it sets DCACHE_DENTRY_KILLED and
removes the victim from parent's ->d_subdirs before dropping either lock.
Moreover, the victim can't have any children at that point - it must have
had the last reference held by called of __dentry_kill() and each child
would've contributed to refcount.
And d_walk() goes through the list of children with parent kept locked.
It does unlock the parent after walking one level deeper, but on the
way back it
* checks that there had been no renames
* checks that child isn't marked with DCACHE_DENTRY_KILLED
after relocking the parent. In case of anything fishy it restarts the
whole thing with renames excluded. If those tests succeed, we are guaranteed
that we'll continue walking the parent's list of children with parent locked,
AFAICS, not that there could legitimately be anything playing with the
dentry tree modifications in parallel with fs shutdown...
It might be interesting to slap WARN_ON(dentry->d_flags & DCACHE_DENTRY_KILLED)
for dentry and target in __d_move() and for anon in __d_materialise_dentry(),
after dentry_lock_for_move() in both functions. And see if it triggers.
IOW, whether it's possible for doomed dentry to be readded to someone's
->d_subdirs after it has entered __dentry_kill().
Ok, I'll add a debug patch to my test kernels that add these and
I'll let you know if anything triggers.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Loading...