Discussion:
[PATCH v3 00/10] xfs_ioc_bulkstat code refactoring and consolidation
Jeff Liu
2014-06-03 09:17:42 UTC
Permalink
Hi folk,

This is the revised patches for xfs_ioc_bulkstat consolidation and code
refactoring. As per Christoph's comments, I'm not include the per AG
inumber patch in this series given that I don't actually introduce the
relevant inumbers interface now. Similar to that reason, I also dropped
the per AG bulkstat patch, it would be included in parallel quota check
series.


v3->v2:
- one major bug fix is at xfs_bulkstat_ag_ichunk() regarding the user buffer
pointer operations, it should be defined as a pointer-to-pointer since it
would be updated inside xfs_bulkstat_ag_ichunk().

- separate xfs_inumber consolidate patch into two patches, the first one
fix the formater function return value and consolidate the codes, another
one does the actual logic changes for better error handling.

- Add a separate patch to get rid of the redundant user buffer count
checks at xfs_bulkstat()

- fixed agino calculation issue at xfs_bulkstat_grab_ichunk().

v2: http://oss.sgi.com/archives/xfs/2014-04/msg00554.html
v1: http://oss.sgi.com/archives/xfs/2013-12/msg00901.html


Any comments are welcome!


Cheers,
-Jeff
Dave Chinner
2014-07-29 23:27:48 UTC
Permalink
Post by Jeff Liu
Hi folk,
This is the revised patches for xfs_ioc_bulkstat consolidation and code
refactoring. As per Christoph's comments, I'm not include the per AG
inumber patch in this series given that I don't actually introduce the
relevant inumbers interface now. Similar to that reason, I also dropped
the per AG bulkstat patch, it would be included in parallel quota check
series.
- one major bug fix is at xfs_bulkstat_ag_ichunk() regarding the user buffer
pointer operations, it should be defined as a pointer-to-pointer since it
would be updated inside xfs_bulkstat_ag_ichunk().
- separate xfs_inumber consolidate patch into two patches, the first one
fix the formater function return value and consolidate the codes, another
one does the actual logic changes for better error handling.
- Add a separate patch to get rid of the redundant user buffer count
checks at xfs_bulkstat()
- fixed agino calculation issue at xfs_bulkstat_grab_ichunk().
v2: http://oss.sgi.com/archives/xfs/2014-04/msg00554.html
v1: http://oss.sgi.com/archives/xfs/2013-12/msg00901.html
Any comments are welcome!
Hi Jeff, I ported this to the current dev tree based on the
xfs-libxfs-restructure branch, and I keep seeing fsstress failing
with memory corruption after random bulkstat ioctls. I see regular
failures with generic/013, generic/068, xfs/167 and the other
fstress tests also randomly fail. The typical failure is glibc
detected memory heap corruption on freeing the bulkstat structure
after the ioctl:

generic/068 42s ...*** Error in `./ltp/fsstress': double free or corruption (!prev): 0x00007f0224000b70 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7aa26)[0x7f022bce3a26]
/lib/x86_64-linux-gnu/libc.so.6(+0x7b7a3)[0x7f022bce47a3]
./ltp/fsstress[0x40260e]
./ltp/fsstress[0x402db5]
./ltp/fsstress[0x402359]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f022bc8a995]
./ltp/fsstress[0x402477]
======= Memory map: ========
00400000-0040c000 r-xp 00000000 08:01 511772 /home/dave/src/xfstests-dev/ltp/fsstress
0060c000-0060d000 rw-p 0000c000 08:01 511772 /home/dave/src/xfstests-dev/ltp/fsstress
0060d000-0062e000 rw-p 00000000 00:00 0 [heap]
7f0224000000-7f0224021000 rw-p 00000000 00:00 0
7f0224021000-7f0228000000 ---p 00000000 00:00 0
7f022ba53000-7f022ba68000 r-xp 00000000 08:01 244777 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f022ba68000-7f022bc68000 ---p 00015000 08:01 244777 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f022bc68000-7f022bc69000 rw-p 00015000 08:01 244777 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f022bc69000-7f022be0b000 r-xp 00000000 08:01 245384 /lib/x86_64-linux-gnu/libc-2.17.so
7f022be0b000-7f022c00b000 ---p 001a2000 08:01 245384 /lib/x86_64-linux-gnu/libc-2.17.so
7f022c00b000-7f022c00f000 r--p 001a2000 08:01 245384 /lib/x86_64-linux-gnu/libc-2.17.so
7f022c00f000-7f022c011000 rw-p 001a6000 08:01 245384 /lib/x86_64-linux-gnu/libc-2.17.so
7f022c011000-7f022c015000 rw-p 00000000 00:00 0
7f022c015000-7f022c02c000 r-xp 00000000 08:01 245380 /lib/x86_64-linux-gnu/libpthread-2.17.so
7f022c02c000-7f022c22b000 ---p 00017000 08:01 245380 /lib/x86_64-linux-gnu/libpthread-2.17.so
7f022c22b000-7f022c22c000 r--p 00016000 08:01 245380 /lib/x86_64-linux-gnu/libpthread-2.17.so
7f022c22c000-7f022c22d000 rw-p 00017000 08:01 245380 /lib/x86_64-linux-gnu/libpthread-2.17.so
7f022c22d000-7f022c231000 rw-p 00000000 00:00 0
7f022c231000-7f022c232000 r-xp 00000000 08:01 325472 /lib/x86_64-linux-gnu/libaio.so.1.0.1
7f022c232000-7f022c431000 ---p 00001000 08:01 325472 /lib/x86_64-linux-gnu/libaio.so.1.0.1
7f022c431000-7f022c432000 r--p 00000000 08:01 325472 /lib/x86_64-linux-gnu/libaio.so.1.0.1
7f022c432000-7f022c433000 rw-p 00001000 08:01 325472 /lib/x86_64-linux-gnu/libaio.so.1.0.1
7f022c433000-7f022c437000 r-xp 00000000 08:01 324661 /lib/x86_64-linux-gnu/libattr.so.1.1.0
7f022c437000-7f022c636000 ---p 00004000 08:01 324661 /lib/x86_64-linux-gnu/libattr.so.1.1.0
7f022c636000-7f022c637000 r--p 00003000 08:01 324661 /lib/x86_64-linux-gnu/libattr.so.1.1.0
7f022c637000-7f022c638000 rw-p 00004000 08:01 324661 /lib/x86_64-linux-gnu/libattr.so.1.1.0
7f022c638000-7f022c659000 r-xp 00000000 08:01 245381 /lib/x86_64-linux-gnu/ld-2.17.so
7f022c844000-7f022c848000 rw-p 00000000 00:00 0
7f022c855000-7f022c856000 rw-p 00000000 00:00 0
7f022c856000-7f022c859000 rw-p 00000000 00:00 0
7f022c859000-7f022c85a000 r--p 00021000 08:01 245381 /lib/x86_64-linux-gnu/ld-2.17.so
7f022c85a000-7f022c85c000 rw-p 00022000 08:01 245381 /lib/x86_64-linux-gnu/ld-2.17.so
7fffd18b7000-7fffd18d8000 rw-p 00000000 00:00 0 [stack]
7fffd19fc000-7fffd19fe000 r-xp 00000000 00:00 0 [vdso]
7fffd19fe000-7fffd1a00000 r--p 00000000 00:00 0 [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]


I haven't been able to track down the issue yet, so could you pass
an eye over my updated patchset and see if you can spot the mistake
I made? The patchset is in the xfs-bulkstat-refactor-NEEDS-FIXING
branch in the xfsdev repo which I just pushed.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Jeff Liu
2014-07-31 07:55:43 UTC
Permalink
Hi Dave,
Post by Dave Chinner
Post by Jeff Liu
Hi folk,
This is the revised patches for xfs_ioc_bulkstat consolidation and code
refactoring. As per Christoph's comments, I'm not include the per AG
inumber patch in this series given that I don't actually introduce the
relevant inumbers interface now. Similar to that reason, I also dropped
the per AG bulkstat patch, it would be included in parallel quota check
series.
- one major bug fix is at xfs_bulkstat_ag_ichunk() regarding the user buffer
pointer operations, it should be defined as a pointer-to-pointer since it
would be updated inside xfs_bulkstat_ag_ichunk().
- separate xfs_inumber consolidate patch into two patches, the first one
fix the formater function return value and consolidate the codes, another
one does the actual logic changes for better error handling.
- Add a separate patch to get rid of the redundant user buffer count
checks at xfs_bulkstat()
- fixed agino calculation issue at xfs_bulkstat_grab_ichunk().
v2: http://oss.sgi.com/archives/xfs/2014-04/msg00554.html
v1: http://oss.sgi.com/archives/xfs/2013-12/msg00901.html
Any comments are welcome!
Hi Jeff, I ported this to the current dev tree based on the
xfs-libxfs-restructure branch, and I keep seeing fsstress failing
with memory corruption after random bulkstat ioctls. I see regular
failures with generic/013, generic/068, xfs/167 and the other
fstress tests also randomly fail. The typical failure is glibc
detected memory heap corruption on freeing the bulkstat structure
generic/068 42s ...*** Error in `./ltp/fsstress': double free or corruption (!prev): 0x00007f0224000b70 ***
======= Backtrace: =========
Sorry for my late response and thanks for help porting this patch series.

Now I can reproduce this issue frequently with generic/013, will try to fix
it ASAP.

Cheers,
-Jeff
Jeff Liu
2014-08-01 06:09:12 UTC
Permalink
Post by Jeff Liu
Hi Dave,
Post by Dave Chinner
Post by Jeff Liu
Hi folk,
This is the revised patches for xfs_ioc_bulkstat consolidation and code
refactoring. As per Christoph's comments, I'm not include the per AG
inumber patch in this series given that I don't actually introduce the
relevant inumbers interface now. Similar to that reason, I also dropped
the per AG bulkstat patch, it would be included in parallel quota check
series.
- one major bug fix is at xfs_bulkstat_ag_ichunk() regarding the user buffer
pointer operations, it should be defined as a pointer-to-pointer since it
would be updated inside xfs_bulkstat_ag_ichunk().
- separate xfs_inumber consolidate patch into two patches, the first one
fix the formater function return value and consolidate the codes, another
one does the actual logic changes for better error handling.
- Add a separate patch to get rid of the redundant user buffer count
checks at xfs_bulkstat()
- fixed agino calculation issue at xfs_bulkstat_grab_ichunk().
v2: http://oss.sgi.com/archives/xfs/2014-04/msg00554.html
v1: http://oss.sgi.com/archives/xfs/2013-12/msg00901.html
Any comments are welcome!
Hi Jeff, I ported this to the current dev tree based on the
xfs-libxfs-restructure branch, and I keep seeing fsstress failing
with memory corruption after random bulkstat ioctls. I see regular
failures with generic/013, generic/068, xfs/167 and the other
fstress tests also randomly fail. The typical failure is glibc
detected memory heap corruption on freeing the bulkstat structure
generic/068 42s ...*** Error in `./ltp/fsstress': double free or corruption (!prev): 0x00007f0224000b70 ***
======= Backtrace: =========
Sorry for my late response and thanks for help porting this patch series.
Now I can reproduce this issue frequently with generic/013, will try to fix
it ASAP.
Hi Dave,

Could you please check the patch below for this issue?


From: Jie Liu <***@oracle.com>
Subject: xfs: always updating acp elements at xfs_bulkstat_ag_ichunk

After processing the inodes in chunk, we should always updating the
last inode number and the left user buffer info no matter anything
wrong while formatting an inode to the user buffer. Otherwise, the
related info might be updated improperly at xfs_bulkstat() for the
next round of inode processing.

Signed-off-by: Jie Liu <***@oracle.com>
---
fs/xfs/xfs_itable.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 9959a05..f71be9c 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -328,11 +328,9 @@ xfs_bulkstat_ag_ichunk(
lastino = ino;
}

- if (!error) {
- acp->ac_lastino = lastino;
- acp->ac_ubleft = ubleft;
- acp->ac_ubelem = ubelem;
- }
+ acp->ac_lastino = lastino;
+ acp->ac_ubleft = ubleft;
+ acp->ac_ubelem = ubelem;

return error;
}
--
1.8.3.2



Cheers,
-Jeff
Dave Chinner
2014-08-03 00:07:05 UTC
Permalink
Post by Jeff Liu
Post by Jeff Liu
Hi Dave,
Post by Dave Chinner
Post by Jeff Liu
Hi folk,
This is the revised patches for xfs_ioc_bulkstat consolidation and code
refactoring. As per Christoph's comments, I'm not include the per AG
inumber patch in this series given that I don't actually introduce the
relevant inumbers interface now. Similar to that reason, I also dropped
the per AG bulkstat patch, it would be included in parallel quota check
series.
- one major bug fix is at xfs_bulkstat_ag_ichunk() regarding the user buffer
pointer operations, it should be defined as a pointer-to-pointer since it
would be updated inside xfs_bulkstat_ag_ichunk().
- separate xfs_inumber consolidate patch into two patches, the first one
fix the formater function return value and consolidate the codes, another
one does the actual logic changes for better error handling.
- Add a separate patch to get rid of the redundant user buffer count
checks at xfs_bulkstat()
- fixed agino calculation issue at xfs_bulkstat_grab_ichunk().
v2: http://oss.sgi.com/archives/xfs/2014-04/msg00554.html
v1: http://oss.sgi.com/archives/xfs/2013-12/msg00901.html
Any comments are welcome!
Hi Jeff, I ported this to the current dev tree based on the
xfs-libxfs-restructure branch, and I keep seeing fsstress failing
with memory corruption after random bulkstat ioctls. I see regular
failures with generic/013, generic/068, xfs/167 and the other
fstress tests also randomly fail. The typical failure is glibc
detected memory heap corruption on freeing the bulkstat structure
generic/068 42s ...*** Error in `./ltp/fsstress': double free or corruption (!prev): 0x00007f0224000b70 ***
======= Backtrace: =========
Sorry for my late response and thanks for help porting this patch series.
Now I can reproduce this issue frequently with generic/013, will try to fix
it ASAP.
Hi Dave,
Could you please check the patch below for this issue?
Subject: xfs: always updating acp elements at xfs_bulkstat_ag_ichunk
After processing the inodes in chunk, we should always updating the
last inode number and the left user buffer info no matter anything
wrong while formatting an inode to the user buffer. Otherwise, the
related info might be updated improperly at xfs_bulkstat() for the
next round of inode processing.
---
fs/xfs/xfs_itable.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 9959a05..f71be9c 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -328,11 +328,9 @@ xfs_bulkstat_ag_ichunk(
lastino = ino;
}
- if (!error) {
- acp->ac_lastino = lastino;
- acp->ac_ubleft = ubleft;
- acp->ac_ubelem = ubelem;
- }
+ acp->ac_lastino = lastino;
+ acp->ac_ubleft = ubleft;
+ acp->ac_ubelem = ubelem;
Yes, that looks like it fixes the issues I'm seeing here. I'll fold
this into the patch that introduces this code (the last patch, if
I'm not mistaken) so we don't have a point in a bisection where
bulkstat is busted. I'll run it through some more testing and if
everything goes well I'll push it into for-next early in the week.

Thanks for the quick turn-around and finding the problem for me,
Jeff!

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Loading...