Brian Foster
2014-08-08 18:49:24 UTC
Hi all,
I've seen collapse range fall over during some recent stress testing.
I'm running fsx and 16 fsstress threads in parallel to reproduce. Note
that the fsstress workload doesn't need to be on the same fs (I suspect
a sync() is a trigger). These patches are what has fallen out so far...
The first patch stems from the fact that the error caused an fs shutdown
that appeared to be unnecessary. I was initially going to skip the inode
log on any error, but on closer inspection it seems like we expect to
abort/shutdown if something has in fact been changed, so this modifies
the code to reduce that shutdown window. The second patch deals with the
actual collapse failure by fixing up the locking.
Note that I still reproduced at least one collapse failure even with
these fixes, so there could be more at play here with the
implementation:
XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 5535 of file fs/xfs/libxfs/xfs_bmap.c. Caller xfs_collapse_file_space+0x1af/0x280 [xfs]
This took significantly longer to reproduce and I don't yet have a feel
for how reproducible it is in general. In the meantime, these two seemed
relatively straightforward and incremental...
Brian
Brian Foster (2):
xfs: don't log inode unless extent shift makes extent modifications
xfs: hole the inode lock across a full file collapse
fs/xfs/libxfs/xfs_bmap.c | 18 ++++++++++--------
fs/xfs/xfs_bmap_util.c | 5 +++--
2 files changed, 13 insertions(+), 10 deletions(-)
I've seen collapse range fall over during some recent stress testing.
I'm running fsx and 16 fsstress threads in parallel to reproduce. Note
that the fsstress workload doesn't need to be on the same fs (I suspect
a sync() is a trigger). These patches are what has fallen out so far...
The first patch stems from the fact that the error caused an fs shutdown
that appeared to be unnecessary. I was initially going to skip the inode
log on any error, but on closer inspection it seems like we expect to
abort/shutdown if something has in fact been changed, so this modifies
the code to reduce that shutdown window. The second patch deals with the
actual collapse failure by fixing up the locking.
Note that I still reproduced at least one collapse failure even with
these fixes, so there could be more at play here with the
implementation:
XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 5535 of file fs/xfs/libxfs/xfs_bmap.c. Caller xfs_collapse_file_space+0x1af/0x280 [xfs]
This took significantly longer to reproduce and I don't yet have a feel
for how reproducible it is in general. In the meantime, these two seemed
relatively straightforward and incremental...
Brian
Brian Foster (2):
xfs: don't log inode unless extent shift makes extent modifications
xfs: hole the inode lock across a full file collapse
fs/xfs/libxfs/xfs_bmap.c | 18 ++++++++++--------
fs/xfs/xfs_bmap_util.c | 5 +++--
2 files changed, 13 insertions(+), 10 deletions(-)
--
1.8.3.1
1.8.3.1