Discussion:
mkfs.xfs fails with raid5 and smaller chunk sizes
Brian Hemme
2014-09-16 22:03:08 UTC
Permalink
Hello all,

I am having some odd problems with mkfs.xfs when used on a raid 5
array. The array is built from 6 960GB SSDs all connected to SATA ports
on the MB and created with mdadm. If I use a chunk size any smaller
then 512K mkfs.xfs just hangs forever. It continues to use CPU and so
does the raid array but never completes. If the system is just left
running for an extended length of time the whole OS eventually locks
up. I have tried this on three different systems with the same
results. I have searched all over for someone with similar issues
without success. I am hoping I am just doing something clearly wrong
and you all can set me straight quickly.

Some specifics:
Arch linux with 3.14.1 kernel
mkfs.xfs version 3.1.11
mdadm - v3.3 - 3rd September 2013
mdadm --create /dev/md0 --chunk=64K --level=5 --raid-devices=6
/dev/sd[a-f]
mkfs.xfs /dev/md0
** This command fails and locks up

I have tried specifying the arguments to mkfs.xfs with the same
results. Building a 4 drive array seems to require a chunk size of 1M
or greater to work. Same results if I make a partition on the array and
make the fs there.

Any help would be appreciated
Thanks
Brian
Dave Chinner
2014-09-16 22:17:38 UTC
Permalink
Post by Brian Hemme
Hello all,
I am having some odd problems with mkfs.xfs when used on a raid 5
array. The array is built from 6 960GB SSDs all connected to SATA
ports on the MB and created with mdadm. If I use a chunk size any
smaller then 512K mkfs.xfs just hangs forever. It continues to use
CPU and so does the raid array but never completes. If the system
is just left running for an extended length of time the whole OS
eventually locks up. I have tried this on three different systems
with the same results. I have searched all over for someone with
similar issues without success. I am hoping I am just doing
something clearly wrong and you all can set me straight quickly.
Arch linux with 3.14.1 kernel
mkfs.xfs version 3.1.11
mdadm - v3.3 - 3rd September 2013
mdadm --create /dev/md0 --chunk=64K --level=5 --raid-devices=6
/dev/sd[a-f]
mkfs.xfs /dev/md0
** This command fails and locks up
I have tried specifying the arguments to mkfs.xfs with the same
results. Building a 4 drive array seems to require a chunk size of
1M or greater to work. Same results if I make a partition on the
array and make the fs there.
mkfs.xfs really should only take a couple of seconds to complete.
Seeing as you are using SSDs, my first suspicion is that md or the
SSDs are having problems with discard. Hence you should first
try 'mkfs.xfs -K /dev/md0' and see if that completes quickly.

Otherwise, output of 'echo w > sysrq-trigger' from dmesg would be a
good start, as would a 'perf top -G -U' snapshot (run for 30s at
least a minute after mkfs.xfs starts) to tell us what is burning
CPU.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Brian Hemme
2014-09-16 22:47:43 UTC
Permalink
Post by Dave Chinner
Post by Brian Hemme
Hello all,
I am having some odd problems with mkfs.xfs when used on a raid 5
array. The array is built from 6 960GB SSDs all connected to SATA
ports on the MB and created with mdadm. If I use a chunk size any
smaller then 512K mkfs.xfs just hangs forever. It continues to use
CPU and so does the raid array but never completes. If the system
is just left running for an extended length of time the whole OS
eventually locks up. I have tried this on three different systems
with the same results. I have searched all over for someone with
similar issues without success. I am hoping I am just doing
something clearly wrong and you all can set me straight quickly.
Arch linux with 3.14.1 kernel
mkfs.xfs version 3.1.11
mdadm - v3.3 - 3rd September 2013
mdadm --create /dev/md0 --chunk=64K --level=5 --raid-devices=6
/dev/sd[a-f]
mkfs.xfs /dev/md0
** This command fails and locks up
I have tried specifying the arguments to mkfs.xfs with the same
results. Building a 4 drive array seems to require a chunk size of
1M or greater to work. Same results if I make a partition on the
array and make the fs there.
mkfs.xfs really should only take a couple of seconds to complete.
Seeing as you are using SSDs, my first suspicion is that md or the
SSDs are having problems with discard. Hence you should first
try 'mkfs.xfs -K /dev/md0' and see if that completes quickly.
Otherwise, output of 'echo w> sysrq-trigger' from dmesg would be a
good start, as would a 'perf top -G -U' snapshot (run for 30s at
least a minute after mkfs.xfs starts) to tell us what is burning
CPU.
Cheers,
Dave.
Thanks for the quick response!

Adding the -K seemed to do the trick. However, for my education, why is
this needed in this case? It seems to work without it for larger chunk
sizes or for raid 0 instead of 5. It also worked on our old install
with a 3.1.6 kernel. Any why would not using the -K cause enough of a
problem that the whole machine hangs? Just trying to understand this
enough to make sure I don't run into problems down the road.

Thanks again,
Brian

Loading...