Improving XFS file system inode performance

Discussion:

Jesse Stroik

2010-11-22 21:59:51 UTC

XFS community,

I have a couple of medium-sized file systems on an ftp server (10TB file
system mounted within a 20TB file system). The load on these file
systems is getting pretty high because we have many users mirroring
datasets from the server. As far as I can tell, the main issue is with
inode performance. For example, an 'ls' on a directory may take 20
seconds to complete. At any given time, there is > 50 ftp STAT, LIST or
NLST commands some of which list entire directories or wildcards.

Sadly, the file system was created with 32 bit inodes. I've remounted
it with the inode64 option, but I assume performance will be boosted
primarily when old files are replaced with new files. Is there anything
I can do to improve performance now?

I'm also using noatime and logbufs=8.

Performance was fine before the file system was filled -- last week ~8TB
showed up and filled the 20TB file system. Since, it has been
performing poorly.

I'd also be interested in inode cache tuning options specific to XFS.
i've been having trouble finding documentation on this particular issue.

This is a production file system so please frame your suggestions with
respect to that. It is a RHEL 5.5 system running xfsprogs-2.9.4.1
centos and redhat kernel version 2.6.18-194.17.1 which includes a
variety of backported xfs fixes.

Best,
Jesse

Jesse Stroik

2010-11-22 22:32:06 UTC

Permalink

Post by Jesse Stroik
Performance was fine before the file system was filled -- last week
~8TB showed up and filled the 20TB file system. Since, it has been
performing poorly.

Maybe it got fragmented? How does fragmentation look like?

I wasn't able to resolve this in reasonable time. Part of the issue is
that we're dealing with files within about 100k directories. I'll
attempt to get the fragmentation numbers overnight.

I suspect the regularly listed set of files on this fs exceeds the inode
cache. Where can I determine the cache misses and tune the file system?

Best,
Jesse

Dave Chinner

2010-11-22 23:44:19 UTC

Permalink

Post by Jesse Stroik

Post by Jesse Stroik
Performance was fine before the file system was filled -- last week
~8TB showed up and filled the 20TB file system. Since, it has been
performing poorly.

Maybe it got fragmented? How does fragmentation look like?

I wasn't able to resolve this in reasonable time. Part of the issue
is that we're dealing with files within about 100k directories.
I'll attempt to get the fragmentation numbers overnight.
I suspect the regularly listed set of files on this fs exceeds the
inode cache. Where can I determine the cache misses and tune the
file system?

Yup, that would be my guess, too.

You can use slabtop to find out how many inodes are cached and the
memory they use, and /proc/meminfo to determine the amount of memory
used by the page cache.

For cache hits and misses, there's a statistics file in
/proc/fs/xfs/stats that contains inode cache hits and misses
amongst other things. Those stats are somewhat documented here:

http://xfs.org/index.php/Runtime_Stats

and you want to look at the inode operation stats. This script:

http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfsmisc/xfs_stats.pl?rev=1.7;content-type=text%2Fplain

makes it easy to view them, even though it doesn't handle many of
the more recent additions.

As to tuning the size of the cache - it's pretty much a crap-shoot.
Firstly, you've got to have enough memory - XFS needs approximately
1-1.5GB RAM per million cached inodes (double that if you've got
lock debugging turned on).

The amount of RAM then used by the inode cache is then dependent on
memory pressure. There's one knob that sometimes makes a difference
- it changes the balance between page cache vs inode cache
reclaimation: /proc/sys/vm/vfs_cache_pressure. From
Documentation/sysctl/vm.txt:

At the default value of vfs_cache_pressure=100 the kernel
will attempt to reclaim dentries and inodes at a "fair" rate
with respect to pagecache and swapcache reclaim. Decreasing
vfs_cache_pressure causes the kernel to prefer to retain
dentry and inode caches. When vfs_cache_pressure=0, the
kernel will never reclaim dentries and inodes due to memory
pressure and this can easily lead to out-of-memory
conditions. Increasing vfs_cache_pressure beyond 100 causes
the kernel to prefer to reclaim dentries and inodes.

So you want to decrease vfs_cache_pressure to try to preserve the
inode cache rather than the page cache.

Cheers,

Dave.

--
Dave Chinner
***@fromorbit.com

Jesse Stroik

2010-11-23 14:49:35 UTC

Permalink

Dave,

Thanks. This is precisely what I was looking for. I'll let you know
how it turns out.

As this file system is likely to continue to increase in number of files
at a fairly rapid rate, we're going to need a long term strategy. I
suspect it may be necessary in the near future to double or quadruple
the memory to 32GB or 64GB, but the uncertainty in the formula makes me
nervous.

For a situation like this, it would be ideal if we could specify an
inode cache size.

Thanks,
Jesse

Dave Chinner

2010-11-23 20:27:21 UTC

Permalink

Post by Jesse Stroik
Dave,
Thanks. This is precisely what I was looking for. I'll let you
know how it turns out.
As this file system is likely to continue to increase in number of
files at a fairly rapid rate, we're going to need a long term
strategy. I suspect it may be necessary in the near future to
double or quadruple the memory to 32GB or 64GB, but the uncertainty
in the formula makes me nervous.
For a situation like this, it would be ideal if we could specify an
inode cache size.

That's the third request in a few weeks I've had for being able to
fix the inode cache size to either prevent it from growing too large
or to prevent it from being reclaimed prematurely. I doubt I'll ever
be able to get the VFS cache capped (people have tried in the past
with no success), so I'm going to look at providing a method for XFS
to provide limits on the size of it's inode cache.

Cheers,

Dave.

--
Dave Chinner
***@fromorbit.com

Dave Chinner

2010-11-23 20:39:06 UTC

Permalink

Post by Jesse Stroik
As this file system is likely to continue to increase in number of
files at a fairly rapid rate, we're going to need a long term
strategy. I suspect it may be necessary in the near future to
double or quadruple the memory to 32GB or 64GB, but the uncertainty
in the formula makes me nervous.

For XFS specific stats, PCP is your friend. It knows all about the
stats in /proc/fs/xfs/stats as well as most system level stats that
other toolѕ also collect and display...

Cheers,

Dave.

--
Dave Chinner
***@fromorbit.com

Michael Monnerie

2010-11-23 20:27:36 UTC

Permalink

Using tools like "munin" helps you see the problem in graphs, so you can
easily check what's going on. Munin draws graphs from a lot of
parameters of a system, and helped me find problems that you couldn't
find otherwise.
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
//
// Haus zu verkaufen: http://zmi.at/langegg/

Emmanuel Florac

2010-11-22 22:25:28 UTC

Permalink

Post by Jesse Stroik
Performance was fine before the file system was filled -- last week
~8TB showed up and filled the 20TB file system. Since, it has been
performing poorly.

Maybe it got fragmented? How does fragmentation look like?

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <***@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------