xfs_check - out of memory | xfs_repair - superblock error reading

Discussion:

christian gattermair

2006-09-18 13:19:18 UTC

hi!

after a reboot of our box (debian sarge, 3ware controller, raid 5 - 3tb xfs)
we can not mount it any more.

from syslog:

Sep 18 12:51:36 localhost kernel: SGI XFS with ACLs, security attributes,
realtime, large block numbers, no debug enabled
Sep 18 12:51:36 localhost kernel: SGI XFS Quota Management subsystem
Sep 18 12:51:53 localhost kernel: attempt to access beyond end of device
Sep 18 12:51:53 localhost kernel: sdb1: rw=0, want=6445069056,
limit=2150101796
Sep 18 12:51:53 localhost kernel: I/O error in filesystem ("sdb1") meta-data
dev sdb1 block 0x18027f2ff ("xfs_read_buf") error 5 buf count 512
Sep 18 12:51:53 localhost kernel: XFS: size check 2 failed

xfs_check fails with:

xfs_check /dev/sdb1
XFS: totally zeroed log
xfs_check: out of memory

there is a lot of space (i tryed more swap)

Mem: 1011 1006 5 0 750 111
-/+ buffers/cache: 144 867
Swap: 57812 0 57812

does xfs_check only looks at the mem or also an swap??? is there any hint to
use the swap?

second question:

xfs_repair works but can not find any superblock. any hints?

xfs_repair /dev/sdb1
Phase 1 - find and verify superblock...
error reading superblock 11 -- seek to offset 1134332153856 failed
couldn't verify primary superblock - bad magic number !!!

attempting to find secondary superblock...
...
...
..............found candidate secondary superblock...
error reading superblock 11 -- seek to offset 1134332153856 failed
unable to verify superblock, continuing...

the whole system runs one year without any errors. only today one shutdown for
chaning the usv ....

thanks for any hint!

with friendly greetings,

christian gattermair

l***@oss.sgi.com

2006-09-18 14:41:41 UTC

Permalink

Post by christian gattermair
xfs_repair works but can not find any superblock. any hints?
xfs_repair /dev/sdb1
Phase 1 - find and verify superblock...
error reading superblock 11 -- seek to offset 1134332153856 failed
couldn't verify primary superblock - bad magic number !!!
attempting to find secondary superblock...
...
...
..............found candidate secondary superblock...
error reading superblock 11 -- seek to offset 1134332153856 failed

That's about a terabyte into your 3t fs, but you can't seek to it? Any kernel
messages when this happens? What does /proc/partitions and/or parted say about
the size of /dev/sdb1? Seems like maybe your device itself is not as expected.

-Eric

christian gattermair

2006-09-19 10:03:21 UTC

Permalink

hi!

thanks for all your answers

parted:

Disk geometry for /dev/sda: 0.000-3147012,000 megabytes
Disk label type: msdos
Minor Start End Type Filesystem Flags
1 0,031 1049854,423 primary xfs

cat /proc/partitions

8 0 3222540288 sda
8 1 1075050898 sda1

tw_cli (3ware raid tool)

Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC
------------------------------------------------------------------------------
u0 RAID-5 OK - 64K 3073.25 ON OFF OFF

Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 279.46 GB 586072368 3NF04F4D
p1 OK u0 279.46 GB 586072368 3NF09YMR
p2 OK u0 279.46 GB 586072368 3NF0AGN2
p3 OK u0 279.46 GB 586072368 3NF0D6ZM
p4 OK u0 279.46 GB 586072368 3NF0BG47
p5 OK u0 279.46 GB 586072368 3NF09YQT
p6 OK u0 279.46 GB 586072368 3NF02SMD
p7 OK u0 279.46 GB 586072368 3NF01YL5
p8 OK u0 279.46 GB 586072368 3NF02J46
p9 OK u0 279.46 GB 586072368 3NF04EYE
p10 OK u0 279.46 GB 586072368 3NF0ECRG
p11 OK u0 279.46 GB 586072368 3NF071X5

Name OnlineState BBUReady Status Volt Temp Hours LastCapTest
---------------------------------------------------------------------------
bbu On Yes OK OK OK 255 23-Aug-2005

dmesg:

3ware 9000 Storage Controller device driver for Linux v2.26.02.001.
ACPI: PCI interrupt 0000:04:02.0[A] -> GSI 52 (level, low) -> IRQ 185
scsi_proc_hostdir_add: proc_mkdir failed for <NULL>
3w-9xxx: scsi0: AEN: INFO (0x04:0x0055): <NULL>:.
3w-9xxx: scsi0: AEN: INFO (0x04:0x0053): <NULL>:.
scsi0 : 3ware 9000 Storage Controller
3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xfeaf0000, IRQ: 185.
3w-9xxx: scsi0: Firmware FE9X 2.06.00.009, BIOS BE9X 2.03.01.051, Ports: 12.
Vendor: AMCC Model: 9500S-12 DISK Rev: 2.06
Type: Direct-Access ANSI SCSI revision: 03
sda : very big device. try to use READ CAPACITY(16).
SCSI device sda: 6445080576 512-byte hdwr sectors (3299881 MB)
SCSI device sda: drive cache: write back, no read (daft)
/dev/scsi/host0/bus0/target0/lun0: p1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0

what is a usv? sorry a german shurtcut. i mean ups.
no i have nothing changed on the system. normal shutdown for changing the ups
and then boot. and know i can not mount the drive ....

yes it is the same kernel (debian 2.6.8-2-686-smp)

i have know updated xfsprogs to 2.8.11 and i get a new error message with
xfs_check:

xfs_check /dev/sda1
XFS: Log inconsistent (didn't find previous header)
XFS: failed to find log head
ERROR: cannot find log head/tail, run xfs_repair

i will try ... and hope

with friendly greetings,

christian gattermair

Eric Sandeen

2006-09-19 15:02:50 UTC

Permalink

Post by christian gattermair
hi!
thanks for all your answers
Disk geometry for /dev/sda: 0.000-3147012,000 megabytes
Disk label type: msdos
Minor Start End Type Filesystem Flags
1 0,031 1049854,423 primary xfs
cat /proc/partitions
8 0 3222540288 sda
8 1 1075050898 sda1

sda1 is only 1 terabyte. sda itself appears to be about 3 terabytes.

I think you need to sort out where your filesystem is living...

Did someone repartition sda?

Post by christian gattermair
Vendor: AMCC Model: 9500S-12 DISK Rev: 2.06
Type: Direct-Access ANSI SCSI revision: 03
sda : very big device. try to use READ CAPACITY(16).

Or perhaps this is the 2T lun problem... although you say it was working
before. At any rate this isn't looking like an xfs problem at this
stage - your kernel thinks that your storage is smaller than you think
it is.

-Eric

Post by christian gattermair
SCSI device sda: 6445080576 512-byte hdwr sectors (3299881 MB)
SCSI device sda: drive cache: write back, no read (daft)
/dev/scsi/host0/bus0/target0/lun0: p1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0

christian gattermair

2006-09-20 15:35:17 UTC

Permalink

hi!

thanks for all your answers

Post by Eric Sandeen

Post by christian gattermair
Disk geometry for /dev/sda: 0.000-3147012,000 megabytes
Disk label type: msdos
Minor Start End Type Filesystem Flags
1 0,031 1049854,423 primary xfs
cat /proc/partitions
8 0 3222540288 sda
8 1 1075050898 sda1

sda1 is only 1 terabyte. sda itself appears to be about 3 terabytes.
I think you need to sort out where your filesystem is living...
Did someone repartition sda?

no repartition of sda or other partitions .... sda1 was the whole space of sda
(3t)

Post by Eric Sandeen

Post by christian gattermair
Vendor: AMCC Model: 9500S-12 DISK Rev: 2.06
Type: Direct-Access ANSI SCSI revision: 03
sda : very big device. try to use READ CAPACITY(16).

how can i bring back the right size? today i have compiled a 2.6.18 kernel.
same error ....

any ideas?

thanks for help

with friendly greetings,

christian gattermair

Post by Eric Sandeen
-Eric

David Chinner

2006-09-18 22:23:48 UTC

Permalink

Post by christian gattermair
hi!
after a reboot of our box (debian sarge, 3ware controller, raid 5 - 3tb xfs)
we can not mount it any more.
Sep 18 12:51:36 localhost kernel: SGI XFS with ACLs, security attributes,
realtime, large block numbers, no debug enabled
Sep 18 12:51:36 localhost kernel: SGI XFS Quota Management subsystem
Sep 18 12:51:53 localhost kernel: attempt to access beyond end of device
Sep 18 12:51:53 localhost kernel: sdb1: rw=0, want=6445069056,
limit=2150101796
Sep 18 12:51:53 localhost kernel: I/O error in filesystem ("sdb1") meta-data
dev sdb1 block 0x18027f2ff ("xfs_read_buf") error 5 buf count 512
Sep 18 12:51:53 localhost kernel: XFS: size check 2 failed

I/O error - something is not right with your raid controller i think.
Are there any other errors in dmesg? What does /proc/partitions tell
you about the size of the device?

Post by christian gattermair
xfs_check /dev/sdb1
XFS: totally zeroed log
xfs_check: out of memory

3TB filesystem - you won't be able to xfs_check that on a 32 bit system,
and you'll need >6GiB RAM to check it on a 64bit system.

Post by christian gattermair
there is a lot of space (i tryed more swap)
Mem: 1011 1006 5 0 750 111
-/+ buffers/cache: 144 867
Swap: 57812 0 57812
does xfs_check only looks at the mem or also an swap??? is there any hint to
use the swap?

Sounds like a 32 bit system where a process can't use more than 2-3GB of RAM.
No amount of swap will help if the process requires more then the maximum
thæt can be addressed per process.

As already commented, that's about 1TB into 3TB volume. I'd suggest
raid controller problems....

Did you boot the same kernel you'd been running previously?

Post by christian gattermair
the whole system runs one year without any errors. only today one shutdown for
chaning the usv ....

What's a usv? Did you change anything else?

Cheers,

Dave.

--
Dave Chinner
Principal Engineer
SGI Australian Software Group

Stephan Jansen

2006-09-18 22:56:10 UTC

Permalink

Hi,

Post by David Chinner

Post by christian gattermair
hi!
after a reboot of our box (debian sarge, 3ware controller, raid 5 - 3tb xfs)
we can not mount it any more.
Sep 18 12:51:36 localhost kernel: SGI XFS with ACLs, security
attributes,
realtime, large block numbers, no debug enabled
Sep 18 12:51:36 localhost kernel: SGI XFS Quota Management subsystem
Sep 18 12:51:53 localhost kernel: attempt to access beyond end of device
Sep 18 12:51:53 localhost kernel: sdb1: rw=0, want=6445069056,
limit=2150101796
Sep 18 12:51:53 localhost kernel: I/O error in filesystem ("sdb1") meta-data
dev sdb1 block 0x18027f2ff ("xfs_read_buf") error 5 buf
count 512
Sep 18 12:51:53 localhost kernel: XFS: size check 2 failed

I/O error - something is not right with your raid controller i think.
Are there any other errors in dmesg? What does /proc/partitions tell
you about the size of the device?

Post by christian gattermair
xfs_check /dev/sdb1
XFS: totally zeroed log
xfs_check: out of memory

3TB filesystem - you won't be able to xfs_check that on a 32 bit system,
and you'll need >6GiB RAM to check it on a 64bit system.

I was just going to create a 3TB filesystem on a 32 bit system. So
xfs_check will not work? How about xfs_repair? I assume that will
work but would like to know beforehand.

[stuff deleted]

Post by David Chinner
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

--

----- Stephan

Barry Naujok

2006-09-19 00:33:31 UTC

Permalink

-----Original Message-----
On Behalf Of Stephan Jansen
Sent: Tuesday, 19 September 2006 8:56 AM
Subject: Re: xfs_check - out of memory | xfs_repair -
superblock error reading
I was just going to create a 3TB filesystem on a 32 bit system. So
xfs_check will not work? How about xfs_repair? I assume that will
work but would like to know beforehand.

Just to let you know, I'm currently working on memory optimisations for
xfs_repair and when it's released, it should work on your system. Memory
usage will grow with inode count and free space fragementation and not based
on filesystem size as it currently does.

The first set of changes have been done and are currently being tested.

Frank Hellmann

2006-09-19 07:56:18 UTC

Permalink

Hi,

Post by Stephan Jansen
Hi,
I was just going to create a 3TB filesystem on a 32 bit system. So
xfs_check will not work? How about xfs_repair? I assume that will
work but would like to know beforehand.
[stuff deleted]
----- Stephan

I have a couple of 3.1TB FC arrays here and I had no real problems with
xfs on them so far. We have to reset the machines from time to time,
'cause nvidia drivers and data moving locks up the complete machine, but
even then no data losses occured. The lastest kernels we use (2.6.16)
checks the filesystems/logs without any hickups during mount phase.

Commandline xfs_check won't work (out of memory error), but checking
with xfs_repair works fine here. It will need a lot of memory though, so
have at least 3GB RAM installed.

Cheers,
Frank...

--
--------------------------------------------------------------------------
Frank Hellmann Optical Art GmbH Waterloohain 7a
DI Supervisor http://www.opticalart.de 22769 Hamburg
***@opticalart.de Tel: ++49 40 5111051 Fax: ++49 40 43169199

christian gattermair

2006-09-19 13:19:13 UTC

Permalink

hi!

xfs_repair

.....................................Sorry, could not find valid secondary
superblock
Exiting now.

data lose or any option to recover anything????

thanks for any tip or hint!

with friendly greetings,

christian gattermair