Discussion:
Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
Carlos E. R.
2014-07-02 09:57:25 UTC
Permalink
Hi,

I got this error:


<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875 Comm: kworker/0:2 Tainted: P O 3.11.10-11-desktop #1
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390] 0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391] ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] 0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] Call Trace:
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403] [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406] [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408] [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411] [<ffffffff815a0252>] dump_stack+0x50/0x89
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425] [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468] [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510] [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560] [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623] [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659] [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688] [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716] [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744] [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763] [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765] [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768] [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771] [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c4c3d8
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)


Brief description:


* It happens only on restore from hibernation.
* It happens randomly, spaced a month or two.
* It happens always on the same partition, the one that holds /home
(I have 10 XFS partitions spread on 4 internal hard disks, and a few
more external). It is a new disk, 2 TB, traditional MBR partitions.
* Disk has no defects, or at least so says smartctl long test.
* When it happens, recovery is impossible: xfs_repair does not seem to
find anything, or maybe it does, silently; but on system reuse,
it crashes again, fast.
* Thus recovery procedure is to use "xfsdump" to get a backup copy,
reformat the partition, and recover the files with xfsrestore.


The worst issue for me is that "xfs_repair" fails to repair it.

I do not have more info than what appears on the logs, but four times
(two different kernels):

***@Telcontar:~> zgrep XFS_WANT_CORRUPTED_GOTO /var/log/messages*xz
/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
***@Telcontar:~>


The first time that this happened I used a rescue usb stick (openSUSE
13.1 xfce). xfs_repair said to mount the partition to force re-play the
log. When I did, mount hung. It was unkillable. Reboot of system hung. I
then used "xfs_repair -L" on that disk, which succeeded with no
error report. On reuse, the system crashed soon: you can see above two
entries on the same day.

This last time, I simply rebooted to runlevel 3, logon as root, perform
the backup, format, restore. No testing, I was in a real hurry, and even
so took hours.


I suppose that to diagnose this further you will want data extracted from
the filesystem: you have to tell me what operations to perform to obtain
that data the next time it happens, without me having to ask here for your
help. It may happen tomorrow, or in two months time, so I have to be
prepared for it. And as usual, it may happen at the worst time, when I
have work to be done in a hurry, as this last time (or I would have asked
you).

The only data I have is the system logs.

I don't suppose that the "xfs_dump" archive contains anything of interest?

- From what I have googled, one suspect is something wrong in that
partition. It was created using gparted, as the rest of the disk. This
last time I used "YaST" to reformat it, not mkfs.xfs.



Wait! I have a "dd" copy of the entire partition (500 GB), made on March
16th, 5 AM, so hard data could be obtained from there. I had
forgotten. I'll get something for you now:


Telcontar:/data/storage_d/old_backup # xfs_info xfs_copy_home
meta-data=/dev/sdf2 isize=256 agcount=4, agsize=122341568
blks
= sectsz=512 attr=2
data = bsize=4096 blocks=489366272, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=238948, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Telcontar:/data/storage_d/old_backup #


I could do a "xfs_metadump" on it - just tell me what options to use, and
where can the result be uploaded to, if big.



Current versions:

Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux

xfs_repair version 3.1.11

CPU: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz

System: openSUSE Linux 13.1, 64 bit.


- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" at Telcontar)
Brian Foster
2014-07-02 12:04:43 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] <0.4>
2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875
Comm: kworker/0:2 Tainted: P O 3.11.10-11-desktop #1
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390] 0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391] ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] 0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403] [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406] [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408] [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411] [<ffffffff815a0252>] dump_stack+0x50/0x89
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425] [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468] [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510] [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560] [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623] [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659] [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688] [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716] [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744] [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763] [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765] [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768] [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771] [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c4c3d8
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
This is the background eofblocks scanner attempting to free preallocated
space on a file. The scanner looks for files that have been recently
grown and since been flushed to disk (i.e., no longer concurrently being
written to) and trims the post-eof preallocation that comes along with
growing files.

The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
extent we are attempting to free is already accounted for in the
by-block allocation btree. IOW, this is attempting to free an extent
that the allocation metadata thinks is already free.
* It happens only on restore from hibernation.
Interesting, could you elaborate a bit more on the behavior this system
is typically subjected to? i.e., is this a server that sees a constant
workload that is also frequently hibernated/awakened?
* It happens randomly, spaced a month or two.
* It happens always on the same partition, the one that holds /home
(I have 10 XFS partitions spread on 4 internal hard disks, and a few
more external). It is a new disk, 2 TB, traditional MBR partitions.
* Disk has no defects, or at least so says smartctl long test.
* When it happens, recovery is impossible: xfs_repair does not seem to
find anything, or maybe it does, silently; but on system reuse,
it crashes again, fast.
* Thus recovery procedure is to use "xfsdump" to get a backup copy,
reformat the partition, and recover the files with xfsrestore.
The worst issue for me is that "xfs_repair" fails to repair it.
I do not have more info than what appears on the logs, but four times (two
/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
The first time that this happened I used a rescue usb stick (openSUSE 13.1
xfce). xfs_repair said to mount the partition to force re-play the log. When
I did, mount hung. It was unkillable. Reboot of system hung. I then used
"xfs_repair -L" on that disk, which succeeded with no error report. On
reuse, the system crashed soon: you can see above two entries on the same
day.
This last time, I simply rebooted to runlevel 3, logon as root, perform the
backup, format, restore. No testing, I was in a real hurry, and even so took
hours.
So you have reproduced this, reformatted with mkfs, restored from
backups and continued to reproduce the problem? And still only on this
particular partition?

This is interesting because the corruption appears to be associated with
post-eof space, which is generally transient. The worst case is that
this space is trimmed off files when they are evicted from cache, such
as during a umount. To me, that seems to correlate with a more
recent/runtime problem rather than something that might be lingering on
disk, but we don't really know for sure.
I suppose that to diagnose this further you will want data extracted from
the filesystem: you have to tell me what operations to perform to obtain
that data the next time it happens, without me having to ask here for your
help. It may happen tomorrow, or in two months time, so I have to be
prepared for it. And as usual, it may happen at the worst time, when I have
work to be done in a hurry, as this last time (or I would have asked you).
The only data I have is the system logs.
I don't suppose that the "xfs_dump" archive contains anything of interest?
- From what I have googled, one suspect is something wrong in that
partition. It was created using gparted, as the rest of the disk. This last
time I used "YaST" to reformat it, not mkfs.xfs.
Wait! I have a "dd" copy of the entire partition (500 GB), made on March
16th, 5 AM, so hard data could be obtained from there. I had forgotten. I'll
Telcontar:/data/storage_d/old_backup # xfs_info xfs_copy_home
meta-data=/dev/sdf2 isize=256 agcount=4, agsize=122341568
blks
= sectsz=512 attr=2
data = bsize=4096 blocks=489366272, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=238948, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Telcontar:/data/storage_d/old_backup #
I could do a "xfs_metadump" on it - just tell me what options to use, and
where can the result be uploaded to, if big.
A metadump would be helpful, though that only gives us the on-disk
state. What was the state of this fs at the time the dd image was
created? I'm curious if something like an 'rm -rf *' on the metadump
would catch any other corruptions or if this is indeed limited to
something associated with recent (pre)allocations.

Run 'xfs_metadump <src> <tgtfile>' to create a metadump that will
obfuscate filenames by default. It should also be compressible. In the
future, it's probably worth grabbing a metadump as a first step (before
repair, zeroing the log, etc.) so we can look at the fs in the state
most recent to the crash.

Brian
Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux
xfs_repair version 3.1.11
System: openSUSE Linux 13.1, 64 bit.
- -- Cheers
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEARECAAYFAlOz14UACgkQtTMYHG2NR9XWLgCfRXInLwE/FrToinuYjpgWQyu6
dA4AnjAP0DdUvOnsdZfLVaI7wm+c7U0N
=vxuS
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Mark Tinguely
2014-07-02 13:07:51 UTC
Permalink
Post by Brian Foster
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346]<0.4>
2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875
Comm: kworker/0:2 Tainted: P O 3.11.10-11-desktop #1
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390] 0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391] ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] 0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403] [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406] [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408] [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411] [<ffffffff815a0252>] dump_stack+0x50/0x89
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425] [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468] [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510] [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560] [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623] [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659] [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688] [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716] [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744] [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763] [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765] [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768] [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771] [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c4c3d8
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
This is the background eofblocks scanner attempting to free preallocated
space on a file. The scanner looks for files that have been recently
grown and since been flushed to disk (i.e., no longer concurrently being
written to) and trims the post-eof preallocation that comes along with
growing files.
The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
extent we are attempting to free is already accounted for in the
by-block allocation btree. IOW, this is attempting to free an extent
that the allocation metadata thinks is already free.
* It happens only on restore from hibernation.
Interesting, could you elaborate a bit more on the behavior this system
is typically subjected to? i.e., is this a server that sees a constant
workload that is also frequently hibernated/awakened?
* It happens randomly, spaced a month or two.
* It happens always on the same partition, the one that holds /home
(I have 10 XFS partitions spread on 4 internal hard disks, and a few
more external). It is a new disk, 2 TB, traditional MBR partitions.
* Disk has no defects, or at least so says smartctl long test.
* When it happens, recovery is impossible: xfs_repair does not seem to
find anything, or maybe it does, silently; but on system reuse,
it crashes again, fast.
* Thus recovery procedure is to use "xfsdump" to get a backup copy,
reformat the partition, and recover the files with xfsrestore.
The worst issue for me is that "xfs_repair" fails to repair it.
what version of xfs_repair? Did you try to mount to replay the log
before repair?

Besides Brian's good advice, is kdump configured to dump vmcore?

--Mark.
Carlos E. R.
2014-07-03 02:54:26 UTC
Permalink
Post by Mark Tinguely
Post by Carlos E. R.
The worst issue for me is that "xfs_repair" fails to repair it.
what version of xfs_repair?
xfs_repair version 3.1.11


which what comes with openSUSE 13.1
Post by Mark Tinguely
Did you try to mount to replay the log before
repair?
Sure.

This last time, I first tried "umount" the partition, which initially
failed, because despite being read only, some applications thought they
had opened files on it (I was already in runlevel 1). I found them with
lsof, killed them, umounted, mounted, system crash. Had to hit reset
button on machine.

Reboot machine, and partition is automatically mounted, so the log
replayed here. umount, repair (finds nothing, as far as I can see),
backup, format, restore.
Post by Mark Tinguely
Besides Brian's good advice, is kdump configured to dump vmcore?
I'm not sure I understand the question :-?


If you want me to run the system for a month, waiting for this to happen
again, in some special kernel debug mode... I don't know if that will be
feasible :-}


- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Carlos E. R.
2014-07-03 03:00:47 UTC
Permalink
...
Post by Brian Foster
This is the background eofblocks scanner attempting to free preallocated
space on a file. The scanner looks for files that have been recently
grown and since been flushed to disk (i.e., no longer concurrently being
written to) and trims the post-eof preallocation that comes along with
growing files.
The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
extent we are attempting to free is already accounted for in the
by-block allocation btree. IOW, this is attempting to free an extent
that the allocation metadata thinks is already free.
Post by Carlos E. R.
* It happens only on restore from hibernation.
Interesting, could you elaborate a bit more on the behavior this system
is typically subjected to? i.e., is this a server that sees a constant
workload that is also frequently hibernated/awakened?
It is a desktop machine I use for work at home. I typically have many
applications opened on diferent workspaces in XFCE. Say one has terminals,
another has Thunderbird/Pine, another Firefox, another LibreOffice;
another may have gimp, another may be kbabel or lokalize, another may have
vmplayer, etc, whatever. When I go out or go to sleep, I hibernate the
machine, instead of powering down, because it is much faster than reboot,
login, and start the wanted applications, and I want to conserve some
electricity.

I also use the machine for testing configurations, but these I try to do
on virtual machines, instead of my work partition.


The machine may be used anywhere from 4 to 16 hours a day, and hibernated
at least once a day, perhaps three times if I have to go out several
times. It makes no sense to me to leave the machine powered doing nothing,
if hibernating is so easy and reliable - till now. If I have to leave for
more than a week, I tend to do a full "halt".



By the way, this started hapening when I replaced an old 500 GB hard disk
(Seagate ST3500418AS) with a 2 TB new unit (Seagate ST2000DM001-1CH164).
Smartctl long test says fine (and seatools from Windows, too).
Post by Brian Foster
Post by Carlos E. R.
I do not have more info than what appears on the logs, but four times (two
/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
So you have reproduced this, reformatted with mkfs, restored from
backups and continued to reproduce the problem? And still only on this
particular partition?
Right. Exactly that.

Only that I can not reproduce the issue at will, but about once a month,
randomly.

AFAIK, xfsdump can not carry over a filesystem corruption, right?



**** LONG DESCRIPTION and LOGS start here ********


The first time was on 2014-03-15 03:35:17, instantly after thawing:


<0.7> 2014-03-15 03:35:14 Telcontar kernel - - - [37682.109726] PM: Basic memory bitmaps freed
<3.6> 2014-03-15 03:35:14 Telcontar systemd 1 - - Time has been changed
<3.4> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - - The canary thread is apparently starving. Taking action.
<3.6> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - - Demoting known real-time threads.
<3.5> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - - Successfully demoted thread 4175 of process 4168 (/usr/bin/pulseaudio).
<3.5> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - - Successfully demoted thread 4174 of process 4168 (/usr/bin/pulseaudio).
<3.5> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - - Successfully demoted thread 4168 of process 4168 (/usr/bin/pulseaudio).
<3.5> 2014-03-15 03:35:14 Telcontar rtkit-daemon 4169 - - Demoted 3 threads.
<3.6> 2014-03-15 03:35:16 Telcontar acpid - - - 1 client rule loaded
<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111792] CPU: 1 PID: 5245 Comm: thunderbird-bin Tainted: P O 3.11.10-7-desktop #1
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111793] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111795] 0000000000000002 ffffffff8159ff82 000000000027610d ffffffffa0c53996
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111799] ffff8802303533c0 ffff8802344e4300 ffff8802263a1f20 0000000000000002
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111801] 0000000000000000 ffff8801a08bfa8c 0000000000000000 0027611300000001
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111804] Call Trace:
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111815] [<ffffffff81004a18>] dump_trace+0x88/0x310
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111818] [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111821] [<ffffffff810061ac>] show_stack+0x1c/0x50
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111825] [<ffffffff8159ff82>] dump_stack+0x50/0x89
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111861] [<ffffffffa0c53996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111905] [<ffffffffa0c54fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111948] [<ffffffffa0c6739e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111999] [<ffffffffa0c864c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112073] [<ffffffffa0c4935b>] xfs_setattr_size+0x41b/0x4a0 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112107] [<ffffffffa0c4940e>] xfs_vn_setattr+0x2e/0x40 [xfs]
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112130] [<ffffffff811a060c>] notify_change+0x1dc/0x360
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112135] [<ffffffff811845ee>] do_truncate+0x5e/0x90
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112139] [<ffffffff81193c53>] do_last+0x253/0xec0
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112142] [<ffffffff81194976>] path_openat+0xb6/0x670
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112145] [<ffffffff81195cb5>] do_filp_open+0x35/0x80
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112147] [<ffffffff81185599>] do_sys_open+0x129/0x210
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112151] [<ffffffff815adbed>] system_call_fastpath+0x1a/0x1f
<0.4> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112157] [<00007f6ec359078d>] 0x7f6ec359078c
<0.5> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.112976] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_b
<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.163643] XFS (sdd5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.163648] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<0.4> 2014-03-15 03:35:18 Telcontar kernel - - - [37686.496013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.5> 2014-03-15 03:35:18 Telcontar dbus 1005 - - [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<5.4> 2014-03-15 03:35:18 Telcontar pm-utils - - - Thawing (95)...
<1.5> 2014-03-15 03:35:22 Telcontar network 11556 - - redirecting to "systemctl restart network.service"




I managed to halt somehow, and booted. The log says that the partition
passes automatic boot tests (excerpted):


<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [ 19.173599] XFS (sdd5): Mounting Filesystem
<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [ 19.377918] XFS (sdd5): Starting recovery (logdev: internal)
<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [ 19.747914] XFS (sdd5): Ending recovery (logdev: internal)


But soon after, it oopses:


<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - - Starting Default.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - - Reached target Default.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - - Startup finished in 57ms.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 1 - - Started User Manager for 9.
<0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
<0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857530] CPU: 3 PID: 57 Comm: kworker/3:1 Tainted: P O 3.11.10-7-desktop #1
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857532] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857570] Workqueue: xfsalloc xfs_bmapi_allocate_worker [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857572] 0000000000000000 ffffffff8159ff82 ffff880192c89080 ffffffffa0c50ee9
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857576] 0000003d30691240 00000000a0c55781 ffff880234917d58 ffff880192c89080
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857579] 000000000000003d 000000000000003d 0000000000000002 0000000000022dab
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857583] Call Trace:
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857583] Call Trace:
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857596] [<ffffffff81004a18>] dump_trace+0x88/0x310
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857600] [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857604] [<ffffffff810061ac>] show_stack+0x1c/0x50
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857609] [<ffffffff8159ff82>] dump_stack+0x50/0x89
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857630] [<ffffffffa0c50ee9>] xfs_alloc_fixup_trees+0x1f9/0x340 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857689] [<ffffffffa0c5344e>] xfs_alloc_ag_vextent_near+0x9ee/0xcd0 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857751] [<ffffffffa0c5408d>] xfs_alloc_ag_vextent+0xbd/0x100 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857810] [<ffffffffa0c54cd6>] xfs_alloc_vextent+0x4e6/0x740 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857870] [<ffffffffa0c60447>] xfs_bmap_btalloc+0x2a7/0x7a0 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857937] [<ffffffffa0c63ecd>] __xfs_bmapi_allocate+0xbd/0x2d0 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.858002] [<ffffffffa0c64107>] xfs_bmapi_allocate_worker+0x27/0x50 [xfs]
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.858069] [<ffffffff8106ac68>] process_one_work+0x168/0x490
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.858074] [<ffffffff8106b904>] worker_thread+0x114/0x3a0
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.858079] [<ffffffff81071c2f>] kthread+0xaf/0xc0
<0.4> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.858084] [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0
<0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.858095] XFS (sdd5): page discard on page ffffea0005357d98, inode 0x602084fd, offset 339968.
<0.1> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896051] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
<0.1> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896051]
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896056] CPU: 2 PID: 56 Comm: kworker/2:1 Tainted: P O 3.11.10-7-desktop #1
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896057] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896091] Workqueue: xfsalloc xfs_bmapi_allocate_worker [xfs]
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896093] 0000000000000000 ffffffff8159ff82 ffff880192c89150 ffffffffa0c50ee9
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896096] 0000003c30691240 00000000a0c55781 ffff88023490fd58 ffff880192c89150
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896098] 000000000000003c 000000000000003c 0000000000000002 0000000000022dab
<0.4> 2014-03-15 03:54:12 Telcontar kernel - - - [ 326.896100] Call Trace:


and pages and pages of log entries (which I'm unsure I saw at the time)

Aparently, I logged in text mode, without reboot, and mounted home again
(perhaps systemd mounted it automatically, I do not remember). It is
possible that I did an xfs repair in the interval, it is not logged.



<0.4> 2014-03-15 04:06:09 Telcontar kernel - - - [ 1044.485279] [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0
<0.1> 2014-03-15 04:06:09 Telcontar kernel - - - [ 1044.486104] XFS (sdd5): page discard on page ffffea00053b68e0, inode 0x602084fd, offset 749568.
<3.6> 2014-03-15 04:07:39 Telcontar systemd 1 - - Starting Session 9 of user root.
<4.6> 2014-03-15 04:07:39 Telcontar systemd-logind 1002 - - New session 9 of user root.
<10.5> 2014-03-15 04:07:39 Telcontar login - - - ROOT LOGIN ON tty2
<3.6> 2014-03-15 04:08:01 Telcontar systemd 1 - - Starting Session 10 of user news.
<0.5> 2014-03-15 04:09:55 Telcontar kernel - - - [ 1270.594691] XFS (sdd5): Mounting Filesystem
<0.6> 2014-03-15 04:09:55 Telcontar kernel - - - [ 1270.681282] XFS (sdd5): Ending clean mount
<3.6> 2014-03-15 04:10:02 Telcontar acpid - - - 1 client rule loaded
<3.6> 2014-03-15 04:11:41 Telcontar acpid - - - 1 client rule loaded
<3.6> 2014-03-15 04:11:47 Telcontar systemd 1 - - Starting Session 11 of user cer.
<4.6> 2014-03-15 04:11:47 Telcontar systemd-logind 1002 - - New session 11 of user cer.
<4.6> 2014-03-15 04:11:47 Telcontar systemd-logind 1002 - - Linked /tmp/.X11-unix/X0 to /run/user/1000/X11-display.
<3.4> 2014-03-15 04:11:47 Telcontar kdm - - - :0 '[5904]: Cannot update authorization file in home dir /home/cer
<3.3> 2014-03-15 04:11:47 Telcontar kdm - - - :0 '[5904]: Cannot chdir to cer's home /home/cer: No such file or directory


But as you can see, despite it saying that it was a "clean mount", my
"/home/cer/", ie, my HOME, is not visible.


<0.5> 2014-03-15 04:12:03 Telcontar kernel - - - [ 1397.853848] XFS (sdd5): Mounting Filesystem
<0.6> 2014-03-15 04:12:03 Telcontar kernel - - - [ 1397.932327] XFS (sdd5): Ending clean mount
<3.6> 2014-03-15 04:12:25 Telcontar systemd 1 - - Starting Getty on tty3...
<3.6> 2014-03-15 04:12:25 Telcontar systemd 1 - - Started Getty on tty3.
<3.6> 2014-03-15 04:12:29 Telcontar systemd 1 - - Starting Session 12 of user cer.
<4.6> 2014-03-15 04:12:29 Telcontar systemd-logind 1002 - - New session 12 of user cer.
<10.6> 2014-03-15 04:12:29 Telcontar login - - - LOGIN ON tty3 BY cer


and this time I apparently managed to log in graphical mode:


<3.6> 2014-03-15 04:13:24 Telcontar systemd 1 - - Starting Session 14 of user cer.
<4.6> 2014-03-15 04:13:24 Telcontar systemd-logind 1002 - - New session 14 of user cer.
<4.6> 2014-03-15 04:13:24 Telcontar systemd-logind 1002 - - Linked /tmp/.X11-unix/X0 to /run/user/1000/X11-display.
<23.4> 2014-03-15 04:13:24 Telcontar checkproc - - - checkproc: can not get session id for process 4131!
<4.5> 2014-03-15 04:13:25 Telcontar gnome-keyring-daemon 6210 - - Gkm: using old keyring directory: /home/cer/.gnome2/keyrings
<4.5> 2014-03-15 04:13:25 Telcontar gnome-keyring-daemon 6210 - - Gkm: using old keyring directory: /home/cer/.gnome2/keyrings


Being late, and confident that the issue was solved (which was wrong, I
maybe did not see those XFS_WANT_CORRUPTED_RETURN above), I hibernated:


<5.4> 2014-03-15 04:23:41 Telcontar pm-utils - - - Hibernating (1)...
<1.5> 2014-03-15 04:23:41 Telcontar network 7779 - - redirecting to "systemctl --signal=9 kill network.service"

... next morning:

<5.4> 2014-03-15 13:23:41 Telcontar pm-utils - - - Thawing (95)...

... afternoon:

<5.4> 2014-03-15 17:50:45 Telcontar pm-utils - - - Hibernating (1)...
...
<5.4> 2014-03-15 19:47:58 Telcontar pm-utils - - - Thawing (95)...


... again once more, and crash!


<5.4> 2014-03-15 20:20:56 Telcontar pm-utils - - - Hibernating (1)...
...
<5.4> 2014-03-15 22:20:21 Telcontar pm-utils - - - Thawing (95)...
<5.4> 2014-03-15 22:20:32 Telcontar pm-utils - - - Thawing (1)...
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298351] CPU: 0 PID: 28877 Comm: kworker/0:7 Tainted: P O 3.11.10-7-desktop #1
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298353] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298388] Workqueue: xfs-eofblocks/sdd5 xfs_eofblocks_worker [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298391] 0000000000000000 ffffffff8159ff82 0000000000007121 ffffffffa0c53996
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298395] ffff880151e21cc0 ffff880234093600 ffff88023016bbe0 0000000000000000
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298398] 0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298402] Call Trace:
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298415] [<ffffffff81004a18>] dump_trace+0x88/0x310
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298419] [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298423] [<ffffffff810061ac>] show_stack+0x1c/0x50
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298428] [<ffffffff8159ff82>] dump_stack+0x50/0x89
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298449] [<ffffffffa0c53996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298511] [<ffffffffa0c54fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298571] [<ffffffffa0c6739e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298643] [<ffffffffa0c864c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298734] [<ffffffffa0c4e633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298786] [<ffffffffa0c441ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298828] [<ffffffffa0c42f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298868] [<ffffffffa0c43a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298909] [<ffffffffa0c43d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298937] [<ffffffff8106ac68>] process_one_work+0x168/0x490
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298942] [<ffffffff8106b904>] worker_thread+0x114/0x3a0
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298946] [<ffffffff81071c2f>] kthread+0xaf/0xc0
<0.4> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298952] [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0
<0.5> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298959] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_b
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.331745] XFS (sdd5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.331748] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<4.5> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - - GLib-GObject: invalid unclassed pointer in cast to 'GkmObject'
<4.3> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - - Gkm: gkm_object_expose_full: assertion 'GKM_IS_OBJECT (self)' failed
<4.5> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.5> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - - Gkm: couldn't create temporary file for: /home/cer/.gnome2/keyrings/login_1.keyring: Input/output error
<4.4> 2014-03-15 22:20:40 Telcontar gnome-keyring-daemon 6210 - - couldn't create login keyring: An error occurred on the device
<10.3> 2014-03-15 22:20:40 Telcontar unix2_chkpwd - - - gkr-pam: the password for the login keyring was invalid.
<0.4> 2014-03-15 22:20:50 Telcontar kernel - - - [20168.032019] XFS (sdd5): xfs_log_force: error 5 returned.
<5.4> 2014-03-15 22:20:57 Telcontar router - - - (Thawing 1) Logging the current IP= 83.41.119.142
<0.4> 2014-03-15 22:21:20 Telcontar kernel - - - [20198.112018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:21:50 Telcontar kernel - - - [20228.192016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:22:21 Telcontar kernel - - - [20258.272013] XFS (sdd5): xfs_log_force: error 5 returned.
<10.5> 2014-03-15 22:22:31 Telcontar polkitd 4115 - - Unregistered Authentication Agent for unix-session:14 (system bus name :1.93, object path /org/gnome/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8
<3.3> 2014-03-15 22:22:37 Telcontar kdm 3931 - - X server for display :0 terminated unexpectedly
<3.4> 2014-03-15 22:22:37 Telcontar kdm - - - :0[31291]: Cannot update authorization file in home dir /home/cer
<0.7> 2014-03-15 22:22:37 Telcontar kernel - - - [20275.208508] nvidia 0000:01:00.0: irq 48 for MSI/MSI-X
<3.6> 2014-03-15 22:22:38 Telcontar acpid - - - 1 client rule loaded
<0.4> 2014-03-15 22:22:51 Telcontar kernel - - - [20288.352018] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-03-15 22:23:01 Telcontar systemd 1 - - Starting Session 126 of user news.
<0.4> 2014-03-15 22:23:21 Telcontar kernel - - - [20318.432014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:23:51 Telcontar kernel - - - [20348.512013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:24:21 Telcontar kernel - - - [20378.592014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:24:51 Telcontar kernel - - - [20408.672014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-03-15 22:25:19 Telcontar systemd 1 - - Stopping User Manager for 9...
<3.6> 2014-03-15 22:25:19 Telcontar systemd 1 - - Stopping Disk Manager...
<3.6> 2014-03-15 22:25:19 Telcontar systemd 1 - - Stopping Daemon for power management...
<3.6> 2014-03-15 22:25:19 Telcontar systemd 1 - - Stopping Bluetooth service...


I was attemtping to go to reboot, I think.


<3.6> 2014-03-15 22:25:20 Telcontar systemd 1 - - Starting Rescue Shell...
<3.6> 2014-03-15 22:25:20 Telcontar systemd 1 - - Started Rescue Shell.
<3.6> 2014-03-15 22:20:19 Telcontar systemd 3976 - - message repeated 3 times: [ Time has been changed]
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - - Stopping Default.
<3.6> 2014-03-15 22:20:19 Telcontar systemd 4987 - - message repeated 3 times: [ Time has been changed]
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - - Stopping Default.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - - Stopped target Default.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - - Stopped target Default.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - - Starting Shutdown.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - - Starting Shutdown.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - - Reached target Shutdown.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - - Reached target Shutdown.
<3.6> 2014-03-15 22:25:20 Telcontar systemd 3976 - - Starting Exit the Session...
<3.6> 2014-03-15 22:25:20 Telcontar systemd 4987 - - Starting Exit the Session...
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920075] type=1131 audit(1394918720.685:1133): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920075] msg=' comm="auditd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920273] type=1131 audit(1394918720.685:1134): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920273] msg=' comm="systemd-logind" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920490] type=1131 audit(1394918720.685:1135): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:20 Telcontar kernel - - - [20437.920490] msg=' comm="smb" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525253] type=1131 audit(1394918721.290:1136): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525253] msg=' comm="cron" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525643] type=1131 audit(1394918721.290:1137): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525643] msg=' comm="avahi-daemon" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525937] type=1131 audit(1394918721.290:1138): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.525937] msg=' comm="console-kit-daemon" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.526359] type=1131 audit(1394918721.291:1139): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.526359] msg=' comm="polkit" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.526577] type=1131 audit(1394918721.291:1140): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.526577] msg=' comm="rtkit-daemon" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.527021] type=1131 audit(1394918721.292:1141): pid=1 uid=0 auid=4294967295 ses=4294967295
<0.5> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.527021] msg=' comm="bluetooth" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
<0.4> 2014-03-15 22:25:21 Telcontar kernel - - - [20438.752008] XFS (sdd5): xfs_log_force: error 5 returned.
<5.6> 2014-03-15 22:25:22 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1067" x-info="http://www.rsyslog.com"] exiting on signal 15.
2014-03-15 22:25:23+01:00 - Halting the system now =========================================== uptime: 22:25pm up 18:36, 2 users, load average: 2.08, 1.04, 0.78
2014-03-15 22:25:31+01:00 - Booting the system now ================================================================================ Linux Telcontar 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC
<5.6> 2014-03-15 22:25:39 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="32300" x-info="http://www.rsyslog.com"] start
<3.6> 2014-03-15 22:25:39 Telcontar systemd 1 - - Stopping Rescue Shell...


This time, the system detects problems:


<0.4> 2014-03-15 22:25:51 Telcontar kernel - - - [20468.832024] XFS (sdd5): xfs_log_force: error 5 returned.
...
<3.6> 2014-03-15 22:26:16 Telcontar systemd 1 - - Started Console Manager.
<10.5> 2014-03-15 22:26:16 Telcontar login - - - ROOT LOGIN ON tty1
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - - Mounted /sys/fs/fuse/connections.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - - Stopped target Sound Card.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - - Starting Default.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - - Reached target Default.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 878 - - Startup finished in 316ms.
<3.6> 2014-03-15 22:26:16 Telcontar systemd 1 - - Started User Manager for 0.
<0.4> 2014-03-15 22:26:21 Telcontar kernel - - - [20498.912018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:26:51 Telcontar kernel - - - [20528.992014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:27:21 Telcontar kernel - - - [20559.072014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-03-15 22:27:51 Telcontar kernel - - - [20589.152013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-03-15 22:28:01 Telcontar systemd 1 - - Starting user-9.slice.


But aparently I decided to abort:


2014-03-15 22:28:03+01:00 - Halting the system now =========================================== uptime: 22:28pm up 18:39, 0 users, load average: 0.70, 1.40, 1.01
2014-03-16 14:07:21+01:00 - Booting the system now ================================================================================ Linux Telcontar 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC


Judging from the time of the next boot, I guess that it was here that I
decided to use the live system and reformat.

The cloned image I have of the filesystem is dated Mar 16 05:42, so it
was made somewhere here - at late hours, you see, if I started to
attempt recovery at 22:30 (I used dd, rsync, and xfsdump, so that took
time).

Unfortunately, I do not remember where I placed my notes on the repair
procedure, so I do not know for certain at which point in my attempts
to repair I took the photo. Seeing that I probably started around
midnight, and the file is dated 05:42, I guess I did it too late. But
that surprises me, as I'm absolutely sure I took the photo to be able
to provide it for investigation.

As it was evident by now that xfsrepair failed to repair the partition,
which crashed soon after "repair", and as it was mountable, I decided to
do an both an rsync copy and an xfsdump copy. I then reformatted the
affected partition, but I don't remember if I used gparted (probably) or
mkfs.xfs, and when done, I copied back the data from the backup made just
an hour before, with xfsrestore. I remember I also used rsync to verify
the copy, and it was correct.



And the procedure succeeded:

<0.5> 2014-03-16 14:07:23 Telcontar kernel - - - [ 20.239542] XFS (sdd5): Mounting Filesystem
<0.5> 2014-03-16 14:07:23 Telcontar kernel - - - [ 20.280604] XFS (sdd8): Mounting Filesystem
<0.6> 2014-03-16 14:07:23 Telcontar kernel - - - [ 20.450123] XFS (sdd8): Ending clean mount
<0.6> 2014-03-16 14:07:23 Telcontar kernel - - - [ 20.459463] XFS (sdd5): Ending clean mount


Next log entry related to "sdd5" was days later, all normal:

<3.6> 2014-03-19 00:18:12 Telcontar dbus-daemon 1004 - - **** ADDING /sys/devices/pci0000:00/0000:00:1f.2/ata10/host9/target9:0:0/9:0:0:0/block/sdd/sdd5







Next crash event happened on 2014-04-17 22:47:08, after 15 sucesful
hibernation cycles:


<5.4> 2014-04-17 20:15:56 Telcontar pm-utils - - - Hibernating (1)...
<1.5> 2014-04-17 20:15:56 Telcontar network 314 - - redirecting to "systemctl --signal=9 kill network.service"
<3.5> 2014-04-17 20:15:56 Telcontar systemd 1 - - ***@eth0.service: main process exited, code=killed, status=9/KILL
<5.4> 2014-04-17 20:15:56 Telcontar pm-utils - - - Hibernating (95)...
<0.7> 2014-04-17 20:15:59 Telcontar kernel - - - [280263.870791] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<0.7> 2014-04-17 20:15:59 Telcontar kernel - - - [280263.870797] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<0.7> 2014-04-17 20:15:59 Telcontar kernel - - - [280263.871414] PM: Basic memory bitmaps created
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280264.493703] Syncing filesystems ... done.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280265.043237] Freezing user space processes ... (elapsed 0.002 seconds) done.
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280265.046032] PM: Preallocating image memory... done (allocated 1140779 pages)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.609430] PM: Allocated 4563116 kbytes in 1.56 seconds (2925.07 MB/s)
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.609554] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.611525] Suspending console(s) (use no_console_suspend to debug)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.612352] serial 00:05: disabled
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.812165] PM: freeze of devices complete after 200.520 msecs
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.812452] PM: late freeze of devices complete after 0.285 msecs
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.812999] PM: noirq freeze of devices complete after 0.544 msecs
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.812999] Disabling non-boot CPUs ...
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.814329] smpboot: CPU 1 is now offline
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.816455] smpboot: CPU 2 is now offline
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.818199] smpboot: CPU 3 is now offline
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.818656] PM: Creating hibernation image:
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] PM: Need to copy 923283 pages
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] PM: Normal pages needed: 923283 + 1024, available pages: 1173501
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] Enabling non-boot CPUs ...
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] smpboot: Booting Node 0 Processor 1 APIC 0x1
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832336] CPU1 is up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832467] smpboot: Booting Node 0 Processor 2 APIC 0x2
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.845865] CPU2 is up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.846034] smpboot: Booting Node 0 Processor 3 APIC 0x3
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.859609] CPU3 is up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.887223] PM: noirq restore of devices complete after 22.590 msecs
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.887356] PM: early restore of devices complete after 0.107 msecs
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059840] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059859] usb usb3: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059869] uhci_hcd 0000:00:1a.1: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059885] usb usb4: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059893] uhci_hcd 0000:00:1a.2: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059910] usb usb5: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059919] ehci-pci 0000:00:1a.7: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.059937] usb usb1: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061145] uhci_hcd 0000:00:1d.0: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061167] usb usb6: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061177] uhci_hcd 0000:00:1d.1: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061196] usb usb7: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061205] uhci_hcd 0000:00:1d.2: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061225] usb usb8: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061236] ehci-pci 0000:00:1d.7: setting latency timer to 64
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.061254] usb usb2: root hub lost power or was reset
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.062031] pci 0000:00:1e.0: setting latency timer to 64
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.062123] ata_piix 0000:00:1f.2: setting latency timer to 64
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.062182] ata_piix 0000:00:1f.5: setting latency timer to 64
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.063832] ehci-pci 0000:00:1a.7: cache line size of 32 is not supported
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.065134] ehci-pci 0000:00:1d.7: cache line size of 32 is not supported
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162023] pciehp 0000:00:1c.4:pcie04: Device 0000:06:00.0 already exists at 0000:06:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162025] pciehp 0000:00:1c.4:pcie04: Cannot add device at 0000:06:00
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162047] pciehp 0000:00:1c.2:pcie04: Device 0000:04:00.0 already exists at 0000:04:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162049] pciehp 0000:00:1c.2:pcie04: Cannot add device at 0000:04:00
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162051] pciehp 0000:00:1c.5:pcie04: Device 0000:07:00.0 already exists at 0000:07:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162053] pciehp 0000:00:1c.5:pcie04: Cannot add device at 0000:07:00
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162098] pciehp 0000:00:1c.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162100] pciehp 0000:00:1c.0:pcie04: Cannot add device at 0000:02:00
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162123] pciehp 0000:00:1c.3:pcie04: Device 0000:05:00.0 already exists at 0000:05:00, cannot hot-add
<0.3> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162125] pciehp 0000:00:1c.3:pcie04: Cannot add device at 0000:05:00
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.162308] pata_jmicron 0000:05:00.1: setting latency timer to 64
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.163546] serial 00:05: activated
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.164041] pata_jmicron 0000:04:00.1: setting latency timer to 64
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.173271] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.386975] ata11: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.467054] ata2: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.468030] ata1: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.481019] usb 1-2: reset high-speed USB device number 3 using ehci-pci
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.485262] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.538037] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.541148] ata12.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.541149] ata12.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.541151] ata12.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.563113] ata12.00: configured for UDMA/100
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.621020] firewire_core 0000:08:02.0: rediscovered device fw0
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.622018] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.624027] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.624176] ata3.00: configured for UDMA/133
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.624207] sd 2:0:0:0: [sda] Starting disk
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.625665] ata4.00: configured for UDMA/133
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.626090] sd 3:0:0:0: [sdb] Starting disk
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.656005] /dev/vmmon[0]: HostIFReadUptimeWork: detected settimeofday: fixed uptimeBase old 18445346595345864640 new 18445346586286024561 attempts 1
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.833055] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.833064] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.836117] ata9.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.836119] ata9.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.836296] ata9.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.836298] ata9.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842067] ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842082] ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842175] ata9.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842176] ata9.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842344] ata9.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.842345] ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.845187] ata10.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.845189] ata10.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.845378] ata10.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.845380] ata10.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.847015] usb 3-1: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851234] ata10.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851235] ata10.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851359] ata9.00: configured for UDMA/133
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851456] ata10.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.851458] ata10.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.857339] ata9.01: configured for UDMA/133
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.857369] sd 8:0:0:0: [sdc] Starting disk
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.857371] sd 8:0:1:0: [sdd] Starting disk
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.879326] ata10.00: configured for UDMA/133
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.885331] ata10.01: configured for UDMA/133
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.885365] sd 9:0:0:0: [sde] Starting disk
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280267.885369] sd 9:0:1:0: [sdf] Starting disk
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280268.242014] usb 2-5: reset high-speed USB device number 2 using ehci-pci
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280268.608013] usb 8-2: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280268.959113] usb 2-5.4: reset high-speed USB device number 4 using ehci-pci
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280269.287977] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280269.796130] PM: restore of devices complete after 2736.343 msecs
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.081655] Restarting kernel threads ... done.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.086714] Restarting tasks ... done.
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.115233] PM: Basic memory bitmaps freed
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.191345] bridge-eth0: disabling the bridge
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.196021] bridge-eth0: down
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.196026] bridge-eth0: detached
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762859] /dev/vmnet: open called by PID 3122 (vmnet-bridge)
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762873] /dev/vmnet: hub 0 does not exist, allocating memory.
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762888] /dev/vmnet: port on hub 0 successfully opened
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762899] bridge-eth0: up
<0.7> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.762904] bridge-eth0: attached
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.396460] userif-2: sent link down event.
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.396463] userif-2: sent link up event.
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851709] CPU: 0 PID: 27785 Comm: kworker/0:4 Tainted: P O 3.11.10-7-desktop #1
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851864] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852074] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852211] 0000000000000000 ffffffff8159ff82 0000000000216bae ffffffffa0c53996
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852486] ffff88019907e0c0 ffff880234160740 ffff88012e9e5cb0 0000000000000000
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852638] 0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852790] Call Trace:
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852847] [<ffffffff81004a18>] dump_trace+0x88/0x310
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.852947] [<ffffffff81004d70>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853063] [<ffffffff810061ac>] show_stack+0x1c/0x50
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853164] [<ffffffff8159ff82>] dump_stack+0x50/0x89
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853275] [<ffffffffa0c53996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853439] [<ffffffffa0c54fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853594] [<ffffffffa0c6739e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853761] [<ffffffffa0c864c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.853950] [<ffffffffa0c4e633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854110] [<ffffffffa0c441ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854268] [<ffffffffa0c42f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854428] [<ffffffffa0c43a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854585] [<ffffffffa0c43d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854725] [<ffffffff8106ac68>] process_one_work+0x168/0x490
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854835] [<ffffffff8106b904>] worker_thread+0x114/0x3a0
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.854941] [<ffffffff81071c2f>] kthread+0xaf/0xc0
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.855037] [<ffffffff815adb3c>] ret_from_fork+0x7c/0xb0
<0.5> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.855142] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c673d8
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.901296] XFS (sde5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.901447] XFS (sde5): Please umount the filesystem and rectify the problem(s)
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280272.480011] XFS (sde5): xfs_log_force: error 5 returned.
<3.4> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - - The canary thread is apparently starving. Taking action.
<3.6> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - - Demoting known real-time threads.
<3.5> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - - Successfully demoted thread 31337 of process 31334 (/usr/bin/pulseaudio).
<3.5> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - - Successfully demoted thread 31336 of process 31334 (/usr/bin/pulseaudio).
<3.5> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - - Successfully demoted thread 31334 of process 31334 (/usr/bin/pulseaudio).
<3.5> 2014-04-17 22:47:06 Telcontar rtkit-daemon 4749 - - Demoted 3 threads.
<3.6> 2014-04-17 22:47:06 Telcontar vmnetBridge - - - RTM_NEWLINK: name:eth0 index:2 flags:0x00001003
<3.6> 2014-04-17 22:47:06 Telcontar vmnetBridge - - - Removing interface eth0 index:2
<3.6> 2014-04-17 22:47:06 Telcontar vmnetBridge - - - Stopped bridge eth0 to virtual network 0.
<3.6> 2014-04-17 22:47:06 Telcontar vmnetBridge - - - RTM_NEWLINK: name:eth0 index:2 flags:0x00011043
<3.6> 2014-04-17 22:47:07 Telcontar vmnet-natd - - - RTM_NEWLINK: name:eth0 index:2 flags:0x00001003
<3.6> 2014-04-17 22:47:08 Telcontar systemd 1 - - Time has been changed
<3.6> 2014-04-17 22:47:11 Telcontar acpid - - - 1 client rule loaded
<3.5> 2014-04-17 22:47:12 Telcontar dbus 1013 - - [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<5.4> 2014-04-17 22:47:12 Telcontar pm-utils - - - Thawing (95)...
<3.5> 2014-04-17 22:47:14 Telcontar dbus 1013 - - [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
<1.5> 2014-04-17 22:47:16 Telcontar network 788 - - redirecting to "systemctl restart network.service"
<3.6> 2014-04-17 22:47:16 Telcontar systemd 1 - - Stopping ifup managed network interface eth1...
<3.6> 2014-04-17 22:47:16 Telcontar systemd 1 - - Stopping ifup managed network interface eth0...
<3.6> 2014-04-17 22:47:16 Telcontar systemd 1 - - Stopping LSB: Configure network interfaces and set up routing...


Apparently, I rebooted:


2014-04-17 23:27:32+02:00 - Halting the system now =========================================== uptime: 23:27pm up 6 days 19:54, 1 user, load average: 12.51, 3.63, 1.38
2014-04-17 23:32:17+02:00 - Booting the system now ================================================================================ Linux Telcontar 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC

<10.5> 2014-04-17 23:33:13 Telcontar login - - - ROOT LOGIN ON tty1
<10.5> 2014-04-17 23:39:17 Telcontar login - - - ROOT LOGIN ON tty2
<10.5> 2014-04-17 23:43:14 Telcontar login - - - ROOT LOGIN ON tty3
<10.5> 2014-04-17 23:43:21 Telcontar login - - - ROOT LOGIN ON tty4



I have reason to believe, looking at my logs, that I restored my home
here, using the same procedure, but using this work system, instead of the
rescue live stick (oS 13.1 XFCE), using text mode tools. Thus I guess
this time I used plain mkfs.xfs. Later I see dozens of hibernate cycles,
till I halt normally about two weeks later, on 2014-05-02, so the
procedure succeded.






Next crash event was this Sunday:


Hibernating and thawing sequence, complete:


<3.4> 2014-06-29 04:51:49 Telcontar pm-utils - - - Hibernating the system now (04)...
<3.5> 2014-06-29 04:51:49 Telcontar pm-utils - - - There appears not be any pending nntp post to be sent. I just checked :-)
<1.5> 2014-06-29 04:51:50 Telcontar network 29169 - - redirecting to "systemctl --signal=9 kill network.service"
<3.5> 2014-06-29 04:51:50 Telcontar systemd 1 - - ***@eth0.service: main process exited, code=killed, status=9/KILL
<3.4> 2014-06-29 04:51:50 Telcontar pm-utils - - - Hibernating (95)...
<0.7> 2014-06-29 04:51:53 Telcontar kernel - - - [212878.926048] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<0.7> 2014-06-29 04:51:53 Telcontar kernel - - - [212878.926052] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<0.7> 2014-06-29 04:51:53 Telcontar kernel - - - [212878.927502] PM: Basic memory bitmaps created
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212879.561676] Syncing filesystems ... done.
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212880.077132] Freezing user space processes ... (elapsed 0.002 seconds) done.
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212880.080024] PM: Preallocating image memory... done (allocated 1140811 pages)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.351277] PM: Allocated 4563244 kbytes in 7.27 seconds (627.68 MB/s)
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.351400] Freezing remaining freezable tasks ... (elapsed 0.080 seconds) done.
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.432284] Suspending console(s) (use no_console_suspend to debug)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.433051] serial 00:05: disabled
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.633138] PM: freeze of devices complete after 200.734 msecs
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.633370] PM: late freeze of devices complete after 0.230 msecs
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.633913] PM: noirq freeze of devices complete after 0.541 msecs
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.633913] Disabling non-boot CPUs ...
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.635222] smpboot: CPU 1 is now offline
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.637153] smpboot: CPU 2 is now offline
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.639195] smpboot: CPU 3 is now offline
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.639658] PM: Creating hibernation image:
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] PM: Need to copy 923219 pages
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] PM: Normal pages needed: 923219 + 1024, available pages: 1173563
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] microcode: CPU0 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] Enabling non-boot CPUs ...
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] smpboot: Booting Node 0 Processor 1 APIC 0x1
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.653119] microcode: CPU1 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.653307] CPU1 is up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.653440] smpboot: Booting Node 0 Processor 2 APIC 0x2
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.666704] microcode: CPU2 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.666844] CPU2 is up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.667011] smpboot: Booting Node 0 Processor 3 APIC 0x3
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.680398] microcode: CPU3 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.680598] CPU3 is up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.708225] PM: noirq restore of devices complete after 22.576 msecs
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.708358] PM: early restore of devices complete after 0.109 msecs
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880083] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880086] uhci_hcd 0000:00:1a.1: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880107] usb usb3: root hub lost power or was reset
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880110] usb usb4: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880120] uhci_hcd 0000:00:1a.2: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880124] ehci-pci 0000:00:1a.7: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880139] usb usb5: root hub lost power or was reset
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880188] usb usb1: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880243] uhci_hcd 0000:00:1d.0: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880265] usb usb6: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880275] uhci_hcd 0000:00:1d.1: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880296] usb usb7: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880306] uhci_hcd 0000:00:1d.2: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880326] usb usb8: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880338] ehci-pci 0000:00:1d.7: setting latency timer to 64
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.880349] usb usb2: root hub lost power or was reset
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.881094] pci 0000:00:1e.0: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.881199] ata_piix 0000:00:1f.2: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.881237] ata_piix 0000:00:1f.5: setting latency timer to 64
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.884086] ehci-pci 0000:00:1a.7: cache line size of 32 is not supported
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.884236] ehci-pci 0000:00:1d.7: cache line size of 32 is not supported
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981023] pciehp 0000:00:1c.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981025] pciehp 0000:00:1c.2:pcie04: Device 0000:04:00.0 already exists at 0000:04:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981026] pciehp 0000:00:1c.0:pcie04: Cannot add device at 0000:02:00
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981028] pciehp 0000:00:1c.2:pcie04: Cannot add device at 0000:04:00
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981032] pciehp 0000:00:1c.5:pcie04: Device 0000:07:00.0 already exists at 0000:07:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981034] pciehp 0000:00:1c.5:pcie04: Cannot add device at 0000:07:00
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981058] pciehp 0000:00:1c.3:pcie04: Device 0000:05:00.0 already exists at 0000:05:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981059] pciehp 0000:00:1c.3:pcie04: Cannot add device at 0000:05:00
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981089] pciehp 0000:00:1c.4:pcie04: Device 0000:06:00.0 already exists at 0000:06:00, cannot hot-add
<0.3> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981090] pciehp 0000:00:1c.4:pcie04: Cannot add device at 0000:06:00
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.981220] pata_jmicron 0000:04:00.1: setting latency timer to 64
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.982188] serial 00:05: activated
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.982714] pata_jmicron 0000:05:00.1: setting latency timer to 64
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.186275] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.192270] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.206012] ata11: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.286032] ata1: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.287030] ata4: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.357035] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.360116] ata12.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.360118] ata12.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.360119] ata12.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.366112] ata12.00: configured for UDMA/100
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.440022] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.440024] firewire_core 0000:08:02.0: rediscovered device fw0
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.442190] ata3.00: configured for UDMA/133
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.442223] sd 2:0:0:0: [sdb] Starting disk
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.450017] usb 8-2: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.659048] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.659058] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.661048] ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.661058] ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.662114] ata9.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.662115] ata9.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.662293] ata9.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.662295] ata9.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.664113] ata10.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.664114] ata10.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.664326] ata10.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.664327] ata10.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.668112] ata9.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.668113] ata9.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.668293] ata9.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.668294] ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.670113] ata10.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.670114] ata10.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.670323] ata10.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.670324] ata10.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.677300] ata9.00: configured for UDMA/133
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.683286] ata9.01: configured for UDMA/133
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.683311] sd 8:0:0:0: [sdc] Starting disk
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.683369] sd 8:0:1:0: [sdd] Starting disk
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.698321] ata10.00: configured for UDMA/133
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.704335] ata10.01: configured for UDMA/133
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.704361] sd 9:0:0:0: [sde] Starting disk
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.704418] sd 9:0:1:0: [sdf] Starting disk
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.829028] usb 2-5: reset high-speed USB device number 2 using ehci-pci
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.901026] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.903237] ata2.00: configured for UDMA/133
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212888.903279] sd 1:0:0:0: [sda] Starting disk
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212889.045020] usb 1-2: reset high-speed USB device number 3 using ehci-pci
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212889.411014] usb 3-1: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212889.778047] usb 2-5.4: reset high-speed USB device number 4 using ehci-pci
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.186436] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] CPU: 0 PID: 28875 Comm: kworker/0:2 Tainted: P O 3.11.10-11-desktop #1
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626348] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626388] Workqueue: xfs-eofblocks/sde5 xfs_eofblocks_worker [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626390] 0000000000000002 ffffffff815a0252 00000000002a61c2 ffffffffa0c38996
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626391] ffff8800b7025680 ffff88022eb74180 ffff880121c3fe50 0000000000000002
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] 0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626393] Call Trace:
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626403] [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626406] [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626408] [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626411] [<ffffffff815a0252>] dump_stack+0x50/0x89
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626425] [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626468] [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626510] [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626560] [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626623] [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626659] [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626688] [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626716] [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626744] [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626763] [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626765] [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626768] [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626771] [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626776] XFS (sde5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c4c3d8
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026207] usb 1-6: USB disconnect, device number 4
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.025944] Restarting kernel threads ... done.
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026371] Restarting tasks ... done.
<0.7> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.079743] PM: Basic memory bitmaps freed
<3.4> 2014-06-29 12:32:19 Telcontar rtkit-daemon 4287 - - The canary thread is apparently starving. Taking action.
<3.6> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - - Demoting known real-time threads.
<3.5> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - - Successfully demoted thread 4293 of process 4286 (/usr/bin/pulseaudio).
<3.5> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - - Successfully demoted thread 4292 of process 4286 (/usr/bin/pulseaudio).
<3.5> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - - Successfully demoted thread 4286 of process 4286 (/usr/bin/pulseaudio).
<3.5> 2014-06-29 12:32:20 Telcontar rtkit-daemon 4287 - - Demoted 3 threads.
<3.6> 2014-06-29 12:32:20 Telcontar systemd 1 - - Time has been changed
<3.3> 2014-06-29 12:32:21 Telcontar systemd-udevd 29550 - - inotify_add_watch(7, /dev/sdg, 10) failed: No such file or directory
<3.3> 2014-06-29 12:32:21 Telcontar systemd-udevd 29551 - - inotify_add_watch(7, /dev/sdh, 10) failed: No such file or directory
<0.4> 2014-06-29 12:32:25 Telcontar kernel - - - [212898.656011] XFS (sde5): xfs_log_force: error 5 returned.
<3.5> 2014-06-29 12:32:26 Telcontar dbus 1033 - - [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<3.4> 2014-06-29 12:32:27 Telcontar pm-utils - - - Thawing (95)...
<3.5> 2014-06-29 12:32:29 Telcontar dbus 1033 - - [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
<1.5> 2014-06-29 12:32:30 Telcontar network 29606 - - redirecting to "systemctl restart network.service"
<3.6> 2014-06-29 12:32:30 Telcontar systemd 1 - - Stopping ifup managed network interface eth1...
<3.6> 2014-06-29 12:32:30 Telcontar systemd 1 - - Stopping ifup managed network interface eth0...
<3.6> 2014-06-29 12:32:30 Telcontar systemd 1 - - Stopping LSB: Configure network interfaces and set up routing...
<3.6> 2014-06-29 12:32:31 Telcontar systemd 1 - - Starting LSB: Configure network interfaces and set up routing...
<3.6> 2014-06-29 12:32:32 Telcontar acpid - - - 1 client rule loaded
<3.6> 2014-06-29 12:32:32 Telcontar ifdown 29624 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-06-29 12:32:32 Telcontar ifdown 29625 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-06-29 12:32:32 Telcontar ifdown 29624 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-06-29 12:32:32 Telcontar ifdown 29625 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-06-29 12:32:32 Telcontar network 29638 - - Setting up network interfaces:
<3.6> 2014-06-29 12:32:34 Telcontar network 29638 - - lo
<1.5> 2014-06-29 12:32:34 Telcontar ifup 30165 - - lo
<1.5> 2014-06-29 12:32:34 Telcontar ifup 30165 - - lo
<1.5> 2014-06-29 12:32:34 Telcontar ifup 30165 - - IP address: 127.0.0.1/8
<3.6> 2014-06-29 12:32:34 Telcontar network 29638 - - lo IP address: 127.0.0.1/8
<1.5> 2014-06-29 12:32:34 Telcontar ifup 30165 - -
<0.6> 2014-06-29 12:32:49 Telcontar kernel - - - [212922.866033] Chrome_ChildThr[14100]: segfault at 0 ip 00007fd3d820d596 sp 00007fd3cbc5c410 error 6 in libmozalloc.so[7fd3d820c000+2000]
<16.3> 2014-06-29 12:32:49 Telcontar dhcpcd 30417 - - eth1: dhcpcd not running
<16.6> 2014-06-29 12:32:49 Telcontar dhcpcd 30417 - - eth1: exiting
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - Interface eth0.IPv6 no longer relevant for mDNS.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - Leaving mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - Interface eth0.IPv4 no longer relevant for mDNS.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - Withdrawing address record for fc00::14 on eth0.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - Withdrawing address record for 192.168.1.14 on eth0.
<3.5> 2014-06-29 12:32:49 Telcontar systemd 1 - - Unit ***@eth0.service entered failed state.
<3.6> 2014-06-29 12:32:49 Telcontar systemd 1 - - Starting ifup managed network interface eth0...
<3.6> 2014-06-29 12:32:49 Telcontar ifup 30485 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-06-29 12:32:49 Telcontar ifup 30485 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<0.6> 2014-06-29 12:32:49 Telcontar kernel - - - [212923.549298] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-06-29 12:32:49 Telcontar kernel - - - [212923.549323] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-06-29 12:32:49 Telcontar kernel - - - [212923.549369] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - New relevant interface eth0.IPv4 for mDNS.
<3.6> 2014-06-29 12:32:49 Telcontar avahi-daemon 1020 - - Registering new address record for 192.168.1.14 on eth0.IPv4.
<3.6> 2014-06-29 12:32:50 Telcontar systemd 1 - - Starting ifup managed network interface eth1...
<3.6> 2014-06-29 12:32:50 Telcontar ifplugd(eth1) 30800 - - ifplugd 0.28 initializing.
<3.6> 2014-06-29 12:32:50 Telcontar ifplugd(eth1) 30800 - - Using interface eth1/00:21:85:16:2D:0C with driver <r8169> (version: 2.3LK-NAPI)
<0.6> 2014-06-29 12:32:50 Telcontar kernel - - - [212924.375304] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-06-29 12:32:50 Telcontar kernel - - - [212924.375373] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
<3.6> 2014-06-29 12:32:50 Telcontar ifplugd(eth1) 30800 - - Using detection mode: SIOCETHTOOL
<3.6> 2014-06-29 12:32:50 Telcontar ifplugd(eth1) 30800 - - Initialization complete, link beat not detected.
<3.6> 2014-06-29 12:32:50 Telcontar ifup 30780 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-06-29 12:32:50 Telcontar ifup 30780 - - eth1 is controlled by ifplugd
<1.5> 2014-06-29 12:32:50 Telcontar ifup 30780 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-06-29 12:32:50 Telcontar ifup 30780 - - eth1 is controlled by ifplugd
<3.6> 2014-06-29 12:32:50 Telcontar systemd 1 - - Started ifup managed network interface eth1.
<0.6> 2014-06-29 12:32:52 Telcontar kernel - - - [212925.693147] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-06-29 12:32:52 Telcontar kernel - - - [212925.693155] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - - Joining mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - - New relevant interface eth0.IPv6 for mDNS.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - - Registering new address record for fe80::221:85ff:fe16:2d0b on eth0.*.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - - Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - - Joining mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - - Registering new address record for fc00::14 on eth0.*.
<3.6> 2014-06-29 12:32:53 Telcontar avahi-daemon 1020 - - Withdrawing address record for fe80::221:85ff:fe16:2d0b on eth0.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - - Withdrawing workstation service for eth1.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - - Withdrawing address record for 192.168.1.14 on eth0.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - - Withdrawing workstation service for eth0.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - - Withdrawing workstation service for lo.
<3.4> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - - Host name conflict, retrying with Telcontar-2
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - - Registering new address record for fc00::14 on eth0.*.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - - Registering new address record for 192.168.1.14 on eth0.IPv4.
<3.6> 2014-06-29 12:32:54 Telcontar avahi-daemon 1020 - - Registering HINFO record with values 'X86_64'/'LINUX'.
<0.4> 2014-06-29 12:32:55 Telcontar kernel - - - [212928.736057] XFS (sde5): xfs_log_force: error 5 returned.
<3.6> 2014-06-29 12:32:55 Telcontar avahi-daemon 1020 - - Server startup complete. Host name is Telcontar-2.local. Local service cookie is 580789639.
<4.6> 2014-06-29 12:32:55 Telcontar SuSEfirewall2 - - - Setting up rules from /etc/sysconfig/SuSEfirewall2 ...
<4.6> 2014-06-29 12:32:55 Telcontar SuSEfirewall2 - - - using default zone 'ext' for interface eth1
<4.6> 2014-06-29 12:32:55 Telcontar SuSEfirewall2 - - - Firewall customary rules loaded from /etc/sysconfig/scripts/SuSEfirewall2-custom
<3.6> 2014-06-29 12:32:56 Telcontar avahi-daemon 1020 - - Service "Telcontar-2" (/etc/avahi/services/udisks.service) successfully established.
<3.6> 2014-06-29 12:32:56 Telcontar avahi-daemon 1020 - - Service "Telcontar-2" (/etc/avahi/services/ssh.service) successfully established.
<3.6> 2014-06-29 12:32:56 Telcontar avahi-daemon 1020 - - Service "Telcontar-2" (/etc/avahi/services/sftp-ssh.service) successfully established.
<4.6> 2014-06-29 12:32:58 Telcontar SuSEfirewall2 - - - Firewall rules successfully set
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - - Found user 'avahi-autoipd' (UID 495) and group 'avahi-autoipd' (GID 491).
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - - Successfully called chroot().
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - - Successfully dropped root privileges.
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - - Starting with address 169.254.3.89
<3.6> 2014-06-29 12:32:58 Telcontar avahi-autoipd(eth0) 31694 - - Routable address already assigned, sleeping.
<3.6> 2014-06-29 12:32:58 Telcontar systemd 1 - - Started ifup managed network interface eth0.
<3.6> 2014-06-29 12:32:58 Telcontar systemd 1 - - Started ifup managed network interface eth1.
<3.6> 2014-06-29 12:32:58 Telcontar network 29638 - - ..done..done..done ppp0 Startmode is 'manual' -> skipping
<1.5> 2014-06-29 12:32:58 Telcontar ifup 31756 - - ppp0 Startmode is 'manual' -> skipping
<3.6> 2014-06-29 12:32:58 Telcontar network 29638 - - ..skippedSetting up service network . . . . . . . . . . . . ...done
<3.6> 2014-06-29 12:32:58 Telcontar systemd 1 - - Started LSB: Configure network interfaces and set up routing.
<3.4> 2014-06-29 12:32:58 Telcontar pm-utils - - - Thawing the system now (04)...
<3.6> 2014-06-29 12:33:01 Telcontar systemd 1 - - Starting Session 1605 of user news.
<3.4> 2014-06-29 12:33:21 Telcontar router - - - (Thawing 04) Logging the current IP= 79.150.228.90
<0.4> 2014-06-29 12:33:25 Telcontar kernel - - - [212958.816015] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:33:55 Telcontar kernel - - - [212988.896014] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:34:25 Telcontar kernel - - - [213018.976015] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:34:55 Telcontar kernel - - - [213049.056014] XFS (sde5): xfs_log_force: error 5 returned.
<3.6> 2014-06-29 12:35:01 Telcontar systemd 1 - - Starting Session 1606 of user news.
<0.4> 2014-06-29 12:35:25 Telcontar kernel - - - [213079.136015] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:35:55 Telcontar kernel - - - [213109.216011] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:36:25 Telcontar kernel - - - [213139.296014] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:36:55 Telcontar kernel - - - [213169.376016] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:37:25 Telcontar kernel - - - [213199.456013] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:37:55 Telcontar kernel - - - [213229.536014] XFS (sde5): xfs_log_force: error 5 returned.
<3.6> 2014-06-29 12:38:01 Telcontar systemd 1 - - Starting Session 1607 of user news.
<0.4> 2014-06-29 12:38:25 Telcontar kernel - - - [213259.616018] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:38:56 Telcontar kernel - - - [213289.696014] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:39:26 Telcontar kernel - - - [213319.776019] XFS (sde5): xfs_log_force: error 5 returned.
<0.4> 2014-06-29 12:39:56 Telcontar kernel - - - [213349.856014] XFS (sde5): xfs_log_force: error 5 returned.
<3.6> 2014-06-29 12:40:01 Telcontar systemd 1 - - Starting Session 1608 of user cer.
...
<5.6> 2014-06-29 12:48:34 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1111" x-info="http://www.rsyslog.com"] exiting on signal 15.
2014-06-29 12:48:35+02:00 - Halting the system now =========================================== uptime: 12:48pm up 4 days 8:43, 33 users, load average: 1.40, 0.53, 0.67
2014-06-29 12:57:41+02:00 - Booting the system now ================================================================================ Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux

(it does not show in the log that I had to hit the hardware reset button,
the machine refused to reboot normally, apparently)


(If you ask why I took so long to notice the problem after thawing,
my routine is to power up the machine, then go prepare tea. :-)
When I come back with the mug, I'm dismayed to see I can not
start working; and this day I was in a a hurry)


So I reboot (text mode, level 3), umount home, run xfsrepair, mount again,
do xfsdump, do simultanesouly an rsync (it is a file by file copy, in case
of problems with dump), umount, use YaST in text mode to reformat the
partition, mount, and then xfsrestore. It did not occur to me to make a
'dd' photo this time: I was tired and busy.

Maybe next time I can take the photo with dd before doing anything else
(it takes about 80 minutes), or simply do an "xfs_metadump", which should
be faster. And I might not have then 500 GiB of free space to make a dd
copy, anyway.






Question.

As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
Post by Brian Foster
This is interesting because the corruption appears to be associated with
post-eof space, which is generally transient. The worst case is that
this space is trimmed off files when they are evicted from cache, such
as during a umount. To me, that seems to correlate with a more
recent/runtime problem rather than something that might be lingering on
disk, but we don't really know for sure.
Dunno.

To me, there are two problems:

1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!

To me, #2 is the worst, and it is what makes me do the backup, format,
restore cycle for recovery. An occassional kernel crash is somewhat
acceptable :-}
Post by Brian Foster
Post by Carlos E. R.
Wait! I have a "dd" copy of the entire partition (500 GB), made on March
16th, 5 AM, so hard data could be obtained from there. I had forgotten. I'll
...
Post by Brian Foster
Post by Carlos E. R.
I could do a "xfs_metadump" on it - just tell me what options to use, and
where can the result be uploaded to, if big.
A metadump would be helpful, though that only gives us the on-disk
state. What was the state of this fs at the time the dd image was
created?
I'm sorry, I'm not absolutely sure. I believe it is corrupted, but I can
not vouch it.
Post by Brian Foster
I'm curious if something like an 'rm -rf *' on the metadump
would catch any other corruptions or if this is indeed limited to
something associated with recent (pre)allocations.
Sorry, run 'rm -rf *' where???
Post by Brian Foster
Run 'xfs_metadump <src> <tgtfile>' to create a metadump that will
obfuscate filenames by default. It should also be compressible. In the
future, it's probably worth grabbing a metadump as a first step (before
repair, zeroing the log, etc.) so we can look at the fs in the state
most recent to the crash.
I will take that photo next time, using a rescue system in order to impede
the system from mounting the partition and replaying the log. Dunno how
long that will take to happen, though... usually a month - but at least
now I know how to do it.




Meanwhile, I have done a xfs_metadump of the image, and compressed it with
xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
that, and even less to a mail list.

Do you still have a bugzilla system where I can upload it? I had an
account at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if
it still runs :-?

If you don't, I can try to create it a bugzilla on openSUSE instead, and
tell you the number... but I don't know if it takes files that big. If it
doesn't, I'll fragment the file. You need to have an account there, I
think, to retrieve the attachment, and I would prefer to mark the bug
private, or at least the attachment.




I did the following.

First I made a copy, with "dd", of the partition image, all 489G of it. On
this copy I ran "xfs_check", "xfs_repair -n", and "xfs_repair", with these
results:


Telcontar:/data/storage_d/old_backup # xfs_check xfs_copy_home_workonit
xfs_check is deprecated and scheduled for removal in June 2014.
Please use xfs_repair -n <dev> instead.
Telcontar:/data/storage_d/old_backup # xfs_repair -n xfs_copy_home_workonit
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
Telcontar:/data/storage_d/old_backup # time xfs_repair xfs_copy_home_workonit
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

real 0m28.058s
user 0m1.692s
sys 0m2.265s
Telcontar:/data/storage_d/old_backup #


Maybe the image was made after repair, or maybe xfs_repair doesn't detect
anything, which as far as I remember, was the case.



I recreate the copy, to try "mount" on an unaltered copy.


Telcontar:/data/storage_d/old_backup # time dd if=xfs_copy_home
of=xfs_copy_home_workonit && mount -v xfs_copy_home_workonit mount/
1024000000+0 records in
1024000000+0 records out
524288000000 bytes (524 GB) copied, 4662.7 s, 112 MB/s

real 77m43.697s
user 3m1.420s
sys 28m41.958s
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
(reverse-i-search)`mount': time dd if=xfs_copy_home
Telcontar:/data/storage_d/old_backup #


So it mounts...





- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Dave Chinner
2014-07-03 09:43:47 UTC
Permalink
This post might be inappropriate. Click to display it.
Brian Foster
2014-07-03 17:40:08 UTC
Permalink
Post by Dave Chinner
Post by Carlos E. R.
...
Post by Brian Foster
This is the background eofblocks scanner attempting to free preallocated
space on a file. The scanner looks for files that have been recently
grown and since been flushed to disk (i.e., no longer concurrently being
written to) and trims the post-eof preallocation that comes along with
growing files.
The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
extent we are attempting to free is already accounted for in the
by-block allocation btree. IOW, this is attempting to free an extent
that the allocation metadata thinks is already free.
Post by Carlos E. R.
* It happens only on restore from hibernation.
Interesting, could you elaborate a bit more on the behavior this system
is typically subjected to? i.e., is this a server that sees a constant
workload that is also frequently hibernated/awakened?
....
Post by Carlos E. R.
The machine may be used anywhere from 4 to 16 hours a day, and
hibernated at least once a day, perhaps three times if I have to go
out several times. It makes no sense to me to leave the machine
powered doing nothing, if hibernating is so easy and reliable - till
now. If I have to leave for more than a week, I tend to do a full
"halt".
Hibernation has always been suspect w.r.t. flushing filesystem
metadata. It does not guarantee that the filesystem is quiesced
and idle, it just does a sync() and hopes that is sufficient to get
the filesystem into a consistent state. The mess that this leaves is
then left to filesystem developers to play whack-a-mole with when
users have problems.
Point of note: there is no oops or crash occurring. XFS dumps the
stack when a corruption occurs to tell use where it was detected
and then shuts down the filesystem. Your system is still just fine
apart from not being able to access that filesystem until you
unmount it, rpeair it and mount it again.
Post by Carlos E. R.
3 PID: 57 Comm: kworker/3:1 Tainted: P O 3.11.10-7-desktop
What's tainting your kernel? If you remove that taint, does the
problem still occur?
....
Post by Carlos E. R.
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] Enabling non-boot CPUs ...
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.819191] smpboot: Booting Node 0 Processor 1 APIC 0x1
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832336] CPU1 is up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.832467] smpboot: Booting Node 0 Processor 2 APIC 0x2
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.845865] CPU2 is up
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.846034] smpboot: Booting Node 0 Processor 3 APIC 0x3
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280266.859609] CPU3 is up
....
Post by Carlos E. R.
<0.6> 2014-04-17 22:47:08 Telcontar kernel - - - [280269.796130] PM: restore of devices complete after 2736.343 msecs
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.081655] Restarting kernel threads ... done.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.086714] Restarting tasks ... done.
.....
Post by Carlos E. R.
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
So the corruption occurred within 2s of the kernel restarting tasks
after a hibernation. It's really looking like a hibernation issue.
Post by Carlos E. R.
<3.4> 2014-06-29 04:51:50 Telcontar pm-utils - - - Hibernating (95)...
.....
Post by Carlos E. R.
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212887.640186] Enabling non-boot CPUs ...
.....
Post by Carlos E. R.
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.615073] PM: restore of devices complete after 2735.034 msecs
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
.....
Post by Carlos E. R.
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.706440] XFS (sde5): Please umount the filesystem and rectify the problem(s)
<0.6> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026207] usb 1-6: USB disconnect, device number 4
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.025944] Restarting kernel threads ... done.
<0.4> 2014-06-29 12:32:18 Telcontar kernel - - - [212891.026371] Restarting tasks ... done.
Well, there's the smoking gun. The XFS kworker is running and
reporting errors before the thawing process has restarted
void thaw_kernel_threads(void)
{
struct task_struct *g, *p;
pm_nosig_freezing = false;
printk("Restarting kernel threads ... ");
thaw_workqueues();
....
Which points to the fact that we probably need WQ_FREEZABLE on some
of our workqueues. Brian, do you want to have a look at this?
Yeah, I'll look into it. I might see if I can try to reproduce this by
suspending a vm. It sounds like a preallocating workload and a reduced
eofblocks scan timer test might be worth a shot. Thanks Dave.

Brian
Post by Dave Chinner
Post by Carlos E. R.
Question.
As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
It's the fact that the filesystem si still running and modifying
state when the snapshot is being taken that results in the snapshot
image containing an inconsistent snapshot. That then gets loaded
on thaw and it goes boom.
Post by Carlos E. R.
1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!
That's because the filesystem is likely to be consistent on disk.
The issue is in-memory corruption, not on-disk corruption, like
XFS (sde5): Corruption of in-memory data detected.
Basically, XFS is catching a bad state in memory and preventing it
from being propagated to disk. if it gets to disk, then you are
likely to lose data. IOWs, XFS is behaving as designed and is
actually preventing data loss in this situation.
Cheers,
Dave.
--
Dave Chinner
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Carlos E. R.
2014-07-03 23:34:52 UTC
Permalink
Post by Dave Chinner
Post by Carlos E. R.
...
hibernated at least once a day, perhaps three times if I have to go
out several times. It makes no sense to me to leave the machine
powered doing nothing, if hibernating is so easy and reliable - till
now. If I have to leave for more than a week, I tend to do a full
"halt".
Hibernation has always been suspect w.r.t. flushing filesystem
metadata. It does not guarantee that the filesystem is quiesced
and idle, it just does a sync() and hopes that is sufficient to get
the filesystem into a consistent state. The mess that this leaves is
then left to filesystem developers to play whack-a-mole with when
users have problems.
Ah, but my problem would then not happen always on the same partition. It
would affect others, would not?
Post by Dave Chinner
Point of note: there is no oops or crash occurring. XFS dumps the
stack when a corruption occurs to tell use where it was detected
and then shuts down the filesystem. Your system is still just fine
apart from not being able to access that filesystem until you
unmount it, rpeair it and mount it again.
Ok, true, there is no formal "Oops".

But no, the system does not remains fine, I had to hit the hardware reset
or power off button to get out.
Post by Dave Chinner
Post by Carlos E. R.
3 PID: 57 Comm: kworker/3:1 Tainted: P O 3.11.10-7-desktop
What's tainting your kernel? If you remove that taint, does the
problem still occur?
Sorry, I can't find that out. It is either the nvidia driver, or the
vmware kernel module. I can temporarily remove it for some days, but
hardly for a month. I agree that it might have unknown influence on the
initial corruption, but not on doing the repair, which I do in text mode,
or with another boot partition that doesn't have that driver.

That is, it would not have influence on "xfs_repair", when done on a non
tainted system.


I don't know of a way to provoking the problem at will, in order to remove
the taint for a brief period :-?
Post by Dave Chinner
Post by Carlos E. R.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.081655] Restarting kernel threads ... done.
<0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.086714] Restarting tasks ... done.
.....
Post by Carlos E. R.
<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
So the corruption occurred within 2s of the kernel restarting tasks
after a hibernation. It's really looking like a hibernation issue.
It's got to be related, of course.
Post by Dave Chinner
Post by Carlos E. R.
Question.
As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
It's the fact that the filesystem si still running and modifying
state when the snapshot is being taken that results in the snapshot
image containing an inconsistent snapshot. That then gets loaded
on thaw and it goes boom.
But it only happens on the /home partition, not on the email partition,
for instance, also in the same hard disk.

Unless... there are probably more things writing on the home partition
than on the mail partition any time.
Post by Dave Chinner
Post by Carlos E. R.
1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!
That's because the filesystem is likely to be consistent on disk.
The issue is in-memory corruption, not on-disk corruption, like
No, the on disk filesystem is not healthy. If I continue using it, after
reboot and using "xfs_repair" several times, it fails again within a day.

I got after booting (the first event):

0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all


And some hours later:

<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo


So, instead of using xfs_repair, I re-formatted and restored backup, which
worked for a month till next event.



- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Dave Chinner
2014-07-04 00:04:26 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Dave Chinner
Post by Carlos E. R.
...
hibernated at least once a day, perhaps three times if I have to go
out several times. It makes no sense to me to leave the machine
powered doing nothing, if hibernating is so easy and reliable - till
now. If I have to leave for more than a week, I tend to do a full
"halt".
Hibernation has always been suspect w.r.t. flushing filesystem
metadata. It does not guarantee that the filesystem is quiesced
and idle, it just does a sync() and hopes that is sufficient to get
the filesystem into a consistent state. The mess that this leaves is
then left to filesystem developers to play whack-a-mole with when
users have problems.
Ah, but my problem would then not happen always on the same
partition. It would affect others, would not?
It needs a busy/dirty filesystem. if the other filesystems are
mostly idle, then they are unlikely to trip over the problem.
Post by Dave Chinner
Point of note: there is no oops or crash occurring. XFS dumps the
stack when a corruption occurs to tell use where it was detected
and then shuts down the filesystem. Your system is still just fine
apart from not being able to access that filesystem until you
unmount it, rpeair it and mount it again.
Ok, true, there is no formal "Oops".
But no, the system does not remains fine, I had to hit the hardware
reset or power off button to get out.
That usually only happens when the root filesystem is shut down and
you can't access any of the binaries needed to run the system. Is
the filesystem that is shutting down the root?
Post by Dave Chinner
Post by Carlos E. R.
Question.
As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
It's the fact that the filesystem si still running and modifying
state when the snapshot is being taken that results in the snapshot
image containing an inconsistent snapshot. That then gets loaded
on thaw and it goes boom.
But it only happens on the /home partition, not on the email
partition, for instance, also in the same hard disk.
/home is typically where all the application have open files and are
writing data to.

Email partitions are unlikely to have problems because email
programs are pretty good about using fsync() to ensure your email
doesn't go missing and so aren't dirty at the time of a hibernation.
Unless... there are probably more things writing on the home
partition than on the mail partition any time.
*nod*
Post by Dave Chinner
Post by Carlos E. R.
1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!
That's because the filesystem is likely to be consistent on disk.
The issue is in-memory corruption, not on-disk corruption, like
No, the on disk filesystem is not healthy. If I continue using it,
after reboot and using "xfs_repair" several times, it fails again
within a day.
After at least one hibernation and thaw cycle, right?

FWIW, to rule out other issues with repair, you should probably
upgrade to the 3.2.0 xfsprogs release...

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Carlos E. R.
2014-07-04 01:29:31 UTC
Permalink
Post by Dave Chinner
Post by Carlos E. R.
Ah, but my problem would then not happen always on the same
partition. It would affect others, would not?
It needs a busy/dirty filesystem. if the other filesystems are
mostly idle, then they are unlikely to trip over the problem.
Right...
Post by Dave Chinner
Post by Carlos E. R.
Ok, true, there is no formal "Oops".
But no, the system does not remains fine, I had to hit the hardware
reset or power off button to get out.
That usually only happens when the root filesystem is shut down and
you can't access any of the binaries needed to run the system. Is
the filesystem that is shutting down the root?
No, it is not. Root is separate and using ext4. The problematic one is
/home.


What I did, as far I remember, was, when I noticed that home had failed
and was read only, to switch to runlevel 1, umount /home (killing the apps
that were still using it), then tried to mount it again to replay the log,
prior to using xfs-repair on it. Mount hung. ctrl-alt-supr failed, or
appeared to fail. So reset button...
Post by Dave Chinner
Post by Carlos E. R.
But it only happens on the /home partition, not on the email
partition, for instance, also in the same hard disk.
/home is typically where all the application have open files and are
writing data to.
Email partitions are unlikely to have problems because email
programs are pretty good about using fsync() to ensure your email
doesn't go missing and so aren't dirty at the time of a hibernation.
Ok, understood.
Post by Dave Chinner
Post by Carlos E. R.
No, the on disk filesystem is not healthy. If I continue using it,
after reboot and using "xfs_repair" several times, it fails again
within a day.
After at least one hibernation and thaw cycle, right?
Yes. 3, I think.

But there were kernel errors right after boot (XFS_WANT_CORRUPTED_RETURN).
Post by Dave Chinner
FWIW, to rule out other issues with repair, you should probably
upgrade to the 3.2.0 xfsprogs release...
I may try that... I see it is available on http://download.opensuse.org/repositories/filesystems/openSUSE_13.1/,
version xfsprogs-3.2.0


Ok, I'll work on it.


- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Dave Chinner
2014-07-04 01:40:08 UTC
Permalink
Post by Carlos E. R.
Post by Dave Chinner
Post by Carlos E. R.
Ok, true, there is no formal "Oops".
But no, the system does not remains fine, I had to hit the hardware
reset or power off button to get out.
That usually only happens when the root filesystem is shut down and
you can't access any of the binaries needed to run the system. Is
the filesystem that is shutting down the root?
No, it is not. Root is separate and using ext4. The problematic one
is /home.
What I did, as far I remember, was, when I noticed that home had
failed and was read only, to switch to runlevel 1, umount /home
(killing the apps that were still using it), then tried to mount it
again to replay the log, prior to using xfs-repair on it. Mount
hung. ctrl-alt-supr failed, or appeared to fail. So reset button...
That's a completely different issue to having a shutdown filesystem
hang your system. That's a mount problem, and likely a known issue.
You need to be specific when describing a problem, otherwise we
waste time going down the wrong paths.
Post by Carlos E. R.
Post by Dave Chinner
Post by Carlos E. R.
No, the on disk filesystem is not healthy. If I continue using it,
after reboot and using "xfs_repair" several times, it fails again
within a day.
After at least one hibernation and thaw cycle, right?
Yes. 3, I think.
Then hibernation has caused the corruption. It may take some time
for the corruption to be detected, but there isn't any doubt in my
mind that hibernation is the cause of your problems.

So, until we have kernel fixes, you'd do best to turn off
hibernation. If you can't live with leaving your machine powered up
or switching it off, then use suspend-to-ram rather than
suspend-to-disk to avoid the problematic snapshot/restore
situation....

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Carlos E. R.
2014-07-04 02:42:44 UTC
Permalink
Post by Dave Chinner
Post by Carlos E. R.
No, it is not. Root is separate and using ext4. The problematic one
is /home.
What I did, as far I remember, was, when I noticed that home had
failed and was read only, to switch to runlevel 1, umount /home
(killing the apps that were still using it), then tried to mount it
again to replay the log, prior to using xfs-repair on it. Mount
hung. ctrl-alt-supr failed, or appeared to fail. So reset button...
That's a completely different issue to having a shutdown filesystem
hang your system. That's a mount problem, and likely a known issue.
You need to be specific when describing a problem, otherwise we
waste time going down the wrong paths.
Sorry for the misunderstanding.

But halt/reboot did hung, even if it was after a failed mount. I was
trying to recover the system, remember, and I'm trying to remember what
exactly I did do, from memory, not written records.
Post by Dave Chinner
Post by Carlos E. R.
Post by Dave Chinner
Post by Carlos E. R.
No, the on disk filesystem is not healthy. If I continue using it,
after reboot and using "xfs_repair" several times, it fails again
within a day.
After at least one hibernation and thaw cycle, right?
Yes. 3, I think.
Then hibernation has caused the corruption. It may take some time
for the corruption to be detected, but there isn't any doubt in my
mind that hibernation is the cause of your problems.
Wait.

The sequence was:

healthy system
several hibernation cycles.
failure on come back from hibernation, with kernel error: XFS_WANT_CORRUPTED_GOTO.

reboot - kernel error messages: XFS_WANT_CORRUPTED_RETURN, which I probably did not see.
repair filesytem
several hibernation cycles during some hours.
failure on come back from hibernation, with kernel error: XFS_WANT_CORRUPTED_GOTO


See that there were kernel error messages right after rebooting, which I
think I did not see at the time, because had I seen them I would have
rebooted again, and I did not.


- From the log, already posted:

<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [ 19.173599] XFS (sdd5): Mounting Filesystem
<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [ 19.377918] XFS (sdd5): Starting recovery (logdev: internal)
<0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [ 19.747914] XFS (sdd5): Ending recovery (logdev: internal)

<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - - Starting Default.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - - Reached target Default.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - - Startup finished in 57ms.
<3.6> 2014-03-15 03:53:01 Telcontar systemd 1 - - Started User Manager for 9.
<0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all



Then I think I run xfs-repair, which did not complain, and I continued
working. Within the day, after 3 hibernations, it failed again with
XFS_WANT_CORRUPTED_GOTO, and I decided I had to reboot, backup, reformat,
restore.
Post by Dave Chinner
So, until we have kernel fixes, you'd do best to turn off
hibernation. If you can't live with leaving your machine powered up
or switching it off, then use suspend-to-ram rather than
suspend-to-disk to avoid the problematic snapshot/restore
situation....
Impossible... this is a desktop, not a laptop. Suspend to ram is high
risk, even if it works (which I think it doesn't).

If the failure is unavoidable, I'll reformat the partition as ext4
instead... which I do not like, but such is life.


But before that, I'll try upgrade xfsprogs.


- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Carlos E. R.
2014-07-04 03:12:00 UTC
Permalink
Post by Dave Chinner
So, until we have kernel fixes, you'd do best to turn off
hibernation. If you can't live with leaving your machine powered up
or switching it off, then use suspend-to-ram rather than
suspend-to-disk to avoid the problematic snapshot/restore
situation....
Forgot to mention:

I have been working the same way for years on this same machine, and with
the same software versions for some months. Only when I replaced the hard
disk that contains home, mail, and some other things, the problem started.

The partitions not cloned; I partitioned and formatted fresh (much bigger
partitions), with gparted, and copied files over with rsync.

- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Brian Foster
2014-07-04 12:40:49 UTC
Permalink
Post by Dave Chinner
Post by Carlos E. R.
Post by Dave Chinner
Post by Carlos E. R.
Ok, true, there is no formal "Oops".
But no, the system does not remains fine, I had to hit the hardware
reset or power off button to get out.
That usually only happens when the root filesystem is shut down and
you can't access any of the binaries needed to run the system. Is
the filesystem that is shutting down the root?
No, it is not. Root is separate and using ext4. The problematic one
is /home.
What I did, as far I remember, was, when I noticed that home had
failed and was read only, to switch to runlevel 1, umount /home
(killing the apps that were still using it), then tried to mount it
again to replay the log, prior to using xfs-repair on it. Mount
hung. ctrl-alt-supr failed, or appeared to fail. So reset button...
That's a completely different issue to having a shutdown filesystem
hang your system. That's a mount problem, and likely a known issue.
You need to be specific when describing a problem, otherwise we
waste time going down the wrong paths.
Post by Carlos E. R.
Post by Dave Chinner
Post by Carlos E. R.
No, the on disk filesystem is not healthy. If I continue using it,
after reboot and using "xfs_repair" several times, it fails again
within a day.
After at least one hibernation and thaw cycle, right?
Yes. 3, I think.
Then hibernation has caused the corruption. It may take some time
for the corruption to be detected, but there isn't any doubt in my
mind that hibernation is the cause of your problems.
So, until we have kernel fixes, you'd do best to turn off
hibernation. If you can't live with leaving your machine powered up
or switching it off, then use suspend-to-ram rather than
suspend-to-disk to avoid the problematic snapshot/restore
situation....
FWIW, I ran through a bunch of hibernation tests yesterday and couldn't
seem to reproduce anything interesting. I ran a preallocating workload
while constantly hibernating and waking a vm. I also tried using a hack
to avoid the eofblocks trim on release to make the test more effective,
and another to invoke the hibernation from the eofblocks background
scanner to "improve" the chances of conflict. I also ran a truncate test
to stress xfs_itruncate_extents() during hibernation cycles (there's
actually an instance of this in Carlos' reported output that doesn't
seem to involve a workqueue, attributed to thunderbird iirc) and ran
these similar tests going back to v3.11.0 as well as the latest
3.16.0-rc2.

None of this really means anything outside of there isn't quite enough
information to reproduce. It looks simple enough to enable freezing on
the eofblocks (or other xfs) workqueues by setting a flag, so we could
go and do that, but that still isn't definite. E.g., that thunderbird
truncate instance of failure stands out a bit to me.

Carlos,

You've indicated in your previous replies that you have reproduced this
repeatedly or more easily after you hit the problem and before you run a
reformat and restore sequence, enough to give you the impression at
least that the reformat is necessary. If you have the time, could you
run some of your typical activities through some hibernation cycles in
an attempt to narrow down what might contribute to this? E.g., perhaps
this only occurs with thunderbird or some other particular application
running, etc. If you have the ability to try a more recent kernel for a
period of time, that could be interesting as well.

Brian
Post by Dave Chinner
Cheers,
Dave.
--
Dave Chinner
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Carlos E. R.
2014-07-04 13:36:47 UTC
Permalink
On 2014-07-04 14:40, Brian Foster wrote:


Thanks.



Yes, that's right.



Yes, certainly. I can do more hibernation cycles to try trigger it
again. Thunderbird is an application that I use a lot, it is always
open. I have several remote imap accounts, and one local imap account,
using a local dovecot daemon on another partition (which has not been
affected so far). It also pulls nntp from a local daemon (leafnode),
which uses a different partition, on reiserfs.

It is a complex setup, you see :-)



I'll investigate if it is possible.

Meanwhile, I have upgraded the xfsprogs package to version 3.2.0, and
the kernel has got an update to 3.11.10 (openSUSE policy is to
backport security patches, while maintaining the same kernel version
through the lifetime of a release, so that this kernel has in fact
additions and patches from more advanced versions).

Having upgraded xfsprogs, I'm right now in the process of
backup-format-restore this home partition again, to take advantage of
any modification this new xfsprogs package may have. I think I will
use this time rsync instead of xfsrestore, although it is much slower
- - unless you ask me to use xfsrestore.

- --
Cheers / Saludos,

Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Brian Foster
2014-07-03 17:39:17 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
...
Post by Brian Foster
This is the background eofblocks scanner attempting to free preallocated
space on a file. The scanner looks for files that have been recently
grown and since been flushed to disk (i.e., no longer concurrently being
written to) and trims the post-eof preallocation that comes along with
growing files.
The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
extent we are attempting to free is already accounted for in the
by-block allocation btree. IOW, this is attempting to free an extent
that the allocation metadata thinks is already free.
Post by Carlos E. R.
* It happens only on restore from hibernation.
Interesting, could you elaborate a bit more on the behavior this system
is typically subjected to? i.e., is this a server that sees a constant
workload that is also frequently hibernated/awakened?
It is a desktop machine I use for work at home. I typically have many
applications opened on diferent workspaces in XFCE. Say one has terminals,
another has Thunderbird/Pine, another Firefox, another LibreOffice; another
may have gimp, another may be kbabel or lokalize, another may have vmplayer,
etc, whatever. When I go out or go to sleep, I hibernate the machine,
instead of powering down, because it is much faster than reboot, login, and
start the wanted applications, and I want to conserve some electricity.
I also use the machine for testing configurations, but these I try to do on
virtual machines, instead of my work partition.
The machine may be used anywhere from 4 to 16 hours a day, and hibernated at
least once a day, perhaps three times if I have to go out several times. It
makes no sense to me to leave the machine powered doing nothing, if
hibernating is so easy and reliable - till now. If I have to leave for more
than a week, I tend to do a full "halt".
By the way, this started hapening when I replaced an old 500 GB hard disk
(Seagate ST3500418AS) with a 2 TB new unit (Seagate ST2000DM001-1CH164).
Smartctl long test says fine (and seatools from Windows, too).
Ok, so there's a lot going on. I was mainly curious to see what was
causing lingering preallocations, but it could be anything extending a
file multiple times.
Post by Brian Foster
Post by Carlos E. R.
I do not have more info than what appears on the logs, but four times (two
/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
So you have reproduced this, reformatted with mkfs, restored from
backups and continued to reproduce the problem? And still only on this
particular partition?
Right. Exactly that.
Only that I can not reproduce the issue at will, but about once a month,
randomly.
AFAIK, xfsdump can not carry over a filesystem corruption, right?
I think that's accurate, though it might complain/fail in the act of
dumping an fs that is corrupted. The behavior here suggests there might
not be on disk corruption, however.
**** LONG DESCRIPTION and LOGS start here ********
...
<5.6> 2014-06-29 12:48:34 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1111" x-info="http://www.rsyslog.com"] exiting on signal 15.
2014-06-29 12:48:35+02:00 - Halting the system now =========================================== uptime: 12:48pm up 4 days 8:43, 33 users, load average: 1.40, 0.53, 0.67
2014-06-29 12:57:41+02:00 - Booting the system now ================================================================================ Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux
(it does not show in the log that I had to hit the hardware reset button,
the machine refused to reboot normally, apparently)
(If you ask why I took so long to notice the problem after thawing,
my routine is to power up the machine, then go prepare tea. :-)
When I come back with the mug, I'm dismayed to see I can not
start working; and this day I was in a a hurry)
So I reboot (text mode, level 3), umount home, run xfsrepair, mount again,
do xfsdump, do simultanesouly an rsync (it is a file by file copy, in case
of problems with dump), umount, use YaST in text mode to reformat the
partition, mount, and then xfsrestore. It did not occur to me to make a
'dd' photo this time: I was tired and busy.
Maybe next time I can take the photo with dd before doing anything else (it
takes about 80 minutes), or simply do an "xfs_metadump", which should be
faster. And I might not have then 500 GiB of free space to make a dd copy,
anyway.
xfs_metadump should be faster. It will grab the metadata only and
obfuscate filenames so as to hide sensitive information.
Question.
As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
Not sure, though if so I would think that might be a more common source
of problems.
Post by Brian Foster
This is interesting because the corruption appears to be associated with
post-eof space, which is generally transient. The worst case is that
this space is trimmed off files when they are evicted from cache, such
as during a umount. To me, that seems to correlate with a more
recent/runtime problem rather than something that might be lingering on
disk, but we don't really know for sure.
Dunno.
1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!
To me, #2 is the worst, and it is what makes me do the backup, format,
restore cycle for recovery. An occassional kernel crash is somewhat
acceptable :-}
Well it could be that the "corruption" is gone at the point of a
remount. E.g., something becomes inconsistent in memory, the fs detects
it and shuts down before going any further. That's actually a positive.
;)

That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.
Post by Brian Foster
Post by Carlos E. R.
Wait! I have a "dd" copy of the entire partition (500 GB), made on March
16th, 5 AM, so hard data could be obtained from there. I had forgotten. I'll
...
Post by Brian Foster
Post by Carlos E. R.
I could do a "xfs_metadump" on it - just tell me what options to use, and
where can the result be uploaded to, if big.
A metadump would be helpful, though that only gives us the on-disk
state. What was the state of this fs at the time the dd image was
created?
I'm sorry, I'm not absolutely sure. I believe it is corrupted, but I can not
vouch it.
Post by Brian Foster
I'm curious if something like an 'rm -rf *' on the metadump
would catch any other corruptions or if this is indeed limited to
something associated with recent (pre)allocations.
Sorry, run 'rm -rf *' where???
On the metadump... mainly just to see whether freeing all of the used
blocks in the fs triggered any other errors (i.e., a brute force way to
check for further corruptions).
Post by Brian Foster
Run 'xfs_metadump <src> <tgtfile>' to create a metadump that will
obfuscate filenames by default. It should also be compressible. In the
future, it's probably worth grabbing a metadump as a first step (before
repair, zeroing the log, etc.) so we can look at the fs in the state
most recent to the crash.
I will take that photo next time, using a rescue system in order to impede
the system from mounting the partition and replaying the log. Dunno how long
that will take to happen, though... usually a month - but at least now I
know how to do it.
Meanwhile, I have done a xfs_metadump of the image, and compressed it with
xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
that, and even less to a mail list.
Do you still have a bugzilla system where I can upload it? I had an account
at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
runs :-?
I think http://bugzilla.redhat.com should allow you to file a bug and
attach the file.

Brian
If you don't, I can try to create it a bugzilla on openSUSE instead, and
tell you the number... but I don't know if it takes files that big. If it
doesn't, I'll fragment the file. You need to have an account there, I think,
to retrieve the attachment, and I would prefer to mark the bug private, or
at least the attachment.
I did the following.
First I made a copy, with "dd", of the partition image, all 489G of it. On
this copy I ran "xfs_check", "xfs_repair -n", and "xfs_repair", with these
Telcontar:/data/storage_d/old_backup # xfs_check xfs_copy_home_workonit
xfs_check is deprecated and scheduled for removal in June 2014.
Please use xfs_repair -n <dev> instead.
Telcontar:/data/storage_d/old_backup # xfs_repair -n xfs_copy_home_workonit
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
Telcontar:/data/storage_d/old_backup # time xfs_repair xfs_copy_home_workonit
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
real 0m28.058s
user 0m1.692s
sys 0m2.265s
Telcontar:/data/storage_d/old_backup #
Maybe the image was made after repair, or maybe xfs_repair doesn't detect
anything, which as far as I remember, was the case.
I recreate the copy, to try "mount" on an unaltered copy.
Telcontar:/data/storage_d/old_backup # time dd if=xfs_copy_home
of=xfs_copy_home_workonit && mount -v xfs_copy_home_workonit mount/
1024000000+0 records in
1024000000+0 records out
524288000000 bytes (524 GB) copied, 4662.7 s, 112 MB/s
real 77m43.697s
user 3m1.420s
sys 28m41.958s
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
(reverse-i-search)`mount': time dd if=xfs_copy_home
Telcontar:/data/storage_d/old_backup #
So it mounts...
- -- Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEARECAAYFAlO0x18ACgkQtTMYHG2NR9X6QwCcD8r5qXIHVh4ELklM/tzXASds
yskAoIcwxYNC2tKsS7wE9Jp+g4MNUdpd
=pIZI
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Carlos E. R.
2014-07-04 21:32:26 UTC
Permalink
[This email has been delayed, while I thought about where to upload
metadata file - see near the end]
Post by Brian Foster
Ok, so there's a lot going on. I was mainly curious to see what was
causing lingering preallocations, but it could be anything extending a
file multiple times.
Right.
Post by Brian Foster
Post by Carlos E. R.
AFAIK, xfsdump can not carry over a filesystem corruption, right?
I think that's accurate, though it might complain/fail in the act of
dumping an fs that is corrupted. The behavior here suggests there might
not be on disk corruption, however.
At least, not a detectable one.

If I don't do that backup-format-restore, I get issues soon, and it
crashes within a day - I got after booting (the first event):

0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all

And some hours later:

<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo


It was here that I decided to backup-format-restore instead.
Post by Brian Foster
Post by Carlos E. R.
Maybe next time I can take the photo with dd before doing anything else (it
takes about 80 minutes), or simply do an "xfs_metadump", which should be
faster. And I might not have then 500 GiB of free space to make a dd copy,
anyway.
xfs_metadump should be faster. It will grab the metadata only and
obfuscate filenames so as to hide sensitive information.
Ok, I have a post-it label on the monitor so that I remember - my notes
are typically stored in the home partition :-)


But the obfuscation is not complete, I can recognize file names:


00008DC0 .leeme.kfPTgt . ....... .2aujzfJ.%;u. . .0...
00008DF0 ***@.. . .......
00008E20 .amyN3xYjaldFXYpeUry. 3;&.K.. .. .0... !.pepe_j
00008E50 ust_created.tar.bz2.JlyD0W .. ***@....... .NGb0URO
00008E80 C0Bh9cHwp-hBh.6wMS .. .p . ... ..registro.0DPzS
00008EB0 G .. . ....... .8n-.w$.9. .. . .8... +.suse_u
00008EE0 pgrade_to_102_pkglist-bis.txt.tcFUKq. . .......
00008F10 #B-XqcrWP4cqsw77yv8UsYbcCa-D76q..(#.. .. .8...
00008F40 '.suse_upgrade_to_102_pkglist.txt.0KTuDa 7.. .8


I just had a quick look with 'mc', the dump is to large too inspect it
all.
Post by Brian Foster
Post by Carlos E. R.
Question.
As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
Not sure, though if so I would think that might be a more common source
of problems.
And it only affects my /home partition - although it may be the busiest
one.
Post by Brian Foster
Post by Carlos E. R.
1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!
To me, #2 is the worst, and it is what makes me do the backup, format,
restore cycle for recovery. An occassional kernel crash is somewhat
acceptable :-}
Well it could be that the "corruption" is gone at the point of a
remount. E.g., something becomes inconsistent in memory, the fs detects
it and shuts down before going any further. That's actually a positive.
;)
That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.
No, if I don't backup-format-restore it happens again within a day. There
is something lingering. Unless that was just chance... :-?

It is true that during that day I hibernated several times more than
needed to see if it happened again - and it did.
Post by Brian Foster
Post by Carlos E. R.
Post by Brian Foster
I'm curious if something like an 'rm -rf *' on the metadump
would catch any other corruptions or if this is indeed limited to
something associated with recent (pre)allocations.
Sorry, run 'rm -rf *' where???
On the metadump... mainly just to see whether freeing all of the used
blocks in the fs triggered any other errors (i.e., a brute force way to
check for further corruptions).
Sorry, but I fail to see how to do it. I maybe thick, or I lack the context.

If I run:

Telcontar:/data/storage_d/old_backup # ls -lh
total 604G
drwxr-xr-x 22 root root 4.0K Mar 8 20:30 home
drwxr-xr-x 3 root root 16 Sep 25 2010 home1
drwxr-xr-x 2 root root 6 Jul 3 02:36 mount
- -rw-r--r-- 1 root root 45 Jul 3 04:25 procedure
- -rw-r--r-- 1 root root 388M Jul 3 02:42 tgtfile
- -rw-r--r-- 1 root root 11M Jul 3 02:50 tgtfile2.xz
- -rw-r--r-- 1 root users 489G Mar 16 05:42 xfs_copy_home
- -rw-r--r-- 1 root root 489G Jul 3 04:40 xfs_copy_home_workonit
- -rw-r--r-- 1 root users 39G Mar 16 05:49 xfsdump__home
- -rw-r--r-- 1 root users 39G Mar 16 05:57 xfsdump__home1
Telcontar:/data/storage_d/old_backup # rm -rf *


that would destroy my entire backup!


If you mean:

rm -rf tgtfile

I fail to see what that would accomplish, except to remove a file that is actually on a different partition, not home.

However, I can do:

Telcontar:/data/storage_d/old_backup # mount -v xfs_copy_home_workonit mount/
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
Telcontar:/data/storage_d/old_backup # cd mount
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*

real 2m45.380s
user 0m0.265s
sys 0m6.878s
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # ls -la
total 4
drwxr-xr-x 2 root root 6 Jul 4 01:56 .
drwxr-xr-x 5 root root 4096 Jul 3 04:25 ..
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 489G 33M 489G 1% /data/storage_d/old_backup/mount
Telcontar:/data/storage_d/old_backup/mount #


And I do not see anything on the log, only that it mounted cleanly.
Post by Brian Foster
Post by Carlos E. R.
Meanwhile, I have done a xfs_metadump of the image, and compressed it with
xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
that, and even less to a mail list.
Do you still have a bugzilla system where I can upload it? I had an account
at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
runs :-?
I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
logged in there now. I haven't checked if I can create a bug, not been
sure what parameters to use (product, component, whom to assign to). I
think that would be the most appropriate place.

Meanwhile, I have uploaded the file to my google drive account, so I can
share it with anybody on request - ie, it is not public, I need to add a
gmail address to the list of people that can read the file.

Alternatively, I could just email the file to people asking for it,
offlist, but not in a single email, in chunks limited to 1.5 MB per
email.
Post by Brian Foster
I think http://bugzilla.redhat.com should allow you to file a bug and
attach the file.
Sorry, I don't have an account there...

I do have one at openSUSE, though, and it does allow me to attach files, up
to a limit. If the file is to big, it can be fragmented in pieces. But I
will not use it unless you people say that you have an account there.

For using a bugzilla, the most appropriate one would be at SGI, IMHO, if
they are still supporting this project.

- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Brian Foster
2014-07-05 12:28:33 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[This email has been delayed, while I thought about where to upload metadata
file - see near the end]
Post by Brian Foster
Ok, so there's a lot going on. I was mainly curious to see what was
causing lingering preallocations, but it could be anything extending a
file multiple times.
Right.
Post by Brian Foster
Post by Carlos E. R.
AFAIK, xfsdump can not carry over a filesystem corruption, right?
I think that's accurate, though it might complain/fail in the act of
dumping an fs that is corrupted. The behavior here suggests there might
not be on disk corruption, however.
At least, not a detectable one.
If I don't do that backup-format-restore, I get issues soon, and it crashes
I echo Dave's previous question... within a day of doing what? Just
using the system or doing more hibernation cycles?
0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
It was here that I decided to backup-format-restore instead.
Post by Brian Foster
Post by Carlos E. R.
Maybe next time I can take the photo with dd before doing anything else (it
takes about 80 minutes), or simply do an "xfs_metadump", which should be
faster. And I might not have then 500 GiB of free space to make a dd copy,
anyway.
xfs_metadump should be faster. It will grab the metadata only and
obfuscate filenames so as to hide sensitive information.
Ok, I have a post-it label on the monitor so that I remember - my notes are
typically stored in the home partition :-)
00008DC0 .leeme.kfPTgt . ....... .2aujzfJ.%;u. . .0...
00008E20 .amyN3xYjaldFXYpeUry. 3;&.K.. .. .0... !.pepe_j
00008E80 C0Bh9cHwp-hBh.6wMS .. .p . ... ..registro.0DPzS
00008EB0 G .. . ....... .8n-.w$.9. .. . .8... +.suse_u
00008EE0 pgrade_to_102_pkglist-bis.txt.tcFUKq. . .......
00008F10 #B-XqcrWP4cqsw77yv8UsYbcCa-D76q..(#.. .. .8...
00008F40 '.suse_upgrade_to_102_pkglist.txt.0KTuDa 7.. .8
I just had a quick look with 'mc', the dump is to large too inspect it all.
Post by Brian Foster
Post by Carlos E. R.
Question.
As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
Not sure, though if so I would think that might be a more common source
of problems.
And it only affects my /home partition - although it may be the busiest one.
Post by Brian Foster
Post by Carlos E. R.
1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!
To me, #2 is the worst, and it is what makes me do the backup, format,
restore cycle for recovery. An occassional kernel crash is somewhat
acceptable :-}
Well it could be that the "corruption" is gone at the point of a
remount. E.g., something becomes inconsistent in memory, the fs detects
it and shuts down before going any further. That's actually a positive.
;)
That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.
No, if I don't backup-format-restore it happens again within a day. There is
something lingering. Unless that was just chance... :-?
It is true that during that day I hibernated several times more than needed
to see if it happened again - and it did.
This depends on what causes this to happen, not how frequent it happens.
Does it continue to happen along with hibernation, or do you start
seeing these kind of errors during normal use?

If the latter, that could suggest something broken on disk. If the
former, that could simply suggest the fs (perhaps on-disk) has made it
into some kind of state that makes this easier to reproduce, for
whatever reason. It could be timing, location of metadata,
fragmentation, or anything really for that matter, but it doesn't
necessarily mean corruption (even though it doesn't rule it out).
Perhaps the clean regeneration of everything by a from-scratch recovery
simply makes this more difficult to reproduce until the fs naturally
becomes more aged/fragmented, for example.

This probably makes a pristine, pre-repair metadump of the reproducing
fs more interesting. I could try some of my previous tests against a
restore of that metadump.
Post by Brian Foster
Post by Carlos E. R.
Post by Brian Foster
I'm curious if something like an 'rm -rf *' on the metadump
would catch any other corruptions or if this is indeed limited to
something associated with recent (pre)allocations.
Sorry, run 'rm -rf *' where???
On the metadump... mainly just to see whether freeing all of the used
blocks in the fs triggered any other errors (i.e., a brute force way to
check for further corruptions).
Sorry, but I fail to see how to do it. I maybe thick, or I lack the context.
Telcontar:/data/storage_d/old_backup # ls -lh
total 604G
drwxr-xr-x 22 root root 4.0K Mar 8 20:30 home
drwxr-xr-x 3 root root 16 Sep 25 2010 home1
drwxr-xr-x 2 root root 6 Jul 3 02:36 mount
- -rw-r--r-- 1 root root 45 Jul 3 04:25 procedure
- -rw-r--r-- 1 root root 388M Jul 3 02:42 tgtfile
- -rw-r--r-- 1 root root 11M Jul 3 02:50 tgtfile2.xz
- -rw-r--r-- 1 root users 489G Mar 16 05:42 xfs_copy_home
- -rw-r--r-- 1 root root 489G Jul 3 04:40 xfs_copy_home_workonit
- -rw-r--r-- 1 root users 39G Mar 16 05:49 xfsdump__home
- -rw-r--r-- 1 root users 39G Mar 16 05:57 xfsdump__home1
Telcontar:/data/storage_d/old_backup # rm -rf *
that would destroy my entire backup!
I was somewhat thinking out loud originally discussing this topic. I was
suggesting to run this against a restored metadump, not the primary
dataset or a backup.

The metadump creates an image of the metadata of the source fs in a file
(no data is copied). This metadump image can be restored at will via
'xfs_mdrestore.' This allows restoring to a file, mounting the file
loopback, and performing experiments or investigation on the fs
generally as it existed when the shutdown was reproducible.

So basically:

- xfs_mdrestore <mdimgfile> <tmpfileimg>
- mount <tmpfileimg> /mnt
- rm -rf /mnt/*

... was what I was suggesting. <tmpfileimg> can be recreated from the
metadump image afterwards to get back to square one.
rm -rf tgtfile
I fail to see what that would accomplish, except to remove a file that is actually on a different partition, not home.
Telcontar:/data/storage_d/old_backup # mount -v xfs_copy_home_workonit mount/
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
Telcontar:/data/storage_d/old_backup # cd mount
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
real 2m45.380s
user 0m0.265s
sys 0m6.878s
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # ls -la
total 4
drwxr-xr-x 2 root root 6 Jul 4 01:56 .
drwxr-xr-x 5 root root 4096 Jul 3 04:25 ..
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 489G 33M 489G 1% /data/storage_d/old_backup/mount
Telcontar:/data/storage_d/old_backup/mount #
And I do not see anything on the log, only that it mounted cleanly.
Post by Brian Foster
Post by Carlos E. R.
Meanwhile, I have done a xfs_metadump of the image, and compressed it with
xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
that, and even less to a mail list.
Do you still have a bugzilla system where I can upload it? I had an account
at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
runs :-?
I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
logged in there now. I haven't checked if I can create a bug, not been sure
what parameters to use (product, component, whom to assign to). I think that
would be the most appropriate place.
Meanwhile, I have uploaded the file to my google drive account, so I can
share it with anybody on request - ie, it is not public, I need to add a
gmail address to the list of people that can read the file.
Alternatively, I could just email the file to people asking for it, offlist,
but not in a single email, in chunks limited to 1.5 MB per email.
Either of the bugzilla or google drive options works Ok for me.

Brian
Post by Brian Foster
I think http://bugzilla.redhat.com should allow you to file a bug and
attach the file.
Sorry, I don't have an account there...
I do have one at openSUSE, though, and it does allow me to attach files, up
to a limit. If the file is to big, it can be fragmented in pieces. But I
will not use it unless you people say that you have an account there.
For using a bugzilla, the most appropriate one would be at SGI, IMHO, if
they are still supporting this project.
- -- Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEARECAAYFAlO3HXUACgkQtTMYHG2NR9VndgCgillZYmQCvUynytO/7YALlUyv
c9gAnj8GmFfnMHGd+P9GaWm9ScVVTH81
=GEXl
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Carlos E. R.
2014-07-12 00:30:45 UTC
Permalink
Post by Brian Foster
If I don't do that backup-format-restore, I get issues soon, and it crashes
I echo Dave's previous question... within a day of doing what? Just
using the system or doing more hibernation cycles?
It is in the long post with the logs I posted.

The first time it crashed, I rebooted, got some errors I probably did not
see, managed to mount the device, and I used the machine normally, doing
several hibernation cycles. On one of these, it crashed, within the day.
Post by Brian Foster
0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
It was here that I decided to backup-format-restore instead.
Post by Brian Foster
That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.
No, if I don't backup-format-restore it happens again within a day. There is
something lingering. Unless that was just chance... :-?
It is true that during that day I hibernated several times more than needed
to see if it happened again - and it did.
This depends on what causes this to happen, not how frequent it happens.
Does it continue to happen along with hibernation, or do you start
seeing these kind of errors during normal use?
Except the first time that this happened, the sequence is this:

I use the machine for weeks, without event, booting once, then hibernating
at least once per day. I finally reboot when I have to apply some
system update, or something special.

Till one day, this "thing" happens. It happens inmediately after coming
out from hibernation, and puts the affected partition, always /home, in
read only mode. When it happens, I reboot, repair partition manually if
needed, then I back up the files, format it, and replace all the files
from the backup just made, with xfsdump. Well, this last time, I used
rsync instead.


It has happened "only" four times:

2014-03-15 03:35:17
2014-03-15 22:20:34
2014-04-17 22:47:08
2014-06-29 12:32:18
Post by Brian Foster
If the latter, that could suggest something broken on disk.
That was my first thought, because it started hapening after replacing the
hard disk, but also after a kernel update. But I have tested that disk
several times, with smartctl and with the manufacturer test tool, and
nothing came out.
Post by Brian Foster
If the
former, that could simply suggest the fs (perhaps on-disk) has made it
into some kind of state that makes this easier to reproduce, for
whatever reason. It could be timing, location of metadata,
fragmentation, or anything really for that matter, but it doesn't
necessarily mean corruption (even though it doesn't rule it out).
Perhaps the clean regeneration of everything by a from-scratch recovery
simply makes this more difficult to reproduce until the fs naturally
becomes more aged/fragmented, for example.
This probably makes a pristine, pre-repair metadump of the reproducing
fs more interesting. I could try some of my previous tests against a
restore of that metadump.
Well, I suggest that, unless you can find something on the metadata (I
just sent you the link via email from google), we wait till the next
event. I will at that time take an intact metadata photo. But this can
take a month or two to happen again, if the pattern keeps.
Post by Brian Foster
I was somewhat thinking out loud originally discussing this topic. I was
suggesting to run this against a restored metadump, not the primary
dataset or a backup.
The metadump creates an image of the metadata of the source fs in a file
(no data is copied). This metadump image can be restored at will via
'xfs_mdrestore.' This allows restoring to a file, mounting the file
loopback, and performing experiments or investigation on the fs
generally as it existed when the shutdown was reproducible.
Ah... I see.
Post by Brian Foster
- xfs_mdrestore <mdimgfile> <tmpfileimg>
- mount <tmpfileimg> /mnt
- rm -rf /mnt/*
... was what I was suggesting. <tmpfileimg> can be recreated from the
metadump image afterwards to get back to square one.
I see.

Well, I tried this on a copy of the 'dd' image days ago, and nothing
hapened. I guess the procedure above would be the same.
Post by Brian Foster
I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
logged in there now. I haven't checked if I can create a bug, not been sure
what parameters to use (product, component, whom to assign to). I think that
would be the most appropriate place.
Meanwhile, I have uploaded the file to my google drive account, so I can
share it with anybody on request - ie, it is not public, I need to add a
gmail address to the list of people that can read the file.
Alternatively, I could just email the file to people asking for it, offlist,
but not in a single email, in chunks limited to 1.5 MB per email.
Either of the bugzilla or google drive options works Ok for me.
It's here:

<https://drive.google.com/file/d/0Bx2OgfTa-XC9UDBnQzZIMTVyN0k/edit?usp=sharing>

Whoever wants to read it, has to tell me the address to add to it, access
is not public.


- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Carlos E. R.
2014-07-12 01:30:17 UTC
Permalink
[xfs_metadump]
Post by Carlos E. R.
<https://drive.google.com/file/d/0Bx2OgfTa-XC9UDBnQzZIMTVyN0k/edit?usp=sharing>
Whoever wants to read it, has to tell me the address to add to it, access
is not public.
Wait.

I just found out that I did something very wrong. That xfs_metadump file
is very wrong, it is not look to be the correct one.


The info on it says:


Telcontar:/data/storage_d/old_backup # xfs_info tgtfile
meta-data=/dev/sdf2 isize=256 agcount=4, agsize=122341568
blks
= sectsz=512 attr=2, projid32bit=0
= crc=0
data = bsize=4096 blocks=489366272, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=238948, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0


while the currently mounted home says:


elcontar:~ # mount | grep home
/dev/sde5 on /home type xfs (rw,noatime,attr2,inode64,noquota)

Telcontar:~ # xfs_info /dev/sde5
meta-data=/dev/sde5 isize=256 agcount=4, agsize=32000000
blks
= sectsz=512 attr=2, projid32bit=1
= crc=0
data = bsize=4096 blocks=128000000, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=62500, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Telcontar:~ # mount | grep /home




So, please wait till I verify things again. Tomorrow, it is 3 AM here.
Sorry :-(


Unless "xfs_info tgtfile" gives the information about the device where
"tgtfile" is stored (/dev/sdf2), not on the image file itself :-?


I'm very confused.


- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Carlos E. R.
2014-07-12 01:45:07 UTC
Permalink
Post by Carlos E. R.
So, please wait till I verify things again. Tomorrow, it is 3 AM here.
Sorry :-(
Unless "xfs_info tgtfile" gives the information about the device where
"tgtfile" is stored (/dev/sdf2), not on the image file itself :-?
I'm very confused.
False alarm. See:


Telcontar:/data/storage_c/tmp_borrar # xfs_info tgtfile
meta-data=/dev/sde18 isize=256 agcount=4, agsize=35770496 blks
= sectsz=512 attr=2, projid32bit=0
= crc=0
data = bsize=4096 blocks=143081984, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=69864, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Telcontar:/data/storage_c/tmp_borrar #

Telcontar:/data/storage_d/old_backup # xfs_info tgtfile
meta-data=/dev/sdf2 isize=256 agcount=4, agsize=122341568 blks
= sectsz=512 attr=2, projid32bit=0
= crc=0
data = bsize=4096 blocks=489366272, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=238948, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Telcontar:/data/storage_d/old_backup #

Telcontar:/data/storage_d/old_backup # file tgtfile
tgtfile: XFS filesystem metadump image
Telcontar:/data/storage_d/old_backup #


It appears that the command "xfs_info" analyzes the current, underlying,
filesystem, not the one given on the command line. Or something in that
line, I'm too sleepy. I hope you can understand my meaning better than my
words...


So the uploaded file is the correct one.


- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Brian Foster
2014-07-12 14:26:38 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Carlos E. R.
So, please wait till I verify things again. Tomorrow, it is 3 AM here.
Sorry :-(
Unless "xfs_info tgtfile" gives the information about the device where
"tgtfile" is stored (/dev/sdf2), not on the image file itself :-?
I'm very confused.
Telcontar:/data/storage_c/tmp_borrar # xfs_info tgtfile
meta-data=/dev/sde18 isize=256 agcount=4, agsize=35770496 blks
= sectsz=512 attr=2, projid32bit=0
= crc=0
data = bsize=4096 blocks=143081984, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=69864, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Telcontar:/data/storage_c/tmp_borrar #
Telcontar:/data/storage_d/old_backup # xfs_info tgtfile
meta-data=/dev/sdf2 isize=256 agcount=4, agsize=122341568 blks
= sectsz=512 attr=2, projid32bit=0
= crc=0
data = bsize=4096 blocks=489366272, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=238948, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Telcontar:/data/storage_d/old_backup #
Telcontar:/data/storage_d/old_backup # file tgtfile
tgtfile: XFS filesystem metadump image
Telcontar:/data/storage_d/old_backup #
It appears that the command "xfs_info" analyzes the current, underlying,
filesystem, not the one given on the command line. Or something in that
line, I'm too sleepy. I hope you can understand my meaning better than my
words...
xfs_info reports on the mounted fs. If you check out 'man xfs_info,'
you'll see it specifies the mountpoint as a parameter but it can query
the fs info from the actual mountpoint or any file therein. E.g., so it
doesn't know anything about a metadump file and pointing it at one will
just report on the fs that contains the file.

If you wanted to test an actual metadump image, restore the metadump to
an fs image, mount and test that:

xfs_mdrestore ./metadump ./mynewfsimage
mount ./mynewfsimage /mnt -o loop
xfs_info /mnt/

Brian
So the uploaded file is the correct one.
- -- Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEARECAAYFAlPAkyMACgkQtTMYHG2NR9U2uACfTdPx8DGCkBzLGiSVGn3XCcSV
7ukAnAvR1CjR9Jx3rPosLYNceBtQjJjf
=/odv
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Brian Foster
2014-07-12 14:19:25 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Brian Foster
If I don't do that backup-format-restore, I get issues soon, and it crashes
I echo Dave's previous question... within a day of doing what? Just
using the system or doing more hibernation cycles?
It is in the long post with the logs I posted.
The first time it crashed, I rebooted, got some errors I probably did not
see, managed to mount the device, and I used the machine normally, doing
several hibernation cycles. On one of these, it crashed, within the day.
That still suggests something could be going on at runtime during the
hibernation or wakeup cycle. Identifying some kind of runtime error or
metadata inconsistency without involving hibernation would be a smoking
gun for a general corruption. So far we have no evidence of reproduction
without hibernation and no evidence of a persistent corruption. That
doesn't rule out something going on on-disk, but it certainly suggests a
runtime corruption during hibernation/wake is more likely.
Post by Brian Foster
0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
It was here that I decided to backup-format-restore instead.
Post by Brian Foster
That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.
No, if I don't backup-format-restore it happens again within a day. There is
something lingering. Unless that was just chance... :-?
It is true that during that day I hibernated several times more than needed
to see if it happened again - and it did.
This depends on what causes this to happen, not how frequent it happens.
Does it continue to happen along with hibernation, or do you start
seeing these kind of errors during normal use?
I use the machine for weeks, without event, booting once, then hibernating
at least once per day. I finally reboot when I have to apply some system
update, or something special.
Till one day, this "thing" happens. It happens inmediately after coming out
from hibernation, and puts the affected partition, always /home, in read
only mode. When it happens, I reboot, repair partition manually if needed,
then I back up the files, format it, and replace all the files from the
backup just made, with xfsdump. Well, this last time, I used rsync instead.
2014-03-15 03:35:17
2014-03-15 22:20:34
2014-04-17 22:47:08
2014-06-29 12:32:18
Post by Brian Foster
If the latter, that could suggest something broken on disk.
That was my first thought, because it started hapening after replacing the
hard disk, but also after a kernel update. But I have tested that disk
several times, with smartctl and with the manufacturer test tool, and
nothing came out.
I was referring to a potential on-disk corruption, but that's good to
know as well.
Post by Brian Foster
If the
former, that could simply suggest the fs (perhaps on-disk) has made it
into some kind of state that makes this easier to reproduce, for
whatever reason. It could be timing, location of metadata,
fragmentation, or anything really for that matter, but it doesn't
necessarily mean corruption (even though it doesn't rule it out).
Perhaps the clean regeneration of everything by a from-scratch recovery
simply makes this more difficult to reproduce until the fs naturally
becomes more aged/fragmented, for example.
This probably makes a pristine, pre-repair metadump of the reproducing
fs more interesting. I could try some of my previous tests against a
restore of that metadump.
Well, I suggest that, unless you can find something on the metadata (I just
sent you the link via email from google), we wait till the next event. I
will at that time take an intact metadata photo. But this can take a month
or two to happen again, if the pattern keeps.
That would be a good idea. I'll take a look at the metadump when I have
a chance. If there is nothing out of the ordinary, the next best option
is to metadump the fs that reproduces the behavior. I could retry some
of my previous vm hibernation tests against that. As mentioned
previously, once you have a more reliably reproducing state, that's also
a good opportunity to see if you can narrow down which of the things you
have running against the fs appear to trigger this.
Post by Brian Foster
I was somewhat thinking out loud originally discussing this topic. I was
suggesting to run this against a restored metadump, not the primary
dataset or a backup.
The metadump creates an image of the metadata of the source fs in a file
(no data is copied). This metadump image can be restored at will via
'xfs_mdrestore.' This allows restoring to a file, mounting the file
loopback, and performing experiments or investigation on the fs
generally as it existed when the shutdown was reproducible.
Ah... I see.
Post by Brian Foster
- xfs_mdrestore <mdimgfile> <tmpfileimg>
- mount <tmpfileimg> /mnt
- rm -rf /mnt/*
... was what I was suggesting. <tmpfileimg> can be recreated from the
metadump image afterwards to get back to square one.
I see.
Well, I tried this on a copy of the 'dd' image days ago, and nothing
hapened. I guess the procedure above would be the same.
A dd of the raw block device will preserve the metadata, so yeah that's
effectively the same test. If there were an obvious free space
corruption, the fs probably would have shutdown. I can retry the same
test via the metadump on a debug kernel as well.

Brian
Post by Brian Foster
I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
logged in there now. I haven't checked if I can create a bug, not been sure
what parameters to use (product, component, whom to assign to). I think that
would be the most appropriate place.
Meanwhile, I have uploaded the file to my google drive account, so I can
share it with anybody on request - ie, it is not public, I need to add a
gmail address to the list of people that can read the file.
Alternatively, I could just email the file to people asking for it, offlist,
but not in a single email, in chunks limited to 1.5 MB per email.
Either of the bugzilla or google drive options works Ok for me.
<https://drive.google.com/file/d/0Bx2OgfTa-XC9UDBnQzZIMTVyN0k/edit?usp=sharing>
Whoever wants to read it, has to tell me the address to add to it, access is
not public.
- -- Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEARECAAYFAlPAgb0ACgkQtTMYHG2NR9U/FQCgjtwuDC0HTSG3i7DrEV8+qZeT
6mUAn0FGf42SsU1WeRx/AAk4X2oqV4Bc
=pASJ
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Carlos E. R.
2014-08-11 14:23:01 UTC
Permalink
Happened again, I'm on middle of recovery procedures, and using my laptop
to post.


The system did not "die", I could still use xterms owned by root. So I
tried to use xfs_metadump before rebooting, but it refused, said that the
partition was mounted (and I know from previous times that umounting fails
or locks the machine). It also said that it could not intialize the XFS
library.

So I logged out, and issued "reboot" on tty1 as root. No go, it got stuck
somewhere, and I had to hit the physical reset button on the machine. I
have not looked at the logs yet.

I am now running the machine off a live usb stick (13.1 XFCE rescue
system) to avoid the automatics to fsck the home partition, and I already
obtained a xfs_metadump of it.

I post this in case you have some suggestion before I nuke the partition
(rsync, reformat, etc). It shold take some hours.

- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Brian Foster
2014-08-11 14:44:00 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Happened again, I'm on middle of recovery procedures, and using my laptop to
post.
The system did not "die", I could still use xterms owned by root. So I tried
to use xfs_metadump before rebooting, but it refused, said that the
partition was mounted (and I know from previous times that umounting fails
or locks the machine). It also said that it could not intialize the XFS
library.
So I logged out, and issued "reboot" on tty1 as root. No go, it got stuck
somewhere, and I had to hit the physical reset button on the machine. I have
not looked at the logs yet.
I am now running the machine off a live usb stick (13.1 XFCE rescue system)
to avoid the automatics to fsck the home partition, and I already obtained a
xfs_metadump of it.
I post this in case you have some suggestion before I nuke the partition
(rsync, reformat, etc). It shold take some hours.
Assuming you already have a pre-repair metadump, I'd suggest to
xfs_repair, capture and post the repair output to the list and leave it
at that (for now at least). I think you mentioned previously that the
problem hits more frequently at this point, so I wonder if you could try
to reproduce and get a better idea of what might contribute to the
failure.

For example, can you actively reproduce at this point? Perhaps get some
work going on all of the applications you typically have running and run
some hibernation cycles..? While a reformat might spare you from the
issue for a bit, it's going to make it that much harder to get more
information on what's going on.

Brian
- -- Cheers
Carlos E. R.
(from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iF4EAREIAAYFAlPo0c4ACgkQja8UbcUWM1wIewD/eEwnzZpDjJLuytDOD9bqiypF
ly6QCDckRvc2rVuCbwcA/0IX5tXGhAHr6izQvWol3F4RoxLk0uf74Ayn8lvSlDU0
=WAIZ
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Carlos E. R.
2014-08-11 14:58:47 UTC
Permalink
Post by Brian Foster
Post by Carlos E. R.
I post this in case you have some suggestion before I nuke the partition
(rsync, reformat, etc). It shold take some hours.
Assuming you already have a pre-repair metadump, I'd suggest to
xfs_repair, capture and post the repair output to the list and leave it
at that (for now at least). I think you mentioned previously that the
problem hits more frequently at this point, so I wonder if you could try
to reproduce and get a better idea of what might contribute to the
failure.
For example, can you actively reproduce at this point? Perhaps get some
work going on all of the applications you typically have running and run
some hibernation cycles..? While a reformat might spare you from the
issue for a bit, it's going to make it that much harder to get more
information on what's going on.
Ok, will do.

I will create a backup of my partition, with xfsdump, after attempting
repair of the partition, and reboot, and see (without the reformat cycle).

At this instant I'm doing a full dd of the partition, just in case it
becomes useful.

- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Carlos E. R.
2014-08-11 17:05:00 UTC
Permalink
Post by Carlos E. R.
Ok, will do.
I will create a backup of my partition, with xfsdump, after attempting
repair of the partition, and reboot, and see (without the reformat cycle).
At this instant I'm doing a full dd of the partition, just in case it
becomes useful.
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # xfs_repair -V
xfs_repair version 3.1.11

It is a live system, so I acan't update it. If I boot from the main
system, that has a more modern xfs_repair, systemd will attempt mount and
automated repair, and we will not get any logs.

linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # xfs_repair -v /dev/sdd5
Phase 1 - find and verify superblock...
- block cache size set to 753952 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 65662 tail block 65607
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 #


linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # mount -v /dev/sdd5 mnt/
mount: /dev/sdd5 mounted on /run/media/linux/d_storage/xfs_disaster_home/20140811/mnt.
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # umount mnt
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 #

dmesg:

[10266.034290] XFS (sdd5): Mounting Filesystem
[10266.073739] XFS (sdd5): Starting recovery (logdev: internal)
[10266.690325] XFS (sdd5): Ending recovery (logdev: internal)
***@linux:~>

dmesg --ctime

[Mon Aug 11 16:47:12 2014] XFS (sdd5): Mounting Filesystem
[Mon Aug 11 16:47:12 2014] XFS (sdd5): Starting recovery (logdev: internal)
[Mon Aug 11 16:47:12 2014] XFS (sdd5): Ending recovery (logdev: internal)
***@linux:~>



linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # xfs_repair -v /dev/sdd5
Phase 1 - find and verify superblock...
- block cache size set to 753952 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 65700 tail block 65700
- scan filesystem freespace and inode maps...
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
agf_freeblks 27745492, counted 27745496 in ag 1
sb_fdblocks 115565042, counted 115565046
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 2
- agno = 1
- agno = 3
- agno = 0
Phase 5 - rebuild AG headers and trees...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

XFS_REPAIR Summary Mon Aug 11 16:48:08 2014

Phase Start End Duration
Phase 1: 08/11 16:47:49 08/11 16:47:49
Phase 2: 08/11 16:47:49 08/11 16:47:52 3 seconds
Phase 3: 08/11 16:47:52 08/11 16:48:07 15 seconds
Phase 4: 08/11 16:48:07 08/11 16:48:07
Phase 5: 08/11 16:48:07 08/11 16:48:07
Phase 6: 08/11 16:48:07 08/11 16:48:07
Phase 7: 08/11 16:48:07 08/11 16:48:07

Total run time: 18 seconds
done
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 #


I don't understand all it says, but aparently it does not detect any
problem.

dmesg doesn't have any more entries.


Now I'm going to create an xfsdump of it, and reboot, without rebuilding.
Then I'll upload the metadata to google drive.

- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Carlos E. R.
2014-08-11 21:31:44 UTC
Permalink
Post by Carlos E. R.
linux:/run/media/linux/d_storage/xfs_disaster_home/20140811 # xfs_repair -V
xfs_repair version 3.1.11
It is a live system, so I acan't update it. If I boot from the main
...
Post by Carlos E. R.
Now I'm going to create an xfsdump of it, and reboot, without rebuilding.
Then I'll upload the metadata to google drive.
I have just booted the main system, in text mode, logged as root. Look:

Telcontar:/data/storage_d/xfs_disaster_home/20140811 # time xfs_metadump -g -w /dev/sdc5 tgtfile_20140811_obfus_after_repair_bis
Copied 231552 of 231552 inodes (3 of 4 AGs)
xfs_metadump: invalid dqblk inode number (-1)
Copying log

real 0m20.044s
user 0m1.527s
sys 0m1.174s
Telcontar:/data/storage_d/xfs_disaster_home/20140811 #
Telcontar:/data/storage_d/xfs_disaster_home/20140811 # xfs_metadump -V
xfs_metadump version 3.2.1


And that was after running xfs_repair 3.2.1, which found nothing...


Does that give any ideas?


- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Carlos E. R.
2014-08-11 22:01:16 UTC
Permalink
Post by Carlos E. R.
Does that give any ideas?
Which version of Linux?
Telcontar:~ # cat /etc/os-release
NAME=openSUSE
VERSION="13.1 (Bottle)"
VERSION_ID="13.1"
PRETTY_NAME="openSUSE 13.1 (Bottle) (x86_64)"
ID=opensuse
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:opensuse:13.1"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://opensuse.org/"
ID_LIKE="suse"
Telcontar:~ #
Telcontar:~ # uname -a
Linux Telcontar 3.11.10-17-desktop #1 SMP PREEMPT Mon Jun 16 15:28:13 UTC 2014 (fba7c1f) x86_64 x86_64 x86_64 GNU/Linux
Telcontar:~ # rpm -q xfsprogs
xfsprogs-3.2.1-40.1.x86_64
Did you get a metadata dump before the xfs_repair?
Yes, sure. I said so on another post. I'm on the process of starting up
the machine, when I noticed that error:

xfs_metadump: invalid dqblk inode number (-1)


being the first time I see that error, I'm wondering if going ahead with
mounting and using, as explained on other posts today, or wait for
different instructions from you people.

I'll try meanwhile to upload the metadata files using another machine.

- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Mark Tinguely
2014-08-11 14:57:58 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Happened again, I'm on middle of recovery procedures, and using my
laptop to post.
The system did not "die", I could still use xterms owned by root. So I
tried to use xfs_metadump before rebooting, but it refused, said that
the partition was mounted (and I know from previous times that umounting
fails or locks the machine). It also said that it could not intialize
the XFS library.
So I logged out, and issued "reboot" on tty1 as root. No go, it got
stuck somewhere, and I had to hit the physical reset button on the
machine. I have not looked at the logs yet.
I am now running the machine off a live usb stick (13.1 XFCE rescue
system) to avoid the automatics to fsck the home partition, and I
already obtained a xfs_metadump of it.
I post this in case you have some suggestion before I nuke the partition
(rsync, reformat, etc). It shold take some hours.
- -- Cheers
Carlos E. R.
Where in the filesystem did the XFS_WANT_CORRUPTED_GOTO happen?

I am interested in the metadata dump.

Also, some one hit back to back duplicate block allocation
XFS_WANT_CORRUPTED_GOTO bugs, you may want to do a metadata dump before
and after the xfs_repair in case you hit it again soon.

If this is a duplicate block allocation, some user blocks will have
overwritten the metadata.

--Mark.
Carlos E. R.
2014-08-11 15:34:46 UTC
Permalink
Post by Mark Tinguely
Where in the filesystem did the XFS_WANT_CORRUPTED_GOTO happen?
This time?
Did not look at the log yet. Let me see...

Here is the full log of the event. It starts prior to hibernating, all
things nominal. And ends on shutdown (had to hit reset button, despite
what log says). If you want to see entries prior to that, since boot, I
can do that.


<3.6> 2014-08-11 05:15:01 Telcontar systemd 1 - - Starting Session 556 of user news.
<3.6> 2014-08-11 05:18:01 Telcontar systemd 1 - - Starting Session 557 of user news.
<3.6> 2014-08-11 05:20:01 Telcontar systemd 1 - - Starting Session 558 of user cer.
<3.4> 2014-08-11 05:22:25 Telcontar pm-utils - - - Hibernating the system now (04)...
<3.5> 2014-08-11 05:22:25 Telcontar pm-utils - - - There appears not be any pending nntp post to be sent. I just checked :-)
<1.5> 2014-08-11 05:22:25 Telcontar network 5840 - - redirecting to "systemctl --signal=9 kill network.service"
<3.5> 2014-08-11 05:22:25 Telcontar systemd 1 - - ***@eth0.service: main process exited, code=killed, status=9/KILL
<3.6> 2014-08-11 05:22:25 Telcontar systemd 1 - - Stopping LSB: Network time protocol daemon (ntpd)...
<3.6> 2014-08-11 05:22:25 Telcontar ntp 5867 - - Shutting down network time protocol daemon (NTPD)..done
<3.6> 2014-08-11 05:22:25 Telcontar systemd 1 - - Stopped LSB: Network time protocol daemon (ntpd).
<3.4> 2014-08-11 05:22:25 Telcontar pm-utils - - - Hibernating (95)...
<0.7> 2014-08-11 05:22:30 Telcontar kernel - - - [73220.857511] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<0.7> 2014-08-11 05:22:30 Telcontar kernel - - - [73220.857516] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<0.7> 2014-08-11 05:22:30 Telcontar kernel - - - [73220.858132] PM: Basic memory bitmaps created
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73221.946553] Syncing filesystems ... done.
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73222.682396] Freezing user space processes ... (elapsed 0.002 seconds) done.
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73222.685031] PM: Preallocating image memory... done (allocated 1140745 pages)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.046524] PM: Allocated 4562980 kbytes in 5.36 seconds (851.30 MB/s)
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.046645] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.048553] Suspending console(s) (use no_console_suspend to debug)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.049663] serial 00:05: disabled
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.260091] PM: freeze of devices complete after 211.420 msecs
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.260391] PM: late freeze of devices complete after 0.298 msecs
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.260939] PM: noirq freeze of devices complete after 0.545 msecs
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.260940] Disabling non-boot CPUs ...
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.262294] smpboot: CPU 1 is now offline
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.264134] smpboot: CPU 2 is now offline
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.265056] Broke affinity for irq 16
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.266103] smpboot: CPU 3 is now offline
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.266614] PM: Creating hibernation image:
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] PM: Need to copy 920142 pages
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] PM: Normal pages needed: 920142 + 1024, available pages: 1176633
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] microcode: CPU0 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] Enabling non-boot CPUs ...
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.267097] smpboot: Booting Node 0 Processor 1 APIC 0x1
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.280111] microcode: CPU1 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.280300] CPU1 is up
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.280425] smpboot: Booting Node 0 Processor 2 APIC 0x2
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.293688] microcode: CPU2 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.293828] CPU2 is up
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.293918] smpboot: Booting Node 0 Processor 3 APIC 0x3
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.307216] microcode: CPU3 sig=0x1067a, pf=0x10, revision=0xa0b
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.307358] CPU3 is up
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.335219] PM: noirq restore of devices complete after 22.779 msecs
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.335354] PM: early restore of devices complete after 0.110 msecs
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508789] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508809] usb usb3: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508819] uhci_hcd 0000:00:1a.1: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508836] usb usb4: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508844] uhci_hcd 0000:00:1a.2: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508861] usb usb5: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508871] ehci-pci 0000:00:1a.7: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.508889] usb usb1: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510138] uhci_hcd 0000:00:1d.0: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510159] usb usb6: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510168] uhci_hcd 0000:00:1d.1: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510187] usb usb7: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510196] uhci_hcd 0000:00:1d.2: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510215] usb usb8: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510225] ehci-pci 0000:00:1d.7: setting latency timer to 64
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.510235] usb usb2: root hub lost power or was reset
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.512778] ehci-pci 0000:00:1a.7: cache line size of 32 is not supported
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.512784] pci 0000:00:1e.0: setting latency timer to 64
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.512879] ata_piix 0000:00:1f.2: setting latency timer to 64
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.512934] ata_piix 0000:00:1f.5: setting latency timer to 64
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.514123] ehci-pci 0000:00:1d.7: cache line size of 32 is not supported
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611029] pciehp 0000:00:1c.3:pcie04: Device 0000:05:00.0 already exists at 0000:05:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611032] pciehp 0000:00:1c.0:pcie04: Device 0000:02:00.0 already exists at 0000:02:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611035] pciehp 0000:00:1c.2:pcie04: Device 0000:04:00.0 already exists at 0000:04:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611036] pciehp 0000:00:1c.3:pcie04: Cannot add device at 0000:05:00
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611037] pciehp 0000:00:1c.0:pcie04: Cannot add device at 0000:02:00
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611039] pciehp 0000:00:1c.2:pcie04: Cannot add device at 0000:04:00
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611061] pciehp 0000:00:1c.4:pcie04: Device 0000:06:00.0 already exists at 0000:06:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611062] pciehp 0000:00:1c.4:pcie04: Cannot add device at 0000:06:00
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611086] pciehp 0000:00:1c.5:pcie04: Device 0000:07:00.0 already exists at 0000:07:00, cannot hot-add
<0.3> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611087] pciehp 0000:00:1c.5:pcie04: Cannot add device at 0000:07:00
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611172] pata_jmicron 0000:05:00.1: setting latency timer to 64
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.611249] pata_jmicron 0000:04:00.1: setting latency timer to 64
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.614064] serial 00:05: activated
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.775267] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.837013] ata11: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.837220] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.916030] ata2: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.916069] ata1: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.920031] ata4: SATA link down (SStatus 0 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.930018] usb 1-2: reset high-speed USB device number 3 using ehci-pci
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.988036] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.991149] ata12.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.991151] ata12.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.991152] ata12.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73228.997133] ata12.00: configured for UDMA/100
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.069020] firewire_core 0000:08:02.0: rediscovered device fw0
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.074017] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.076182] ata3.00: configured for UDMA/133
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.076210] sd 2:0:0:0: [sda] Starting disk
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.146014] usb 2-5: reset high-speed USB device number 2 using ehci-pci
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.284050] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.284060] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.284167] ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.284177] ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287190] ata9.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287191] ata9.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287241] ata10.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287242] ata10.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287362] ata9.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287364] ata9.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287457] ata10.01: ACPI cmd c6/00:10:00:00:00:b0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.287459] ata10.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293185] ata10.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293186] ata10.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293236] ata9.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293237] ata9.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293378] ata10.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293379] ata10.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293443] ata9.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.293445] ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.302319] ata9.00: configured for UDMA/133
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.308304] ata9.01: configured for UDMA/133
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.308337] sd 8:0:0:0: [sdb] Starting disk
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.308338] sd 8:0:1:0: [sdc] Starting disk
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.318322] ata10.00: configured for UDMA/133
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.324321] ata10.01: configured for UDMA/133
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.324351] sd 9:0:1:0: [sde] Starting disk
<0.5> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.324352] sd 9:0:0:0: [sdd] Starting disk
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73229.512018] usb 3-1: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73230.057013] usb 8-2: reset low-speed USB device number 2 using uhci_hcd
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73230.408094] usb 2-5.4: reset high-speed USB device number 4 using ehci-pci
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73230.798419] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-08-11 15:17:18 Telcontar kernel - - - [73231.245103] PM: restore of devices complete after 2736.365 msecs
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73231.514298] Restarting kernel threads ... done.
<0.4> 2014-08-11 15:17:18 Telcontar kernel - - - [73231.518736] Restarting tasks ... done.
<0.7> 2014-08-11 15:17:18 Telcontar kernel - - - [73231.562307] PM: Basic memory bitmaps freed
<3.4> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - - The canary thread is apparently starving. Taking action.
<3.6> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - - Demoting known real-time threads.
<3.5> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
<3.5> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
<3.5> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
<3.5> 2014-08-11 15:17:19 Telcontar rtkit-daemon 4535 - - Demoted 3 threads.
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.1
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.439809].
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440155] CPU: 0 PID: 6255 Comm: kworker/0:7 Tainted: P O 3.11.10-17-desktop #1
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440322] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440361] Workqueue: xfs-eofblocks/sdd5 xfs_eofblocks_worker [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440364] 0000000000000001 ffffffff815a0402 000000000010c9d3 ffffffffa0c38996
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440365] ffff880211412b00 ffff88023448dd80 ffff88023fb95cb0 0000000000000001
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440366] 0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440367] Call Trace:
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440377] [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440380] [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440382] [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440385] [<ffffffff815a0402>] dump_stack+0x50/0x89
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440399] [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440442] [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440484] [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440534] [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440597] [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440633] [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440662] [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440690] [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440718] [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440737] [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440739] [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440742] [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440746] [<ffffffff815adfbc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<3.6> 2014-08-11 15:17:22 Telcontar systemd 1 - - Time has been changed
<3.6> 2014-08-11 15:17:27 Telcontar acpid - - - 1 client rule loaded
<3.4> 2014-08-11 15:17:29 Telcontar pm-utils - - - Thawing (95)...
<3.5> 2014-08-11 15:17:30 Telcontar dbus 1020 - - [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<3.6> 2014-08-11 15:17:30 Telcontar systemd 1 - - Starting LSB: Network time protocol daemon (ntpd)...
<0.4> 2014-08-11 15:17:30 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
<3.5> 2014-08-11 15:17:31 Telcontar dbus 1020 - - [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
<1.5> 2014-08-11 15:17:31 Telcontar network 6315 - - redirecting to "systemctl restart network.service"
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - - Stopping ifup managed network interface eth1...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - - Stopping ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - - Stopping LSB: Configure network interfaces and set up routing...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - - Starting LSB: Configure network interfaces and set up routing...
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - - touch: cannot touch ‘/dev/.sysconfig/network/tmp/if-eth0.6352’: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - - scripts/functions: line 1221: /dev/.sysconfig/network/tmp/if-eth0.6352.tmp: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - - scripts/functions: line 1239: /dev/.sysconfig/network/tmp/if-eth0.6352.tmp: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - - cat: /dev/.sysconfig/network/tmp/if-eth0.6352: No such file or directory
<3.6> 2014-08-11 15:17:34 Telcontar ntp 6314 - - 11 Aug 15:17:34 sntp[6505]: Started sntp
<3.6> 2014-08-11 15:17:34 Telcontar ifdown 6352 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:34 Telcontar ifdown 6352 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:34 Telcontar ifdown 6351 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:34 Telcontar ifdown 6351 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:34 Telcontar network 6384 - - Setting up network interfaces:
<3.6> 2014-08-11 15:17:34 Telcontar network 6384 - - lo
<1.5> 2014-08-11 15:17:34 Telcontar ifup 6924 - - lo
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - - lo
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - - IP address: 127.0.0.1/8
<3.6> 2014-08-11 15:17:35 Telcontar network 6384 - - lo IP address: 127.0.0.1/8
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - -.
<16.3> 2014-08-11 15:17:38 Telcontar dhcpcd 7162 - - eth1: dhcpcd not running
<16.6> 2014-08-11 15:17:38 Telcontar dhcpcd 7162 - - eth1: exiting
<3.5> 2014-08-11 15:17:38 Telcontar systemd 1 - - Unit ***@eth0.service entered failed state.
<3.6> 2014-08-11 15:17:38 Telcontar systemd 1 - - Starting ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Interface eth0.IPv6 no longer relevant for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Leaving mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Interface eth0.IPv4 no longer relevant for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Withdrawing address record for fc00::14 on eth0.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Withdrawing address record for 192.168.1.14 on eth0.
<3.6> 2014-08-11 15:17:38 Telcontar ifup 7226 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:38 Telcontar ifup 7226 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792336] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792353] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792366] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - New relevant interface eth0.IPv4 for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Registering new address record for 192.168.1.14 on eth0.IPv4.
<3.6> 2014-08-11 15:17:39 Telcontar systemd 1 - - Starting ifup managed network interface eth1...
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - - ifplugd 0.28 initializing.
<0.6> 2014-08-11 15:17:39 Telcontar kernel - - - [73252.646313] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-08-11 15:17:39 Telcontar kernel - - - [73252.646341] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - - Using interface eth1/00:21:85:16:2D:0C with driver <r8169> (version: 2.3LK-NAPI)
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - - Using detection mode: SIOCETHTOOL
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - - Initialization complete, link beat not detected.
<1.5> 2014-08-11 15:17:39 Telcontar ifup 7521 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:39 Telcontar ifup 7521 - - eth1 is controlled by ifplugd
<3.6> 2014-08-11 15:17:39 Telcontar ifup 7521 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:39 Telcontar ifup 7521 - - eth1 is controlled by ifplugd
<3.6> 2014-08-11 15:17:39 Telcontar systemd 1 - - Started ifup managed network interface eth1.
<0.6> 2014-08-11 15:17:40 Telcontar kernel - - - [73253.958299] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-08-11 15:17:40 Telcontar kernel - - - [73253.958306] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - - Joining mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - - New relevant interface eth0.IPv6 for mDNS.
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - - Registering new address record for fe80::221:85ff:fe16:2d0b on eth0.*.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - - Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - - Joining mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - - Registering new address record for fc00::14 on eth0.*.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - - Withdrawing address record for fe80::221:85ff:fe16:2d0b on eth0.
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - 11 Aug 15:17:44 sntp[6505]: Received no useable packet from 192.168.1.15!
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - 11 Aug 15:17:44 sntp[7926]: Started sntp
<3.6> 2014-08-11 15:17:44 Telcontar systemd 1 - - Time has been changed
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - 2014-08-11 15:17:44.656291 (-0100) -0.112718 +/- 0.037338 secs
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - 2014-08-11 15:17:44.604369 (-0100) +0.0081 +/- 0.069473 secs
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - Time synchronized with 0.pool.ntp.org
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - - Setting up rules from /etc/sysconfig/SuSEfirewall2 ...
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - - using default zone 'ext' for interface eth1
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - - Firewall customary rules loaded from /etc/sysconfig/scripts/SuSEfirewall2-custom
<3.5> 2014-08-11 15:17:45 Telcontar ntpd 7991 - - ntpd ***@1.2349-o Tue Jul 22 08:26:41 UTC 2014 (1)
<3.6> 2014-08-11 15:17:45 Telcontar ntp 6314 - - Starting network time protocol daemon (NTPD)..done
<3.6> 2014-08-11 15:17:44 Telcontar systemd 1 - - Time has been changed
<3.6> 2014-08-11 15:17:45 Telcontar systemd 1 - - Started LSB: Network time protocol daemon (ntpd).
<3.5> 2014-08-11 15:17:45 Telcontar ntpd 8017 - - proto: precision = 1.613 usec
<3.7> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen and drop on 1 v6wildcard :: UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 2 lo 127.0.0.1 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 3 eth0 192.168.1.14 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 4 lo ::1 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 5 eth0 fe80::221:85ff:fe16:2d0b UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 6 eth0 fc00::14 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - peers refreshed
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listening on routing socket on fd #23 for interface updates
<3.5> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - logging to file /var/log/ntp
<4.6> 2014-08-11 15:17:48 Telcontar SuSEfirewall2 - - - Firewall rules successfully set
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Found user 'avahi-autoipd' (UID 495) and group 'avahi-autoipd' (GID 491).
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Successfully called chroot().
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Successfully dropped root privileges.
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Starting with address 169.254.3.89
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Routable address already assigned, sleeping.
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - - Started ifup managed network interface eth0.
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - - Started ifup managed network interface eth1.
<3.6> 2014-08-11 15:17:50 Telcontar network 6384 - - ..done..done..done ppp0 Startmode is 'manual' -> skipping
<1.5> 2014-08-11 15:17:50 Telcontar ifup 8500 - - ppp0 Startmode is 'manual' -> skipping
<3.6> 2014-08-11 15:17:50 Telcontar network 6384 - - ..skippedSetting up service network . . . . . . . . . . . . ...done
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - - Started LSB: Configure network interfaces and set up routing.
<3.4> 2014-08-11 15:17:52 Telcontar pm-utils - - - Thawing the system now (04)...
<0.6> 2014-08-11 15:17:55 Telcontar kernel - - - [73268.481672] Chrome_ChildThr[5680]: segfault at 0 ip 00007ffcedf71598 sp 00007ffce1821410 error 6 in libmozalloc.so[7ffcedf7
<0.4> 2014-08-11 15:18:00 Telcontar kernel - - - [73274.336014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:18:01 Telcontar systemd 1 - - Starting Session 559 of user news.
<3.4> 2014-08-11 15:18:16 Telcontar router - - - (Thawing 04) Logging the current IP= 79.159.63.177
<0.4> 2014-08-11 15:18:31 Telcontar kernel - - - [73304.416012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:19:01 Telcontar kernel - - - [73334.496014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:19:31 Telcontar kernel - - - [73364.576016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:20:01 Telcontar kernel - - - [73394.656015] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:20:01 Telcontar systemd 1 - - Starting Session 560 of user cer.
<0.4> 2014-08-11 15:20:31 Telcontar kernel - - - [73424.736049] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:21:01 Telcontar kernel - - - [73454.816016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:21:31 Telcontar kernel - - - [73484.896015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:22:01 Telcontar kernel - - - [73514.976016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:22:31 Telcontar kernel - - - [73545.056018] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:23:01 Telcontar systemd 1 - - Starting Session 561 of user news.
<0.4> 2014-08-11 15:23:01 Telcontar kernel - - - [73575.136025] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:23:31 Telcontar kernel - - - [73605.216014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:23:52 Telcontar smartd 1013 - - Device: /dev/sdb [SAT], Temperature changed -5 Celsius to 33 Celsius (Min/Max 19/38)
<0.4> 2014-08-11 15:24:01 Telcontar kernel - - - [73635.296078] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:24:32 Telcontar kernel - - - [73665.376020] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:25:01 Telcontar systemd 1 - - Starting Session 562 of user news.
<0.4> 2014-08-11 15:25:02 Telcontar kernel - - - [73695.456011] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:25:32 Telcontar kernel - - - [73725.536015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:26:02 Telcontar kernel - - - [73755.616017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:26:32 Telcontar kernel - - - [73785.696017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:27:02 Telcontar kernel - - - [73815.776016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:27:32 Telcontar kernel - - - [73845.856021] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:28:01 Telcontar systemd 1 - - Starting Session 563 of user news.
<0.4> 2014-08-11 15:28:02 Telcontar kernel - - - [73875.936014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:28:32 Telcontar kernel - - - [73906.016015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:29:02 Telcontar kernel - - - [73936.096017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:29:32 Telcontar kernel - - - [73966.176012] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:30:01 Telcontar systemd 1 - - Starting Session 564 of user root.
<3.6> 2014-08-11 15:30:01 Telcontar systemd 1 - - Starting Session 565 of user cer.
<1.6> 2014-08-11 15:30:01 Telcontar run-crons 8974 - - suse.de-snapper: OK
<4.5> 2014-08-11 15:30:01 Telcontar su - - - (to root) root on (null)
<10.3> 2014-08-11 15:30:01 Telcontar su - - - pam_systemd(su-l:session): pam_putenv: delete non-existent entry; XDG_RUNTIME_DIR
<0.4> 2014-08-11 15:30:02 Telcontar kernel - - - [73996.256010] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:30:32 Telcontar kernel - - - [74026.336013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:31:03 Telcontar kernel - - - [74056.416012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:31:33 Telcontar kernel - - - [74086.496011] XFS (sdd5): xfs_log_force: error 5 returned.
<4.5> 2014-08-11 15:31:59 Telcontar gnome-keyring-daemon 4381 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - GLib-GObject: invalid unclassed pointer in cast to 'GkmObject'
<4.3> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - Gkm: gkm_object_expose_full: assertion 'GKM_IS_OBJECT (self)' failed
<4.5> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.5> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - Gkm: couldn't create temporary file for: /home/cer/.gnome2/keyrings/login.keyring: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - couldn't create login keyring: An error occurred on the device
<10.3> 2014-08-11 15:32:00 Telcontar unix2_chkpwd - - - gkr-pam: the password for the login keyring was invalid.
<0.4> 2014-08-11 15:32:03 Telcontar kernel - - - [74116.576018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:32:33 Telcontar kernel - - - [74146.656011] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:33:01 Telcontar systemd 1 - - Starting Session 566 of user news.
<0.4> 2014-08-11 15:33:03 Telcontar kernel - - - [74176.736068] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:33:33 Telcontar kernel - - - [74206.816012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:34:03 Telcontar kernel - - - [74236.896017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:34:33 Telcontar kernel - - - [74266.976014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:35:01 Telcontar systemd 1 - - Starting Session 567 of user news.
<0.4> 2014-08-11 15:35:03 Telcontar kernel - - - [74297.056012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:35:33 Telcontar kernel - - - [74327.136015] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:35:56 Telcontar run-crons 8974 - - leafnode: OK
<3.6> 2014-08-11 15:35:56 Telcontar systemd 1 - - Reloading System Logging Service.
<3.6> 2014-08-11 15:35:57 Telcontar systemd 1 - - Reloaded System Logging Service.
<5.6> 2014-08-11 15:35:57 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
<3.6> 2014-08-11 15:36:02 Telcontar systemd 1 - - Reloading System Logging Service.
<3.6> 2014-08-11 15:36:02 Telcontar systemd 1 - - Reloaded System Logging Service.
<0.4> 2014-08-11 15:36:03 Telcontar kernel - - - [74357.216013] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:36:06 Telcontar run-crons 8974 - - logrotate: OK
<3.2> 2014-08-11 15:36:06 Telcontar mdadm 9290 - - DegradedArray event detected on md device /dev/md0
<1.6> 2014-08-11 15:36:06 Telcontar run-crons 8974 - - mdadm: OK
<4.5> 2014-08-11 15:36:06 Telcontar su - - - (to root) root on (null)
<1.4> 2014-08-11 15:36:25 Telcontar run-crons 8974 - - mlocate.cron returned 143
<1.6> 2014-08-11 15:36:25 Telcontar run-crons 8974 - - packagekit-background.cron: OK
<1.6> 2014-08-11 15:36:26 Telcontar run-crons 8974 - - suse-clean_catman: OK
<0.4> 2014-08-11 15:36:33 Telcontar kernel - - - [74387.296018] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:36:41 Telcontar run-crons 8974 - - suse-do_mandb: OK
<1.6> 2014-08-11 15:36:57 Telcontar run-crons 8974 - - suse-texlive: OK
<1.6> 2014-08-11 15:36:57 Telcontar run-crons 8974 - - suse.cron-sa-update: OK
<1.6> 2014-08-11 15:36:58 Telcontar run-crons 8974 - - suse.de-backup-rc.config: OK
<0.4> 2014-08-11 15:37:04 Telcontar kernel - - - [74417.376010] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-backup-rpmdb: OK
<0.4> 2014-08-11 15:37:34 Telcontar kernel - - - [74447.456013] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-check-battery: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-cron-local: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-faxcron: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-snapper: OK
<3.6> 2014-08-11 15:38:01 Telcontar systemd 1 - - Starting Session 568 of user news.
<0.4> 2014-08-11 15:38:04 Telcontar kernel - - - [74477.536013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:38:34 Telcontar kernel - - - [74507.616019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:39:04 Telcontar kernel - - - [74537.696013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:39:34 Telcontar kernel - - - [74567.776014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:40:01 Telcontar systemd 1 - - Starting Session 569 of user cer.
<0.4> 2014-08-11 15:40:04 Telcontar kernel - - - [74597.856013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:40:34 Telcontar kernel - - - [74627.936021] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:41:04 Telcontar kernel - - - [74658.016012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:41:34 Telcontar kernel - - - [74688.096019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:42:04 Telcontar kernel - - - [74718.176018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:42:34 Telcontar kernel - - - [74748.256017] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:43:01 Telcontar systemd 1 - - Starting Session 570 of user news.
<0.4> 2014-08-11 15:43:04 Telcontar kernel - - - [74778.336012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:43:35 Telcontar kernel - - - [74808.416013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:44:05 Telcontar kernel - - - [74838.496014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:44:35 Telcontar kernel - - - [74868.576013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:45:01 Telcontar systemd 1 - - Starting Session 571 of user root.
<3.6> 2014-08-11 15:45:01 Telcontar systemd 1 - - Starting Session 572 of user news.
<0.4> 2014-08-11 15:45:05 Telcontar kernel - - - [74898.656019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:45:35 Telcontar kernel - - - [74928.736017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:46:05 Telcontar kernel - - - [74958.816015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:46:35 Telcontar kernel - - - [74988.896026] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:47:05 Telcontar kernel - - - [75018.976014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:47:35 Telcontar kernel - - - [75049.056013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:48:01 Telcontar systemd 1 - - Starting Session 573 of user news.
<0.4> 2014-08-11 15:48:05 Telcontar kernel - - - [75079.136016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:48:35 Telcontar kernel - - - [75109.216014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:49:05 Telcontar kernel - - - [75139.296014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:49:36 Telcontar kernel - - - [75169.376013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:50:06 Telcontar kernel - - - [75199.456012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:50:36 Telcontar kernel - - - [75229.536011] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:51:06 Telcontar kernel - - - [75259.616013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.6> 2014-08-11 15:51:09 Telcontar kernel - - - [75262.721354] xfce4-session[4520]: segfault at 8 ip 00000000004164dc sp 00007fffdc291dc0 error 4 in xfce4-session[400000+2b00
<4.6> 2014-08-11 15:51:18 Telcontar systemd-logind 1021 - - Removed session 8.
<10.5> 2014-08-11 15:51:18 Telcontar polkitd 4314 - - Unregistered Authentication Agent for unix-session:10 (system bus name :1.69, object path /org/gnome/PolicyKit1/Authenti
<0.7> 2014-08-11 15:51:28 Telcontar kernel - - - [75282.132776] nvidia 0000:01:00.0: irq 48 for MSI/MSI-X
<3.6> 2014-08-11 15:51:29 Telcontar acpid - - - 1 client rule loaded
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - - Removed session 10.
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.176020] usb 1-6: new high-speed USB device number 4 using ehci-pci
<3.6> 2014-08-11 15:51:30 Telcontar systemd 1 - - Starting Session 574 of user lightdm.
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - - New session 574 of user lightdm.
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - - Linked /tmp/.X11-unix/X0 to /run/user/127/X11-display.
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291822] usb 1-6: New USB device found, idVendor=8564, idProduct=1000
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291825] usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291828] usb 1-6: Product: Mass Storage Device
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291829] usb 1-6: Manufacturer: JetFlash
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291831] usb 1-6: SerialNumber: 346YLQ4L0G5H8S2F
<1.6> 2014-08-11 15:51:31 Telcontar mtp-probe - - - checking bus 1, device 4: "/sys/devices/pci0000:00/0000:00:1a.7/usb1/1-6"
<1.6> 2014-08-11 15:51:31 Telcontar mtp-probe - - - bus: 1, device: 4 was not an MTP device
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667399] usb-storage 1-6:1.0: USB Mass Storage device detected
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667502] scsi12 : usb-storage 1-6:1.0
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667606] usbcore: registered new interface driver usb-storage
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.794904] scsi 12:0:0:0: Direct-Access JetFlash Transcend 4GB 1100 PQ: 0 ANSI: 4
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.794976] scsi 12:0:0:0: alua: supports implicit and explicit TPGS
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796262] scsi 12:0:0:0: alua: No target port descriptors found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796265] scsi 12:0:0:0: alua: not attached
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796396] sd 12:0:0:0: Attached scsi generic sg6 type 0
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796888] sd 12:0:0:0: [sdf] 7913472 512-byte logical blocks: (4.05 GB/3.77 GiB)
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.797634] sd 12:0:0:0: [sdf] Write Protect is off
<0.7> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.797637] sd 12:0:0:0: [sdf] Mode Sense: 43 00 00 00
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.798386] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.798388] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.801508] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.801511] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.802147] sdf: sdf1 sdf2 sdf3
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805642] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805645] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805648] sd 12:0:0:0: [sdf] Attached SCSI removable disk
<0.4> 2014-08-11 15:51:36 Telcontar kernel - - - [75289.696019] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:51:41 Telcontar systemd 1 - - Starting Getty on tty2...
<3.6> 2014-08-11 15:51:41 Telcontar systemd 1 - - Started Getty on tty2.
<3.6> 2014-08-11 15:51:42 Telcontar systemd 1 - - Starting Getty on tty3...
<3.6> 2014-08-11 15:51:42 Telcontar systemd 1 - - Started Getty on tty3.
<3.6> 2014-08-11 15:51:43 Telcontar systemd 1 - - Starting Getty on tty6...
<3.6> 2014-08-11 15:51:43 Telcontar systemd 1 - - Started Getty on tty6.
<3.6> 2014-08-11 15:51:44 Telcontar systemd 1 - - Starting Getty on tty5...
<3.6> 2014-08-11 15:51:44 Telcontar systemd 1 - - Started Getty on tty5.
<3.6> 2014-08-11 15:51:45 Telcontar systemd 1 - - Starting Getty on tty4...
<3.6> 2014-08-11 15:51:45 Telcontar systemd 1 - - Started Getty on tty4.
<0.4> 2014-08-11 15:52:06 Telcontar kernel - - - [75319.776023] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Unmounting /data/raid...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Unmounting /data/cripta...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping /sys/devices/virtual/block/dm-0.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - message repeated 5 times: [ Stopping /sys/devices/virtual/block/dm-0.]
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Session 574 of user lightdm.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped Session 574 of user lightdm.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Session 7 of user root.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped Session 7 of user root.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping user-0.slice.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Removed slice user-0.slice.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Stop Read-Ahead Data Collection 10s After Completed Startup.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped Stop Read-Ahead Data Collection 10s After Completed Startup.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping User Manager for 1000...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping User Manager for 9...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping User Manager for 127...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping CUPS Printing Service...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping ifup managed network interface eth1...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:44 Telcontar systemd 4377 - - message repeated 14 times: [ Time has been changed]
<3.3> 2014-08-11 15:52:08 Telcontar systemd 4377 - - Failed to enqueue exit.target job: Unit exit.target failed to load: Input/output error.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Graphical Interface.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped target Graphical Interface.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: X Display Manager...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping helloworld.service...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped helloworld.service.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Multi-User System.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped target Multi-User System.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: virus scanner daemon...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: Start the hddtemp daemon...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: mdadmd daemon monitoring MD devices...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: This services starts and stops the USB Arbitrator....
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: Supports the direct execution of binary formats....
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: irqbalance daemon providing irq balancing on MP-machines...
<3.6> 2014-08-11 15:52:09 Telcontar systemd 1 - - Stopping LSB: Set up analog joysticks...
<0.4> 2014-08-11 15:52:10 Telcontar kernel - - - [75323.547122] nfsd: last server has exited, flushing export cache
<5.6> 2014-08-11 15:36:02 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
<5.6> 2014-08-11 15:52:11 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] exiting on signal 15.
2014-08-11 15:52:12+02:00 - Halting the system now =========================================== uptime: 15:52pm up 1 day 20:54, 1 user, load average: 5.94, 2.47, 1.22
Post by Mark Tinguely
I am interested in the metadata dump.
Ok, sure, no problem. I'm working on that, but I need to have lunch first ;-)
Post by Mark Tinguely
Also, some one hit back to back duplicate block allocation
XFS_WANT_CORRUPTED_GOTO bugs, you may want to do a metadata dump before and
after the xfs_repair in case you hit it again soon.
I already have a metadata dump, and I have not attempted to repair yet
(I'm doing a full dd copy of partition, and it is 400 Gigs). I will obtain
another metadadump after repair, and I can upload both to google drive.

But first I need sustenance :-)

(At least this time I do not have any pressing thing to do on the
computer...)

- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Brian Foster
2014-08-11 16:14:46 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Post by Mark Tinguely
Where in the filesystem did the XFS_WANT_CORRUPTED_GOTO happen?
This time?
Did not look at the log yet. Let me see...
Here is the full log of the event. It starts prior to hibernating, all
things nominal. And ends on shutdown (had to hit reset button, despite what
log says). If you want to see entries prior to that, since boot, I can do
that.
...
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.1
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.439809].
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440155] CPU: 0 PID: 6255 Comm: kworker/0:7 Tainted: P O 3.11.10-17-desktop #1
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440322] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7516/MS-7516, BIOS V1.5 10/10/2008
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440361] Workqueue: xfs-eofblocks/sdd5 xfs_eofblocks_worker [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440364] 0000000000000001 ffffffff815a0402 000000000010c9d3 ffffffffa0c38996
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440365] ffff880211412b00 ffff88023448dd80 ffff88023fb95cb0 0000000000000001
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440366] 0000000000000000 0000000100000000 0000000000000000 0000000000000001
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440377] [<ffffffff81004a28>] dump_trace+0x88/0x310
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440380] [<ffffffff81004d80>] show_stack_log_lvl+0xd0/0x1d0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440382] [<ffffffff810061bc>] show_stack+0x1c/0x50
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440385] [<ffffffff815a0402>] dump_stack+0x50/0x89
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440399] [<ffffffffa0c38996>] xfs_free_ag_extent+0x226/0x860 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440442] [<ffffffffa0c39fe9>] xfs_free_extent+0xb9/0xf0 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440484] [<ffffffffa0c4c39e>] xfs_bmap_finish+0x11e/0x170 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440534] [<ffffffffa0c6b4c0>] xfs_itruncate_extents+0x190/0x340 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440597] [<ffffffffa0c33633>] xfs_free_eofblocks+0x1e3/0x260 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440633] [<ffffffffa0c291ef>] xfs_inode_free_eofblocks+0x6f/0x150 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440662] [<ffffffffa0c27f82>] xfs_inode_ag_walk.isra.10+0x1c2/0x310 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440690] [<ffffffffa0c28a8e>] xfs_inode_ag_iterator_tag+0x6e/0xb0 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440718] [<ffffffffa0c28d82>] xfs_eofblocks_worker+0x12/0x20 [xfs]
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440737] [<ffffffff8106ac78>] process_one_work+0x168/0x490
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440739] [<ffffffff8106b914>] worker_thread+0x114/0x3a0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440742] [<ffffffff81071c3f>] kthread+0xaf/0xc0
<0.4> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440746] [<ffffffff815adfbc>] ret_from_fork+0x7c/0xb0
<0.5> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected. Shutting down filesystem
<0.1> 2014-08-11 15:17:22 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
This reminds me that it might be interesting to tune the eofblocks
scanner to be more aggressive and see if that helps reproduce. This
thread that's running here normally runs every 5 minutes by default, but
it can be tuned to run at a user-defined interval via the following
/proc file:

# cat /proc/sys/fs/xfs/speculative_prealloc_lifetime
300

I wonder if setting it to 30s or so ('echo 30 > /proc/...') and running
some hibernation cycles would help...

Brian
<3.6> 2014-08-11 15:17:22 Telcontar systemd 1 - - Time has been changed
<3.6> 2014-08-11 15:17:27 Telcontar acpid - - - 1 client rule loaded
<3.4> 2014-08-11 15:17:29 Telcontar pm-utils - - - Thawing (95)...
<3.5> 2014-08-11 15:17:30 Telcontar dbus 1020 - - [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<3.6> 2014-08-11 15:17:30 Telcontar systemd 1 - - Starting LSB: Network time protocol daemon (ntpd)...
<0.4> 2014-08-11 15:17:30 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
<3.5> 2014-08-11 15:17:31 Telcontar dbus 1020 - - [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
<1.5> 2014-08-11 15:17:31 Telcontar network 6315 - - redirecting to "systemctl restart network.service"
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - - Stopping ifup managed network interface eth1...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - - Stopping ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - - Stopping LSB: Configure network interfaces and set up routing...
<3.6> 2014-08-11 15:17:32 Telcontar systemd 1 - - Starting LSB: Configure network interfaces and set up routing...
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - - touch: cannot touch ‘/dev/.sysconfig/network/tmp/if-eth0.6352’: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - - scripts/functions: line 1221: /dev/.sysconfig/network/tmp/if-eth0.6352.tmp: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - - scripts/functions: line 1239: /dev/.sysconfig/network/tmp/if-eth0.6352.tmp: No such file or directory
<3.6> 2014-08-11 15:17:32 Telcontar ifdown 6352 - - cat: /dev/.sysconfig/network/tmp/if-eth0.6352: No such file or directory
<3.6> 2014-08-11 15:17:34 Telcontar ntp 6314 - - 11 Aug 15:17:34 sntp[6505]: Started sntp
<3.6> 2014-08-11 15:17:34 Telcontar ifdown 6352 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:34 Telcontar ifdown 6352 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:34 Telcontar ifdown 6351 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:34 Telcontar ifdown 6351 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:34 Telcontar network 6384 - - lo
<1.5> 2014-08-11 15:17:34 Telcontar ifup 6924 - - lo
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - - lo
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - - IP address: 127.0.0.1/8
<3.6> 2014-08-11 15:17:35 Telcontar network 6384 - - lo IP address: 127.0.0.1/8
<1.5> 2014-08-11 15:17:35 Telcontar ifup 6924 - -.
<16.3> 2014-08-11 15:17:38 Telcontar dhcpcd 7162 - - eth1: dhcpcd not running
<16.6> 2014-08-11 15:17:38 Telcontar dhcpcd 7162 - - eth1: exiting
<3.6> 2014-08-11 15:17:38 Telcontar systemd 1 - - Starting ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Interface eth0.IPv6 no longer relevant for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Leaving mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Interface eth0.IPv4 no longer relevant for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Withdrawing address record for fc00::14 on eth0.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Withdrawing address record for 192.168.1.14 on eth0.
<3.6> 2014-08-11 15:17:38 Telcontar ifup 7226 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:38 Telcontar ifup 7226 - - eth0 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792336] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792353] r8169 0000:06:00.0 eth0: link down
<0.6> 2014-08-11 15:17:38 Telcontar kernel - - - [73251.792366] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.14.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - New relevant interface eth0.IPv4 for mDNS.
<3.6> 2014-08-11 15:17:38 Telcontar avahi-daemon 1007 - - Registering new address record for 192.168.1.14 on eth0.IPv4.
<3.6> 2014-08-11 15:17:39 Telcontar systemd 1 - - Starting ifup managed network interface eth1...
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - - ifplugd 0.28 initializing.
<0.6> 2014-08-11 15:17:39 Telcontar kernel - - - [73252.646313] r8169 0000:07:00.0 eth1: link down
<0.6> 2014-08-11 15:17:39 Telcontar kernel - - - [73252.646341] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - - Using interface eth1/00:21:85:16:2D:0C with driver <r8169> (version: 2.3LK-NAPI)
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - - Using detection mode: SIOCETHTOOL
<3.6> 2014-08-11 15:17:39 Telcontar ifplugd(eth1) 7541 - - Initialization complete, link beat not detected.
<1.5> 2014-08-11 15:17:39 Telcontar ifup 7521 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<1.5> 2014-08-11 15:17:39 Telcontar ifup 7521 - - eth1 is controlled by ifplugd
<3.6> 2014-08-11 15:17:39 Telcontar ifup 7521 - - eth1 device: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 02)
<3.6> 2014-08-11 15:17:39 Telcontar ifup 7521 - - eth1 is controlled by ifplugd
<3.6> 2014-08-11 15:17:39 Telcontar systemd 1 - - Started ifup managed network interface eth1.
<0.6> 2014-08-11 15:17:40 Telcontar kernel - - - [73253.958299] r8169 0000:06:00.0 eth0: link up
<0.6> 2014-08-11 15:17:40 Telcontar kernel - - - [73253.958306] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - - Joining mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - - New relevant interface eth0.IPv6 for mDNS.
<3.6> 2014-08-11 15:17:41 Telcontar avahi-daemon 1007 - - Registering new address record for fe80::221:85ff:fe16:2d0b on eth0.*.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - - Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::221:85ff:fe16:2d0b.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - - Joining mDNS multicast group on interface eth0.IPv6 with address fc00::14.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - - Registering new address record for fc00::14 on eth0.*.
<3.6> 2014-08-11 15:17:42 Telcontar avahi-daemon 1007 - - Withdrawing address record for fe80::221:85ff:fe16:2d0b on eth0.
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - 11 Aug 15:17:44 sntp[6505]: Received no useable packet from 192.168.1.15!
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - 11 Aug 15:17:44 sntp[7926]: Started sntp
<3.6> 2014-08-11 15:17:44 Telcontar systemd 1 - - Time has been changed
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - 2014-08-11 15:17:44.656291 (-0100) -0.112718 +/- 0.037338 secs
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - 2014-08-11 15:17:44.604369 (-0100) +0.0081 +/- 0.069473 secs
<3.6> 2014-08-11 15:17:44 Telcontar ntp 6314 - - Time synchronized with 0.pool.ntp.org
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - - Setting up rules from /etc/sysconfig/SuSEfirewall2 ...
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - - using default zone 'ext' for interface eth1
<4.6> 2014-08-11 15:17:45 Telcontar SuSEfirewall2 - - - Firewall customary rules loaded from /etc/sysconfig/scripts/SuSEfirewall2-custom
<3.6> 2014-08-11 15:17:45 Telcontar ntp 6314 - - Starting network time protocol daemon (NTPD)..done
<3.6> 2014-08-11 15:17:44 Telcontar systemd 1 - - Time has been changed
<3.6> 2014-08-11 15:17:45 Telcontar systemd 1 - - Started LSB: Network time protocol daemon (ntpd).
<3.5> 2014-08-11 15:17:45 Telcontar ntpd 8017 - - proto: precision = 1.613 usec
<3.7> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen and drop on 1 v6wildcard :: UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 2 lo 127.0.0.1 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 3 eth0 192.168.1.14 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 4 lo ::1 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 5 eth0 fe80::221:85ff:fe16:2d0b UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listen normally on 6 eth0 fc00::14 UDP 123
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - peers refreshed
<3.6> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - Listening on routing socket on fd #23 for interface updates
<3.5> 2014-08-11 15:17:46 Telcontar ntpd 8017 - - logging to file /var/log/ntp
<4.6> 2014-08-11 15:17:48 Telcontar SuSEfirewall2 - - - Firewall rules successfully set
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Found user 'avahi-autoipd' (UID 495) and group 'avahi-autoipd' (GID 491).
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Successfully called chroot().
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Successfully dropped root privileges.
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Starting with address 169.254.3.89
<3.6> 2014-08-11 15:17:48 Telcontar avahi-autoipd(eth0) 8434 - - Routable address already assigned, sleeping.
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - - Started ifup managed network interface eth0.
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - - Started ifup managed network interface eth1.
<3.6> 2014-08-11 15:17:50 Telcontar network 6384 - - ..done..done..done ppp0 Startmode is 'manual' -> skipping
<1.5> 2014-08-11 15:17:50 Telcontar ifup 8500 - - ppp0 Startmode is 'manual' -> skipping
<3.6> 2014-08-11 15:17:50 Telcontar network 6384 - - ..skippedSetting up service network . . . . . . . . . . . . ...done
<3.6> 2014-08-11 15:17:50 Telcontar systemd 1 - - Started LSB: Configure network interfaces and set up routing.
<3.4> 2014-08-11 15:17:52 Telcontar pm-utils - - - Thawing the system now (04)...
<0.6> 2014-08-11 15:17:55 Telcontar kernel - - - [73268.481672] Chrome_ChildThr[5680]: segfault at 0 ip 00007ffcedf71598 sp 00007ffce1821410 error 6 in libmozalloc.so[7ffcedf7
<0.4> 2014-08-11 15:18:00 Telcontar kernel - - - [73274.336014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:18:01 Telcontar systemd 1 - - Starting Session 559 of user news.
<3.4> 2014-08-11 15:18:16 Telcontar router - - - (Thawing 04) Logging the current IP= 79.159.63.177
<0.4> 2014-08-11 15:18:31 Telcontar kernel - - - [73304.416012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:19:01 Telcontar kernel - - - [73334.496014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:19:31 Telcontar kernel - - - [73364.576016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:20:01 Telcontar kernel - - - [73394.656015] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:20:01 Telcontar systemd 1 - - Starting Session 560 of user cer.
<0.4> 2014-08-11 15:20:31 Telcontar kernel - - - [73424.736049] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:21:01 Telcontar kernel - - - [73454.816016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:21:31 Telcontar kernel - - - [73484.896015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:22:01 Telcontar kernel - - - [73514.976016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:22:31 Telcontar kernel - - - [73545.056018] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:23:01 Telcontar systemd 1 - - Starting Session 561 of user news.
<0.4> 2014-08-11 15:23:01 Telcontar kernel - - - [73575.136025] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:23:31 Telcontar kernel - - - [73605.216014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:23:52 Telcontar smartd 1013 - - Device: /dev/sdb [SAT], Temperature changed -5 Celsius to 33 Celsius (Min/Max 19/38)
<0.4> 2014-08-11 15:24:01 Telcontar kernel - - - [73635.296078] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:24:32 Telcontar kernel - - - [73665.376020] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:25:01 Telcontar systemd 1 - - Starting Session 562 of user news.
<0.4> 2014-08-11 15:25:02 Telcontar kernel - - - [73695.456011] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:25:32 Telcontar kernel - - - [73725.536015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:26:02 Telcontar kernel - - - [73755.616017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:26:32 Telcontar kernel - - - [73785.696017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:27:02 Telcontar kernel - - - [73815.776016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:27:32 Telcontar kernel - - - [73845.856021] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:28:01 Telcontar systemd 1 - - Starting Session 563 of user news.
<0.4> 2014-08-11 15:28:02 Telcontar kernel - - - [73875.936014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:28:32 Telcontar kernel - - - [73906.016015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:29:02 Telcontar kernel - - - [73936.096017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:29:32 Telcontar kernel - - - [73966.176012] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:30:01 Telcontar systemd 1 - - Starting Session 564 of user root.
<3.6> 2014-08-11 15:30:01 Telcontar systemd 1 - - Starting Session 565 of user cer.
<1.6> 2014-08-11 15:30:01 Telcontar run-crons 8974 - - suse.de-snapper: OK
<4.5> 2014-08-11 15:30:01 Telcontar su - - - (to root) root on (null)
<10.3> 2014-08-11 15:30:01 Telcontar su - - - pam_systemd(su-l:session): pam_putenv: delete non-existent entry; XDG_RUNTIME_DIR
<0.4> 2014-08-11 15:30:02 Telcontar kernel - - - [73996.256010] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:30:32 Telcontar kernel - - - [74026.336013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:31:03 Telcontar kernel - - - [74056.416012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:31:33 Telcontar kernel - - - [74086.496011] XFS (sdd5): xfs_log_force: error 5 returned.
<4.5> 2014-08-11 15:31:59 Telcontar gnome-keyring-daemon 4381 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - GLib-GObject: invalid unclassed pointer in cast to 'GkmObject'
<4.3> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - Gkm: gkm_object_expose_full: assertion 'GKM_IS_OBJECT (self)' failed
<4.5> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.5> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - Gkm: couldn't stat directory: /home/cer/.gnome2/keyrings: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - Gkm: couldn't create temporary file for: /home/cer/.gnome2/keyrings/login.keyring: Input/output error
<4.4> 2014-08-11 15:32:00 Telcontar gnome-keyring-daemon 4381 - - couldn't create login keyring: An error occurred on the device
<10.3> 2014-08-11 15:32:00 Telcontar unix2_chkpwd - - - gkr-pam: the password for the login keyring was invalid.
<0.4> 2014-08-11 15:32:03 Telcontar kernel - - - [74116.576018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:32:33 Telcontar kernel - - - [74146.656011] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:33:01 Telcontar systemd 1 - - Starting Session 566 of user news.
<0.4> 2014-08-11 15:33:03 Telcontar kernel - - - [74176.736068] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:33:33 Telcontar kernel - - - [74206.816012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:34:03 Telcontar kernel - - - [74236.896017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:34:33 Telcontar kernel - - - [74266.976014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:35:01 Telcontar systemd 1 - - Starting Session 567 of user news.
<0.4> 2014-08-11 15:35:03 Telcontar kernel - - - [74297.056012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:35:33 Telcontar kernel - - - [74327.136015] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:35:56 Telcontar run-crons 8974 - - leafnode: OK
<3.6> 2014-08-11 15:35:56 Telcontar systemd 1 - - Reloading System Logging Service.
<3.6> 2014-08-11 15:35:57 Telcontar systemd 1 - - Reloaded System Logging Service.
<5.6> 2014-08-11 15:35:57 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
<3.6> 2014-08-11 15:36:02 Telcontar systemd 1 - - Reloading System Logging Service.
<3.6> 2014-08-11 15:36:02 Telcontar systemd 1 - - Reloaded System Logging Service.
<0.4> 2014-08-11 15:36:03 Telcontar kernel - - - [74357.216013] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:36:06 Telcontar run-crons 8974 - - logrotate: OK
<3.2> 2014-08-11 15:36:06 Telcontar mdadm 9290 - - DegradedArray event detected on md device /dev/md0
<1.6> 2014-08-11 15:36:06 Telcontar run-crons 8974 - - mdadm: OK
<4.5> 2014-08-11 15:36:06 Telcontar su - - - (to root) root on (null)
<1.4> 2014-08-11 15:36:25 Telcontar run-crons 8974 - - mlocate.cron returned 143
<1.6> 2014-08-11 15:36:25 Telcontar run-crons 8974 - - packagekit-background.cron: OK
<1.6> 2014-08-11 15:36:26 Telcontar run-crons 8974 - - suse-clean_catman: OK
<0.4> 2014-08-11 15:36:33 Telcontar kernel - - - [74387.296018] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:36:41 Telcontar run-crons 8974 - - suse-do_mandb: OK
<1.6> 2014-08-11 15:36:57 Telcontar run-crons 8974 - - suse-texlive: OK
<1.6> 2014-08-11 15:36:57 Telcontar run-crons 8974 - - suse.cron-sa-update: OK
<1.6> 2014-08-11 15:36:58 Telcontar run-crons 8974 - - suse.de-backup-rc.config: OK
<0.4> 2014-08-11 15:37:04 Telcontar kernel - - - [74417.376010] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-backup-rpmdb: OK
<0.4> 2014-08-11 15:37:34 Telcontar kernel - - - [74447.456013] XFS (sdd5): xfs_log_force: error 5 returned.
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-check-battery: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-cron-local: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-faxcron: OK
<1.6> 2014-08-11 15:37:34 Telcontar run-crons 8974 - - suse.de-snapper: OK
<3.6> 2014-08-11 15:38:01 Telcontar systemd 1 - - Starting Session 568 of user news.
<0.4> 2014-08-11 15:38:04 Telcontar kernel - - - [74477.536013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:38:34 Telcontar kernel - - - [74507.616019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:39:04 Telcontar kernel - - - [74537.696013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:39:34 Telcontar kernel - - - [74567.776014] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:40:01 Telcontar systemd 1 - - Starting Session 569 of user cer.
<0.4> 2014-08-11 15:40:04 Telcontar kernel - - - [74597.856013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:40:34 Telcontar kernel - - - [74627.936021] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:41:04 Telcontar kernel - - - [74658.016012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:41:34 Telcontar kernel - - - [74688.096019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:42:04 Telcontar kernel - - - [74718.176018] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:42:34 Telcontar kernel - - - [74748.256017] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:43:01 Telcontar systemd 1 - - Starting Session 570 of user news.
<0.4> 2014-08-11 15:43:04 Telcontar kernel - - - [74778.336012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:43:35 Telcontar kernel - - - [74808.416013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:44:05 Telcontar kernel - - - [74838.496014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:44:35 Telcontar kernel - - - [74868.576013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:45:01 Telcontar systemd 1 - - Starting Session 571 of user root.
<3.6> 2014-08-11 15:45:01 Telcontar systemd 1 - - Starting Session 572 of user news.
<0.4> 2014-08-11 15:45:05 Telcontar kernel - - - [74898.656019] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:45:35 Telcontar kernel - - - [74928.736017] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:46:05 Telcontar kernel - - - [74958.816015] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:46:35 Telcontar kernel - - - [74988.896026] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:47:05 Telcontar kernel - - - [75018.976014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:47:35 Telcontar kernel - - - [75049.056013] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:48:01 Telcontar systemd 1 - - Starting Session 573 of user news.
<0.4> 2014-08-11 15:48:05 Telcontar kernel - - - [75079.136016] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:48:35 Telcontar kernel - - - [75109.216014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:49:05 Telcontar kernel - - - [75139.296014] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:49:36 Telcontar kernel - - - [75169.376013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:50:06 Telcontar kernel - - - [75199.456012] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:50:36 Telcontar kernel - - - [75229.536011] XFS (sdd5): xfs_log_force: error 5 returned.
<0.4> 2014-08-11 15:51:06 Telcontar kernel - - - [75259.616013] XFS (sdd5): xfs_log_force: error 5 returned.
<0.6> 2014-08-11 15:51:09 Telcontar kernel - - - [75262.721354] xfce4-session[4520]: segfault at 8 ip 00000000004164dc sp 00007fffdc291dc0 error 4 in xfce4-session[400000+2b00
<4.6> 2014-08-11 15:51:18 Telcontar systemd-logind 1021 - - Removed session 8.
<10.5> 2014-08-11 15:51:18 Telcontar polkitd 4314 - - Unregistered Authentication Agent for unix-session:10 (system bus name :1.69, object path /org/gnome/PolicyKit1/Authenti
<0.7> 2014-08-11 15:51:28 Telcontar kernel - - - [75282.132776] nvidia 0000:01:00.0: irq 48 for MSI/MSI-X
<3.6> 2014-08-11 15:51:29 Telcontar acpid - - - 1 client rule loaded
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - - Removed session 10.
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.176020] usb 1-6: new high-speed USB device number 4 using ehci-pci
<3.6> 2014-08-11 15:51:30 Telcontar systemd 1 - - Starting Session 574 of user lightdm.
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - - New session 574 of user lightdm.
<4.6> 2014-08-11 15:51:30 Telcontar systemd-logind 1021 - - Linked /tmp/.X11-unix/X0 to /run/user/127/X11-display.
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291822] usb 1-6: New USB device found, idVendor=8564, idProduct=1000
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291825] usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291828] usb 1-6: Product: Mass Storage Device
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291829] usb 1-6: Manufacturer: JetFlash
<0.6> 2014-08-11 15:51:30 Telcontar kernel - - - [75284.291831] usb 1-6: SerialNumber: 346YLQ4L0G5H8S2F
<1.6> 2014-08-11 15:51:31 Telcontar mtp-probe - - - checking bus 1, device 4: "/sys/devices/pci0000:00/0000:00:1a.7/usb1/1-6"
<1.6> 2014-08-11 15:51:31 Telcontar mtp-probe - - - bus: 1, device: 4 was not an MTP device
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667399] usb-storage 1-6:1.0: USB Mass Storage device detected
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667502] scsi12 : usb-storage 1-6:1.0
<0.6> 2014-08-11 15:51:31 Telcontar kernel - - - [75284.667606] usbcore: registered new interface driver usb-storage
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.794904] scsi 12:0:0:0: Direct-Access JetFlash Transcend 4GB 1100 PQ: 0 ANSI: 4
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.794976] scsi 12:0:0:0: alua: supports implicit and explicit TPGS
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796262] scsi 12:0:0:0: alua: No target port descriptors found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796265] scsi 12:0:0:0: alua: not attached
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796396] sd 12:0:0:0: Attached scsi generic sg6 type 0
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.796888] sd 12:0:0:0: [sdf] 7913472 512-byte logical blocks: (4.05 GB/3.77 GiB)
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.797634] sd 12:0:0:0: [sdf] Write Protect is off
<0.7> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.797637] sd 12:0:0:0: [sdf] Mode Sense: 43 00 00 00
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.798386] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.798388] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.801508] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.801511] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.6> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.802147] sdf: sdf1 sdf2 sdf3
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805642] sd 12:0:0:0: [sdf] No Caching mode page found
<0.3> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805645] sd 12:0:0:0: [sdf] Assuming drive cache: write through
<0.5> 2014-08-11 15:51:32 Telcontar kernel - - - [75285.805648] sd 12:0:0:0: [sdf] Attached SCSI removable disk
<0.4> 2014-08-11 15:51:36 Telcontar kernel - - - [75289.696019] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:51:41 Telcontar systemd 1 - - Starting Getty on tty2...
<3.6> 2014-08-11 15:51:41 Telcontar systemd 1 - - Started Getty on tty2.
<3.6> 2014-08-11 15:51:42 Telcontar systemd 1 - - Starting Getty on tty3...
<3.6> 2014-08-11 15:51:42 Telcontar systemd 1 - - Started Getty on tty3.
<3.6> 2014-08-11 15:51:43 Telcontar systemd 1 - - Starting Getty on tty6...
<3.6> 2014-08-11 15:51:43 Telcontar systemd 1 - - Started Getty on tty6.
<3.6> 2014-08-11 15:51:44 Telcontar systemd 1 - - Starting Getty on tty5...
<3.6> 2014-08-11 15:51:44 Telcontar systemd 1 - - Started Getty on tty5.
<3.6> 2014-08-11 15:51:45 Telcontar systemd 1 - - Starting Getty on tty4...
<3.6> 2014-08-11 15:51:45 Telcontar systemd 1 - - Started Getty on tty4.
<0.4> 2014-08-11 15:52:06 Telcontar kernel - - - [75319.776023] XFS (sdd5): xfs_log_force: error 5 returned.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Unmounting /data/raid...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Unmounting /data/cripta...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping /sys/devices/virtual/block/dm-0.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - message repeated 5 times: [ Stopping /sys/devices/virtual/block/dm-0.]
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Session 574 of user lightdm.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped Session 574 of user lightdm.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Session 7 of user root.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped Session 7 of user root.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping user-0.slice.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Removed slice user-0.slice.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Stop Read-Ahead Data Collection 10s After Completed Startup.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped Stop Read-Ahead Data Collection 10s After Completed Startup.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping User Manager for 1000...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping User Manager for 9...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping User Manager for 127...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping CUPS Printing Service...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping ifup managed network interface eth1...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping ifup managed network interface eth0...
<3.6> 2014-08-11 15:17:44 Telcontar systemd 4377 - - message repeated 14 times: [ Time has been changed]
<3.3> 2014-08-11 15:52:08 Telcontar systemd 4377 - - Failed to enqueue exit.target job: Unit exit.target failed to load: Input/output error.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Graphical Interface.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped target Graphical Interface.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: X Display Manager...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping helloworld.service...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped helloworld.service.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping Multi-User System.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopped target Multi-User System.
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: virus scanner daemon...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: Start the hddtemp daemon...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: mdadmd daemon monitoring MD devices...
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: This services starts and stops the USB Arbitrator....
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: Supports the direct execution of binary formats....
<3.6> 2014-08-11 15:52:08 Telcontar systemd 1 - - Stopping LSB: irqbalance daemon providing irq balancing on MP-machines...
<3.6> 2014-08-11 15:52:09 Telcontar systemd 1 - - Stopping LSB: Set up analog joysticks...
<0.4> 2014-08-11 15:52:10 Telcontar kernel - - - [75323.547122] nfsd: last server has exited, flushing export cache
<5.6> 2014-08-11 15:36:02 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
<5.6> 2014-08-11 15:52:11 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1081" x-info="http://www.rsyslog.com"] exiting on signal 15.
2014-08-11 15:52:12+02:00 - Halting the system now =========================================== uptime: 15:52pm up 1 day 20:54, 1 user, load average: 5.94, 2.47, 1.22
Post by Mark Tinguely
I am interested in the metadata dump.
Ok, sure, no problem. I'm working on that, but I need to have lunch first ;-)
Post by Mark Tinguely
Also, some one hit back to back duplicate block allocation
XFS_WANT_CORRUPTED_GOTO bugs, you may want to do a metadata dump before
and after the xfs_repair in case you hit it again soon.
I already have a metadata dump, and I have not attempted to repair yet (I'm
doing a full dd copy of partition, and it is 400 Gigs). I will obtain
another metadadump after repair, and I can upload both to google drive.
But first I need sustenance :-)
(At least this time I do not have any pressing thing to do on the
computer...)
- -- Cheers
Carlos E. R.
(from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iF0EAREIAAYFAlPo4p4ACgkQja8UbcUWM1wGAADxAVuTUPkxG+LO29VzehJ8cSPV
uItG/Puu2KbqUeCyXwD/cgu/+F7vhEeU9WEbNP5eifhmyu0T3ByDMtuKp55Rj7A=
=CgSx
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Carlos E. R.
2014-08-11 17:08:53 UTC
Permalink
Post by Brian Foster
This reminds me that it might be interesting to tune the eofblocks
scanner to be more aggressive and see if that helps reproduce. This
thread that's running here normally runs every 5 minutes by default, but
it can be tuned to run at a user-defined interval via the following
# cat /proc/sys/fs/xfs/speculative_prealloc_lifetime
300
I wonder if setting it to 30s or so ('echo 30 > /proc/...') and running
some hibernation cycles would help...
Ok, I can try that.

- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Mark Tinguely
2014-08-11 21:27:12 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Post by Mark Tinguely
Where in the filesystem did the XFS_WANT_CORRUPTED_GOTO happen?
This time?
Did not look at the log yet. Let me see...
Here is the full log of the event. It starts prior to hibernating, all
things nominal. And ends on shutdown (had to hit reset button, despite
what log says). If you want to see entries prior to that, since boot, I
can do that.
...

so XFS gave a forced shutdown after the machine came back from
hibernation. After replaying the log, there were no errors in xfs_repair.

We should have quiesced the metadata/log before freezing xfs. Was there
a lot of items in the log?

--Mark.
Carlos E. R.
2014-08-11 21:50:20 UTC
Permalink
so XFS gave a forced shutdown after the machine came back from hibernation.
After replaying the log, there were no errors in xfs_repair.
We should have quiesced the metadata/log before freezing xfs. Was there a lot
of items in the log?
Sorry, what log? The /var/log/messages file? I posted it in full, from
before the hibernation to powerdown.

- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Mark Tinguely
2014-08-11 21:56:57 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Post by Mark Tinguely
so XFS gave a forced shutdown after the machine came back from
hibernation. After replaying the log, there were no errors in xfs_repair.
We should have quiesced the metadata/log before freezing xfs. Was
there a lot of items in the log?
Sorry, what log? The /var/log/messages file? I posted it in full, from
before the hibernation to powerdown.
- -- Cheers
Carlos E. R.
Sorry, I was referring to the XFS log.

If you had a metadata dump before mounting/xfs_repair, then you can
display the xfs log using the xfs_logprint.

--Mark.
Carlos E. R.
2014-08-11 22:36:31 UTC
Permalink
Post by Mark Tinguely
Post by Carlos E. R.
Post by Mark Tinguely
We should have quiesced the metadata/log before freezing xfs. Was
there a lot of items in the log?
Sorry, what log? The /var/log/messages file? I posted it in full, from
before the hibernation to powerdown.
Sorry, I was referring to the XFS log.
If you had a metadata dump before mounting/xfs_repair, then you can display
the xfs log using the xfs_logprint.
Ah! Ok :-)



Telcontar:/data/storage_d/xfs_disaster_home/20140811 # xfs_logprint -f tgtfile_20140811
xfs_logprint:
data device: 0xffffffffffffffff
log device: 0xffffffffffffffff daddr: 0 length: 820476

cycle: 3 version: 2 lsn: 3,65730 tail_lsn: 3,65561
length of Log Record: 1024 prev offset: 65667 num ops: 10
uuid: 3a35756d-1b63-4b9b-9b3a-c12c8951b678 format: little endian linuxh_size: 32768
- ----------------------------------------------------------------------------
Oper (0): tid: b80486ac len: 0 clientid: TRANS flags: START
- ----------------------------------------------------------------------------
Oper (1): tid: b80486ac len: 16 clientid: TRANS flags: none
TRAN: type: CHECKPOINT tid: b80486ac num_items: 7
- ----------------------------------------------------------------------------
Oper (2): tid: b80486ac len: 56 clientid: TRANS flags: none
INODE: #regs: 3 ino: 0x20fcbc83 flags: 0x5 dsize: 96
blkno: 264281664 len: 16 boff: 768
Oper (3): tid: b80486ac len: 96 clientid: TRANS flags: none
INODE CORE
magic 0x494e mode 0100644 version 2 format 2
nlink 1 uid 1000 gid 100
atime 0x53b9d2fd mtime 0x53c05b34 ctime 0x53c05b34
size 0x90f8 nblocks 0xa extsize 0x0 nextents 0x6
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0xd2610167
Oper (4): tid: b80486ac len: 96 clientid: TRANS flags: none
EXTENTS inode data
- ----------------------------------------------------------------------------
Oper (5): tid: b80486ac len: 56 clientid: TRANS flags: none
INODE: #regs: 2 ino: 0x6048329c flags: 0x1 dsize: 0
blkno: 770365760 len: 16 boff: 7168
Oper (6): tid: b80486ac len: 96 clientid: TRANS flags: none
INODE CORE
magic 0x494e mode 0100600 version 2 format 2
nlink 1 uid 1000 gid 100
atime 0x53c05b25 mtime 0x53c05b34 ctime 0x53c05b34
size 0x0 nblocks 0x0 extsize 0x0 nextents 0x0
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x69e7b261
- ----------------------------------------------------------------------------
Oper (7): tid: b80486ac len: 56 clientid: TRANS flags: none
INODE: #regs: 2 ino: 0x600814ef flags: 0x1 dsize: 0
blkno: 768264816 len: 16 boff: 3840
Oper (8): tid: b80486ac len: 96 clientid: TRANS flags: none
INODE CORE
magic 0x494e mode 0100600 version 2 format 2
nlink 1 uid 1000 gid 100
atime 0x53b6ef00 mtime 0x53c05b34 ctime 0x53c05b34
size 0x1 nblocks 0x1 extsize 0x0 nextents 0x1
naextents 0x0 forkoff 0 dmevmask 0x0 dmstate 0x0
flags 0x0 gen 0x97ccc0ee
- ----------------------------------------------------------------------------
Oper (9): tid: b80486ac len: 0 clientid: TRANS flags: COMMIT

============================================================================
cycle: 3 version: 2 lsn: 3,65733 tail_lsn: 3,65561
length of Log Record: 32256 prev offset: 65730 num ops: 176
uuid: 3a35756d-1b63-4b9b-9b3a-c12c8951b678 format: little endian linux
h_size: 32768
**********************************************************************
* ERROR: data block=379316 *
**********************************************************************
Bad data in log
Telcontar:/data/storage_d/xfs_disaster_home/20140811 #



But I have no idea what any of that means.


Notice that the metadata was obtained using tools version 3.1.11, but the
print above was made using tools version 3.2.1 - in case that has any
relevance.



And, same operation on the metadata obtained after running repairs:



Telcontar:/data/storage_d/xfs_disaster_home/20140811 # xfs_logprint -f tgtfile_20140811_after_repair
xfs_logprint:
data device: 0xffffffffffffffff
log device: 0xffffffffffffffff daddr: 0 length: 820428

Log inconsistent or not a log (last==0, first!=1)
xfs_logprint: after 7 zeroed blocks
**********************************************************************
* ERROR: found data after zeroed blocks block=13 *
**********************************************************************
Bad log - data after zeroed blocks
Telcontar:/data/storage_d/xfs_disaster_home/20140811 # xfs_logprint -f tgtfile_20140811_after_repair_bis
xfs_logprint:
data device: 0xffffffffffffffff
log device: 0xffffffffffffffff daddr: 0 length: 820428

Log inconsistent or not a log (last==0, first!=1)
xfs_logprint: after 7 zeroed blocks
**********************************************************************
* ERROR: found data after zeroed blocks block=13 *
**********************************************************************
Bad log - data after zeroed blocks
Telcontar:/data/storage_d/xfs_disaster_home/20140811 #




Telcontar:/data/storage_d/xfs_disaster_home/20140811 # file
tgtfile_20140811*
tgtfile_20140811: XFS filesystem metadump image
tgtfile_20140811_after_repair: XFS filesystem metadump image
tgtfile_20140811_after_repair_bis: XFS filesystem metadump image
tgtfile_20140811_obfus: XFS filesystem metadump image
tgtfile_20140811_obfus_after_repair: XFS filesystem metadump image
tgtfile_20140811_obfus_after_repair_bis: XFS filesystem metadump image
Telcontar:/data/storage_d/xfs_disaster_home/20140811 #


tgtfile_20140811 is the metadata obtained before any mount or
repairs, using tools 3.1.11.

tgtfile_20140811_after_repair is the metadata obtained after
mount and repair, using tools 3.1.11.

tgtfile_20140811_after_repair_bis is the metadata obtained after
mount and repair, using tools 3.2.1



I will now attempt to upload the three obfuscated files. Sizes are quite
different, after compression:

Telcontar:/data/storage_d/xfs_disaster_home/20140811/tmp # ls -lh
total 51M
26M Aug 11 16:21 tgtfile_20140811_obfus.xz
13M Aug 11 18:54 tgtfile_20140811_obfus_after_repair.xz
13M Aug 11 23:16 tgtfile_20140811_obfus_after_repair_bis.xz

but all of them are about 401M before compression. The upload will take
long, my ADSL upload is 0.3M/s at most.


- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Carlos E. R.
2014-08-12 00:17:00 UTC
Permalink
Post by Carlos E. R.
but all of them are about 401M before compression. The upload will take
long, my ADSL upload is 0.3M/s at most.
I have shared (view) on google drive a folder with the three files. Both
Brian Foster and Mark Tinguely should have got a link on the mail from me.
If somebody else wants access, just tell me.

- --
Cheers
Carlos E. R.

(from 13.1 x86_64 "Bottle" (Minas Tirith))
Brian Foster
2014-08-12 16:51:43 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Post by Carlos E. R.
but all of them are about 401M before compression. The upload will take
long, my ADSL upload is 0.3M/s at most.
I have shared (view) on google drive a folder with the three files. Both
Brian Foster and Mark Tinguely should have got a link on the mail from me.
If somebody else wants access, just tell me.
I see the same thing from repair that was in your repair output:

block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2

If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.

Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.

Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).

Given the failure occurs on freeing an extent via the xfs_eofblocks
scanner, perhaps this extent was initially allocated as speculative
preallocation and the eofblocks scanner is where we happen to first
identify the corrupted cntbt. What is strange is that, as mentioned
previously, the space appears to be free if I zero the log, so that
means it was probably free before the freeze. It seems highly unlikely
for a file to gain preallocation, be written out and then get trimmed by
the scanner all on wake-from-hibernate.

Carlos,

How long after hibernate does the shutdown/crash typically occur? Do you
basically wake-up and within a few seconds the filesystem crashes, or is
it some time (minutes) later?

If the former, I wonder if it's possible that the scanner returns to
life pointing to a stale or freed incore inode and does something bogus
based on that.

Brian
- -- Cheers
Carlos E. R.
(from 13.1 x86_64 "Bottle" (Minas Tirith))
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iF4EAREIAAYFAlPpXQYACgkQja8UbcUWM1wQ9gEAl1WI24UDArdlWHh3J2ih3AV3
nMTwDRqTrT0Rk2BJOB8A/1BOzzn3/IX16sPCsYoqGEyXNHcNXWBHENShlyWzJGUr
=W+BG
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Carlos E. R.
2014-08-12 21:17:36 UTC
Permalink
Post by Carlos E. R.
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
Is it possible to find out what file uses that block?
I have a non-obfuscated copy of the metadata. Knowing the file, we can
know what application is involved - and that might help, or perhaps not.
Post by Carlos E. R.
If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.
Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.
I'm not sure if I follow you.

The sequence of events here is:

a) hibernate
b) thaw
c) immediately, in memory corruption found and kernel error message.
Filesystem is switched to read only.
System is unstable, has to be halted or rebooted.
Umount is impossible.

d) (¬) Reboot
e) Mount (¬), manual umount, xfs_repair (¬), mount
(photos of metadata taken at the appropriate points (marked with ¬))


This the point I'm at now. Are you saying that the filesystem can explode
at any time now? I have not written any files, beyond what the desktop
does automatically.



What I have not done (on your request), this time, is:

f) backup, format, restore.
Post by Carlos E. R.
Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).
Given the failure occurs on freeing an extent via the xfs_eofblocks
scanner, perhaps this extent was initially allocated as speculative
preallocation and the eofblocks scanner is where we happen to first
identify the corrupted cntbt. What is strange is that, as mentioned
previously, the space appears to be free if I zero the log, so that
means it was probably free before the freeze. It seems highly unlikely
for a file to gain preallocation, be written out and then get trimmed by
the scanner all on wake-from-hibernate.
Well, I understand little of that, but if you do, and can do whatever
modifications need to be done to the code, that's fine with me :-)
Post by Carlos E. R.
Carlos,
How long after hibernate does the shutdown/crash typically occur? Do you
basically wake-up and within a few seconds the filesystem crashes, or is
it some time (minutes) later?
Instantly during the wake-up (thaw), according to the log.

I'm typically not present when it happens: my routine is switch on the
computer, then go make coffee/tea, and then return and start using the
machine. It takes a minute or two to wake up from hibernation, and then
the machine is sluggish for a minute or two more while processes start
doing things and claiming chunks from swap, mail is fetched, etc.

And instead of starting work, I find the machine in a bad state.


Look, an excerpt from the last event (the full log is in another post
yesterday), but taken from another log file with finer grained timestaps:


<30>1 2014-08-11T05:22:25.861413+02:00 Telcontar ntp 5867 - - Shutting down network time protocol daemon (NTPD)..done
<30>1 2014-08-11T05:22:25.917520+02:00 Telcontar systemd 1 - - Stopped LSB: Network time protocol daemon (ntpd).
<28>1 2014-08-11T05:22:25.977431+02:00 Telcontar pm-utils - - - Hibernating (95)...
<7>1 2014-08-11T05:22:30.605714+02:00 Telcontar kernel - - - [73220.857511] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<7>1 2014-08-11T05:22:30.605728+02:00 Telcontar kernel - - - [73220.857516] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<7>1 2014-08-11T05:22:30.605729+02:00 Telcontar kernel - - - [73220.858132] PM: Basic memory bitmaps created
<4>1 2014-08-11T15:17:18.911655+02:00 Telcontar kernel - - - [73221.946553] Syncing filesystems ... done.
<4>1 2014-08-11T15:17:18.911744+02:00 Telcontar kernel - - - [73222.682396] Freezing user space processes ... (elapsed 0.002 seconds) done.
<6>1 2014-08-11T15:17:18.911746+02:00 Telcontar kernel - - - [73222.685031] PM: Preallocating image memory... done (allocated 1140745 pages)


The "Hibernating (95)" is written by a script of mine in
"/etc/pm/sleep.d/95cosas" which main purpose is to write to the log that
line.

Then the machine wakes up, hours later - despite the timestamp not saying
so (the time jump is written instead lines above):


<6>1 2014-08-11T15:17:18.911768+02:00 Telcontar kernel - - - [73228.307358] CPU3 is up
<6>1 2014-08-11T15:17:18.911769+02:00 Telcontar kernel - - - [73228.335219] PM: noirq restore of devices complete after 22.779 msecs
<6>1 2014-08-11T15:17:18.911770+02:00 Telcontar kernel - - - [73228.335354] PM: early restore of devices complete after 0.110 msecs
<7>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508789] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<4>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508809] usb usb3: root hub lost power or was reset

...


<6>1 2014-08-11T15:17:18.911838+02:00 Telcontar kernel - - - [73230.798419] r8169 0000:06:00.0 eth0: link up
<6>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.245103] PM: restore of devices complete after 2736.365 msecs
<4>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.514298] Restarting kernel threads ... done.
<4>1 2014-08-11T15:17:18.911842+02:00 Telcontar kernel - - - [73231.518736] Restarting tasks ... done.
<7>1 2014-08-11T15:17:18.911843+02:00 Telcontar kernel - - - [73231.562307] PM: Basic memory bitmaps freed
<28>1 2014-08-11T15:17:19.946945+02:00 Telcontar rtkit-daemon 4535 - - The canary thread is apparently starving. Taking action.
<30>1 2014-08-11T15:17:19.947259+02:00 Telcontar rtkit-daemon 4535 - - Demoting known real-time threads.
<29>1 2014-08-11T15:17:19.951276+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951546+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951799+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.952033+02:00 Telcontar rtkit-daemon 4535 - - Demoted 3 threads.
<20>1 2014-08-11T15:17:20.808125+02:00 Telcontar dovecot - - - imap: Warning: Time jumped forwards 33996 seconds
<20>1 2014-08-11T15:17:20.840771+02:00 Telcontar dovecot - - - imap: Warning: Time jumped forwards 35660 seconds
<22>1 2014-08-11T15:17:20.841006+02:00 Telcontar dovecot - - - imap(cer): Disconnected for inactivity in=237010 out=9273919
<1>1 2014-08-11T15:17:22.173611+02:00 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
<1>1 2014-08-11T15:17:22.173625+02:00 Telcontar kernel - - - [73235.439809]


...


<5>1 2014-08-11T15:17:22.174493+02:00 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c4c3d8
<1>1 2014-08-11T15:17:22.232589+02:00 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected. Shutting down filesystem
<1>1 2014-08-11T15:17:22.232594+02:00 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<30>1 2014-08-11T15:17:22.716184+02:00 Telcontar systemd 1 - - Time has been changed
<30>1 2014-08-11T15:17:27.171188+02:00 Telcontar acpid - - - 1 client rule loaded
<28>1 2014-08-11T15:17:29.413944+02:00 Telcontar pm-utils - - - Thawing (95)...
<29>1 2014-08-11T15:17:30.048264+02:00 Telcontar dbus 1020 - - [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<30>1 2014-08-11T15:17:30.833496+02:00 Telcontar systemd 1 - - Starting LSB: Network time protocol daemon (ntpd)...
<4>1 2014-08-11T15:17:30.990470+02:00 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
<29>1 2014-08-11T15:17:31.324585+02:00 Telcontar dbus 1020 - - [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid


As you see, the corruption is detected instantly after waking up, before
pm-utils scripts have a chance to run.
Post by Carlos E. R.
If the former, I wonder if it's possible that the scanner returns to
life pointing to a stale or freed incore inode and does something bogus
based on that.
Well, as I said, that's above my understanding ;-)



- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Brian Foster
2014-08-13 12:04:51 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Carlos E. R.
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
Is it possible to find out what file uses that block?
I have a non-obfuscated copy of the metadata. Knowing the file, we can know
what application is involved - and that might help, or perhaps not.
I don't see how given the current situation. The space appears to be
free initially, so zeroing the log contents on repair puts the fs in a
state where the space is not allocated to any particular file. Perhaps
there is some incremental state created by the log that can provide this
information (e.g., space is free, space is preallocated, extent is
converted, eofblocks are trimmed all in a single checkpoint), but that
could be difficult to trace back since iirc the btree had grown as well.
Post by Carlos E. R.
If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.
Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.
I'm not sure if I follow you.
a) hibernate
b) thaw
c) immediately, in memory corruption found and kernel error message.
Filesystem is switched to read only.
System is unstable, has to be halted or rebooted.
Umount is impossible.
Ok, so the crash is fairly immediate after the wake (also according to
the log output below).
d) (¬) Reboot
e) Mount (¬), manual umount, xfs_repair (¬), mount
(photos of metadata taken at the appropriate points (marked with ¬))
This the point I'm at now. Are you saying that the filesystem can explode at
any time now? I have not written any files, beyond what the desktop does
automatically.
No, the filesystem has been fixed by repair. I'm just saying that
somehow the fs creates a duplicate free space record in one of the free
space trees. That particular condition means it's only a matter of time
before some block allocation operation trips up on that inconsistent
state and shuts down the fs. You happen to hit it immediately due to
that space being involved with speculative preallocation.

The current theory is that this is probably due to XFS workqueues not
being freezable, and therefore can make changes on disk after the dump
image is created. This seems logical to me, but I'd still like to see
some kind of verification of the potential fix if possible. I can repeat
some vm hibernate testing with that in mind. Alternatively, would you
have the ability to test a patch? Have you been able to reproduce this
again since the most recent instance?

Brian
f) backup, format, restore.
Post by Carlos E. R.
Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).
Given the failure occurs on freeing an extent via the xfs_eofblocks
scanner, perhaps this extent was initially allocated as speculative
preallocation and the eofblocks scanner is where we happen to first
identify the corrupted cntbt. What is strange is that, as mentioned
previously, the space appears to be free if I zero the log, so that
means it was probably free before the freeze. It seems highly unlikely
for a file to gain preallocation, be written out and then get trimmed by
the scanner all on wake-from-hibernate.
Well, I understand little of that, but if you do, and can do whatever
modifications need to be done to the code, that's fine with me :-)
Post by Carlos E. R.
Carlos,
How long after hibernate does the shutdown/crash typically occur? Do you
basically wake-up and within a few seconds the filesystem crashes, or is
it some time (minutes) later?
Instantly during the wake-up (thaw), according to the log.
I'm typically not present when it happens: my routine is switch on the
computer, then go make coffee/tea, and then return and start using the
machine. It takes a minute or two to wake up from hibernation, and then the
machine is sluggish for a minute or two more while processes start doing
things and claiming chunks from swap, mail is fetched, etc.
And instead of starting work, I find the machine in a bad state.
Look, an excerpt from the last event (the full log is in another post
<30>1 2014-08-11T05:22:25.861413+02:00 Telcontar ntp 5867 - - Shutting down network time protocol daemon (NTPD)..done
<30>1 2014-08-11T05:22:25.917520+02:00 Telcontar systemd 1 - - Stopped LSB: Network time protocol daemon (ntpd).
<28>1 2014-08-11T05:22:25.977431+02:00 Telcontar pm-utils - - - Hibernating (95)...
<7>1 2014-08-11T05:22:30.605714+02:00 Telcontar kernel - - - [73220.857511] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<7>1 2014-08-11T05:22:30.605728+02:00 Telcontar kernel - - - [73220.857516] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<7>1 2014-08-11T05:22:30.605729+02:00 Telcontar kernel - - - [73220.858132] PM: Basic memory bitmaps created
<4>1 2014-08-11T15:17:18.911655+02:00 Telcontar kernel - - - [73221.946553] Syncing filesystems ... done.
<4>1 2014-08-11T15:17:18.911744+02:00 Telcontar kernel - - - [73222.682396] Freezing user space processes ... (elapsed 0.002 seconds) done.
<6>1 2014-08-11T15:17:18.911746+02:00 Telcontar kernel - - - [73222.685031] PM: Preallocating image memory... done (allocated 1140745 pages)
The "Hibernating (95)" is written by a script of mine in
"/etc/pm/sleep.d/95cosas" which main purpose is to write to the log that
line.
Then the machine wakes up, hours later - despite the timestamp not saying so
<6>1 2014-08-11T15:17:18.911768+02:00 Telcontar kernel - - - [73228.307358] CPU3 is up
<6>1 2014-08-11T15:17:18.911769+02:00 Telcontar kernel - - - [73228.335219] PM: noirq restore of devices complete after 22.779 msecs
<6>1 2014-08-11T15:17:18.911770+02:00 Telcontar kernel - - - [73228.335354] PM: early restore of devices complete after 0.110 msecs
<7>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508789] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<4>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508809] usb usb3: root hub lost power or was reset
...
<6>1 2014-08-11T15:17:18.911838+02:00 Telcontar kernel - - - [73230.798419] r8169 0000:06:00.0 eth0: link up
<6>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.245103] PM: restore of devices complete after 2736.365 msecs
<4>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.514298] Restarting kernel threads ... done.
<4>1 2014-08-11T15:17:18.911842+02:00 Telcontar kernel - - - [73231.518736] Restarting tasks ... done.
<7>1 2014-08-11T15:17:18.911843+02:00 Telcontar kernel - - - [73231.562307] PM: Basic memory bitmaps freed
<28>1 2014-08-11T15:17:19.946945+02:00 Telcontar rtkit-daemon 4535 - - The canary thread is apparently starving. Taking action.
<30>1 2014-08-11T15:17:19.947259+02:00 Telcontar rtkit-daemon 4535 - - Demoting known real-time threads.
<29>1 2014-08-11T15:17:19.951276+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951546+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951799+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.952033+02:00 Telcontar rtkit-daemon 4535 - - Demoted 3 threads.
<20>1 2014-08-11T15:17:20.808125+02:00 Telcontar dovecot - - - imap: Warning: Time jumped forwards 33996 seconds
<20>1 2014-08-11T15:17:20.840771+02:00 Telcontar dovecot - - - imap: Warning: Time jumped forwards 35660 seconds
<22>1 2014-08-11T15:17:20.841006+02:00 Telcontar dovecot - - - imap(cer): Disconnected for inactivity in=237010 out=9273919
<1>1 2014-08-11T15:17:22.173611+02:00 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
<1>1 2014-08-11T15:17:22.173625+02:00 Telcontar kernel - - - [73235.439809]
...
<5>1 2014-08-11T15:17:22.174493+02:00 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c4c3d8
<1>1 2014-08-11T15:17:22.232589+02:00 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected. Shutting down filesystem
<1>1 2014-08-11T15:17:22.232594+02:00 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<30>1 2014-08-11T15:17:22.716184+02:00 Telcontar systemd 1 - - Time has been changed
<30>1 2014-08-11T15:17:27.171188+02:00 Telcontar acpid - - - 1 client rule loaded
<28>1 2014-08-11T15:17:29.413944+02:00 Telcontar pm-utils - - - Thawing (95)...
<29>1 2014-08-11T15:17:30.048264+02:00 Telcontar dbus 1020 - - [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<30>1 2014-08-11T15:17:30.833496+02:00 Telcontar systemd 1 - - Starting LSB: Network time protocol daemon (ntpd)...
<4>1 2014-08-11T15:17:30.990470+02:00 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
<29>1 2014-08-11T15:17:31.324585+02:00 Telcontar dbus 1020 - - [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
As you see, the corruption is detected instantly after waking up, before
pm-utils scripts have a chance to run.
Post by Carlos E. R.
If the former, I wonder if it's possible that the scanner returns to
life pointing to a stale or freed incore inode and does something bogus
based on that.
Well, as I said, that's above my understanding ;-)
- -- Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEARECAAYFAlPqhHwACgkQtTMYHG2NR9WmrwCglBRRHEMgU9mCEHkU9iHqYehX
+1AAn2oUn8/M3Rfb7mLWapLqYxDfvHNv
=9Yft
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Mark Tinguely
2014-08-13 13:29:41 UTC
Permalink
Post by Brian Foster
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Carlos E. R.
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
Is it possible to find out what file uses that block?
I have a non-obfuscated copy of the metadata. Knowing the file, we can know
what application is involved - and that might help, or perhaps not.
I don't see how given the current situation. The space appears to be
free initially, so zeroing the log contents on repair puts the fs in a
state where the space is not allocated to any particular file. Perhaps
there is some incremental state created by the log that can provide this
information (e.g., space is free, space is preallocated, extent is
converted, eofblocks are trimmed all in a single checkpoint), but that
could be difficult to trace back since iirc the btree had grown as well.
Post by Carlos E. R.
If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.
Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.
I'm not sure if I follow you.
a) hibernate
b) thaw
c) immediately, in memory corruption found and kernel error message.
Filesystem is switched to read only.
System is unstable, has to be halted or rebooted.
Umount is impossible.
Ok, so the crash is fairly immediate after the wake (also according to
the log output below).
d) (¬) Reboot
e) Mount (¬), manual umount, xfs_repair (¬), mount
(photos of metadata taken at the appropriate points (marked with ¬))
This the point I'm at now. Are you saying that the filesystem can explode at
any time now? I have not written any files, beyond what the desktop does
automatically.
No, the filesystem has been fixed by repair. I'm just saying that
somehow the fs creates a duplicate free space record in one of the free
space trees. That particular condition means it's only a matter of time
before some block allocation operation trips up on that inconsistent
state and shuts down the fs. You happen to hit it immediately due to
that space being involved with speculative preallocation.
The current theory is that this is probably due to XFS workqueues not
being freezable, and therefore can make changes on disk after the dump
image is created. This seems logical to me, but I'd still like to see
some kind of verification of the potential fix if possible. I can repeat
some vm hibernate testing with that in mind. Alternatively, would you
have the ability to test a patch? Have you been able to reproduce this
again since the most recent instance?
Brian
I am still digging through the xfs log:
I do not see anything in that extent range 46162829-46162839 being
freed in the log. (or anything close to it).

Late in the log, there is a write (op 27 of tid e9f15120) of a big
portion of the interested AG1 cnt btree. So we know that it is good at
that point.

The the next two writes (op 66 of tid 6ed362ea and op 25 of tid
6281c8b) that write entry "8d63c000 a000000" to that block are the
beginning of the 16 byte log write. Depending on the offset, it is
possible that one of these writes could insert a duplicate entry.

I will chase it further and see where and why this duplicate happens
from a log perspective.

--Mark.
Post by Brian Foster
f) backup, format, restore.
Post by Carlos E. R.
Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).
Given the failure occurs on freeing an extent via the xfs_eofblocks
scanner, perhaps this extent was initially allocated as speculative
preallocation and the eofblocks scanner is where we happen to first
identify the corrupted cntbt. What is strange is that, as mentioned
previously, the space appears to be free if I zero the log, so that
means it was probably free before the freeze. It seems highly unlikely
for a file to gain preallocation, be written out and then get trimmed by
the scanner all on wake-from-hibernate.
Well, I understand little of that, but if you do, and can do whatever
modifications need to be done to the code, that's fine with me :-)
Post by Carlos E. R.
Carlos,
How long after hibernate does the shutdown/crash typically occur? Do you
basically wake-up and within a few seconds the filesystem crashes, or is
it some time (minutes) later?
Instantly during the wake-up (thaw), according to the log.
I'm typically not present when it happens: my routine is switch on the
computer, then go make coffee/tea, and then return and start using the
machine. It takes a minute or two to wake up from hibernation, and then the
machine is sluggish for a minute or two more while processes start doing
things and claiming chunks from swap, mail is fetched, etc.
And instead of starting work, I find the machine in a bad state.
Look, an excerpt from the last event (the full log is in another post
<30>1 2014-08-11T05:22:25.861413+02:00 Telcontar ntp 5867 - - Shutting down network time protocol daemon (NTPD)..done
<30>1 2014-08-11T05:22:25.917520+02:00 Telcontar systemd 1 - - Stopped LSB: Network time protocol daemon (ntpd).
<28>1 2014-08-11T05:22:25.977431+02:00 Telcontar pm-utils - - - Hibernating (95)...
<7>1 2014-08-11T05:22:30.605714+02:00 Telcontar kernel - - - [73220.857511] PM: Marking nosave pages: [mem 0x0009f000-0x000fffff]
<7>1 2014-08-11T05:22:30.605728+02:00 Telcontar kernel - - - [73220.857516] PM: Marking nosave pages: [mem 0xbff90000-0xffffffff]
<7>1 2014-08-11T05:22:30.605729+02:00 Telcontar kernel - - - [73220.858132] PM: Basic memory bitmaps created
<4>1 2014-08-11T15:17:18.911655+02:00 Telcontar kernel - - - [73221.946553] Syncing filesystems ... done.
<4>1 2014-08-11T15:17:18.911744+02:00 Telcontar kernel - - - [73222.682396] Freezing user space processes ... (elapsed 0.002 seconds) done.
<6>1 2014-08-11T15:17:18.911746+02:00 Telcontar kernel - - - [73222.685031] PM: Preallocating image memory... done (allocated 1140745 pages)
The "Hibernating (95)" is written by a script of mine in
"/etc/pm/sleep.d/95cosas" which main purpose is to write to the log that
line.
Then the machine wakes up, hours later - despite the timestamp not saying so
<6>1 2014-08-11T15:17:18.911768+02:00 Telcontar kernel - - - [73228.307358] CPU3 is up
<6>1 2014-08-11T15:17:18.911769+02:00 Telcontar kernel - - - [73228.335219] PM: noirq restore of devices complete after 22.779 msecs
<6>1 2014-08-11T15:17:18.911770+02:00 Telcontar kernel - - - [73228.335354] PM: early restore of devices complete after 0.110 msecs
<7>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508789] uhci_hcd 0000:00:1a.0: setting latency timer to 64
<4>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508809] usb usb3: root hub lost power or was reset
...
<6>1 2014-08-11T15:17:18.911838+02:00 Telcontar kernel - - - [73230.798419] r8169 0000:06:00.0 eth0: link up
<6>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.245103] PM: restore of devices complete after 2736.365 msecs
<4>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.514298] Restarting kernel threads ... done.
<4>1 2014-08-11T15:17:18.911842+02:00 Telcontar kernel - - - [73231.518736] Restarting tasks ... done.
<7>1 2014-08-11T15:17:18.911843+02:00 Telcontar kernel - - - [73231.562307] PM: Basic memory bitmaps freed
<28>1 2014-08-11T15:17:19.946945+02:00 Telcontar rtkit-daemon 4535 - - The canary thread is apparently starving. Taking action.
<30>1 2014-08-11T15:17:19.947259+02:00 Telcontar rtkit-daemon 4535 - - Demoting known real-time threads.
<29>1 2014-08-11T15:17:19.951276+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951546+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951799+02:00 Telcontar rtkit-daemon 4535 - - Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.952033+02:00 Telcontar rtkit-daemon 4535 - - Demoted 3 threads.
<20>1 2014-08-11T15:17:20.808125+02:00 Telcontar dovecot - - - imap: Warning: Time jumped forwards 33996 seconds
<20>1 2014-08-11T15:17:20.840771+02:00 Telcontar dovecot - - - imap: Warning: Time jumped forwards 35660 seconds
<22>1 2014-08-11T15:17:20.841006+02:00 Telcontar dovecot - - - imap(cer): Disconnected for inactivity in=237010 out=9273919
<1>1 2014-08-11T15:17:22.173611+02:00 Telcontar kernel - - - [73235.439809] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
<1>1 2014-08-11T15:17:22.173625+02:00 Telcontar kernel - - - [73235.439809]
...
<5>1 2014-08-11T15:17:22.174493+02:00 Telcontar kernel - - - [73235.440751] XFS (sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c. Return address = 0xffffffffa0c4c3d8
<1>1 2014-08-11T15:17:22.232589+02:00 Telcontar kernel - - - [73235.498979] XFS (sdd5): Corruption of in-memory data detected. Shutting down filesystem
<1>1 2014-08-11T15:17:22.232594+02:00 Telcontar kernel - - - [73235.499136] XFS (sdd5): Please umount the filesystem and rectify the problem(s)
<30>1 2014-08-11T15:17:22.716184+02:00 Telcontar systemd 1 - - Time has been changed
<30>1 2014-08-11T15:17:27.171188+02:00 Telcontar acpid - - - 1 client rule loaded
<28>1 2014-08-11T15:17:29.413944+02:00 Telcontar pm-utils - - - Thawing (95)...
<29>1 2014-08-11T15:17:30.048264+02:00 Telcontar dbus 1020 - - [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<30>1 2014-08-11T15:17:30.833496+02:00 Telcontar systemd 1 - - Starting LSB: Network time protocol daemon (ntpd)...
<4>1 2014-08-11T15:17:30.990470+02:00 Telcontar kernel - - - [73244.256012] XFS (sdd5): xfs_log_force: error 5 returned.
<29>1 2014-08-11T15:17:31.324585+02:00 Telcontar dbus 1020 - - [system] Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not found or permissions invalid
As you see, the corruption is detected instantly after waking up, before
pm-utils scripts have a chance to run.
Post by Carlos E. R.
If the former, I wonder if it's possible that the scanner returns to
life pointing to a stale or freed incore inode and does something bogus
based on that.
Well, as I said, that's above my understanding ;-)
- -- Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEARECAAYFAlPqhHwACgkQtTMYHG2NR9WmrwCglBRRHEMgU9mCEHkU9iHqYehX
+1AAn2oUn8/M3Rfb7mLWapLqYxDfvHNv
=9Yft
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Dave Chinner
2014-08-13 21:04:09 UTC
Permalink
Post by Brian Foster
This the point I'm at now. Are you saying that the filesystem can explode at
any time now? I have not written any files, beyond what the desktop does
automatically.
No, the filesystem has been fixed by repair. I'm just saying that
somehow the fs creates a duplicate free space record in one of the free
space trees.
Simple answer: the block is being freed twice. i.e. from a workqueue
during the hibernate process after the relevant memory has been
snapshotted (i.e. because the workqueue was not frozen), and again
after thaw when the memory image is restored to RAM and the
workqueue is started up again and the workqueue runs the same work a
second time.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Eric Sandeen
2014-08-12 21:27:58 UTC
Permalink
Post by Carlos E. R.
Post by Carlos E. R.
Post by Carlos E. R.
but all of them are about 401M before compression. The upload will take
long, my ADSL upload is 0.3M/s at most.
I have shared (view) on google drive a folder with the three files. Both
Brian Foster and Mark Tinguely should have got a link on the mail from me.
If somebody else wants access, just tell me.
Post by Carlos E. R.
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.
Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.
Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).
What freeze? look at hibernate(), nothing but a sync:

/**
* hibernate - Carry out system hibernation, including saving the image.
*/
int hibernate(void)
{
...
printk(KERN_INFO "PM: Syncing filesystems ... ");
sys_sync();
printk("done.\n");

error = freeze_processes();
if (error)
goto Exit;


AFAIK there is no freeze call involved.

-Eric
Brian Foster
2014-08-12 21:59:43 UTC
Permalink
Post by Eric Sandeen
Post by Carlos E. R.
Post by Carlos E. R.
Post by Carlos E. R.
but all of them are about 401M before compression. The upload will take
long, my ADSL upload is 0.3M/s at most.
I have shared (view) on google drive a folder with the three files. Both
Brian Foster and Mark Tinguely should have got a link on the mail from me.
If somebody else wants access, just tell me.
Post by Carlos E. R.
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.
Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.
Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).
/**
* hibernate - Carry out system hibernation, including saving the image.
*/
int hibernate(void)
{
...
printk(KERN_INFO "PM: Syncing filesystems ... ");
sys_sync();
printk("done.\n");
error = freeze_processes();
if (error)
goto Exit;
AFAIK there is no freeze call involved.
Eep, not sure why I was thinking there was a freeze there. It appears
not. I guess that explains why the log contains what it does. Thanks for
pointing that out...

Brian
Post by Eric Sandeen
-Eric
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Eric Sandeen
2014-08-12 22:21:58 UTC
Permalink
Post by Brian Foster
Post by Eric Sandeen
Post by Carlos E. R.
Post by Carlos E. R.
Post by Carlos E. R.
but all of them are about 401M before compression. The upload will take
long, my ADSL upload is 0.3M/s at most.
I have shared (view) on google drive a folder with the three files. Both
Brian Foster and Mark Tinguely should have got a link on the mail from me.
If somebody else wants access, just tell me.
Post by Carlos E. R.
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.
Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.
Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).
/**
* hibernate - Carry out system hibernation, including saving the image.
*/
int hibernate(void)
{
...
printk(KERN_INFO "PM: Syncing filesystems ... ");
sys_sync();
printk("done.\n");
error = freeze_processes();
if (error)
goto Exit;
AFAIK there is no freeze call involved.
Eep, not sure why I was thinking there was a freeze there.
because it seems so logical. :)
Post by Brian Foster
It appears
not. I guess that explains why the log contains what it does. Thanks for
pointing that out...
but as I was saying on IRC, I think in theory it's not necessary; the fs state
on disk + fs state in memory (saved to disk during hibernate) needs to be
consistent, and it's conceivable that this could be done without freeze
(or even sync for that matter).

A freeze sure sounds nice though, to be sure the fs really is consistent
on disk, in case resume fails.

The thing I was wondering about is what makes sure disk caches are flushed
before disks lose power when hibernate completes. (I'm just handwaving
here, though...)

Anyway, Dave's mention of making threads freezable makes the most sense.
Documentation/power/freezing-of-tasks.txt
makes it pretty clear that any thread which might change fs state
Post by Brian Foster
We therefore freeze tasks that might
cause the on-disk filesystems' data and metadata to be modified after the
hibernation image has been created and before the system is finally powered off.
The majority of these are user space processes, but if any of the kernel threads
may cause something like this to happen, they have to be freezable.
jbd/jbd2 explicitly handle this freezing in the kjournald/kjournald2 threads.

-Eric
Post by Brian Foster
Brian
Post by Eric Sandeen
-Eric
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs
Dave Chinner
2014-08-12 23:16:29 UTC
Permalink
Post by Eric Sandeen
Post by Brian Foster
Post by Eric Sandeen
Post by Carlos E. R.
Post by Carlos E. R.
Post by Carlos E. R.
but all of them are about 401M before compression. The upload will take
long, my ADSL upload is 0.3M/s at most.
I have shared (view) on google drive a folder with the three files. Both
Brian Foster and Mark Tinguely should have got a link on the mail from me.
If somebody else wants access, just tell me.
Post by Carlos E. R.
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.
Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.
Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).
/**
* hibernate - Carry out system hibernation, including saving the image.
*/
int hibernate(void)
{
...
printk(KERN_INFO "PM: Syncing filesystems ... ");
sys_sync();
printk("done.\n");
error = freeze_processes();
if (error)
goto Exit;
AFAIK there is no freeze call involved.
Eep, not sure why I was thinking there was a freeze there.
because it seems so logical. :)
Post by Brian Foster
It appears
not. I guess that explains why the log contains what it does. Thanks for
pointing that out...
but as I was saying on IRC, I think in theory it's not necessary; the fs state
on disk + fs state in memory (saved to disk during hibernate) needs to be
consistent, and it's conceivable that this could be done without freeze
(or even sync for that matter).
Well, the sync is necessary for hibernate - it needs to shrink the
amount of memory that is saved to disk to as small as possible. If
your memory is full of dirty page cache, why would you save that to
the hibernate image, only to have to load it back off, then write it
to the filesystem after resume? Why wouldn't you write it straight
to disk before hibernation, then remove it from memory so you've
then got free memory to allocate the hibernation image that gets
written to disk?
Post by Eric Sandeen
A freeze sure sounds nice though, to be sure the fs really is consistent
on disk, in case resume fails.
The thing I was wondering about is what makes sure disk caches are flushed
before disks lose power when hibernate completes. (I'm just handwaving
here, though...)
That usually happens in the driver power-down sequence.
Post by Eric Sandeen
Anyway, Dave's mention of making threads freezable makes the most sense.
Documentation/power/freezing-of-tasks.txt
makes it pretty clear that any thread which might change fs state
Post by Brian Foster
We therefore freeze tasks that might
cause the on-disk filesystems' data and metadata to be modified after the
hibernation image has been created and before the system is finally powered off.
The majority of these are user space processes, but if any of the kernel threads
may cause something like this to happen, they have to be freezable.
jbd/jbd2 explicitly handle this freezing in the kjournald/kjournald2 threads.
As we do for the xfsaild kernel thread. We used to use kernel
threads for functionality that we now use workqueues for - the
xfssyncd and the xfsbufd - and those kernel threads used to also
freeze like the xfsaild does. We lost that when moving to
workqueues.

The stupid part about all this is we actually stop periodic
workqueue processing for workqueues that can modify state when the
filesystem freezes. i.e. if the hibernation code froze the
filesystem we wouldn't need to mark workqueues as freezable because
XFS already manages everything in the manner than hibernation
requires....

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Carlos E. R.
2014-08-13 00:07:42 UTC
Permalink
On Wednesday, 2014-08-13 at 09:16 +1000, Dave Chinner wrote:

...
Post by Dave Chinner
Well, the sync is necessary for hibernate - it needs to shrink the
amount of memory that is saved to disk to as small as possible. If
your memory is full of dirty page cache, why would you save that to
the hibernate image, only to have to load it back off, then write it
to the filesystem after resume? Why wouldn't you write it straight
to disk before hibernation, then remove it from memory so you've
then got free memory to allocate the hibernation image that gets
written to disk?
You can see that this happens by looking at the output of "free" before
and after hibernation. Even issuing the command after getting the desktop
back, I can see a big difference (the ammount of buffers and cache).

- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
Dave Chinner
2014-08-12 21:57:05 UTC
Permalink
Post by Eric Sandeen
Post by Carlos E. R.
Post by Carlos E. R.
Post by Carlos E. R.
but all of them are about 401M before compression. The upload will take
long, my ADSL upload is 0.3M/s at most.
I have shared (view) on google drive a folder with the three files. Both
Brian Foster and Mark Tinguely should have got a link on the mail from me.
If somebody else wants access, just tell me.
Post by Carlos E. R.
block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.
Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.
Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).
/**
* hibernate - Carry out system hibernation, including saving the image.
*/
int hibernate(void)
{
...
printk(KERN_INFO "PM: Syncing filesystems ... ");
sys_sync();
printk("done.\n");
error = freeze_processes();
if (error)
goto Exit;
AFAIK there is no freeze call involved.
Yes, that's a problem I've been pointing out for years. TuxOnIce
freezes the filesystems, but the kernel hibernation maintainers have
steadfastly refuses to even acknowledge that it is necessary.

As it is, I'm pretty sure this is being caused by the XFS workqueues
not being frozen appropriately i.e. WQ_FREEZEABLE needs to be added
to various workqueue definitions so that work gets halted when
kernel threads get halted.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Loading...