Discussion:
kernel crash 2.6.31.14-0.1-xen from openSUSE 11.2
(too old to reply)
Michael Monnerie
2010-10-09 21:55:16 UTC
Permalink
I just had a bug in one VM, and don't know if this is XFS related, but it seems so.

[1172140.926859] BUG: unable to handle kernel paging request at 00000003d03dd178
[1172140.926877] IP: [<ffffffff80034872>] dequeue_task+0x72/0x110
[1172140.926897] PGD 1b54b067 PUD 0
[1172140.926897] Thread overran stack, or stack corrupted
[1172140.926897] Oops: 0000 [#1] SMP
[1172140.926897] last sysfs file: /sys/devices/xen/vbd-51744/block/xvdc/stat
[1172140.926897] CPU 0
[1172140.926897] Modules linked in: bluetooth rfkill af_packet ipt_REJECT ipt_LOG xt_multiport xt_tcpudp ipt_ECN iptable_mangle xt_state iptable_filter ip_tables nf_conntrack_slp nf_conntrack_sip nf_conntrack_netbios_ns nf_conntrack_proto_udplite nf_conntrack_proto_dccp nf_conntrack_irc nf_conntrack_ftp
nf_conntrack_tftp nf_conntrack_netlink nfnetlink nf_conntrack_sane ts_kmp nf_conntrack_amanda nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_h323 xt_conntrack x_tables nf_conntrack_proto_sctp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack nf_defr
ag_ipv4 nfs lockd fscache nfs_acl ipv6 auth_rpcgss sunrpc ramzswap xvmalloc lzo_decompress lzo_compress fuse loop dm_mod xenblk cdrom xennet ext4 jbd2 crc16 xfs exportfs reiserfs
[1172140.926897] Pid: 28690, comm: pdflush Not tainted 2.6.31.14-0.1-xen #1
[1172140.926897] RIP: e030:[<ffffffff80034872>] [<ffffffff80034872>] dequeue_task+0x72/0x110
[1172140.926897] RSP: e02b:ffff880001230380 EFLAGS: 00010016
[1172140.926897] RAX: 000000000000a380 RBX: ffff880004f84180 RCX: 0000000089f8b3ff
[1172140.926897] RDX: 0000000000000001 RSI: ffff880004f84180 RDI: ffffc9000000a380
[1172140.926897] RBP: ffff8800012303a0 R08: ffff880001230000 R09: 0000000000000000
[1172140.926897] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[1172140.926897] R13: ffff8800012304f0 R14: 000002a400003000 R15: ffff8800285fa600
[1172140.926897] FS: 00007f6c27bb06f0(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000
[1172140.926897] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[1172140.926897] CR2: 00000003d03dd178 CR3: 000000001b4cd000 CR4: 0000000000000660
[1172140.926897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1172140.926897] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1172140.926897] Process pdflush (pid: 28690, threadinfo ffff880001230000, task ffff880004f84180)
[1172140.926897] Stack:
[1172140.926897] ffff8800012303d0 0000000089f8b3ff 0000000000001000 ffffc9000000a380
[1172140.926897] <0> ffff8800012303d0 ffffffff800349a8 ffffc9000000a380 0000000089f8b3ff
[1172140.926897] <0> ffff8800012304f0 ffffc9000000a380 ffff8800012304e0 ffffffff8046ca5c
[1172140.926897] Call Trace:
[1172140.926897] [<ffffffff800349a8>] deactivate_task+0x38/0x60
[1172140.926897] [<ffffffff8046ca5c>] thread_return+0x157/0x3fb
[1172140.926897] [<ffffffffa009b5ef>] _sv_wait+0x8f/0xc0 [xfs]
[1172140.926897] [<ffffffffa009ec69>] xlog_state_sync+0x2b9/0x2e0 [xfs]
[1172140.926897] [<ffffffffa009ecf5>] _xfs_log_force+0x65/0xa0 [xfs]
[1172140.926897] [<ffffffffa009ed52>] xfs_log_force+0x22/0x60 [xfs]
[1172140.926897] [<ffffffffa005f2fa>] xfs_alloc_search_busy+0xea/0x110 [xfs]
[1172140.926897] [<ffffffffa0061244>] xfs_alloc_ag_vextent+0x154/0x160 [xfs]
[1172140.926897] [<ffffffffa0061a04>] xfs_alloc_vextent+0x204/0x4b0 [xfs]
[1172140.926897] [<ffffffffa0071c8b>] xfs_bmap_btalloc+0x18b/0xae0 [xfs]
[1172140.926897] [<ffffffffa007260f>] xfs_bmap_alloc+0x2f/0x70 [xfs]
[1172140.926897] [<ffffffffa0073272>] xfs_bmapi+0xc22/0x1320 [xfs]
[1172140.926897] [<ffffffffa0098248>] xfs_iomap_write_allocate+0x1d8/0x3f0 [xfs]
[1172140.926897] [<ffffffffa0099089>] xfs_iomap+0x2c9/0x300 [xfs]
[1172140.926897] [<ffffffffa00b61b8>] xfs_map_blocks+0x38/0x60 [xfs]
[1172140.926897] [<ffffffffa00b793a>] xfs_page_state_convert+0x3fa/0x720 [xfs]
[1172140.926897] [<ffffffffa00b7de4>] xfs_vm_writepage+0x84/0x160 [xfs]
[1172140.926897] [<ffffffff800e36b3>] pageout+0x143/0x2b0
[1172140.926897] [<ffffffff800e51fe>] shrink_page_list+0x26e/0x650
[1172140.926897] [<ffffffff800e58b3>] shrink_inactive_list+0x2d3/0x7c0
[1172140.926897] [<ffffffff800e5dfb>] shrink_list+0x5b/0x110
[1172140.926897] [<ffffffff800e6021>] shrink_zone+0x171/0x250
[1172140.926897] [<ffffffff800e6183>] shrink_zones+0x83/0x120
[1172140.926897] [<ffffffff800e62be>] do_try_to_free_pages+0x9e/0x380
[1172140.926897] [<ffffffff800e66b7>] try_to_free_pages+0x77/0xa0
[1172140.926897] [<ffffffff800dc053>] __alloc_pages_slowpath+0x2d3/0x5c0
[1172140.926897] [<ffffffff800dc491>] __alloc_pages_nodemask+0x151/0x160
[1172140.926897] [<ffffffff8010efdc>] T.583+0x4c/0x150
[1172140.926897] [<ffffffff8010f1d7>] T.581+0xf7/0x380
[1172140.926897] [<ffffffff8010f6c0>] cache_alloc_refill+0x260/0x2a0
[1172140.926897] [<ffffffff8010f86c>] kmem_cache_alloc+0x16c/0x180
[1172140.926897] [<ffffffffa00b53e2>] kmem_zone_alloc+0xa2/0x110 [xfs]
[1172140.926897] [<ffffffffa00b5472>] kmem_zone_zalloc+0x22/0x60 [xfs]
[1172140.926897] [<ffffffffa00abadd>] _xfs_trans_alloc+0x3d/0x90 [xfs]
[1172140.926897] [<ffffffffa00abcd2>] xfs_trans_alloc+0xa2/0xd0 [xfs]
[1172140.926897] [<ffffffffa00982b1>] xfs_iomap_write_allocate+0x241/0x3f0 [xfs]
[1172140.926897] [<ffffffffa0099089>] xfs_iomap+0x2c9/0x300 [xfs]
[1172140.926897] [<ffffffffa00b61b8>] xfs_map_blocks+0x38/0x60 [xfs]
[1172140.926897] [<ffffffffa00b793a>] xfs_page_state_convert+0x3fa/0x720 [xfs]
[1172140.926897] [<ffffffffa00b7de4>] xfs_vm_writepage+0x84/0x160 [xfs]
[1172140.926897] [<ffffffff800dd9e1>] __writepage+0x21/0x60
[1172140.926897] [<ffffffff800dec35>] write_cache_pages+0x215/0x520
[1172140.926897] [<ffffffff800def70>] generic_writepages+0x30/0x50
[1172140.926897] [<ffffffffa00b6d0c>] xfs_vm_writepages+0x7c/0xb0 [xfs]
[1172140.926897] [<ffffffff800defc5>] do_writepages+0x35/0x70
[1172140.926897] [<ffffffff80140c2b>] writeback_single_inode+0x10b/0x3f0
[1172140.926897] [<ffffffff801413e8>] generic_sync_sb_inodes+0x1a8/0x5f0
[1172140.926897] [<ffffffff80141875>] sync_sb_inodes+0x45/0x60
[1172140.926897] [<ffffffff801419a4>] writeback_inodes+0x54/0x150
[1172140.926897] [<ffffffff800de3ae>] background_writeout+0xbe/0x110
[1172140.926897] [<ffffffff800dffb0>] __pdflush+0x1b0/0x3b0
[1172140.926897] [<ffffffff800e0207>] pdflush+0x57/0x80
[1172140.926897] [<ffffffff8006fb06>] kthread+0xb6/0xc0
[1172140.926897] [<ffffffff8000d3ea>] child_rip+0xa/0x20
[1172140.926897] Code: 00 48 29 c8 48 29 f0 48 c1 f8 03 48 01 f0 48 89 83 a0 00 00 00 4c 8b 43 08 4c 8b 8b 18 02 00 00 48 c7 c0 80 a3 00 00 41 8b 48 18 <48> 8b 0c cd 80 31 78 80 48 8b b4 08 40 08 00 00 31 c9 48 c7 83
[1172140.926897] RIP [<ffffffff80034872>] dequeue_task+0x72/0x110
[1172140.926897] RSP <ffff880001230380>
[1172140.926897] CR2: 00000003d03dd178
[1172140.927989] ---[ end trace 3d59bc6e1b53fd3a ]---
--
mit freundlichen GrÃŒssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei HÀuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/
Dave Chinner
2010-10-10 01:56:28 UTC
Permalink
Post by Michael Monnerie
I just had a bug in one VM, and don't know if this is XFS related, but it seems so.
[1172140.926859] BUG: unable to handle kernel paging request at 00000003d03dd178
[1172140.926877] IP: [<ffffffff80034872>] dequeue_task+0x72/0x110
[1172140.926897] PGD 1b54b067 PUD 0
[1172140.926897] Thread overran stack, or stack corrupted
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The stack trace shows pdflush doing writeback, and way down the
stack doing a memory allocation that triggered direct reclaim, which
caused writeback to occur, which blew the stack.....

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Michael Monnerie
2010-10-11 04:57:55 UTC
Permalink
Post by Dave Chinner
Post by Michael Monnerie
[1172140.926859] BUG: unable to handle kernel paging request at
00000003d03dd178 [1172140.926877] IP: [<ffffffff80034872>]
dequeue_task+0x72/0x110 [1172140.926897] PGD 1b54b067 PUD 0
[1172140.926897] Thread overran stack, or stack corrupted
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The stack trace shows pdflush doing writeback, and way down the
stack doing a memory allocation that triggered direct reclaim, which
caused writeback to occur, which blew the stack.....
Thanks. So what? Didn't we have this before? I sent a mail on May 21,
2010, with subject "kernel crash: scheduling while atomic". Eric Sandeen
said "I'm guessing you blew the stack".

So what should I do about it? I think I read about a patch that should
fix this. Seems Novell/openSUSE didn't backport it, so could someone
please guide me where to find it? And I guess I should report upstream,
right?
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/
Eric Sandeen
2010-10-13 22:07:56 UTC
Permalink
Post by Dave Chinner
Post by Michael Monnerie
I just had a bug in one VM, and don't know if this is XFS related, but it seems so.
[1172140.926859] BUG: unable to handle kernel paging request at 00000003d03dd178
[1172140.926877] IP: [<ffffffff80034872>] dequeue_task+0x72/0x110
[1172140.926897] PGD 1b54b067 PUD 0
[1172140.926897] Thread overran stack, or stack corrupted
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Awesome, that's the 2nd time in like 4 years I've seen that stack canary
I put in actually be useful ;)

-Eric
Post by Dave Chinner
The stack trace shows pdflush doing writeback, and way down the
stack doing a memory allocation that triggered direct reclaim, which
caused writeback to occur, which blew the stack.....
Cheers,
Dave.
Michael Monnerie
2010-10-14 05:18:37 UTC
Permalink
Post by Eric Sandeen
Post by Dave Chinner
Post by Michael Monnerie
[1172140.926859] BUG: unable to handle kernel paging request at
00000003d03dd178 [1172140.926877] IP: [<ffffffff80034872>]
dequeue_task+0x72/0x110 [1172140.926897] PGD 1b54b067 PUD 0
[1172140.926897] Thread overran stack, or stack corrupted
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Awesome, that's the 2nd time in like 4 years I've seen that stack
canary I put in actually be useful ;)
And both times reported by me :-)
Not that I'm especially excited to have that problem though :-(

What should I report where (upstream?) to get that fixed?
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Radiointerview zum Thema Spam ******
http://www.it-podcast.at/archiv.html#podcast-100716

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/
Michael Monnerie
2010-10-15 09:13:27 UTC
Permalink
Post by Michael Monnerie
Post by Eric Sandeen
Post by Dave Chinner
Post by Michael Monnerie
[1172140.926859] BUG: unable to handle kernel paging request at
00000003d03dd178 [1172140.926877] IP: [<ffffffff80034872>]
dequeue_task+0x72/0x110 [1172140.926897] PGD 1b54b067 PUD 0
[1172140.926897] Thread overran stack, or stack corrupted
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Awesome, that's the 2nd time in like 4 years I've seen that stack
canary I put in actually be useful ;)
And both times reported by me :-)
Not that I'm especially excited to have that problem though :-(
What should I report where (upstream?) to get that fixed?
Is it the same bug as here? Then it has been fixed:
https://bugzilla.novell.com/show_bug.cgi?id=614670
--
mit freundlichen GrÃŒssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Radiointerview zum Thema Spam ******
http://www.it-podcast.at/archiv.html#podcast-100716

// Wir haben im Moment zwei HÀuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/
Continue reading on narkive:
Loading...