[RFC, PATCH 0/4] xfs: clean up xlog_recover_process

Discussion:

[RFC, PATCH 0/4] xfs: clean up xlog_recover_process_data

Dave Chinner

2014-08-26 01:21:37 UTC

Hi folks,

The log recovery use-after-free that Brian posted a patch for has
had several previous attempts sent and none have completed review.
So let's put this one to bed for good.

This patch set addresses the previous review feedback for fixing
this problem. It factors xlog_recover_process_data() to make it
cleaner and isolate the context of the transaction recvoery
structure that is causing problems. It fixes a leak of the structure
in an error condition that I noticed while factoring, as well as the
double free that Brian and others have identified and tried to fix
in the past. It then re-arranges the recovery item management code
to put it all in one place, rather than in 3 separate parts of the
file separated by several hundred lines of unrelated code.

Comments?

-Dave.

Dave Chinner

2014-08-26 01:21:39 UTC

Permalink

From: Dave Chinner <***@redhat.com>

It aborts recovery without freeing the current trans structure that
we are decoding.

Signed-off-by: Dave Chinner <***@redhat.com>
---
fs/xfs/xfs_log_recover.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1970732f..460cf98 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3587,8 +3587,9 @@ xlog_recovery_process_ophdr(
/* unexpected flag values */
case XLOG_UNMOUNT_TRANS:
xfs_warn(log->l_mp, "%s: Unmount LR", __func__);
- error = 0;
- break;
+ xlog_recover_free_trans(trans);
+ return 0;
+
case XLOG_START_TRANS:
xfs_warn(log->l_mp, "%s: bad transaction 0x%x", __func__, tid);
ASSERT(0);

--
2.0.0

Brian Foster

2014-08-26 12:41:57 UTC

Permalink

Post by Dave Chinner
It aborts recovery without freeing the current trans structure that
we are decoding.

What do you mean by "aborts recovery?" I don't see anything in the code
that reflects that behavior. Do you mean it's an on-disk marker for
completion?

Post by Dave Chinner
---
fs/xfs/xfs_log_recover.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1970732f..460cf98 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3587,8 +3587,9 @@ xlog_recovery_process_ophdr(
/* unexpected flag values */
xfs_warn(log->l_mp, "%s: Unmount LR", __func__);
- error = 0;
- break;
+ xlog_recover_free_trans(trans);
+ return 0;
+

The change to return here seems superfluous. It's fine, but just to
check, were you intending to alter behavior in some way (e.g., return
from xlog_recover_process_data())?

Brian

Post by Dave Chinner
xfs_warn(log->l_mp, "%s: bad transaction 0x%x", __func__, tid);
ASSERT(0);
--
2.0.0
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs

Dave Chinner

2014-08-26 01:21:38 UTC

Permalink

From: Dave Chinner <***@redhat.com>

Clean up xlog_recover_process_data() structure in preparation for
fixing the allocationa nd freeing context of the transaction being
recovered.

Signed-off-by: Dave Chinner <***@redhat.com>
---
fs/xfs/xfs_log_recover.c | 151 ++++++++++++++++++++++++++---------------------
1 file changed, 84 insertions(+), 67 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 01becbb..1970732f 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3531,12 +3531,78 @@ out:
}

STATIC int
-xlog_recover_unmount_trans(
- struct xlog *log)
+xlog_recovery_process_ophdr(
+ struct xlog *log,
+ struct hlist_head rhash[],
+ struct xlog_rec_header *rhead,
+ struct xlog_op_header *ohead,
+ xfs_caddr_t dp,
+ xfs_caddr_t lp,
+ int pass)
{
- /* Do nothing now */
- xfs_warn(log->l_mp, "%s: Unmount LR", __func__);
- return 0;
+ struct xlog_recover *trans;
+ xlog_tid_t tid;
+ int error;
+ unsigned long hash;
+ uint flags;
+ unsigned int hlen;
+
+ hlen = be32_to_cpu(ohead->oh_len);
+ tid = be32_to_cpu(ohead->oh_tid);
+ hash = XLOG_RHASH(tid);
+ trans = xlog_recover_find_tid(&rhash[hash], tid);
+ if (!trans) {
+ /* add new tid if this is a new transaction */
+ if (ohead->oh_flags & XLOG_START_TRANS) {
+ xlog_recover_new_tid(&rhash[hash], tid,
+ be64_to_cpu(rhead->h_lsn));
+ }
+ return 0;
+ }
+
+ error = -EIO;
+ if (dp + hlen > lp) {
+ xfs_warn(log->l_mp, "%s: bad length 0x%x", __func__, hlen);
+ WARN_ON(1);
+ goto out_free;
+ }
+
+ flags = ohead->oh_flags & ~XLOG_END_TRANS;
+ if (flags & XLOG_WAS_CONT_TRANS)
+ flags &= ~XLOG_CONTINUE_TRANS;
+
+ switch (flags) {
+ /* expected flag values */
+ case 0:
+ case XLOG_CONTINUE_TRANS:
+ error = xlog_recover_add_to_trans(log, trans, dp, hlen);
+ break;
+ case XLOG_WAS_CONT_TRANS:
+ error = xlog_recover_add_to_cont_trans(log, trans, dp, hlen);
+ break;
+ case XLOG_COMMIT_TRANS:
+ error = xlog_recover_commit_trans(log, trans, pass);
+ break;
+
+ /* unexpected flag values */
+ case XLOG_UNMOUNT_TRANS:
+ xfs_warn(log->l_mp, "%s: Unmount LR", __func__);
+ error = 0;
+ break;
+ case XLOG_START_TRANS:
+ xfs_warn(log->l_mp, "%s: bad transaction 0x%x", __func__, tid);
+ ASSERT(0);
+ break;
+ default:
+ xfs_warn(log->l_mp, "%s: bad flag 0x%x", __func__, flags);
+ ASSERT(0);
+ break;
+ }
+
+out_free:
+ if (error)
+ xlog_recover_free_trans(trans);
+ return error;
}

/*
@@ -3556,14 +3622,10 @@ xlog_recover_process_data(
xfs_caddr_t dp,
int pass)
{
+ struct xlog_op_header *ohead;
xfs_caddr_t lp;
int num_logops;
- xlog_op_header_t *ohead;
- xlog_recover_t *trans;
- xlog_tid_t tid;
int error;
- unsigned long hash;
- uint flags;

lp = dp + be32_to_cpu(rhead->h_len);
num_logops = be32_to_cpu(rhead->h_num_logops);
@@ -3573,69 +3635,24 @@ xlog_recover_process_data(
return -EIO;

while ((dp < lp) && num_logops) {
- ASSERT(dp + sizeof(xlog_op_header_t) <= lp);
- ohead = (xlog_op_header_t *)dp;
- dp += sizeof(xlog_op_header_t);
+ ASSERT(dp + sizeof(struct xlog_op_header) <= lp);
+
+ ohead = (struct xlog_op_header *)dp;
+ dp += sizeof(*ohead);
+
if (ohead->oh_clientid != XFS_TRANSACTION &&
ohead->oh_clientid != XFS_LOG) {
xfs_warn(log->l_mp, "%s: bad clientid 0x%x",
- __func__, ohead->oh_clientid);
+ __func__, ohead->oh_clientid);
ASSERT(0);
return -EIO;
}
- tid = be32_to_cpu(ohead->oh_tid);
- hash = XLOG_RHASH(tid);
- trans = xlog_recover_find_tid(&rhash[hash], tid);
- if (trans == NULL) { /* not found; add new tid */
- if (ohead->oh_flags & XLOG_START_TRANS)
- xlog_recover_new_tid(&rhash[hash], tid,
- be64_to_cpu(rhead->h_lsn));
- } else {
- if (dp + be32_to_cpu(ohead->oh_len) > lp) {
- xfs_warn(log->l_mp, "%s: bad length 0x%x",
- __func__, be32_to_cpu(ohead->oh_len));
- WARN_ON(1);
- return -EIO;
- }
- flags = ohead->oh_flags & ~XLOG_END_TRANS;
- if (flags & XLOG_WAS_CONT_TRANS)
- flags &= ~XLOG_CONTINUE_TRANS;
- switch (flags) {
- case XLOG_COMMIT_TRANS:
- error = xlog_recover_commit_trans(log,
- trans, pass);
- break;
- case XLOG_UNMOUNT_TRANS:
- error = xlog_recover_unmount_trans(log);
- break;
- case XLOG_WAS_CONT_TRANS:
- error = xlog_recover_add_to_cont_trans(log,
- trans, dp,
- be32_to_cpu(ohead->oh_len));
- break;
- case XLOG_START_TRANS:
- xfs_warn(log->l_mp, "%s: bad transaction",
- __func__);
- ASSERT(0);
- error = -EIO;
- break;
- case 0:
- case XLOG_CONTINUE_TRANS:
- error = xlog_recover_add_to_trans(log, trans,
- dp, be32_to_cpu(ohead->oh_len));
- break;
- default:
- xfs_warn(log->l_mp, "%s: bad flag 0x%x",
- __func__, flags);
- ASSERT(0);
- error = -EIO;
- break;
- }
- if (error) {
- xlog_recover_free_trans(trans);
- return error;
- }
- }
+
+ error = xlog_recovery_process_ophdr(log, rhash, rhead, ohead,
+ dp, lp, pass);
+ if (error)
+ return error;
+
dp += be32_to_cpu(ohead->oh_len);
num_logops--;
}

--
2.0.0

Christoph Hellwig

2014-08-26 04:09:21 UTC

Permalink

Post by Dave Chinner
@@ -3556,14 +3622,10 @@ xlog_recover_process_data(
xfs_caddr_t dp,
int pass)
{
+ struct xlog_op_header *ohead;
xfs_caddr_t lp;
int num_logops;
int error;
lp = dp + be32_to_cpu(rhead->h_len);
num_logops = be32_to_cpu(rhead->h_num_logops);
@@ -3573,69 +3635,24 @@ xlog_recover_process_data(
return -EIO;
while ((dp < lp) && num_logops) {
+ ASSERT(dp + sizeof(struct xlog_op_header) <= lp);
+
+ ohead = (struct xlog_op_header *)dp;
+ dp += sizeof(*ohead);

Using sizeof type and sizeof variable for the same thing right next
to each other seems weird. Also why duplicate the addition instead
of moving it below the assignment:

ohead = (struct xlog_op_header *)dp;
dp += sizeof(*ohead);

ASSERT(dp <= lp);

Dave Chinner

2014-08-26 04:55:09 UTC

Permalink

Post by Christoph Hellwig

Using sizeof type and sizeof variable for the same thing right next
to each other seems weird. Also why duplicate the addition instead

Oh, I missed converting the one in the ASSERT.

Post by Christoph Hellwig
ohead = (struct xlog_op_header *)dp;
dp += sizeof(*ohead);
ASSERT(dp <= lp);

Yup, that makes sense.

Cheers,

Dave.

--
Dave Chinner
***@fromorbit.com

Brian Foster

2014-08-26 12:41:13 UTC

Permalink

Post by Dave Chinner
Clean up xlog_recover_process_data() structure in preparation for
fixing the allocationa nd freeing context of the transaction being
recovered.
---
fs/xfs/xfs_log_recover.c | 151 ++++++++++++++++++++++++++---------------------
1 file changed, 84 insertions(+), 67 deletions(-)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 01becbb..1970732f 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
}
STATIC int
-xlog_recover_unmount_trans(
- struct xlog *log)
+xlog_recovery_process_ophdr(
+ struct xlog *log,
+ struct hlist_head rhash[],
+ struct xlog_rec_header *rhead,
+ struct xlog_op_header *ohead,
+ xfs_caddr_t dp,
+ xfs_caddr_t lp,
+ int pass)
{
- /* Do nothing now */
- xfs_warn(log->l_mp, "%s: Unmount LR", __func__);
- return 0;
+ struct xlog_recover *trans;
+ xlog_tid_t tid;
+ int error;
+ unsigned long hash;
+ uint flags;
+ unsigned int hlen;
+
+ hlen = be32_to_cpu(ohead->oh_len);
+ tid = be32_to_cpu(ohead->oh_tid);
+ hash = XLOG_RHASH(tid);
+ trans = xlog_recover_find_tid(&rhash[hash], tid);
+ if (!trans) {
+ /* add new tid if this is a new transaction */
+ if (ohead->oh_flags & XLOG_START_TRANS) {
+ xlog_recover_new_tid(&rhash[hash], tid,
+ be64_to_cpu(rhead->h_lsn));
+ }
+ return 0;
+ }
+

Overall this looks pretty good to me. I wonder if we can clean this up
to separate state management from error detection while we're at it. I
don't see any reason this code to find trans has to be up here.

Post by Dave Chinner
+ error = -EIO;
+ if (dp + hlen > lp) {
+ xfs_warn(log->l_mp, "%s: bad length 0x%x", __func__, hlen);
+ WARN_ON(1);
+ goto out_free;
+ }
+
+ flags = ohead->oh_flags & ~XLOG_END_TRANS;
+ if (flags & XLOG_WAS_CONT_TRANS)
+ flags &= ~XLOG_CONTINUE_TRANS;
+

/* we should find a trans for anything other than a start op */
trans = xlog_recover_find_tid(&rhash[hash], tid);
if (((ohead->oh_flags & XLOG_START_TRANS) && trans) ||
(!(ohead->oh_flags & XLOG_START_TRANS) && !trans)) {
xfs_warn(log->l_mp, "%s: bad transaction 0x%x oh_flags 0x%x trans %p",
__func__, tid, ohead->oh_flags, trans);
ASSERT(0);
return -EIO;
}

Maybe returning error here is not the right thing to do because we want
the recovery to proceed. We could still dump a warning and return 0
though.

Post by Dave Chinner
+ switch (flags) {
+ /* expected flag values */
+ error = xlog_recover_add_to_trans(log, trans, dp, hlen);
+ break;
+ error = xlog_recover_add_to_cont_trans(log, trans, dp, hlen);
+ break;
+ error = xlog_recover_commit_trans(log, trans, pass);
+ break;
+
+ /* unexpected flag values */
+ xfs_warn(log->l_mp, "%s: Unmount LR", __func__);
+ error = 0;
+ break;
+ xfs_warn(log->l_mp, "%s: bad transaction 0x%x", __func__, tid);
+ ASSERT(0);
+ break;

xlog_recover_new_tid(&rhash[hash], tid, be64_to_cpu(rhead->h_lsn)
error = 0;
break;

Brian

Post by Dave Chinner
+ xfs_warn(log->l_mp, "%s: bad flag 0x%x", __func__, flags);
+ ASSERT(0);
+ break;
+ }
+
+ if (error)
+ xlog_recover_free_trans(trans);
+ return error;
}
/*
@@ -3556,14 +3622,10 @@ xlog_recover_process_data(
xfs_caddr_t dp,
int pass)
{
+ struct xlog_op_header *ohead;
xfs_caddr_t lp;
int num_logops;
- xlog_op_header_t *ohead;
- xlog_recover_t *trans;
- xlog_tid_t tid;
int error;
- unsigned long hash;
- uint flags;
lp = dp + be32_to_cpu(rhead->h_len);
num_logops = be32_to_cpu(rhead->h_num_logops);
@@ -3573,69 +3635,24 @@ xlog_recover_process_data(
return -EIO;
while ((dp < lp) && num_logops) {
- ASSERT(dp + sizeof(xlog_op_header_t) <= lp);
- ohead = (xlog_op_header_t *)dp;
- dp += sizeof(xlog_op_header_t);
+ ASSERT(dp + sizeof(struct xlog_op_header) <= lp);
+
+ ohead = (struct xlog_op_header *)dp;
+ dp += sizeof(*ohead);
+
if (ohead->oh_clientid != XFS_TRANSACTION &&
ohead->oh_clientid != XFS_LOG) {
xfs_warn(log->l_mp, "%s: bad clientid 0x%x",
- __func__, ohead->oh_clientid);
+ __func__, ohead->oh_clientid);
ASSERT(0);
return -EIO;
}
- tid = be32_to_cpu(ohead->oh_tid);
- hash = XLOG_RHASH(tid);
- trans = xlog_recover_find_tid(&rhash[hash], tid);
- if (trans == NULL) { /* not found; add new tid */
- if (ohead->oh_flags & XLOG_START_TRANS)
- xlog_recover_new_tid(&rhash[hash], tid,
- be64_to_cpu(rhead->h_lsn));
- } else {
- if (dp + be32_to_cpu(ohead->oh_len) > lp) {
- xfs_warn(log->l_mp, "%s: bad length 0x%x",
- __func__, be32_to_cpu(ohead->oh_len));
- WARN_ON(1);
- return -EIO;
- }
- flags = ohead->oh_flags & ~XLOG_END_TRANS;
- if (flags & XLOG_WAS_CONT_TRANS)
- flags &= ~XLOG_CONTINUE_TRANS;
- switch (flags) {
- error = xlog_recover_commit_trans(log,
- trans, pass);
- break;
- error = xlog_recover_unmount_trans(log);
- break;
- error = xlog_recover_add_to_cont_trans(log,
- trans, dp,
- be32_to_cpu(ohead->oh_len));
- break;
- xfs_warn(log->l_mp, "%s: bad transaction",
- __func__);
- ASSERT(0);
- error = -EIO;
- break;
- error = xlog_recover_add_to_trans(log, trans,
- dp, be32_to_cpu(ohead->oh_len));
- break;
- xfs_warn(log->l_mp, "%s: bad flag 0x%x",
- __func__, flags);
- ASSERT(0);
- error = -EIO;
- break;
- }
- if (error) {
- xlog_recover_free_trans(trans);
- return error;
- }
- }
+
+ error = xlog_recovery_process_ophdr(log, rhash, rhead, ohead,
+ dp, lp, pass);
+ if (error)
+ return error;
+
dp += be32_to_cpu(ohead->oh_len);
num_logops--;
}
--
2.0.0
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs

Dave Chinner

2014-08-26 22:34:07 UTC

Permalink

Post by Brian Foster

/* we should find a trans for anything other than a start op */
trans = xlog_recover_find_tid(&rhash[hash], tid);
if (((ohead->oh_flags & XLOG_START_TRANS) && trans) ||
(!(ohead->oh_flags & XLOG_START_TRANS) && !trans)) {
xfs_warn(log->l_mp, "%s: bad transaction 0x%x oh_flags 0x%x trans %p",
__func__, tid, ohead->oh_flags, trans);
ASSERT(0);
return -EIO;
}
Maybe returning error here is not the right thing to do because we want
the recovery to proceed. We could still dump a warning and return 0
though.

Urk. Try understanding why that logic exists in a couple of years
time when you've forgetten all the context. :/

Post by Brian Foster

xlog_recover_new_tid(&rhash[hash], tid, be64_to_cpu(rhead->h_lsn)
error = 0;
break;

I like the idea, but I don't like the suggested implementation. I
was in two minds as to whether I should factor
xlog_recover_find_tid() further. There's only one caller of it and
only one caller of xlog_recover_new_tid() and the happen within
three lines of each other. Hence I'm thinking that it makes more
sense to wrap the "find or allocate trans" code in a single helper
and lift all that logic clean out of this function. That helper can
handle all the XLOG_START_TRANS logic more cleanly, I think....

Actually, that makes the factoring I've already done a little
inconsistent. Let me rework this a bit.

Cheers,

Dave.

--
Dave Chinner
***@fromorbit.com

Brian Foster

2014-08-27 11:19:10 UTC

Permalink

Post by Dave Chinner

Post by Brian Foster

/* we should find a trans for anything other than a start op */
trans = xlog_recover_find_tid(&rhash[hash], tid);
if (((ohead->oh_flags & XLOG_START_TRANS) && trans) ||
(!(ohead->oh_flags & XLOG_START_TRANS) && !trans)) {
xfs_warn(log->l_mp, "%s: bad transaction 0x%x oh_flags 0x%x trans %p",
__func__, tid, ohead->oh_flags, trans);
ASSERT(0);
return -EIO;
}
Maybe returning error here is not the right thing to do because we want
the recovery to proceed. We could still dump a warning and return 0
though.

Urk. Try understanding why that logic exists in a couple of years
time when you've forgetten all the context. :/

Heh, that's the problem I had with the current code. The error checking
and state machine management is split between here and below. The above
is just an error check, and fwiw, it adds an extra check that doesn't
exist in the current code. Hide the flag busy-ness and effectively the
logic is:

/* verify a tx is in progress or we're starting a new one */
if (trans && is_start_header(ohead) ||
!trans && !is_start_header(ohead))
return -EIO;

... which seems straightforward to me, but I'm sure there are other ways
to refactor things as well.

Post by Dave Chinner

Post by Brian Foster

xlog_recover_new_tid(&rhash[hash], tid, be64_to_cpu(rhead->h_lsn)
error = 0;
break;

Fair enough, that sounds reasonable so long as it isn't pulling the core
state management off into disparate functions. What I like about the
above (combined with the result of the rest of this series) is that the
lifecycle is obvious and contained in a single block of code.

Brian

Post by Dave Chinner
Actually, that makes the factoring I've already done a little
inconsistent. Let me rework this a bit.
Cheers,
Dave.
--
Dave Chinner

Dave Chinner

2014-08-28 00:47:17 UTC

Permalink

Post by Brian Foster

Post by Dave Chinner

Post by Brian Foster

xlog_recover_new_tid(&rhash[hash], tid, be64_to_cpu(rhead->h_lsn)
error = 0;
break;

Well, it's not "state" as in "state machine". What we are doing is
decoding ophdrs, not walking through a state machine, and so I think
that the "start" ophdrs need to be treated differently because all
the other types of ophdrs have a dependency on the trans structure
existing.

Indeed, I suspect the correct thing to do is check for the start
flag *first*, before we do the lookup to find the trans structure,
because the start flag implies that we should not find an existing
trans structure for that tid. And once we've done that, we're
completely finished processing that ophdr, and hence should
return and not run any of the other code we run for all the
remaining ophdrs.

Cheers,

Dave.

--
Dave Chinner
***@fromorbit.com

Dave Chinner

2014-08-26 01:21:41 UTC

Permalink

From: Dave Chinner <***@redhat.com>

The code for managing transactions anf the items for recovery is
spread across 3 different locations in the file. Move them all
together so that it is easy to read the code without needing to jump
long distances in the file.

Signed-off-by: Dave Chinner <***@redhat.com>
---
fs/xfs/xfs_log_recover.c | 358 +++++++++++++++++++++++------------------------
1 file changed, 179 insertions(+), 179 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 23895d5..b36d96a 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1441,160 +1441,6 @@ xlog_clear_stale_blocks(
******************************************************************************
*/

-STATIC xlog_recover_t *
-xlog_recover_find_tid(
- struct hlist_head *head,
- xlog_tid_t tid)
-{
- xlog_recover_t *trans;
-
- hlist_for_each_entry(trans, head, r_list) {
- if (trans->r_log_tid == tid)
- return trans;
- }
- return NULL;
-}
-
-STATIC void
-xlog_recover_new_tid(
- struct hlist_head *head,
- xlog_tid_t tid,
- xfs_lsn_t lsn)
-{
- xlog_recover_t *trans;
-
- trans = kmem_zalloc(sizeof(xlog_recover_t), KM_SLEEP);
- trans->r_log_tid = tid;
- trans->r_lsn = lsn;
- INIT_LIST_HEAD(&trans->r_itemq);
-
- INIT_HLIST_NODE(&trans->r_list);
- hlist_add_head(&trans->r_list, head);
-}
-
-STATIC void
-xlog_recover_add_item(
- struct list_head *head)
-{
- xlog_recover_item_t *item;
-
- item = kmem_zalloc(sizeof(xlog_recover_item_t), KM_SLEEP);
- INIT_LIST_HEAD(&item->ri_list);
- list_add_tail(&item->ri_list, head);
-}
-
-STATIC int
-xlog_recover_add_to_cont_trans(
- struct xlog *log,
- struct xlog_recover *trans,
- xfs_caddr_t dp,
- int len)
-{
- xlog_recover_item_t *item;
- xfs_caddr_t ptr, old_ptr;
- int old_len;
-
- if (list_empty(&trans->r_itemq)) {
- /* finish copying rest of trans header */
- xlog_recover_add_item(&trans->r_itemq);
- ptr = (xfs_caddr_t) &trans->r_theader +
- sizeof(xfs_trans_header_t) - len;
- memcpy(ptr, dp, len); /* d, s, l */
- return 0;
- }
- /* take the tail entry */
- item = list_entry(trans->r_itemq.prev, xlog_recover_item_t, ri_list);
-
- old_ptr = item->ri_buf[item->ri_cnt-1].i_addr;
- old_len = item->ri_buf[item->ri_cnt-1].i_len;
-
- ptr = kmem_realloc(old_ptr, len+old_len, old_len, KM_SLEEP);
- memcpy(&ptr[old_len], dp, len); /* d, s, l */
- item->ri_buf[item->ri_cnt-1].i_len += len;
- item->ri_buf[item->ri_cnt-1].i_addr = ptr;
- trace_xfs_log_recover_item_add_cont(log, trans, item, 0);
- return 0;
-}
-
-/*
- * The next region to add is the start of a new region. It could be
- * a whole region or it could be the first part of a new region. Because
- * of this, the assumption here is that the type and size fields of all
- * format structures fit into the first 32 bits of the structure.
- *
- * This works because all regions must be 32 bit aligned. Therefore, we
- * either have both fields or we have neither field. In the case we have
- * neither field, the data part of the region is zero length. We only have
- * a log_op_header and can throw away the header since a new one will appear
- * later. If we have at least 4 bytes, then we can determine how many regions
- * will appear in the current log item.
- */
-STATIC int
-xlog_recover_add_to_trans(
- struct xlog *log,
- struct xlog_recover *trans,
- xfs_caddr_t dp,
- int len)
-{
- xfs_inode_log_format_t *in_f; /* any will do */
- xlog_recover_item_t *item;
- xfs_caddr_t ptr;
-
- if (!len)
- return 0;
- if (list_empty(&trans->r_itemq)) {
- /* we need to catch log corruptions here */
- if (*(uint *)dp != XFS_TRANS_HEADER_MAGIC) {
- xfs_warn(log->l_mp, "%s: bad header magic number",
- __func__);
- ASSERT(0);
- return -EIO;
- }
- if (len == sizeof(xfs_trans_header_t))
- xlog_recover_add_item(&trans->r_itemq);
- memcpy(&trans->r_theader, dp, len); /* d, s, l */
- return 0;
- }
-
- ptr = kmem_alloc(len, KM_SLEEP);
- memcpy(ptr, dp, len);
- in_f = (xfs_inode_log_format_t *)ptr;
-
- /* take the tail entry */
- item = list_entry(trans->r_itemq.prev, xlog_recover_item_t, ri_list);
- if (item->ri_total != 0 &&
- item->ri_total == item->ri_cnt) {
- /* tail item is in use, get a new one */
- xlog_recover_add_item(&trans->r_itemq);
- item = list_entry(trans->r_itemq.prev,
- xlog_recover_item_t, ri_list);
- }
-
- if (item->ri_total == 0) { /* first region to be added */
- if (in_f->ilf_size == 0 ||
- in_f->ilf_size > XLOG_MAX_REGIONS_IN_ITEM) {
- xfs_warn(log->l_mp,
- "bad number of regions (%d) in inode log format",
- in_f->ilf_size);
- ASSERT(0);
- kmem_free(ptr);
- return -EIO;
- }
-
- item->ri_total = in_f->ilf_size;
- item->ri_buf =
- kmem_zalloc(item->ri_total * sizeof(xfs_log_iovec_t),
- KM_SLEEP);
- }
- ASSERT(item->ri_total > item->ri_cnt);
- /* Description region is ri_buf[0] */
- item->ri_buf[item->ri_cnt].i_addr = ptr;
- item->ri_buf[item->ri_cnt].i_len = len;
- item->ri_cnt++;
- trace_xfs_log_recover_item_add(log, trans, item, 0);
- return 0;
-}
-
/*
* Sort the log items in the transaction.
*
@@ -3250,31 +3096,6 @@ xlog_recover_do_icreate_pass2(
return 0;
}

-/*
- * Free up any resources allocated by the transaction
- *
- * Remember that EFIs, EFDs, and IUNLINKs are handled later.
- */
-STATIC void
-xlog_recover_free_trans(
- struct xlog_recover *trans)
-{
- xlog_recover_item_t *item, *n;
- int i;
-
- list_for_each_entry_safe(item, n, &trans->r_itemq, ri_list) {
- /* Free the regions in the item. */
- list_del(&item->ri_list);
- for (i = 0; i < item->ri_cnt; i++)
- kmem_free(item->ri_buf[i].i_addr);
- /* Free the item itself */
- kmem_free(item->ri_buf);
- kmem_free(item);
- }
- /* Free the transaction recover structure */
- kmem_free(trans);
-}
-
STATIC void
xlog_recover_buffer_ra_pass2(
struct xlog *log,
@@ -3528,6 +3349,185 @@ out:
return error ? error : error2;
}

+
+STATIC xlog_recover_t *
+xlog_recover_find_tid(
+ struct hlist_head *head,
+ xlog_tid_t tid)
+{
+ xlog_recover_t *trans;
+
+ hlist_for_each_entry(trans, head, r_list) {
+ if (trans->r_log_tid == tid)
+ return trans;
+ }
+ return NULL;
+}
+
+STATIC void
+xlog_recover_new_tid(
+ struct hlist_head *head,
+ xlog_tid_t tid,
+ xfs_lsn_t lsn)
+{
+ xlog_recover_t *trans;
+
+ trans = kmem_zalloc(sizeof(xlog_recover_t), KM_SLEEP);
+ trans->r_log_tid = tid;
+ trans->r_lsn = lsn;
+ INIT_LIST_HEAD(&trans->r_itemq);
+
+ INIT_HLIST_NODE(&trans->r_list);
+ hlist_add_head(&trans->r_list, head);
+}
+
+STATIC void
+xlog_recover_add_item(
+ struct list_head *head)
+{
+ xlog_recover_item_t *item;
+
+ item = kmem_zalloc(sizeof(xlog_recover_item_t), KM_SLEEP);
+ INIT_LIST_HEAD(&item->ri_list);
+ list_add_tail(&item->ri_list, head);
+}
+
+STATIC int
+xlog_recover_add_to_cont_trans(
+ struct xlog *log,
+ struct xlog_recover *trans,
+ xfs_caddr_t dp,
+ int len)
+{
+ xlog_recover_item_t *item;
+ xfs_caddr_t ptr, old_ptr;
+ int old_len;
+
+ if (list_empty(&trans->r_itemq)) {
+ /* finish copying rest of trans header */
+ xlog_recover_add_item(&trans->r_itemq);
+ ptr = (xfs_caddr_t) &trans->r_theader +
+ sizeof(xfs_trans_header_t) - len;
+ memcpy(ptr, dp, len);
+ return 0;
+ }
+ /* take the tail entry */
+ item = list_entry(trans->r_itemq.prev, xlog_recover_item_t, ri_list);
+
+ old_ptr = item->ri_buf[item->ri_cnt-1].i_addr;
+ old_len = item->ri_buf[item->ri_cnt-1].i_len;
+
+ ptr = kmem_realloc(old_ptr, len+old_len, old_len, KM_SLEEP);
+ memcpy(&ptr[old_len], dp, len);
+ item->ri_buf[item->ri_cnt-1].i_len += len;
+ item->ri_buf[item->ri_cnt-1].i_addr = ptr;
+ trace_xfs_log_recover_item_add_cont(log, trans, item, 0);
+ return 0;
+}
+
+/*
+ * The next region to add is the start of a new region. It could be
+ * a whole region or it could be the first part of a new region. Because
+ * of this, the assumption here is that the type and size fields of all
+ * format structures fit into the first 32 bits of the structure.
+ *
+ * This works because all regions must be 32 bit aligned. Therefore, we
+ * either have both fields or we have neither field. In the case we have
+ * neither field, the data part of the region is zero length. We only have
+ * a log_op_header and can throw away the header since a new one will appear
+ * later. If we have at least 4 bytes, then we can determine how many regions
+ * will appear in the current log item.
+ */
+STATIC int
+xlog_recover_add_to_trans(
+ struct xlog *log,
+ struct xlog_recover *trans,
+ xfs_caddr_t dp,
+ int len)
+{
+ xfs_inode_log_format_t *in_f; /* any will do */
+ xlog_recover_item_t *item;
+ xfs_caddr_t ptr;
+
+ if (!len)
+ return 0;
+ if (list_empty(&trans->r_itemq)) {
+ /* we need to catch log corruptions here */
+ if (*(uint *)dp != XFS_TRANS_HEADER_MAGIC) {
+ xfs_warn(log->l_mp, "%s: bad header magic number",
+ __func__);
+ ASSERT(0);
+ return -EIO;
+ }
+ if (len == sizeof(xfs_trans_header_t))
+ xlog_recover_add_item(&trans->r_itemq);
+ memcpy(&trans->r_theader, dp, len);
+ return 0;
+ }
+
+ ptr = kmem_alloc(len, KM_SLEEP);
+ memcpy(ptr, dp, len);
+ in_f = (xfs_inode_log_format_t *)ptr;
+
+ /* take the tail entry */
+ item = list_entry(trans->r_itemq.prev, xlog_recover_item_t, ri_list);
+ if (item->ri_total != 0 &&
+ item->ri_total == item->ri_cnt) {
+ /* tail item is in use, get a new one */
+ xlog_recover_add_item(&trans->r_itemq);
+ item = list_entry(trans->r_itemq.prev,
+ xlog_recover_item_t, ri_list);
+ }
+
+ if (item->ri_total == 0) { /* first region to be added */
+ if (in_f->ilf_size == 0 ||
+ in_f->ilf_size > XLOG_MAX_REGIONS_IN_ITEM) {
+ xfs_warn(log->l_mp,
+ "bad number of regions (%d) in inode log format",
+ in_f->ilf_size);
+ ASSERT(0);
+ kmem_free(ptr);
+ return -EIO;
+ }
+
+ item->ri_total = in_f->ilf_size;
+ item->ri_buf =
+ kmem_zalloc(item->ri_total * sizeof(xfs_log_iovec_t),
+ KM_SLEEP);
+ }
+ ASSERT(item->ri_total > item->ri_cnt);
+ /* Description region is ri_buf[0] */
+ item->ri_buf[item->ri_cnt].i_addr = ptr;
+ item->ri_buf[item->ri_cnt].i_len = len;
+ item->ri_cnt++;
+ trace_xfs_log_recover_item_add(log, trans, item, 0);
+ return 0;
+}
+/*
+ * Free up any resources allocated by the transaction
+ *
+ * Remember that EFIs, EFDs, and IUNLINKs are handled later.
+ */
+STATIC void
+xlog_recover_free_trans(
+ struct xlog_recover *trans)
+{
+ xlog_recover_item_t *item, *n;
+ int i;
+
+ list_for_each_entry_safe(item, n, &trans->r_itemq, ri_list) {
+ /* Free the regions in the item. */
+ list_del(&item->ri_list);
+ for (i = 0; i < item->ri_cnt; i++)
+ kmem_free(item->ri_buf[i].i_addr);
+ /* Free the item itself */
+ kmem_free(item->ri_buf);
+ kmem_free(item);
+ }
+ /* Free the transaction recover structure */
+ kmem_free(trans);
+}
+
STATIC int
xlog_recovery_process_ophdr(
struct xlog *log,

--
2.0.0

Dave Chinner

2014-08-26 01:21:40 UTC

Permalink

From: Dave Chinner <***@redhat.com>

When an error occurs during buffer submission in
xlog_recover_commit_trans(), we free the trans structure twice. Fix
it by only freeing the structure in the caller regardless of the
success or failure of the function.

Signed-off-by: Dave Chinner <***@redhat.com>
---
fs/xfs/xfs_log_recover.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 460cf98..23895d5 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3524,8 +3524,6 @@ out:
if (!list_empty(&done_list))
list_splice_init(&done_list, &trans->r_itemq);

- xlog_recover_free_trans(trans);
-
error2 = xfs_buf_delwri_submit(&buffer_list);
return error ? error : error2;
}
@@ -3571,6 +3569,11 @@ xlog_recovery_process_ophdr(
if (flags & XLOG_WAS_CONT_TRANS)
flags &= ~XLOG_CONTINUE_TRANS;

+ /*
+ * Callees must not free the trans structure. We own it, so we'll decide
+ * if we need to free it or not based on the operation being done and
+ * it's result.
+ */
switch (flags) {
/* expected flag values */
case 0:
@@ -3582,7 +3585,8 @@ xlog_recovery_process_ophdr(
break;
case XLOG_COMMIT_TRANS:
error = xlog_recover_commit_trans(log, trans, pass);
- break;
+ xlog_recover_free_trans(trans);
+ return error;

/* unexpected flag values */
case XLOG_UNMOUNT_TRANS:

--
2.0.0

Brian Foster

2014-08-26 12:42:31 UTC

Permalink

Post by Dave Chinner
When an error occurs during buffer submission in
xlog_recover_commit_trans(), we free the trans structure twice. Fix
it by only freeing the structure in the caller regardless of the
success or failure of the function.
---
fs/xfs/xfs_log_recover.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 460cf98..23895d5 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
if (!list_empty(&done_list))
list_splice_init(&done_list, &trans->r_itemq);
- xlog_recover_free_trans(trans);
-
error2 = xfs_buf_delwri_submit(&buffer_list);
return error ? error : error2;
}
@@ -3571,6 +3569,11 @@ xlog_recovery_process_ophdr(
if (flags & XLOG_WAS_CONT_TRANS)
flags &= ~XLOG_CONTINUE_TRANS;
+ /*
+ * Callees must not free the trans structure. We own it, so we'll decide
+ * if we need to free it or not based on the operation being done and
+ * it's result.

its

Post by Dave Chinner
+ */
switch (flags) {
/* expected flag values */
@@ -3582,7 +3585,8 @@ xlog_recovery_process_ophdr(
break;
error = xlog_recover_commit_trans(log, trans, pass);
- break;
+ xlog_recover_free_trans(trans);
+ return error;
/* unexpected flag values */
--
2.0.0
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs

Brian Foster

2014-08-26 12:40:55 UTC

Permalink

Post by Dave Chinner
Hi folks,
The log recovery use-after-free that Brian posted a patch for has
had several previous attempts sent and none have completed review.
So let's put this one to bed for good.
This patch set addresses the previous review feedback for fixing
this problem. It factors xlog_recover_process_data() to make it
cleaner and isolate the context of the transaction recvoery
structure that is causing problems. It fixes a leak of the structure
in an error condition that I noticed while factoring, as well as the
double free that Brian and others have identified and tried to fix
in the past. It then re-arranges the recovery item management code
to put it all in one place, rather than in 3 separate parts of the
file separated by several hundred lines of unrelated code.
Comments?

This looks pretty good to me. I like the cleanups and it fixes the
problem. A few minor comments/questions to follow.

Brian

Post by Dave Chinner
-Dave.
_______________________________________________
xfs mailing list
http://oss.sgi.com/mailman/listinfo/xfs