Discussion:
Hello, I have a question about XFS File System
Yongmin
2014-03-06 09:15:27 UTC
Permalink
Hello.

My name is Yongmin Park and I am a graduated student in Ajou University (Korea).
My research area is Digital Forensics.
And this time i tried to understand the structure of XFS file system, because XFS is one of the famous huge file system in these days.

I already founded and read 'XFS Filesystem Structure 2nd Edition Revision 1' on the Internet, which was written by Silicon Graphics Inc in 2006 and it is really well written to understand.

But the concentrated part of mine is "Deleted File Recovery", so the Journaling part is really important for me,, but regretfully there are no specific guide line about Journaling part...
Also next version(maybe the 3re Edition) is not exsist for more than a 5 years.

So is there no guide line for journaling part in XFS?
How can i get them,, have I to buy them? or Is Analysing Source Cord only way to study?

Thank you for your concentration.

=======================
from Yongmin Park
=======================
Stan Hoeppner
2014-03-06 20:30:57 UTC
Permalink
Post by Yongmin
Hello.
My name is Yongmin Park and I am a graduated student in Ajou
University (Korea). My research area is Digital Forensics. And this
time i tried to understand the structure of XFS file system, because
XFS is one of the famous huge file system in these days.
I already founded and read 'XFS Filesystem Structure 2nd Edition
Revision 1' on the Internet, which was written by Silicon Graphics
Inc in 2006 and it is really well written to understand.
But the concentrated part of mine is "Deleted File Recovery", so the
Journaling part is really important for me,, but regretfully there
are no specific guide line about Journaling part... Also next
version(maybe the 3re Edition) is not exsist for more than a 5
years.
So is there no guide line for journaling part in XFS? How can i get
them,, have I to buy them? or Is Analysing Source Cord only way to
study?
The journal only contains in flight transactional metadata for recovery
purposes after a system crash or power loss, to prevent filesystem, i.e.
metadata, corruption. The journal does not contain file data. During
normal operation, once the metadata has been written into an allocation
group the transactional entry in the journal is removed. Thus,
recovering deleted files has nothing to do with the journal.

This may be helpful:
http://xfs.org/index.php/XFS_FAQ#Q:_Does_the_filesystem_have_an_undelete_capability.3F
--
Stan
Stan Hoeppner
2014-03-07 22:19:44 UTC
Permalink
Please reply to the mailing list as well as the individual.

Note that you stated:

'...the concentrated part of mine is "Deleted File Recovery"'
Yes! there are no actual file data in journaling part.
BUT, by analyzing journaling part, we can get a Inode Core Information which was deleted.
In Inode Core, there are many information about the actual data, i.e. start address, file length etc.
Analyzing the journal code may inform you about structures, but it won't
inform you about on disk locations of the structures and how to find
them. If a file has been deleted, no information about that is going to
exist in the journal for more than a few seconds before the transaction
is committed and the entry removed from the journal.
By using those information, Recovering delete file can be done.
So the analysis of Journaling part is absolutely needed.
I disagree. Again, the journal log is unrelated to "deleted file
recovery" in a forensics scenario.

I think Dave and Jeff both missed the fact that you're interested only
in deleted file recovery, not in learning how the journal works for the
sake of learning how the journal works.
=======================
from Yongmin Park
=======================
Post by Stan Hoeppner
Post by Yongmin
Hello.
My name is Yongmin Park and I am a graduated student in Ajou
University (Korea). My research area is Digital Forensics. And this
time i tried to understand the structure of XFS file system, because
XFS is one of the famous huge file system in these days.
I already founded and read 'XFS Filesystem Structure 2nd Edition
Revision 1' on the Internet, which was written by Silicon Graphics
Inc in 2006 and it is really well written to understand.
But the concentrated part of mine is "Deleted File Recovery", so the
Journaling part is really important for me,, but regretfully there
are no specific guide line about Journaling part... Also next
version(maybe the 3re Edition) is not exsist for more than a 5
years.
So is there no guide line for journaling part in XFS? How can i get
them,, have I to buy them? or Is Analysing Source Cord only way to
study?
The journal only contains in flight transactional metadata for recovery
purposes after a system crash or power loss, to prevent filesystem, i.e.
metadata, corruption. The journal does not contain file data. During
normal operation, once the metadata has been written into an allocation
group the transactional entry in the journal is removed. Thus,
recovering deleted files has nothing to do with the journal.
http://xfs.org/index.php/XFS_FAQ#Q:_Does_the_filesystem_have_an_undelete_capability.3F
--
Stan
Shaun Gosse
2014-03-07 22:40:31 UTC
Permalink
Stan,

If I understand what you're saying here correctly, it sounds like there would still be a very tiny window where the journal could be relevant, those "few seconds" before it's committed as you said. So it would be a rather small corner case, but there might be some use. And I think it was already stated to be an academic project...

This does makes me curious in turn about how difficult it would be to recover journal entries. At a guess, if a person knows the structure and it hasn't been overwritten, it'll still be there? Or is it automatically overwritten/zero'd when the entry is removed from the journal, perhaps as the very mechanism of removal? And presumably this window, if any, would also be rather small assuming an active filesystem (and an inactive one presumably irrelevant...unless, perhaps, it was one where the last action, arbitrarily long ago, was a critical delete operation...).

Cheers,
-Shaun

-----Original Message-----
From: xfs-***@oss.sgi.com [mailto:xfs-***@oss.sgi.com] On Behalf Of Stan Hoeppner
Sent: Friday, March 07, 2014 4:20 PM
To: Yongmin; ***@oss.sgi.com
Subject: Re: Hello, I have a question about XFS File System

Please reply to the mailing list as well as the individual.

Note that you stated:

'...the concentrated part of mine is "Deleted File Recovery"'
Yes! there are no actual file data in journaling part.
BUT, by analyzing journaling part, we can get a Inode Core Information which was deleted.
In Inode Core, there are many information about the actual data, i.e. start address, file length etc.
Analyzing the journal code may inform you about structures, but it won't inform you about on disk locations of the structures and how to find them. If a file has been deleted, no information about that is going to exist in the journal for more than a few seconds before the transaction is committed and the entry removed from the journal.
By using those information, Recovering delete file can be done.
So the analysis of Journaling part is absolutely needed.
I disagree. Again, the journal log is unrelated to "deleted file recovery" in a forensics scenario.

I think Dave and Jeff both missed the fact that you're interested only in deleted file recovery, not in learning how the journal works for the sake of learning how the journal works.
=======================
from Yongmin Park
=======================
Post by Stan Hoeppner
Post by Yongmin
Hello.
My name is Yongmin Park and I am a graduated student in Ajou
University (Korea). My research area is Digital Forensics. And this
time i tried to understand the structure of XFS file system, because
XFS is one of the famous huge file system in these days.
I already founded and read 'XFS Filesystem Structure 2nd Edition
Revision 1' on the Internet, which was written by Silicon Graphics
Inc in 2006 and it is really well written to understand.
But the concentrated part of mine is "Deleted File Recovery", so the
Journaling part is really important for me,, but regretfully there
are no specific guide line about Journaling part... Also next
version(maybe the 3re Edition) is not exsist for more than a 5
years.
So is there no guide line for journaling part in XFS? How can i get
them,, have I to buy them? or Is Analysing Source Cord only way to
study?
The journal only contains in flight transactional metadata for
recovery purposes after a system crash or power loss, to prevent filesystem, i.e.
metadata, corruption. The journal does not contain file data. During
normal operation, once the metadata has been written into an
allocation group the transactional entry in the journal is removed.
Thus, recovering deleted files has nothing to do with the journal.
http://xfs.org/index.php/XFS_FAQ#Q:_Does_the_filesystem_have_an_undel
ete_capability.3F
--
Stan

_______________________________________________
xfs mailing list
***@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
Stan Hoeppner
2014-03-08 02:22:09 UTC
Permalink
Post by Shaun Gosse
Stan,
If I understand what you're saying here correctly, it sounds like
there would still be a very tiny window where the journal could be
relevant, those "few seconds" before it's committed as you said. So
it would be a rather small corner case, but there might be some use.
And I think it was already stated to be an academic project...
It could be in the log for milliseconds, many minutes, hours, or even
days, or months, depending on the rate of metadata write activity. XFS
is still primarily for "large and lots". Most organizations using XFS
probably don't have idle journal logs, but very active ones.
Post by Shaun Gosse
This does makes me curious in turn about how difficult it would be to
recover journal entries. At a guess, if a person knows the structure
and it hasn't been overwritten, it'll still be there? Or is it
automatically overwritten/zero'd when the entry is removed from the
journal, perhaps as the very mechanism of removal? And presumably
this window, if any, would also be rather small assuming an active
filesystem (and an inactive one presumably irrelevant...unless,
perhaps, it was one where the last action, arbitrarily long ago, was
a critical delete operation...).
How often are forensics experts brought in within minutes, hours, or
days of an incident of such magnitude prompting them to be hired?
Forensics is typically performed long after the fact, in which case
there's almost zero chance any relevant information will be in the
filesystem journal.
--
Stan
Dave Chinner
2014-03-07 23:09:15 UTC
Permalink
Post by Stan Hoeppner
Please reply to the mailing list as well as the individual.
'...the concentrated part of mine is "Deleted File Recovery"'
Yes! there are no actual file data in journaling part.
BUT, by analyzing journaling part, we can get a Inode Core Information which was deleted.
In Inode Core, there are many information about the actual data, i.e. start address, file length etc.
Analyzing the journal code may inform you about structures, but it won't
inform you about on disk locations of the structures and how to find
them. If a file has been deleted, no information about that is going to
exist in the journal for more than a few seconds before the transaction
is committed and the entry removed from the journal.
Well, we don't actually "remove" information from the log. We update
pointers that indicate what the active region is, but we never
physically "remove" anything from it. IOWs, the information is in
the journal until it wraps around and is over written by new
checkpoints....
Post by Stan Hoeppner
By using those information, Recovering delete file can be done.
So the analysis of Journaling part is absolutely needed.
I disagree. Again, the journal log is unrelated to "deleted file
recovery" in a forensics scenario.
I think Dave and Jeff both missed the fact that you're interested only
in deleted file recovery, not in learning how the journal works for the
sake of learning how the journal works.
Oh, no, I saw it and didn't think it was worth commenting on. I
think it's a brain-dead concept trying to do undelete in the
filesystem. "recoverable delete" was a problem solved 30 years ago -
it's commonly known as a trash bin and you do it in userspace with a
wrapper around unlink that calls rename(2) instead. And then "empty
trashbin" is what does the unlink and permanently deletes the files.

Besides, from a conceptual point of view after-the-fact filesystem
based undelete is fundamentally flawed. i.e. the journal is a
write-ahead logging journal and so can only be used to roll the
filesystem state forwardi in time. Undelete requires having state
and data in the journal that allows the filesystem to be rolled
*backwards in time*. XFS simply does not record such information in
the log and so parsing the log to "undelete files by transaction
rollback" just doesn't work.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Greg Freemyer
2014-03-08 00:38:15 UTC
Permalink
Post by Dave Chinner
Post by Stan Hoeppner
Please reply to the mailing list as well as the individual.
'...the concentrated part of mine is "Deleted File Recovery"'
Yes! there are no actual file data in journaling part.
BUT, by analyzing journaling part, we can get a Inode Core Information which was deleted.
In Inode Core, there are many information about the actual data, i.e. start address, file length etc.
Analyzing the journal code may inform you about structures, but it won't
inform you about on disk locations of the structures and how to find
them. If a file has been deleted, no information about that is going to
exist in the journal for more than a few seconds before the transaction
is committed and the entry removed from the journal.
Well, we don't actually "remove" information from the log. We update
pointers that indicate what the active region is, but we never
physically "remove" anything from it. IOWs, the information is in
the journal until it wraps around and is over written by new
checkpoints....
Post by Stan Hoeppner
By using those information, Recovering delete file can be done.
So the analysis of Journaling part is absolutely needed.
I disagree. Again, the journal log is unrelated to "deleted file
recovery" in a forensics scenario.
I think Dave and Jeff both missed the fact that you're interested only
in deleted file recovery, not in learning how the journal works for the
sake of learning how the journal works.
Oh, no, I saw it and didn't think it was worth commenting on. I
think it's a brain-dead concept trying to do undelete in the
filesystem. "recoverable delete" was a problem solved 30 years ago -
it's commonly known as a trash bin and you do it in userspace with a
wrapper around unlink that calls rename(2) instead. And then "empty
trashbin" is what does the unlink and permanently deletes the files.
As a practicing forensicator, I can say "potentially recoverable
files" is a heavily used concept.

I don't know XFS on disk structure in detail, so I'll comment about
HFS+ and NTFS.

NTFS uses a simple linear on disk directory structure and inode (mft)
structure. Forensic tools look at existing directory structures that
have directory entries marked invalid and they also scan the
unallocated space for directory remnants that were left behind from a
delete operation. When either are found, they assume the data pointed
at is "potentially recoverable".

I think some tools also parse the NTFS journal, but it is a much less
useful mechanism.

Note that it is understood that the data pointed at by filesystem
remnants may have been overwritten by new data, so it is unreliable.

As a last resort, data carving tools equivalent to photorec are used.
Photorec depends on the files not being fragmented and also fails to
recover any filesystem metadata such as filenames, timestamps, etc.

Macs use HFS+ which uses a btree structure for the directory
structure. I'm not aware of any forensic tool that attempts to look
for invalid directory remnants with HFS+. Instead they go straight to
data carving. It turns out with HFS+ the files are often not
fragmented, so data carving is very successful, but you still don't
get any filesystem metadata.

I believe XFS is similar to HFS+ in this sense. Data carving would
likely have good results, but looking for filesystem metadata remnants
is a waste of time. If I'm right, there is not really any research
needed on the deleted file recovery side.

At the same time, I'm not familiar with forensic tool, free or
commercial, that parses the XFS filesystem for live files and
filesystem metadata such as timestamps. (I could be wrong, it is not
something I've had to research in the last few years.)
Post by Dave Chinner
Besides, from a conceptual point of view after-the-fact filesystem
based undelete is fundamentally flawed. i.e. the journal is a
write-ahead logging journal and so can only be used to roll the
filesystem state forwardi in time. Undelete requires having state
and data in the journal that allows the filesystem to be rolled
*backwards in time*. XFS simply does not record such information in
the log and so parsing the log to "undelete files by transaction
rollback" just doesn't work.
I suspect you are assuming the goal is "reliable undelete". The
typical goal is the identification of potentially recoverable files.
I don't know if parsing the XFS journal can help with that.

Greg
Dave Chinner
2014-03-09 00:28:19 UTC
Permalink
Post by Greg Freemyer
Post by Dave Chinner
Post by Stan Hoeppner
Please reply to the mailing list as well as the individual.
'...the concentrated part of mine is "Deleted File Recovery"'
Yes! there are no actual file data in journaling part.
BUT, by analyzing journaling part, we can get a Inode Core Information which was deleted.
In Inode Core, there are many information about the actual data, i.e. start address, file length etc.
Analyzing the journal code may inform you about structures, but it won't
inform you about on disk locations of the structures and how to find
them. If a file has been deleted, no information about that is going to
exist in the journal for more than a few seconds before the transaction
is committed and the entry removed from the journal.
Well, we don't actually "remove" information from the log. We update
pointers that indicate what the active region is, but we never
physically "remove" anything from it. IOWs, the information is in
the journal until it wraps around and is over written by new
checkpoints....
Post by Stan Hoeppner
By using those information, Recovering delete file can be done.
So the analysis of Journaling part is absolutely needed.
I disagree. Again, the journal log is unrelated to "deleted file
recovery" in a forensics scenario.
I think Dave and Jeff both missed the fact that you're interested only
in deleted file recovery, not in learning how the journal works for the
sake of learning how the journal works.
Oh, no, I saw it and didn't think it was worth commenting on. I
think it's a brain-dead concept trying to do undelete in the
filesystem. "recoverable delete" was a problem solved 30 years ago -
it's commonly known as a trash bin and you do it in userspace with a
wrapper around unlink that calls rename(2) instead. And then "empty
trashbin" is what does the unlink and permanently deletes the files.
As a practicing forensicator, I can say "potentially recoverable
files" is a heavily used concept.
"forensicator"? I'm so behind on my terminology. :/

Depsite not having some fancy title, I do forensic analysis of
filesystem corpses *every day*. I have to do this in far more
detail than a typical forensic analysis for data recovery or
security breach post-mortem purposes because I've got to find the
bug in the metadata that lead to the corruption or data loss in the
first place....
Post by Greg Freemyer
I don't know XFS on disk structure in detail, so I'll comment about
HFS+ and NTFS.
[snip]

XFS metadata is fully dynamic, so take the problems you have with
HFS btree directory structures and apply it to *every* metadata
structure in XFS.
Post by Greg Freemyer
As a last resort, data carving tools equivalent to photorec are used.
Photorec depends on the files not being fragmented and also fails to
recover any filesystem metadata such as filenames, timestamps, etc.
Well, yes - anyone can grep a disk image for digital signatures and
then hope the files have been stored contiguously.

More advanced users write utilities like xfs_irecover or xfsr that
walk the filesystem looking for inode magic numbers and manually
decode the inode metadata themselves to recover the data they
reference even when the files are not contiguous. These only work on
unconnected inodes, though, because unlinked inodes don't have any
metadata in them that is useful:

xfs_db> inode 1029
xfs_db> p
core.magic = 0x494e
core.mode = 0
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 0
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 0
core.gid = 0
core.flushiter = 2
core.atime.sec = Sun Mar 9 10:44:22 2014
core.atime.nsec = 821908000
core.mtime.sec = Sun Mar 9 10:44:22 2014
core.mtime.nsec = 821908000
core.ctime.sec = Sun Mar 9 10:45:02 2014
core.ctime.nsec = 273908000
core.size = 0
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 1
next_unlinked = null
u = (empty)
xfs_db> daddr 0x202
xfs_db> p
....
100: 494e0000 02020000 00000000 00000000 00000000 00000000 00000000 00000002
120: 531bab56 30fd5220 531bab56 30fd5220 531bab7e 10538120 00000000 00000000
140: 00000000 00000000 00000000 00000000 00000002 00000000 00000000 00000001
160: ffffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000
180: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

So, it has non-zero timestamps in it, but everything else is zeroed.
IOWs, you can't get any information about what it contained before
it was unlinked from it by looking at the unlinked inode on disk.

But that wasn't the original question being asked, though. ;)
Post by Greg Freemyer
At the same time, I'm not familiar with forensic tool, free or
commercial, that parses the XFS filesystem for live files and
filesystem metadata such as timestamps. (I could be wrong, it is not
something I've had to research in the last few years.)
Finding lost and/or unreferenced *live* metadata is what xfs_repair
does. I've mentioned other tools above that can also be used for
recovering stuff that xfs_repair won't find because it doesn't do a
magic number search of every filesystem block in the fs...
Post by Greg Freemyer
Post by Dave Chinner
Besides, from a conceptual point of view after-the-fact filesystem
based undelete is fundamentally flawed. i.e. the journal is a
write-ahead logging journal and so can only be used to roll the
filesystem state forwardi in time. Undelete requires having state
and data in the journal that allows the filesystem to be rolled
*backwards in time*. XFS simply does not record such information in
the log and so parsing the log to "undelete files by transaction
rollback" just doesn't work.
I suspect you are assuming the goal is "reliable undelete". The
No at all.
Post by Greg Freemyer
typical goal is the identification of potentially recoverable files.
I don't know if parsing the XFS journal can help with that.
That's exactly the question I answered: journal parsing is mostly
useless for forensic analysis of XFS filesystems with respect to
recovery of deleted files.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Jay Ashworth
2014-03-10 17:53:42 UTC
Permalink
----- Original Message -----
Post by Dave Chinner
Post by Greg Freemyer
As a practicing forensicator, I can say "potentially recoverable
files" is a heavily used concept.
"forensicator"? I'm so behind on my terminology. :/
This fall, on HBO: David Duchovny in Califorensicator.

Cheers,
-- jra
--
Jay R. Ashworth Baylink ***@baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII
St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274
Stan Hoeppner
2014-03-08 02:08:56 UTC
Permalink
Post by Dave Chinner
Post by Stan Hoeppner
Please reply to the mailing list as well as the individual.
'...the concentrated part of mine is "Deleted File Recovery"'
Yes! there are no actual file data in journaling part.
BUT, by analyzing journaling part, we can get a Inode Core Information which was deleted.
In Inode Core, there are many information about the actual data, i.e. start address, file length etc.
Analyzing the journal code may inform you about structures, but it won't
inform you about on disk locations of the structures and how to find
them. If a file has been deleted, no information about that is going to
exist in the journal for more than a few seconds before the transaction
is committed and the entry removed from the journal.
Well, we don't actually "remove" information from the log. We update
pointers that indicate what the active region is, but we never
physically "remove" anything from it. IOWs, the information is in
the journal until it wraps around and is over written by new
checkpoints....
Quite right. I sacrificed some technical accuracy to drive home the
larger point, that the journal shouldn't be relied upon for forensic
retrieval of deleted files.
Post by Dave Chinner
Post by Stan Hoeppner
By using those information, Recovering delete file can be done.
So the analysis of Journaling part is absolutely needed.
I disagree. Again, the journal log is unrelated to "deleted file
recovery" in a forensics scenario.
I think Dave and Jeff both missed the fact that you're interested only
in deleted file recovery, not in learning how the journal works for the
sake of learning how the journal works.
Oh, no, I saw it and didn't think it was worth commenting on. I
think it's a brain-dead concept trying to do undelete in the
filesystem. "recoverable delete" was a problem solved 30 years ago -
it's commonly known as a trash bin and you do it in userspace with a
wrapper around unlink that calls rename(2) instead. And then "empty
trashbin" is what does the unlink and permanently deletes the files.
Besides, from a conceptual point of view after-the-fact filesystem
based undelete is fundamentally flawed. i.e. the journal is a
write-ahead logging journal and so can only be used to roll the
filesystem state forwardi in time. Undelete requires having state
and data in the journal that allows the filesystem to be rolled
*backwards in time*. XFS simply does not record such information in
the log and so parsing the log to "undelete files by transaction
rollback" just doesn't work.
Sometimes context gets lost. In his first paragraph he stated he's a
graduate student and his research area is digital forensics. So the
discussion about "deleted file recovery" needs to be in the forensics
context. As you explain above, and as Greg Freemyer pointed out,
looking at filesystem metadata or journals for information that will
assist in the recovery of previously deleted files is usually not going
to be fruitful.
--
Stan
Eric Sandeen
2014-03-08 03:24:53 UTC
Permalink
Post by Stan Hoeppner
Post by Dave Chinner
Post by Stan Hoeppner
Please reply to the mailing list as well as the individual.
'...the concentrated part of mine is "Deleted File Recovery"'
Yes! there are no actual file data in journaling part.
BUT, by analyzing journaling part, we can get a Inode Core Information which was deleted.
In Inode Core, there are many information about the actual data, i.e. start address, file length etc.
Analyzing the journal code may inform you about structures, but it won't
inform you about on disk locations of the structures and how to find
them. If a file has been deleted, no information about that is going to
exist in the journal for more than a few seconds before the transaction
is committed and the entry removed from the journal.
Well, we don't actually "remove" information from the log. We update
pointers that indicate what the active region is, but we never
physically "remove" anything from it. IOWs, the information is in
the journal until it wraps around and is over written by new
checkpoints....
Quite right. I sacrificed some technical accuracy to drive home the
larger point, that the journal shouldn't be relied upon for forensic
retrieval of deleted files.
Post by Dave Chinner
Post by Stan Hoeppner
By using those information, Recovering delete file can be done.
So the analysis of Journaling part is absolutely needed.
I disagree. Again, the journal log is unrelated to "deleted file
recovery" in a forensics scenario.
I think Dave and Jeff both missed the fact that you're interested only
in deleted file recovery, not in learning how the journal works for the
sake of learning how the journal works.
Oh, no, I saw it and didn't think it was worth commenting on. I
think it's a brain-dead concept trying to do undelete in the
filesystem. "recoverable delete" was a problem solved 30 years ago -
it's commonly known as a trash bin and you do it in userspace with a
wrapper around unlink that calls rename(2) instead. And then "empty
trashbin" is what does the unlink and permanently deletes the files.
Besides, from a conceptual point of view after-the-fact filesystem
based undelete is fundamentally flawed. i.e. the journal is a
write-ahead logging journal and so can only be used to roll the
filesystem state forwardi in time. Undelete requires having state
and data in the journal that allows the filesystem to be rolled
*backwards in time*. XFS simply does not record such information in
the log and so parsing the log to "undelete files by transaction
rollback" just doesn't work.
Sometimes context gets lost. In his first paragraph he stated he's a
graduate student and his research area is digital forensics. So the
discussion about "deleted file recovery" needs to be in the forensics
context. As you explain above, and as Greg Freemyer pointed out,
looking at filesystem metadata or journals for information that will
assist in the recovery of previously deleted files is usually not going
to be fruitful.
Well, I think that's a good point. "Looking at the log to see if there
are any clues for manual recovery" is a very different problem from
"Use the log for automated, guaranteed successful undelete" :)

-Eric
Dave Chinner
2014-03-06 22:59:47 UTC
Permalink
Post by Yongmin
Hello.
My name is Yongmin Park and I am a graduated student in Ajou University (Korea).
My research area is Digital Forensics.
And this time i tried to understand the structure of XFS file system, because XFS is one of the famous huge file system in these days.
I already founded and read 'XFS Filesystem Structure 2nd Edition Revision 1' on the Internet, which was written by Silicon Graphics Inc in 2006 and it is really well written to understand.
But the concentrated part of mine is "Deleted File Recovery", so the Journaling part is really important for me,, but regretfully there are no specific guide line about Journaling part...
Also next version(maybe the 3re Edition) is not exsist for more than a 5 years.
So is there no guide line for journaling part in XFS?
How can i get them,, have I to buy them? or Is Analysing Source Cord only way to study?
There is some documentation about some of the logging concepts and
design. eg:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=blob;f=design/xfs-delayed-logging-design.asciidoc

But the only way to learn about the actual structure of the log is to
read the code and use xfs_logprint to study the contents of the log.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Jeff Liu
2014-03-07 02:23:38 UTC
Permalink
Post by Dave Chinner
Post by Yongmin
Hello.
My name is Yongmin Park and I am a graduated student in Ajou University (Korea).
My research area is Digital Forensics.
And this time i tried to understand the structure of XFS file system, because XFS is one of the famous huge file system in these days.
I already founded and read 'XFS Filesystem Structure 2nd Edition Revision 1' on the Internet, which was written by Silicon Graphics Inc in 2006 and it is really well written to understand.
But the concentrated part of mine is "Deleted File Recovery", so the Journaling part is really important for me,, but regretfully there are no specific guide line about Journaling part...
Also next version(maybe the 3re Edition) is not exsist for more than a 5 years.
So is there no guide line for journaling part in XFS?
How can i get them,, have I to buy them? or Is Analysing Source Cord only way to study?
There is some documentation about some of the logging concepts and
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=blob;f=design/xfs-delayed-logging-design.asciidoc
Not sure if someone else also think that XFS journal design is the stumbling-block
to get involved into the development...but I once heard of "I'm really confused by
the design of delayed logging, I have to give up after reading the document for about
2 or 3 weeks..." from 2 Chinese developers in the past year, though nothing can help
someone out without taking infinite patience.
Post by Dave Chinner
But the only way to learn about the actual structure of the log is to
read the code and use xfs_logprint to study the contents of the log.
To Yongmin,

For your information only.

I'm trying to understand XFS journal via the following steps:

1) Download Linux-2.6.34 source, read the journal code.
Understand the original design as there is no delayed-logging support at that time.

FYI, two obsoleted documents could be found at,
http://oss.sgi.com/projects/xfs/design_docs/xfsdocs93_pdf/log_mgr-overview.pdf
http://oss.sgi.com/projects/xfs/design_docs/xfsdocs93_pdf/log_mgr.pdf

2) Download Linux-2.6.35 source, read the journal code and delayed-logging-design doc as
per Dave's suggestion because we have this big change in this version.

3) Play with xfs_logprint with the XFS mainline source and read all those threads in XFS
mailing list which are related to journals in the past several years....

4) Nothing, just have fun. :)


Thanks,
-Jeff
Dave Chinner
2014-03-07 04:19:17 UTC
Permalink
Post by Jeff Liu
Post by Dave Chinner
Post by Yongmin
Hello.
My name is Yongmin Park and I am a graduated student in Ajou University (Korea).
My research area is Digital Forensics.
And this time i tried to understand the structure of XFS file system, because XFS is one of the famous huge file system in these days.
I already founded and read 'XFS Filesystem Structure 2nd Edition Revision 1' on the Internet, which was written by Silicon Graphics Inc in 2006 and it is really well written to understand.
But the concentrated part of mine is "Deleted File Recovery", so the Journaling part is really important for me,, but regretfully there are no specific guide line about Journaling part...
Also next version(maybe the 3re Edition) is not exsist for more than a 5 years.
So is there no guide line for journaling part in XFS?
How can i get them,, have I to buy them? or Is Analysing Source Cord only way to study?
There is some documentation about some of the logging concepts and
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=blob;f=design/xfs-delayed-logging-design.asciidoc
Not sure if someone else also think that XFS journal design is the stumbling-block
to get involved into the development...but I once heard of "I'm really confused by
the design of delayed logging, I have to give up after reading the document for about
2 or 3 weeks..." from 2 Chinese developers in the past year, though nothing can help
someone out without taking infinite patience.
Let me put it this way: it took me *5 years* of working deep in the
XFS code to really understand how the XFS transaction and
journalling subsystems are supposed to function. Delayed logging
took me 3 failed design attempts over 3 years before I had learnt
enough to come up with a design that worked. It's by far the most
complex part of XFS - expecting to understand how it works by
spending a couple of weeks reading the code is unrealistic.

Fundamentally, understanding delayed logging means you have to first
understand why relogging is necessary in the XFS journal. To
understand why relogging is necessary, you first need to understand
the transaction subsystem, the log space reservation subsystem, log
recovery constraints, how tail pushing works, the physical log
interface code, the on-disk log format, etc andhow they all
interact.

IOWs, delayed logging is the last thing in the journalling layer
that anyone should try to understand because understanding it fully
requires a high level of knoweldge about the XFS metadata and
logging subsystem architecture and fundamental principles....
Post by Jeff Liu
Post by Dave Chinner
But the only way to learn about the actual structure of the log is to
read the code and use xfs_logprint to study the contents of the log.
To Yongmin,
For your information only.
1) Download Linux-2.6.34 source, read the journal code.
Understand the original design as there is no delayed-logging support at that time.
Delayed logging changes neither the journal nor the transaction
layer code or design. If you can't understand the fundamental
principles behind those subsystems from the current code, then
looking at the older code won't make it any clearer because it is
exactly the same...
Post by Jeff Liu
FYI, two obsoleted documents could be found at,
http://oss.sgi.com/projects/xfs/design_docs/xfsdocs93_pdf/log_mgr-overview.pdf
http://oss.sgi.com/projects/xfs/design_docs/xfsdocs93_pdf/log_mgr.pdf
The first of those really doesn't contain much useful information.

The second really only documents the physical log format. That might
be useful as a first step, but it doesn't document any of the
alogorithms that the log uses, and that is where all the complexity
lies.

Reading code will only get you so far - the only way to continue the
learning process is by trying to modify the code and fixing what you
break, along with asking questions about things you don't understand
on the list so that people who do understand them can teach you
things that aren't obvious from the code and aren't documented
anywhere other than the code.

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Jeff Liu
2014-03-07 05:23:53 UTC
Permalink
<snip>
Post by Dave Chinner
Post by Jeff Liu
Post by Dave Chinner
There is some documentation about some of the logging concepts and
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=blob;f=design/xfs-delayed-logging-design.asciidoc
Not sure if someone else also think that XFS journal design is the stumbling-block
to get involved into the development...but I once heard of "I'm really confused by
the design of delayed logging, I have to give up after reading the document for about
2 or 3 weeks..." from 2 Chinese developers in the past year, though nothing can help
someone out without taking infinite patience.
Let me put it this way: it took me *5 years* of working deep in the
XFS code to really understand how the XFS transaction and
journalling subsystems are supposed to function. Delayed logging
took me 3 failed design attempts over 3 years before I had learnt
enough to come up with a design that worked. It's by far the most
complex part of XFS - expecting to understand how it works by
spending a couple of weeks reading the code is unrealistic.
Hah, this would help me relax a lot when I felt frustrating to understand
something in XFS :-P.
Post by Dave Chinner
Fundamentally, understanding delayed logging means you have to first
understand why relogging is necessary in the XFS journal. To
understand why relogging is necessary, you first need to understand
the transaction subsystem, the log space reservation subsystem, log
recovery constraints, how tail pushing works, the physical log
interface code, the on-disk log format, etc andhow they all
interact.
IOWs, delayed logging is the last thing in the journalling layer
that anyone should try to understand because understanding it fully
requires a high level of knoweldge about the XFS metadata and
logging subsystem architecture and fundamental principles....
Thanks for the nice guidance.
Post by Dave Chinner
Post by Jeff Liu
Post by Dave Chinner
But the only way to learn about the actual structure of the log is to
read the code and use xfs_logprint to study the contents of the log.
To Yongmin,
For your information only.
1) Download Linux-2.6.34 source, read the journal code.
Understand the original design as there is no delayed-logging support at that time.
Delayed logging changes neither the journal nor the transaction
layer code or design. If you can't understand the fundamental
principles behind those subsystems from the current code, then
looking at the older code won't make it any clearer because it is
exactly the same...
Post by Jeff Liu
FYI, two obsoleted documents could be found at,
http://oss.sgi.com/projects/xfs/design_docs/xfsdocs93_pdf/log_mgr-overview.pdf
http://oss.sgi.com/projects/xfs/design_docs/xfsdocs93_pdf/log_mgr.pdf
The first of those really doesn't contain much useful information.
The second really only documents the physical log format. That might
be useful as a first step, but it doesn't document any of the
alogorithms that the log uses, and that is where all the complexity
lies.
Actually, both documents are only a little useful to me when I began to
understand the semantics of in-core logs.
Post by Dave Chinner
Reading code will only get you so far - the only way to continue the
learning process is by trying to modify the code and fixing what you
break, along with asking questions about things you don't understand
on the list so that people who do understand them can teach you
things that aren't obvious from the code and aren't documented
anywhere other than the code.
Thanks,
-Jeff
Loading...