Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLFS/FUSE fails with fsx #326

Open
lionkov opened this issue Nov 5, 2013 · 14 comments
Open

PLFS/FUSE fails with fsx #326

lionkov opened this issue Nov 5, 2013 · 14 comments

Comments

@lionkov
Copy link

lionkov commented Nov 5, 2013

Running the File System Excersizer (http://codemonkey.org.uk/projects/fsx/ltp-fsx.c) on a PLFS/FUSE fails:

hop0:~/tmp/plfs$ fsx -P /tmp -W -R te
mapped writes DISABLED
truncating to largest ever: 0x32740
READ BAD DATA: offset = 0xa935, size = 0x85fd
OFFSET GOOD BAD RANGE
0x10f13 0x0d02 0x016b 0x 1ff8
operation# (mod 256) for the bad data may be 107
LOG DUMP (13 total operations):
1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE _**WWWW
2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) __WWWW
3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes)
4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes)
5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes)
6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes)
7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes)
8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes)
9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740
10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes)
11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND
12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a
13(13 mod 256): READ 0xa935 thru 0x12f31 (0x85fd bytes) _RRRR

Correct content saved for comparison
(maybe hexdump "te" vs "te.fsxgood")

@brettkettering
Copy link
Contributor

So, after searching for various strings I'm guessing that the failure was that it read bad data. This output isn't very intuitive to figure out what test failed, what the expected result was versus what result was obtained. Oh, how I wish people wrote some English explanation of what they're trying to do in their code rather than letting the source be the documentation. I don't enjoy playing compiler.

So, Lucho, what do you want us to do with this information?

Brett

From: lionkov <notifications@github.commailto:notifications@github.com>
Reply-To: plfs/plfs-core <reply@reply.github.commailto:reply@reply.github.com>
Date: Tuesday, November 5, 2013 2:53 PM
To: plfs/plfs-core <plfs-core@noreply.github.commailto:plfs-core@noreply.github.com>
Subject: [plfs-core] PLFS/FUSE fails with fsx (#326)

Running the File System Excersizer (http://codemonkey.org.uk/projects/fsx/ltp-fsx.c) on a PLFS/FUSE fails:

hop0:~/tmp/plfs$ fsx -P /tmp -W -R te
mapped writes DISABLED
truncating to largest ever: 0x32740
READ BAD DATA: offset = 0xa935, size = 0x85fd
OFFSET GOOD BAD RANGE
0x10f13 0x0d02 0x016b 0x 1ff8
operation# (mod 256) for the bad data may be 107
LOG DUMP (13 total operations):
1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE *_WWWW
2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) *_WWWW
3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes)
4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes)
5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes)
6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes)
7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes)
8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes)
9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740
10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes)
11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND
12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a
13(13 mod 256): READ 0xa935 thru 0x12f31 (0x85fd bytes) RRRR
Correct content saved for comparison
(maybe hexdump "te" vs "te.fsxgood")


Reply to this email directly or view it on GitHubhttps://github.com//issues/326.

@lionkov
Copy link
Author

lionkov commented Nov 5, 2013

Download and run fsx. Unlike some other tests, it produces both the "bad"
and "good" file contents, as well as a log of the file operations that
produced the error.

On Tue, Nov 5, 2013 at 3:13 PM, Brett Kettering notifications@github.comwrote:

So, after searching for various strings I'm guessing that the failure was
that it read bad data. This output isn't very intuitive to figure out what
test failed, what the expected result was versus what result was obtained.
Oh, how I wish people wrote some English explanation of what they're trying
to do in their code rather than letting the source be the documentation. I
don't enjoy playing compiler.

So, Lucho, what do you want us to do with this information?

Brett

From: lionkov <notifications@github.commailto:notifications@github.com>
Reply-To: plfs/plfs-core <reply@reply.github.com<mailto:
reply@reply.github.com>>
Date: Tuesday, November 5, 2013 2:53 PM
To: plfs/plfs-core <plfs-core@noreply.github.com<mailto:
plfs-core@noreply.github.com>>
Subject: [plfs-core] PLFS/FUSE fails with fsx (#326)

Running the File System Excersizer (
http://codemonkey.org.uk/projects/fsx/ltp-fsx.c) on a PLFS/FUSE fails:

hop0:~/tmp/plfs$ fsx -P /tmp -W -R te
mapped writes DISABLED
truncating to largest ever: 0x32740
READ BAD DATA: offset = 0xa935, size = 0x85fd
OFFSET GOOD BAD RANGE
0x10f13 0x0d02 0x016b 0x 1ff8
operation# (mod 256) for the bad data may be 107
LOG DUMP (13 total operations):
1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE *_WWWW
2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) *_WWWW
3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes)
4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes)
5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes)
6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes)
7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes)
8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes)
9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740
10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes)
11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND
12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a
13(13 mod 256): READ 0xa935 thru 0x12f31 (0x85fd bytes) RRRR
Correct content saved for comparison
(maybe hexdump "te" vs "te.fsxgood")


Reply to this email directly or view it on GitHub<
https://github.com/plfs/plfs-core/issues/326>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/326#issuecomment-27818346
.

@lionkov
Copy link
Author

lionkov commented Nov 5, 2013

Additional information:

Run on a single node, the underlying file system is ext4.

Configuration file .plfsrc:

  • global_params:

    threadpool_size: 64

    compress_contiguous: 0

  • mount_point: /home/lucho/tmp/plfs

    workload: shared_file

    backends:

    • location: /tmp/plfs_store

@thewacokid
Copy link
Contributor

Looks like a failure in truncate somewhere (most likely due to calling a partial truncate on a file that's open in read/write mode).

Do we care?

I get different results, btw, on my Macbook:

pn1245359:plfs_n1 dbonnie$ ./a.out -P /tmp -W -R test.file
mapped writes DISABLED
truncating to largest ever: 0x32740
Size error: expected 0x1977a stat 0x17000 seek 0x17000
LOG DUMP (12 total operations):
1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE
2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes)
3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes)
4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes)
5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes)
6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes)
7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes)
8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes)
9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740
10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes)
11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND
12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a
Correct content saved for comparison
(maybe hexdump "test.file" vs "test.file.fsxgood")

pn1245359:plfs_n1 dbonnie$ hexdump test.file
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0019770

pn1245359:plfs_n1 dbonnie$ hexdump /tmp//test.file.fsxgood
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0010f10 00 00 00 0d 02 6a 02 6b 02 17 02 df 02 79 02 e6
0010f20 02 4b 02 5e 02 4e 02 13 02 f0 02 93 02 2e 02 cd
0010f30 02 2a 02 fd 02 3c 02 4d 02 50 02 f2 02 6c 02 04
0010f40 02 fc 02 53 02 ff 02 6f 02 e1 02 43 02 34 02 2f
.
.
.
SNIPPED FOR BREVITY
.
.
.
0016af0 02 a0 02 f2 02 d7 02 82 02 71 02 21 02 31 02 91
0016b00 02 e9 02 9b 02 bd 02 04 02 a1 02 f5 02 fc 02 e2
0016b10 02 68 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0016b20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0019770

@thewacokid
Copy link
Contributor

The test passed completely with the "-L" flag to ignore truncate operations.

@brettkettering
Copy link
Contributor

We tell people to use O_RDONLY or O_WRONLY, but not O_RDWR. So, as long as we can open a file with a O_TRUNC and then write it, I think we're OK.

Brett

From: David Bonnie <notifications@github.commailto:notifications@github.com>
Reply-To: plfs/plfs-core <reply@reply.github.commailto:reply@reply.github.com>
Date: Wednesday, November 6, 2013 9:07 AM
To: plfs/plfs-core <plfs-core@noreply.github.commailto:plfs-core@noreply.github.com>
Cc: Brett Kettering <brettk@lanl.govmailto:brettk@lanl.gov>
Subject: Re: [plfs-core] PLFS/FUSE fails with fsx (#326)

Looks like a failure in truncate somewhere (most likely due to calling a partial truncate on a file that's open in read/write mode).

Do we care?

I get different results, btw, on my Macbook:

pn1245359:plfs_n1 dbonnie$ ./a.out -P /tmp -W -R test.file
mapped writes DISABLED
truncating to largest ever: 0x32740
Size error: expected 0x1977a stat 0x17000 seek 0x17000
LOG DUMP (12 total operations):
1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE
2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes)
3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes)
4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes)
5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes)
6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes)
7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes)
8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes)
9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740
10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes)
11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND
12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a
Correct content saved for comparison
(maybe hexdump "test.file" vs "test.file.fsxgood")

pn1245359:plfs_n1 dbonnie$ hexdump test.file
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0019770

pn1245359:plfs_n1 dbonnie$ hexdump /tmp//test.file.fsxgood
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0010f10 00 00 00 0d 02 6a 02 6b 02 17 02 df 02 79 02 e6
0010f20 02 4b 02 5e 02 4e 02 13 02 f0 02 93 02 2e 02 cd
0010f30 02 2a 02 fd 02 3c 02 4d 02 50 02 f2 02 6c 02 04
0010f40 02 fc 02 53 02 ff 02 6f 02 e1 02 43 02 34 02 2f
.
.
.
SNIPPED FOR BREVITY
.
.
.
0016af0 02 a0 02 f2 02 d7 02 82 02 71 02 21 02 31 02 91
0016b00 02 e9 02 9b 02 bd 02 04 02 a1 02 f5 02 fc 02 e2
0016b10 02 68 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0016b20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0019770


Reply to this email directly or view it on GitHubhttps://github.com//issues/326#issuecomment-27886387.

@dshrader
Copy link
Contributor

dshrader commented Nov 6, 2013

We already have tests that test O_TRUNC, don't we? I'll check with Alfred. If we do, then I'm not sure what this test provides beyond what we have already (regression suite and posix test suite).

@lionkov
Copy link
Author

lionkov commented Nov 6, 2013

If a test fails and the tests you already have don't, obviously it provides
something beyond what you already have :)

On Wed, Nov 6, 2013 at 9:29 AM, David notifications@github.com wrote:

We already have tests that test O_TRUNC, don't we? I'll check with Alfred.
If we do, then I'm not sure what this test provides beyond what we have
already (regression suite and posix test suite).


Reply to this email directly or view it on GitHubhttps://github.com//issues/326#issuecomment-27888471
.

@johnbent
Copy link
Member

johnbent commented Nov 6, 2013

I think we prefer to be as posix compliant as possible without sacrificing performance for our core workloads. And without massive internal code complexity. This should be low on priority list but it's higher than zero I think.

On Nov 6, 2013, at 9:22 AM, Brett Kettering notifications@github.com wrote:

We tell people to use O_RDONLY or O_WRONLY, but not O_RDWR. So, as long as we can open a file with a O_TRUNC and then write it, I think we're OK.

Brett

From: David Bonnie <notifications@github.commailto:notifications@github.com>
Reply-To: plfs/plfs-core <reply@reply.github.commailto:reply@reply.github.com>
Date: Wednesday, November 6, 2013 9:07 AM
To: plfs/plfs-core <plfs-core@noreply.github.commailto:plfs-core@noreply.github.com>
Cc: Brett Kettering <brettk@lanl.govmailto:brettk@lanl.gov>
Subject: Re: [plfs-core] PLFS/FUSE fails with fsx (#326)

Looks like a failure in truncate somewhere (most likely due to calling a partial truncate on a file that's open in read/write mode).

Do we care?

I get different results, btw, on my Macbook:

pn1245359:plfs_n1 dbonnie$ ./a.out -P /tmp -W -R test.file
mapped writes DISABLED
truncating to largest ever: 0x32740
Size error: expected 0x1977a stat 0x17000 seek 0x17000
LOG DUMP (12 total operations):
1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE
2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes)
3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes)
4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes)
5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes)
6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes)
7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes)
8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes)
9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740
10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes)
11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND
12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a
Correct content saved for comparison
(maybe hexdump "test.file" vs "test.file.fsxgood")

pn1245359:plfs_n1 dbonnie$ hexdump test.file
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0019770

pn1245359:plfs_n1 dbonnie$ hexdump /tmp//test.file.fsxgood
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0010f10 00 00 00 0d 02 6a 02 6b 02 17 02 df 02 79 02 e6
0010f20 02 4b 02 5e 02 4e 02 13 02 f0 02 93 02 2e 02 cd
0010f30 02 2a 02 fd 02 3c 02 4d 02 50 02 f2 02 6c 02 04
0010f40 02 fc 02 53 02 ff 02 6f 02 e1 02 43 02 34 02 2f
.
.
.
SNIPPED FOR BREVITY
.
.
.
0016af0 02 a0 02 f2 02 d7 02 82 02 71 02 21 02 31 02 91
0016b00 02 e9 02 9b 02 bd 02 04 02 a1 02 f5 02 fc 02 e2
0016b10 02 68 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0016b20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0019770


Reply to this email directly or view it on GitHubhttps://github.com//issues/326#issuecomment-27886387.

Reply to this email directly or view it on GitHub.

@dshrader
Copy link
Contributor

dshrader commented Nov 6, 2013

We actually already have a lot of tests that fail. The POSIX test suite that we benchmark against provides a lot of failures that we will never fix because PLFS is not 100% POSIX compliant and has no plans to be so. The issue of O_RDWR and O_TRUNC is known and reproducible in our existing tests. It seems that, so far, fsx has told us nothing new. We need to analyze the remaining tests that fsx does to find out if it does anything new.

To follow up on the source, O_RDWR is always used when O_TRUNC is used on an open call in the ltp_fsx.c source. So, using the -L command line parameter that David B. pointed out is a good idea unless we modify the source.

@lionkov
Copy link
Author

lionkov commented Nov 6, 2013

I don't think the issue is only with O_TRUNC, it is with truncate. I
removed O_TRUNC from fsx source and it still fails. The problem could be
the same that makes plfs fail for O_TRUNC though.

On Wed, Nov 6, 2013 at 9:39 AM, David notifications@github.com wrote:

We actually already have a lot of tests that fail. The POSIX test suite
that we benchmark against provides a lot of failures that we will never fix
because PLFS is not 100% POSIX compliant and has no plans to be so. The
issue of O_RDWR and O_TRUNC is known and reproducible in our existing
tests. It seems that, so far, fsx has told us nothing new. We need to
analyze the remaining tests that fsx does to find out if it does anything
new.

To follow up on the source, O_RDWR is always used when O_TRUNC is used on
an open call in the ltp_fsx.c source. So, using the -L command line
parameter that David B. pointed out is a good idea unless we modify the
source.


Reply to this email directly or view it on GitHubhttps://github.com//issues/326#issuecomment-27889496
.

@thewacokid
Copy link
Contributor

Exactly - truncate, used in combination with O_RDWR, is a known issue.

@dshrader
Copy link
Contributor

dshrader commented Nov 6, 2013

It looks like every file that ltp_fsx opens except for the log file is opened with O_RDWR. We still should take a look at what else ltp_fsx does to make sure we have all the functionality somewhere. I don't know if we can include fsx directly in our testing suite due to licensing (I really hate licensing red tape), but it would still be good to make sure we test the same things.

@lionkov
Copy link
Author

lionkov commented Nov 6, 2013

BTW, I think the right way to "fix" this is to make truncate fail in case
of O_RDWR, instead of producing a file with incorrect content.

On Wed, Nov 6, 2013 at 9:43 AM, Latchesar Ionkov lucho@ionkov.net wrote:

I don't think the issue is only with O_TRUNC, it is with truncate. I
removed O_TRUNC from fsx source and it still fails. The problem could be
the same that makes plfs fail for O_TRUNC though.

On Wed, Nov 6, 2013 at 9:39 AM, David notifications@github.com wrote:

We actually already have a lot of tests that fail. The POSIX test suite
that we benchmark against provides a lot of failures that we will never fix
because PLFS is not 100% POSIX compliant and has no plans to be so. The
issue of O_RDWR and O_TRUNC is known and reproducible in our existing
tests. It seems that, so far, fsx has told us nothing new. We need to
analyze the remaining tests that fsx does to find out if it does anything
new.

To follow up on the source, O_RDWR is always used when O_TRUNC is used on
an open call in the ltp_fsx.c source. So, using the -L command line
parameter that David B. pointed out is a good idea unless we modify the
source.


Reply to this email directly or view it on GitHubhttps://github.com//issues/326#issuecomment-27889496
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants