Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting bad_alloc Error on N-1 Strided using Certain Block Sizes #251

Open
atorrez opened this issue May 30, 2013 · 40 comments
Open

Getting bad_alloc Error on N-1 Strided using Certain Block Sizes #251

atorrez opened this issue May 30, 2013 · 40 comments
Labels

Comments

@atorrez
Copy link
Contributor

atorrez commented May 30, 2013

In running a sweep of N-1 strided performance using fs_test and plfs version 2.4rc2 on rrz, I got bad_alloc errors when opening for read. This happened using a 32k block size. I got some errors and some successes using other block sizes.

Here is the command line I used:
mpirun -n 32 /users/atorrez/Testing-ext/fs_test/fs_test/fs_test.rrz.x -strided 1 -target plfs:/var/tmp/plfs.atorrez1/atorrez/out.%s -io mpi -time 300 -barriers aopen -shift -type 2 -sync -size 32768

Dave Bonnie is seeing the same errors on smog and rrz using smaller block sizes and shorter -time parameters.

@thewacokid
Copy link
Contributor

So here are some preliminary findings:

On smog:

aprun -n 16 ./fs_test.smog.x -target plfs://users/dbonnie/plfs_n1/out.%s -io mpi -time 5 -barriers aopen -shift -type 2 -sync -size 1 -nodb -strided 1

A single node writing 1 byte blocks will cause this with only a 5 second write time when using a glib backend (because glib is MUCH faster than posix).

On rrz:

mpirun -n 16 ./fs_test.localhost.x -strided 1 -target plfs:/users/dbonnie/plfs_n1/out.%s -io mpi -time 30 -barriers aopen -shift -type 2 -sync -size 1 -nodb

Causes a failure in 30 seconds (because PanFS even with glib is much slower than smog's Lustre).

I have the sneaking feeling this is an index bug or something because these runs create millions and millions of index entries. I have confirmed that this also occurs with compress_contiguous set to false in the plfsrc.

My plfsrc files contain very similar entries between the two systems:

rrz:

  • global_params:
    threadpool_size: 1
    num_hostdirs: 27
  • mount_point: /users/dbonnie/plfs_n1
    backends:
    • location: glib:///panfs/scratch1/vol20/dbonnie/n1

smog:

  • mount_point: /users/dbonnie/plfs_n1
    workload: n-1
    backends:
    • location: glib:///lustre/lscratch/dbonnie/plfs_1/n1
  • global_params:
    threadpool_size: 1
    num_hostdirs: 27

So first thought is this may be in older releases as well but it's hard to test. I think Alfred or I will try to confirm that soon with a longer test suite for 2.3.

@thewacokid
Copy link
Contributor

Also, the errors change between runs which makes me think memory corruption in some form or another although bad_alloc seems to be quite common.

@thewacokid
Copy link
Contributor

I've narrowed this down a bit more -- I'm almost positive the problem lies in the indexes when they get to a certain size.

So far the following causes the error:

128 procs (16 nodes), 70000 1-byte writes each proc. Each index is 3.3 MB, total memory usage per node for indexes should be about 3.4 GB per node(each has 16 GB)...so this probably shouldn't fail.

16 procs (2 nodes), 400000 1-byte writes each proc. Each index is 28 MB, total memory usage for indexes should be 3.5 GB per node (each has 16 GB)...so this probably shouldn't fail either.

1 proc, 25000000 1 byte writes. Total index is 1.2 GB.

So clearly something fishy is going on here. I'm going to test at larger scales and with an older version of PLFS to see if I can narrow this down a bit better.

@thewacokid
Copy link
Contributor

Updates:

512 procs (64 nodes), 15000 1 byte writes each. Each index is 704 KB, total usage per node is ~2.75 GB.

All of the above are from RRZ. Smog is a bit different but fails at similar counts per node (though memory usage for indexes at 1088 procs on Smog results in ~12 GB...way more than what fails on RRZ).

The strided access is what's causing every write to be a separate index entry but it shouldn't be crashing where it is.

@thewacokid
Copy link
Contributor

Single node on RRZ: 8 procs, 1000000 objects each, roughly the same memory usage in aggregate to crash.

@thewacokid
Copy link
Contributor

Confirmed that the exact same workload fails on PLFS 2.3...about 46 MB of index per proc, so loading it on a single node uses around 2.9 GB of RAM. There's 16 GB on a node on RRZ.

@johnbent
Copy link
Member

johnbent commented Jun 3, 2013

Im just curious how our regression failed to find this in 2.3 but did for
2.4?

On Mon, Jun 3, 2013 at 5:24 PM, David Bonnie notifications@github.comwrote:

Confirmed that the exact same workload fails on PLFS 2.3.


Reply to this email directly or view it on GitHubhttps://github.com//issues/251#issuecomment-18872367
.

@thewacokid
Copy link
Contributor

2.4 is faster than 2.3. The test was right on the edge of failing with 2.3 in terms of object counts after 300 seconds of writing.

@johnbent
Copy link
Member

johnbent commented Jun 3, 2013

That's pretty awesome actually.

Why faster?

On Jun 3, 2013, at 5:30 PM, David Bonnie notifications@github.com wrote:

2.4 is faster than 2.3. The test was right on the edge of failing with 2.3 in terms of object counts after 300 seconds of writing.


Reply to this email directly or view it on GitHub.

@thewacokid
Copy link
Contributor

Various little things all adding up I think.

I might have jumped the gun a bit on 2.3 failing - I had linked things weirdly for the my custom MPI and I think I hooked into that accidentally for my 2.3 test. I'll update when I figure out what I managed to muck up. :)

@thewacokid
Copy link
Contributor

Okay - fixed my install and I've gotten 2.3 to fail with OpenMPI 1.6.4 on RRZ in the exact same manner as 2.4rc2.

@thewacokid
Copy link
Contributor

Watching a run on a single node I was able to see what's happening.

  1. The write phase runs and completes normally with low memory usage.
  2. The call to MPI_File_open causes a single proc to load the index. Memory usage exceeds 5 GB for this proc.
  3. The proc that loaded the index then (I assume) distributes this to the other procs which causes the available memory to drop to 0.
  4. Hammer...err...Crash time.

Here's what top shows as the memory is rising on one proc:

11459 dbonnie 20 0 134m 13m 4636 R 100.2 0.1 2:45.79 fs_test.localho
11460 dbonnie 20 0 135m 13m 4668 R 100.2 0.1 2:45.92 fs_test.localho
11463 dbonnie 20 0 134m 13m 4632 R 99.9 0.1 2:45.71 fs_test.localho
11457 dbonnie 20 0 134m 13m 4620 R 99.6 0.1 2:45.31 fs_test.localho
11461 dbonnie 20 0 134m 13m 4632 R 99.6 0.1 2:46.01 fs_test.localho
11462 dbonnie 20 0 134m 13m 4616 R 99.6 0.1 2:45.25 fs_test.localho
11456 dbonnie 20 0 2372m 2.2g 5636 R 98.9 14.0 2:45.71 fs_test.localho
11458 dbonnie 20 0 134m 13m 4644 R 98.9 0.1 2:46.02 fs_test.localho

That proc will rise to just over 5.6 GB, then the following happens:

11525 dbonnie 20 0 1581m 1.4g 4860 R 100.6 9.1 3:47.22 fs_test.localho
11523 dbonnie 20 0 4656m 4.4g 5780 R 99.6 28.1 3:46.44 fs_test.localho
11524 dbonnie 20 0 1629m 1.5g 4844 R 99.6 9.3 3:47.21 fs_test.localho
11526 dbonnie 20 0 1531m 1.4g 4824 R 99.6 8.7 3:46.80 fs_test.localho
11527 dbonnie 20 0 1658m 1.5g 4872 R 99.6 9.5 3:46.64 fs_test.localho
11528 dbonnie 20 0 1575m 1.4g 4828 R 99.6 9.0 3:44.96 fs_test.localho
11529 dbonnie 20 0 1612m 1.5g 4844 R 99.6 9.2 3:47.24 fs_test.localho
11530 dbonnie 20 0 1664m 1.5g 4796 R 99.6 9.6 3:47.19 fs_test.localho

The proc that was using 5+ GB of memory starts to drop in memory usage slightly but every other proc's memory usage skyrockets to the point of a crash.

@johnbent
Copy link
Member

johnbent commented Jun 4, 2013

How did this pass regression in 2.3 then? No longer sounds like race caused by increased performance.

On Jun 4, 2013, at 2:40 PM, David Bonnie notifications@github.com wrote:

Watching a run on a single node I was able to see what's happening.

The write phase runs and completes normally with low memory usage.
The call to MPI_File_open causes a single proc to load the index. Memory usage exceeds 5 GB for this proc.
The proc that loaded the index then (I assume) distributes this to the other procs which causes the available memory to drop to 0.
Hammer...err...Crash time.
Here's what top shows as the memory is rising on one proc:

11459 dbonnie 20 0 134m 13m 4636 R 100.2 0.1 2:45.79 fs_test.localho

11460 dbonnie 20 0 135m 13m 4668 R 100.2 0.1 2:45.92 fs_test.localho

11463 dbonnie 20 0 134m 13m 4632 R 99.9 0.1 2:45.71 fs_test.localho

11457 dbonnie 20 0 134m 13m 4620 R 99.6 0.1 2:45.31 fs_test.localho

11461 dbonnie 20 0 134m 13m 4632 R 99.6 0.1 2:46.01 fs_test.localho

11462 dbonnie 20 0 134m 13m 4616 R 99.6 0.1 2:45.25 fs_test.localho

11456 dbonnie 20 0 2372m 2.2g 5636 R 98.9 14.0 2:45.71 fs_test.localho

11458 dbonnie 20 0 134m 13m 4644 R 98.9 0.1 2:46.02 fs_test.localho

That proc will rise to just over 5.6 GB, then the following happens:

11525 dbonnie 20 0 1581m 1.4g 4860 R 100.6 9.1 3:47.22 fs_test.localho

11523 dbonnie 20 0 4656m 4.4g 5780 R 99.6 28.1 3:46.44 fs_test.localho

11524 dbonnie 20 0 1629m 1.5g 4844 R 99.6 9.3 3:47.21 fs_test.localho

11526 dbonnie 20 0 1531m 1.4g 4824 R 99.6 8.7 3:46.80 fs_test.localho

11527 dbonnie 20 0 1658m 1.5g 4872 R 99.6 9.5 3:46.64 fs_test.localho

11528 dbonnie 20 0 1575m 1.4g 4828 R 99.6 9.0 3:44.96 fs_test.localho

11529 dbonnie 20 0 1612m 1.5g 4844 R 99.6 9.2 3:47.24 fs_test.localho

11530 dbonnie 20 0 1664m 1.5g 4796 R 99.6 9.6 3:47.19 fs_test.localho

The proc that was using 5+ GB of memory starts to drop in memory usage slightly but every other proc's memory usage skyrockets to the point of a crash.


Reply to this email directly or view it on GitHub.

@thewacokid
Copy link
Contributor

The workload I established isn't time-based so it hits it regardless of speed.

@chuckcranor
Copy link
Contributor

How big are the droppping files on the backend after the write
phase completes? just curious...

chuck

On Tue, Jun 04, 2013 at 11:40:26AM -0700, David Bonnie wrote:

Watching a run on a single node I was able to see what's happening.

  1. The write phase runs and completes normally with low memory usage.
  2. The call to MPI_File_open causes a single proc to load the index. Memory usage exceeds 5 GB for this proc.
  3. The proc that loaded the index then (I assume) distributes this to the other procs which causes the available memory to drop to 0.
  4. Hammer...err...Crash time.

@thewacokid
Copy link
Contributor

Here's the contents of the hostdir:

-rw-r--r-- 1 dbonnie dbonnie 0 Jun 4 12:32 dropping.data.1370370765.643511.rrz004.localdomain.0
-rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.685680.rrz004.localdomain.0
-rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697148.rrz004.localdomain.7
-rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697149.rrz004.localdomain.1
-rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697153.rrz004.localdomain.3
-rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697154.rrz004.localdomain.4
-rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697155.rrz004.localdomain.5
-rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697180.rrz004.localdomain.2
-rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697778.rrz004.localdomain.6
-rw-r--r-- 1 dbonnie dbonnie 0 Jun 4 12:32 dropping.index.1370370765.643511.rrz004.localdomain.0
-rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.685680.rrz004.localdomain.0
-rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697148.rrz004.localdomain.7
-rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697149.rrz004.localdomain.1
-rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697153.rrz004.localdomain.3
-rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697154.rrz004.localdomain.4
-rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697155.rrz004.localdomain.5
-rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697180.rrz004.localdomain.2
-rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697778.rrz004.localdomain.6

I wouldn't imagine this is enough data to cause an issue unless the index entries expand drastically in-memory.

@chuckcranor
Copy link
Contributor

On Tue, Jun 04, 2013 at 12:29:28PM -0700, David Bonnie wrote:

Here's the contents of the hostdir:
I wouldn't imagine this is enough data to cause an issue unless the index entries expand drastically in-memory.

Yeah, I don't see how you get to 5GB from that. and if you complete
the write phase, then it closes the file, so everthing should be
written out to the backend before it starts the read phase, right?
(you could run fs_test with "-op write" and then do second "-op read"
run to ensure it is all down to backend...)

chuck

@thewacokid
Copy link
Contributor

It's definitely all hitting the backend before doing the open call (I did a write-only test and the results match up with the above backend listing).

@chuckcranor
Copy link
Contributor

On Tue, Jun 04, 2013 at 12:43:50PM -0700, David Bonnie wrote:

It's definitely all hitting the backend before doing the open call
(I did a write-only test and the results match up with the above backend listing).

what happens if you generate a global index after a write
(e.g. with the "plfs_flatten_index" tool)?

chuck

@thewacokid
Copy link
Contributor

Flattening the index results in a 367 MB global index. Trying to read it now.

@thewacokid
Copy link
Contributor

Looks like the same behavior on read. Is there a way to use the global index when doing an MPI read or is that handled automatically?

@chuckcranor
Copy link
Contributor

On Tue, Jun 04, 2013 at 02:25:41PM -0700, David Bonnie wrote:

Looks like the same behavior on read. Is there a way to use the
global index when doing an MPI read or is that handled automatically?

It should look for the global index first and use it if it is found.

you could try writing a simple plfs API program that does a
plfs_open for RDONLY and look at the memory usage of that
before and after the open. that might help isolate the
problem (e.g. is it in the MPI code, or the main lib?).

e.g. something like

main() {
Plfs_fd fd;
fd = NULL;
int ret;
printf("before open\n");
system("ps axuwwww"); /
or maybe use getpid() to just look at this proc */
ret = plfs_open(&fd, "/big/file", O_RDONLY, getpid(), 0666, NULL);
printf("after open, ret was %d\n", ret);
system("ps axuwwww");
}

chuck

@thewacokid
Copy link
Contributor

So trying to use the "plfs_flatten_close" MPI-IO hint results in an instant crash on close in fs_test due to a seg fault.

The backtrace:
fs_test.localhost.x:557 terminated with signal 11 at PC=2ba3477729fa SP=7fffe032cd58. Backtrace:
/lib64/libc.so.6(memcpy+0x1da)[0x2ba3477729fa]
/usr/lib64/libstdc++.so.6(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKcl+0x7b)[0x2ba3459abafb]
/usr/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x1b5)[0x2ba3459a2935]
/users/dbonnie/plfs/lib/libplfs.so.2.4(_ZN5Index16global_to_streamEPPvPm+0x233)[0x2ba34568c293]
/users/dbonnie/plfs/lib/libplfs.so.2.4(container_index_stream+0x73)[0x2ba34567a893]
/users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(flatten_then_close+0x56)[0x2ba345d3a366]
/users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(ADIOI_PLFS_Close+0x174)[0x2ba345d3a994]
/users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(ADIO_Close+0x3e)[0x2ba345d29f4e]
/users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(mca_io_romio_dist_MPI_File_close+0xef)[0x2ba345d4b6cf]
/users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(+0x8a38e)[0x2ba345ca038e]
/users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(ompi_file_close+0x59)[0x2ba345ca02a9]
/users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(PMPI_File_close+0x78)[0x2ba345ccdf18]
./fs_test.localhost.x(close_file+0x5c4)[0x407296]
./fs_test.localhost.x(read_write_file+0x619)[0x40879f]
./fs_test.localhost.x(main+0x1ae)[0x403f67]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x2ba347707cdd]
./fs_test.localhost.x[0x403c69]

Mostly just putting that here for future reference. Still digging.

@brettkettering
Copy link
Contributor

The "flatten" flag for fs_test is only valid if you use I/O type of PLFS.

This is only used if the "io" is 'plfs'. If non-zero it tells PLFS to

flatten the index file for better performance.

"flatten" : [ 1 ],

I'm not sure about any hint to pass. I don't see anything in the code or template files that mentions using such a hint.

So, you could try to change the io type to plfs and then turn the flatten flag on and see if that works.

Brett

@thewacokid
Copy link
Contributor

[ignore this]Actually, using fs_test "--flatten 1" results in a crash too, even with a fairly small workload.[/ignore this]

EDIT: I found the flatten flag by looking in ad_plfs -- so it should work for MPI-IO, no?

@brettkettering
Copy link
Contributor

Using io of "plfs"? I didn't look at the code closely, but if you use io that is not "plfs" you should just get an exit from the application without any I/O happening and a note that you can use flatten with an io that is not "plfs".

Brett

@thewacokid
Copy link
Contributor

Using either plfs or mpi the -flatten 1 flag crashes with a seg fault.

Using the mpi-io hint "plfs_flatten_close" clearly starts the flattening process but then crashes with a seg fault as well.

@johnbent
Copy link
Member

johnbent commented Jun 5, 2013

I'm assuming you've checked that workload=n-1 in the plfsrc. Flatten
doesn't make sense in the other modes; it shouldn't crash it though.

John

On Wed, Jun 5, 2013 at 11:27 AM, David Bonnie notifications@github.comwrote:

Using either plfs or mpi the -flatten 1 flag crashes with a seg fault.

Using the mpi-io hint "plfs_flatten_close" clearly starts the flattening
process but then crashes with a seg fault as well.

On Jun 5, 2013, at 9:20 AM, Brett Kettering wrote:

Using io of "plfs"? I didn't look at the code closely, but if you use io
that is not "plfs" you should just get an exit from the application without
any I/O happening and a note that you can use flatten with an io that is
not "plfs".

Brett

From: David Bonnie <notifications@github.com<mailto:
notifications@github.com>mailto:notifications@github.com>
Reply-To: plfs/plfs-core <reply@reply.github.com<mailto:
reply@reply.github.com>mailto:reply@reply.github.com>
Date: Wed, 5 Jun 2013 08:16:50 -0700
To: plfs/plfs-core <plfs-core@noreply.github.com<mailto:
plfs-core@noreply.github.com>mailto:plfs-core@noreply.github.com>
Cc: Brett Kettering <brettk@lanl.govmailto:brettk@lanl.gov<mailto:
brettk@lanl.gov>>
Subject: Re: [plfs-core] Getting bad_alloc Error on N-1 Strided using
Certain Block Sizes (#251)

Actually, using fs_test "--flatten 1" results in a crash too, even with a
fairly small workload.


Reply to this email directly or view it on GitHub<
https://github.com/plfs/plfs-core/issues/251#issuecomment-18983481>.


Reply to this email directly or view it on GitHub<
https://github.com/plfs/plfs-core/issues/251#issuecomment-18983782>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/251#issuecomment-18984205
.

@brettkettering
Copy link
Contributor

OK. So, it's a bug for fs_test to let you use the --flatten flag with io of "mpi".

Do you see this line printed?

"INFO RANK[%d]: Flattening plfs index

where rank is zero (0)?

Only rank zero is allowed to call plfs_flatten_index.

Look at the flatten_file function in fs_test.c. Put some debug statements in there to make sure the target path is correct to determine if fs_test is messing up or plfs_flatten_index is messing up.

Thanks,
Brett

@thewacokid
Copy link
Contributor

Yeah it prints that for rank 0 only (though with the mpi-io hint I believe it's trying to flatten on all ranks). This is to an N-1 mount, single backend, nothing crazy in terms of parameters.

I'll keep digging.

@brettkettering
Copy link
Contributor

If the hint is trying to flatten on all ranks, then that must be in the ad_* code because fs_test's flatten_file will only call plfs_flatten_index if it is rank 0 or if it's N-N.

Brett

@thewacokid
Copy link
Contributor

A note on memory usage (running plfs_map) for reading an index from two procs writing 12,000,000 objects of 1-byte each, strided. Each index dropping is ~550 MB.

Someone correct me if I'm wrong, but shouldn't I expect to see approximately 550 MB * [2 nodes] in terms of memory usage when reading the index with plfs_map? Instead, I see nearly 10 GB.

Reading back the index with "--uniform 0" to only read a single rank also results in a huge amount of memory usage in relation to the size of the index (4.6 GB).

@johnbent
Copy link
Member

johnbent commented Jun 5, 2013

Sounds like a gigantic memory leak.

John

On Jun 5, 2013, at 3:57 PM, David Bonnie notifications@github.com wrote:

A note on memory usage (running plfs_map) for reading an index from two procs writing 12,000,000 objects of 1-byte each, strided. Each index dropping is ~550 MB.

Someone correct me if I'm wrong, but shouldn't I expect to see approximately 550 MB * [2 nodes] in terms of memory usage when reading the index with plfs_map? Instead, I see nearly 10 GB.

Reading back the index with "--uniform 0" to only read a single rank also results in a huge amount of memory usage in relation to the size of the index (4.6 GB).


Reply to this email directly or view it on GitHub.

@chuckcranor
Copy link
Contributor

yeah. if it happens in plfs_map that's cool because it
rules out stuff and is easy to run and debug...

chuck

On Wed, Jun 05, 2013 at 01:07:17PM -0700, John Bent wrote:

Sounds like a gigantic memory leak.
John

On Jun 5, 2013, at 3:57 PM, David Bonnie notifications@github.com wrote:

A note on memory usage (running plfs_map) for reading an index from two procs writing 12,000,000 objects of 1-byte each, strided. Each index dropping is ~550 MB.

Someone correct me if I'm wrong, but shouldn't I expect to see approximately 550 MB * [2 nodes] in terms of memory usage when reading the index with plfs_map? Instead, I see nearly 10 GB.

Reading back the index with "--uniform 0" to only read a single rank also results in a huge amount of memory usage in relation to the size of the index (4.6 GB).

?
Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub:
#251 (comment)

@thewacokid
Copy link
Contributor

I'm tracking down where the memory is being allocated. I don't think it's a leak (as in allocated but not freed) but more in how the memory is handled for the global index creation.

@thewacokid
Copy link
Contributor

Just an update on this - to reduce memory usage there really just needs to be a rewrite of the global index creation (I don't believe there is really a bug here, just higher memory usage than I initially expected). Perhaps a shared-memory approach on a per-node basis?

@chuckcranor
Copy link
Contributor

Have you been able to quantify the memory allocation?
Where is it all going?

earlier you said "Flattening the index results in a 367 MB global index"
so how many GBs of RAM was a 367MB index taking? Was it like 5GB?

chuck

On Mon, Jun 10, 2013 at 10:18:34AM -0700, David Bonnie wrote:

Just an update on this - to reduce memory usage there really just needs to be a rewrite of the global index creation (I don't believe there is really a bug here, just higher memory usage than I initially expected). Perhaps a shared-memory approach on a per-node basis?


Reply to this email directly or view it on GitHub:
#251 (comment)

@thewacokid
Copy link
Contributor

The in-memory map results in most of the memory pressure (over 90% of the in-use memory is from the global index data structures). The on-disk size is roughly tripled in-memory per-process so that coupled with the various buffers and mmap'd files causes memory exhaustion with the cases I highlighted above.

A lot of the memory usage when using plfs_map initially was caused by a massive string construction for the output. I removed that and memory usage fell dramatically (though was still higher than I initially expected) to the quantities above.

@johnbent
Copy link
Member

What is the ratio of in-memory size to on-disk size?

John

On Jun 10, 2013, at 12:32 PM, David Bonnie <notifications@github.commailto:notifications@github.com> wrote:

The in-memory map results in most of the memory pressure (over 90% of the in-use memory is from the global index data structures). The on-disk size is roughly tripled in-memory per-process so that coupled with the various buffers and mmap'd files causes memory exhaustion with the cases I highlighted above.

A lot of the memory usage when using plfs_map initially was caused by a massive string construction for the output. I removed that and memory usage fell dramatically (though was still higher than I initially expected) to the quantities above.


Reply to this email directly or view it on GitHubhttps://github.com//issues/251#issuecomment-19214154.

@thewacokid
Copy link
Contributor

In my limited tests it's 2-3 times larger in-memory than on-disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants