Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remora 1.8.4 hanging #60

Open
laytonjbgmail opened this issue Nov 10, 2020 · 3 comments
Open

Remora 1.8.4 hanging #60

laytonjbgmail opened this issue Nov 10, 2020 · 3 comments
Assignees

Comments

@laytonjbgmail
Copy link

Good morning,

I'm testing remora 1.8.4 with a simple serial application and it is hanging. I can run the code without remora and it runs correctly. However, if I run it with remora, it hangs (it normally runs for about 1 minute and with remora, I've let it run for an hour).

I look at the output folder and I see a few text files (appears to be configuration information) but there is no data in the subfolders.

Any suggestions on how to debug this?

Thanks!

Jeff

@laytonjbgmail
Copy link
Author

Just a few things to add. This is an Ubuntu 18.04 system running bash 4.4.20 (just making sure it's not a bash version problem).

I went back and tried remora 1.8.2 and it too hangs (so did 1.8.3).

I'm also getting errors in install.sh (note: I'm building for MPI).

gcc -o mpstat -g -O2 -Wall -Wstrict-prototypes -pipe -O2 mpstat.o librdstats_light.a libsyscom.a -s
Installing mpstat ...
./install.sh: 78: ./install.sh: Syntax error: Bad fd number
./install.sh: 73: [: unexpected operator
./install.sh: 83: ./install.sh: Syntax error: Bad fd number
./install.sh: 78: [: unexpected operator
./install.sh: 86: ./install.sh: Syntax error: Bad fd number
./install.sh: 87: [: 1: unexpected operator
./install.sh: 100: [: 1: unexpected operator

WARNING : mpicc / mpif77 not found
WARNING : REMORA will be built without MPI support

./install.sh: 111: [: 0: unexpected operator
Copying all scripts to installation folder ...
'./src/aux/extra' -> '/home/laytonjb/bin/remora-1.8.2/bin/aux/extra'
'./src/aux/report' -> '/home/laytonjb/bin/remora-1.8.2/bin/aux/report'
'./src/aux/scheduler' -> '/home/laytonjb/bin/remora-1.8.2/bin/aux/scheduler'
'./src/aux/sql_functions' -> '/home/laytonjb/bin/remora-1.8.2/bin/aux/sql_functions'
'./src/config/fs_blacklist' -> '/home/laytonjb/bin/remora-1.8.2/bin/config/fs_blacklist'
'./src/config/modules' -> '/home/laytonjb/bin/remora-1.8.2/bin/config/modules'
'./src/modules/cpu' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/cpu'
'./src/modules/dvs' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/dvs'
'./src/modules/eth' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/eth'
'./src/modules/gpu' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/gpu'
'./src/modules/ib' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/ib'
'./src/modules/impi' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/impi'
'./src/modules/lnet' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/lnet'
'./src/modules/lustre' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/lustre'
'./src/modules/memory' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/memory'
'./src/modules/modules_utils' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/modules_utils'
'./src/modules/mv2' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/mv2'
'./src/modules/network' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/network'
'./src/modules/numa' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/numa'
'./src/modules/power' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/power'
'./src/modules/temperature' -> '/home/laytonjb/bin/remora-1.8.2/bin/modules/temperature'
'./src/remora' -> '/home/laytonjb/bin/remora-1.8.2/bin/remora'
'./src/remora_mem_safe' -> '/home/laytonjb/bin/remora-1.8.2/bin/remora_mem_safe'
'./src/remora_post' -> '/home/laytonjb/bin/remora-1.8.2/bin/remora_post'
'./src/remora_post_crash' -> '/home/laytonjb/bin/remora-1.8.2/bin/remora_post_crash'
'./src/scripts/remora_collect.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_collect.sh'
'./src/scripts/remora_finalize.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_finalize.sh'
'./src/scripts/remora_init.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_init.sh'
'./src/scripts/remora_monitor.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_monitor.sh'
'./src/scripts/remora_monitor_memory.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_monitor_memory.sh'
'./src/scripts/remora_mpi_post.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_mpi_post.sh'
'./src/scripts/remora_remote_post.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_remote_post.sh'
'./src/scripts/remora_report.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_report.sh'
'./src/scripts/remora_report_mic.sh' -> '/home/laytonjb/bin/remora-1.8.2/bin/scripts/remora_report_mic.sh'
./install.sh: 122: [: 1: unexpected operator
./install.sh: 122: [: 1: unexpected operator
./install.sh: 126: [: -1: unexpected operator

Installation of REMORA v1.8.2 completed.
For a fully functional installation make sure to:

export PATH=$PATH:/home/laytonjb/bin/remora-1.8.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/laytonjb/bin/remora-1.8.2/lib
export REMORA_BIN=/home/laytonjb/bin/remora-1.8.2/bin

Good Luck!

I'm guessing these are primarily bash issues?

I tried using a build of OpenMPI (4.0.3) to perhaps get around some issues but it gives the exact same error messages.

Jeff

@milfeld
Copy link
Collaborator

milfeld commented Nov 12, 2020

The hang is probably due to the shell problem that was fixed. (See #59.)
Download the latest remora, and try again. If it persists, it might be due
to a problem for a specific module during the (remote) collection or reporting problem.
If it persists, please report back with the output from a run with "export REMORA_VERBOSE=1".

@milfeld milfeld self-assigned this Nov 12, 2020
@milfeld
Copy link
Collaborator

milfeld commented Nov 12, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants