Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang on collecting GRBM_GUI_ACTIVE in LAMMPS #165

Open
skyreflectedinmirrors opened this issue Sep 21, 2022 · 2 comments
Open

Hang on collecting GRBM_GUI_ACTIVE in LAMMPS #165

skyreflectedinmirrors opened this issue Sep 21, 2022 · 2 comments

Comments

@skyreflectedinmirrors
Copy link

skyreflectedinmirrors commented Sep 21, 2022

Using:

OMNITRACE_CONFIG_FILE                              = 
OMNITRACE_USE_PERFETTO                             = true
OMNITRACE_USE_TIMEMORY                             = false
OMNITRACE_USE_SAMPLING                             = false
OMNITRACE_USE_PROCESS_SAMPLING                     = false
OMNITRACE_USE_ROCTRACER                            = true
OMNITRACE_USE_ROCM_SMI                             = true
OMNITRACE_USE_KOKKOSP                              = false
OMNITRACE_USE_PID                                  = true
OMNITRACE_USE_RCCLP                                = false
OMNITRACE_USE_ROCPROFILER                          = true
OMNITRACE_USE_ROCTX                                = false
OMNITRACE_OUTPUT_PATH                              = omnitrace-%tag%-output
OMNITRACE_OUTPUT_PREFIX                            = 
OMNITRACE_CRITICAL_TRACE                           = false
OMNITRACE_PAPI_EVENTS                              = PAPI_TOT_CYC
OMNITRACE_PERFETTO_BACKEND                         = inprocess
OMNITRACE_PERFETTO_BUFFER_SIZE_KB                  = 1024000
OMNITRACE_PERFETTO_FILL_POLICY                     = discard
OMNITRACE_PROCESS_SAMPLING_DURATION                = -1
OMNITRACE_PROCESS_SAMPLING_FREQ                    = 0
OMNITRACE_ROCM_EVENTS                              = GRBM_GUI_ACTIVE
OMNITRACE_SAMPLING_CPUS                            = all
OMNITRACE_SAMPLING_DELAY                           = 0.5
OMNITRACE_SAMPLING_DURATION                        = 0
OMNITRACE_SAMPLING_FREQ                            = 200
OMNITRACE_SAMPLING_GPUS                            = 0,1
OMNITRACE_TIME_OUTPUT                              = true
OMNITRACE_TIMEMORY_COMPONENTS                      = wall_clock
OMNITRACE_VERBOSE                                  = 0
OMNITRACE_ENABLED                                  = true
OMNITRACE_SUPPRESS_CONFIG                          = false
OMNITRACE_SUPPRESS_PARSING                         = false

hangs on the first kernel call:

$ AMD_LOG_LEVEL=3 /home/nicurtis/lammps_benchmarking/install/tpl/openmpi/bin/mpirun --mca pml ucx --mca btl ^vader,tcp,openib,uct -np 1 ./lmp -k on g 1 -sf kk -pk kokkos cuda/aware on neigh half neigh/qeq full newton on -v x 6 -v y 6 -v z 8 -v steps 25 -in in.reaxc.hns -nocite -log TheraC63/reaxff//log.lammps
[omnitrace][omnitrace_init_tooling] Instrumentation mode: Trace


      ______   .___  ___. .__   __.  __  .___________..______          ___       ______  _______
     /  __  \  |   \/   | |  \ |  | |  | |           ||   _  \        /   \     /      ||   ____|
    |  |  |  | |  \  /  | |   \|  | |  | `---|  |----`|  |_)  |      /  ^  \   |  ,----'|  |__
    |  |  |  | |  |\/|  | |  . `  | |  |     |  |     |      /      /  /_\  \  |  |     |   __|
    |  `--'  | |  |  |  | |  |\   | |  |     |  |     |  |\  \----./  _____  \ |  `----.|  |____
     \______/  |__|  |__| |__| \__| |__|     |__|     | _| `._____/__/     \__\ \______||_______|

    
[066.998]       perfetto.cc:55910 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: ""

[omnitrace][pid=30219] MPI rank: 0 (0), MPI size: 1 (1)
LAMMPS (23 Jun 2022 - Update 1)
KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:105)
  will use up to 1 GPU(s) per node
:3:rocdevice.cpp            :416 : 81067696131 us: 30219: [tid:0x7f68d9031280] Initializing HSA stack.
:3:comgrctx.cpp             :33  : 81067696207 us: 30219: [tid:0x7f68d9031280] Loading COMGR library.
:3:rocdevice.cpp            :207 : 81067696378 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5df3880
:3:rocdevice.cpp            :1611: 81067696802 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067697588 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5e30cb0
:3:rocdevice.cpp            :1611: 81067697802 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067698438 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5e6e3d0
:3:rocdevice.cpp            :1611: 81067698628 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067699255 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5eabad0
:3:rocdevice.cpp            :1611: 81067699441 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067700248 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5ee91e0
:3:rocdevice.cpp            :1611: 81067700432 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067701884 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5f26930
:3:rocdevice.cpp            :1611: 81067702074 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067703320 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5f64010
:3:rocdevice.cpp            :1611: 81067703500 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067704752 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5fa1710
:3:rocdevice.cpp            :1611: 81067704929 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:hip_context.cpp          :50  : 81067706380 us: 30219: [tid:0x7f68d9031280] Direct Dispatch: 1
:3:hip_device_runtime.cpp   :517 : 81067708010 us: 30219: [tid:0x7f68d9031280] hipGetDeviceCount: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708019 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c2e0, 0 )
:3:hip_device.cpp           :348 : 81067708219 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708237 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c5f8, 1 )
:3:hip_device.cpp           :348 : 81067708254 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708258 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c910, 2 )
:3:hip_device.cpp           :348 : 81067708286 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708298 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5cc28, 3 )
:3:hip_device.cpp           :348 : 81067708312 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708316 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5cf40, 4 )
:3:hip_device.cpp           :348 : 81067708329 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708333 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d258, 5 )
:3:hip_device.cpp           :348 : 81067708356 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708367 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d570, 6 )
:3:hip_device.cpp           :348 : 81067708380 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708385 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d888, 7 )
:3:hip_device.cpp           :348 : 81067708395 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :530 : 81067708403 us: 30219: [tid:0x7f68d9031280] hipSetDevice ( 0 )
:3:hip_device_runtime.cpp   :535 : 81067708424 us: 30219: [tid:0x7f68d9031280] hipSetDevice: Returned hipSuccess : 
:3:hip_memory.cpp           :493 : 81067708445 us: 30219: [tid:0x7f68d9031280] hipMalloc ( 0x7fff288c3f20, 8448 )
:3:rocdevice.cpp            :2093: 81067708474 us: 30219: [tid:0x7f68d9031280] device=0x653dda0, freeMem_ = 0xfeffdf00
:3:hip_memory.cpp           :495 : 81067708478 us: 30219: [tid:0x7f68d9031280] hipMalloc: Returned hipSuccess : 0x7f6051b00000: duration: 33 us
:3:hip_memory.cpp           :1225: 81067708487 us: 30219: [tid:0x7f68d9031280] hipMemcpyAsync ( 0x7f6051b00000, 0x7fff288c40c0, 256, hipMemcpyDefault, stream:<null> )
:3:rocdevice.cpp            :2686: 81067708503 us: 30219: [tid:0x7f68d9031280] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:3:rocdevice.cpp            :2757: 81067721343 us: 30219: [tid:0x7f68d9031280] created hardware queue 0x7f68680ca000 with size 4096 with priority 1, cooperative: 0
:3:devprogram.cpp           :2675: 81067924077 us: 30219: [tid:0x7f68d9031280] Using Code Object V4.
:3:devprogram.cpp           :2978: 81067925217 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillImage
:3:devprogram.cpp           :2978: 81067925223 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned2D
:3:devprogram.cpp           :2978: 81067925225 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned
:3:devprogram.cpp           :2978: 81067925227 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImage1DA
:3:devprogram.cpp           :2978: 81067925228 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferAligned
:3:devprogram.cpp           :2978: 81067925229 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWait
:3:devprogram.cpp           :2978: 81067925230 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBuffer
:3:devprogram.cpp           :2978: 81067925232 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWrite
:3:devprogram.cpp           :2978: 81067925233 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRectAligned
:3:devprogram.cpp           :2978: 81067925234 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_gwsInit
:3:devprogram.cpp           :2978: 81067925236 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRect
:3:devprogram.cpp           :2978: 81067925237 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImageToBuffer
:3:devprogram.cpp           :2978: 81067925238 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferToImage
:3:devprogram.cpp           :2978: 81067925239 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImage
:3:rocvirtual.hpp           :62  : 81067925542 us: 30219: [tid:0x7f68d9031280] Host active wait for Signal = (0x7f686811d180) for 100000 ns
:3:rocvirtual.cpp           :143 : 81067925558 us: 30219: [tid:0x7f68d9031280] Signal = (0x7f686811d180), start = 81067925545769, end = 81067925547369
:3:hip_memory.cpp           :1226: 81067925567 us: 30219: [tid:0x7f68d9031280] hipMemcpyAsync: Returned hipSuccess : : duration: 217080 us
:3:hip_stream.cpp           :450 : 81067925582 us: 30219: [tid:0x7f68d9031280] hipStreamSynchronize ( stream:<null> )
:3:rocdevice.cpp            :2636: 81067925599 us: 30219: [tid:0x7f68d9031280] No HW event
:3:hip_stream.cpp           :451 : 81067925601 us: 30219: [tid:0x7f68d9031280] hipStreamSynchronize: Returned hipSuccess : 
:3:hip_memory.cpp           :2461: 81067925613 us: 30219: [tid:0x7f68d9031280] hipMemset ( 0x7f6051b00100, 0, 8192 )
:3:rocvirtual.cpp           :679 : 81067925626 us: 30219: [tid:0x7f68d9031280] Arg3: ulong* bufULong = ptr:0x7f6051b00000 obj:[0x7f6051b00000-0x7f6051b02100]
:3:rocvirtual.cpp           :679 : 81067925628 us: 30219: [tid:0x7f68d9031280] Arg4: uchar* pattern = ptr:0x7f686807c080 obj:[0x7f686807c000-0x7f686807d000]
:3:rocvirtual.cpp           :753 : 81067925630 us: 30219: [tid:0x7f68d9031280] Arg5: uint patternSize = val:1
:3:rocvirtual.cpp           :753 : 81067925631 us: 30219: [tid:0x7f68d9031280] Arg6: ulong offset = val:32
:3:rocvirtual.cpp           :753 : 81067925633 us: 30219: [tid:0x7f68d9031280] Arg7: ulong size = val:1024
:3:rocvirtual.cpp           :2723: 81067925634 us: 30219: [tid:0x7f68d9031280] ShaderName : __amd_rocclr_fillBufferAligned
:3:rocvirtual.hpp           :62  : 81067935725 us: 30219: [tid:0x7f68d9031280] Host active wait for Signal = (0x7f686811d080) for -1 ns
# hangs here forever
@skyreflectedinmirrors
Copy link
Author

On 472e96a

@skyreflectedinmirrors
Copy link
Author

No hang w/ OMNITRACE_PAPI_EVENTS, but it doesn't show in the trace either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant