Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing conditional check for -arch=native flag. #210

Merged

Conversation

Treece-Burgess
Copy link
Contributor

@Treece-Burgess Treece-Burgess commented Jul 29, 2024

Pull Request Description

This PR removes the conditional block which determined if the -arch=native flag was used. There is no apparent difference to setting this flag or not. See below for test output after removal of the conditional block.

Test running simpleMultiGPU in Cuda Component

Command line:

./simpleMultiGPU cuda:::dram__bytes.avg

Output:

Starting simpleMultiGPU
PAPI version: 7.1.0
CUDA-capable device count: 8
CUDA Device 0: Tesla V100-SXM2-32GB : computeCapability 7.0 runtimeVersion 12.1 driverVersion 12.5
CUDA Device 1: Tesla V100-SXM2-32GB : computeCapability 7.0 runtimeVersion 12.1 driverVersion 12.5
CUDA Device 2: Tesla V100-SXM2-32GB : computeCapability 7.0 runtimeVersion 12.1 driverVersion 12.5
CUDA Device 3: Tesla V100-SXM2-32GB : computeCapability 7.0 runtimeVersion 12.1 driverVersion 12.5
CUDA Device 4: Tesla V100-SXM2-32GB : computeCapability 7.0 runtimeVersion 12.1 driverVersion 12.5
CUDA Device 5: Tesla V100-SXM2-32GB : computeCapability 7.0 runtimeVersion 12.1 driverVersion 12.5
CUDA Device 6: Tesla V100-SXM2-32GB : computeCapability 7.0 runtimeVersion 12.1 driverVersion 12.5
CUDA Device 7: Tesla V100-SXM2-32GB : computeCapability 7.0 runtimeVersion 12.1 driverVersion 12.5
Generating input data...
Setup PAPI counters internally (PAPI)
Found CUDA Component at id 2
Add event success: 'cuda:::dram__bytes.avg:device=0' GPU 0
Add event success: 'cuda:::dram__bytes.avg:device=1' GPU 1
Add event success: 'cuda:::dram__bytes.avg:device=2' GPU 2
Add event success: 'cuda:::dram__bytes.avg:device=3' GPU 3
Add event success: 'cuda:::dram__bytes.avg:device=4' GPU 4
Add event success: 'cuda:::dram__bytes.avg:device=5' GPU 5
Add event success: 'cuda:::dram__bytes.avg:device=6' GPU 6
Add event success: 'cuda:::dram__bytes.avg:device=7' GPU 7
Computing with 8 GPUs...
Process GPU results on 8 GPUs...
PAPI counterValue        26491 		 --> cuda:::dram__bytes.avg:device=0 
PAPI counterValue        24344 		 --> cuda:::dram__bytes.avg:device=1 
PAPI counterValue        24144 		 --> cuda:::dram__bytes.avg:device=2 
PAPI counterValue        23583 		 --> cuda:::dram__bytes.avg:device=3 
PAPI counterValue        22175 		 --> cuda:::dram__bytes.avg:device=4 
PAPI counterValue        22783 		 --> cuda:::dram__bytes.avg:device=5 
PAPI counterValue        21681 		 --> cuda:::dram__bytes.avg:device=6 
PAPI counterValue        21450 		 --> cuda:::dram__bytes.avg:device=7 
  GPU Processing time: 3.043000 (ms)
Computing the same result with Host CPU...
  CPU Processing time: 4.527000 (ms) (speedup 1.49X)
Comparing GPU and Host CPU results...
  GPU sum: 777239.750000
  CPU sum: 777239.896331
  Relative difference: 1.882697E-07 
PASSED


Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@Treece-Burgess Treece-Burgess marked this pull request as draft July 29, 2024 21:04
@Treece-Burgess Treece-Burgess marked this pull request as ready for review July 29, 2024 21:13
Copy link
Collaborator

@jagode jagode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you removed the initial setting of NVCFLAGS, you should also remove the + from line:
NVCFLAGS += -g -ccbin='$(CC)' $(PAPI_FLAG) so that it looks like:
NVCFLAGS = -g -ccbin='$(CC)' $(PAPI_FLAG)

src/components/cuda/tests/Makefile Outdated Show resolved Hide resolved
@Treece-Burgess Treece-Burgess merged commit 2151088 into icl-utk-edu:master Sep 9, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants