Abhijeet per camera gpu #253

craigwarner-ufastro · 2023-09-19T04:30:13Z

Use batch rebin, transmission_Lyman, and calc_zchi2 operations.

Speed gains: without archetypes, redrock on 4 GPU / 4 CPU takes 14.8s
reported total run time, 7.3s of which is in the fine redshift scan.
Comparatively with 64 CPU and 0 GPU the base redrock runs in 40.0s
reported total run time with 6.7s spent in fine redshift scan.

Adding the base archetypes option (without per-camera or nearest
neighbor) raises CPU times to 63.8s overall and 28.1s in fine z scan so
about 60% increase overall and about 4x slower in fine z scan.

With the new code the "batch" CPU mode slightly improves this to 60.0s
and 24.2s.

With the new GPU code, it runs on 4 GPU / 4 CPU in 22.8s total and 14.3s
for fine z scan, a 50% overall increase but only 2x speed increase in
the fine z scan, a big improvement from CPU times.

Also updated transmission_Lyman to return None when given scalar z to
match behavior when given an array of redshifts, in the case where the
wavelength range is not affected by Lyman transmission. There is no
need in this case to calculate an array of all ones and then
additionally multiply the rebinned data by it.

Have yet to update --per-camera and -n_nearest options so right now
there are placeholders so that it does not crash that simply loop over
the existing CPU-mode logic. These will be updated shortly.

craigwarner-ufastro · 2023-09-26T20:42:31Z

Timing update - rows after the blank line are new results

Mode            CPU     GPU     Findz   Total   Clock
Orig             4      4        7.3    14.8    18.6
Orig            64      0        6.7    40.0    49.1
Arch only       64      0       28.1    63.8    71.7
Arch only batch 64      0       24.2    60.0    68.0
Arch only GPU    4      4       14.3    22.8    27.4
Arch+PC+NN      64      0       40.1    76.4    93.5
Arch+PerCam      4      4       376.7  389.0   400.6

Arch only batch 64      0       24.3    60.1    67.9
Arch only GPU    4      4       12.7    21.8    26.0
Arch+NN batch   64      0       24.6    60.3    69.7
Arch+NN GPU      4      4       15.5    23.9    28.1

Modified solve_matrices algorithm to accept PCA or BVLS as a method. Re-orgianized logic for GPU to take advantage of existing calc_zchi2_batch algorithm. For some strange reason refactoring tdata to add rows for color data in batch in 3d array on CPU takes way longer than a python loop over 2d arrays so I left two methods for per_camera - one for GPU and one for CPU - for now while this gets resolved. This is not the cleanest but it is the fastest and we can revisit if we change our mind on that.

cleanly without a speed loss in runtime. Removed old methods Tbs_for_archetypes, per_camera_coeff_with_least_square, and per_camera_coeff_with_least_square_cpu and renamed the remaining method to per_camera_coeff_with_least_square_batch. Updated 1->n_nbh where Abhijeet pointed out errors had been made.

coveralls · 2023-11-02T18:59:42Z

coverage: 35.613% (-0.5%) from 36.147%
when pulling bd5983a on abhijeet_per_camera_gpu
into 201399a on abhijeet_per_camera.

Rolled back changes by Abhijeet done while attempting to merge to restore per camera batch method.

craigwarner-ufastro · 2023-11-15T19:18:15Z

Added NNLS to solve_matrices and restored per_camera_coeff_with_least_square_batch in zscan.py from my previous version.

sbailey · 2023-11-28T22:44:15Z

See #259 . I think we want to merge these changes into the abhijeet_per_camera_priors_gpu branch, rather than the abhijeet_per_camera branch which appears to be superseded.

…et_per_camera_prior_gpu. Added prior as optional arg to calc_zchi2_batch to do this.

sbailey · 2024-01-04T01:05:15Z

changes in this PR were incorporated into PR #255, so closing this one without merging.

craigwarner-ufastro changed the base branch from main to abhijeet_per_camera September 19, 2023 04:33

craigwarner-ufastro and others added 7 commits October 4, 2023 15:05

prior_sigma arg added

d430674

print removed

bdc961f

case for zero ivard added to bvls

5ba6769

Update README.rst

af37f26

Merge branch 'abhijeet_per_camera' into abhijeet_per_camera_gpu

0cec7ee

abhi0395 and others added 3 commits November 3, 2023 09:37

nearest nbh argument corrected

ced3c01

nearest nbh argument corrected

c1528e1

Added NNLS as an option to solve_matrices in zscan.

a4b8b80

Rolled back changes by Abhijeet done while attempting to merge to restore per camera batch method.

Merged abhijeet_per_camera_gpu and abhijeet_per_camera_gpu and abhije…

bd5983a

…et_per_camera_prior_gpu. Added prior as optional arg to calc_zchi2_batch to do this.

sbailey closed this Jan 4, 2024

sbailey deleted the abhijeet_per_camera_gpu branch January 4, 2024 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abhijeet per camera gpu #253

Abhijeet per camera gpu #253

craigwarner-ufastro commented Sep 19, 2023

craigwarner-ufastro commented Sep 26, 2023

coveralls commented Nov 2, 2023 •

edited

Loading

craigwarner-ufastro commented Nov 15, 2023

sbailey commented Nov 28, 2023

sbailey commented Jan 4, 2024

Abhijeet per camera gpu #253

Abhijeet per camera gpu #253

Conversation

craigwarner-ufastro commented Sep 19, 2023

craigwarner-ufastro commented Sep 26, 2023

coveralls commented Nov 2, 2023 • edited Loading

craigwarner-ufastro commented Nov 15, 2023

sbailey commented Nov 28, 2023

sbailey commented Jan 4, 2024

coveralls commented Nov 2, 2023 •

edited

Loading