-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abhijeet per camera gpu #253
Conversation
Timing update - rows after the blank line are new results
|
Modified solve_matrices algorithm to accept PCA or BVLS as a method. Re-orgianized logic for GPU to take advantage of existing calc_zchi2_batch algorithm. For some strange reason refactoring tdata to add rows for color data in batch in 3d array on CPU takes way longer than a python loop over 2d arrays so I left two methods for per_camera - one for GPU and one for CPU - for now while this gets resolved. This is not the cleanest but it is the fastest and we can revisit if we change our mind on that.
cleanly without a speed loss in runtime. Removed old methods Tbs_for_archetypes, per_camera_coeff_with_least_square, and per_camera_coeff_with_least_square_cpu and renamed the remaining method to per_camera_coeff_with_least_square_batch. Updated 1->n_nbh where Abhijeet pointed out errors had been made.
Rolled back changes by Abhijeet done while attempting to merge to restore per camera batch method.
Added NNLS to |
See #259 . I think we want to merge these changes into the abhijeet_per_camera_priors_gpu branch, rather than the abhijeet_per_camera branch which appears to be superseded. |
…et_per_camera_prior_gpu. Added prior as optional arg to calc_zchi2_batch to do this.
changes in this PR were incorporated into PR #255, so closing this one without merging. |
Use batch rebin, transmission_Lyman, and calc_zchi2 operations.
Speed gains: without archetypes, redrock on 4 GPU / 4 CPU takes 14.8s
reported total run time, 7.3s of which is in the fine redshift scan.
Comparatively with 64 CPU and 0 GPU the base redrock runs in 40.0s
reported total run time with 6.7s spent in fine redshift scan.
Adding the base archetypes option (without per-camera or nearest
neighbor) raises CPU times to 63.8s overall and 28.1s in fine z scan so
about 60% increase overall and about 4x slower in fine z scan.
With the new code the "batch" CPU mode slightly improves this to 60.0s
and 24.2s.
With the new GPU code, it runs on 4 GPU / 4 CPU in 22.8s total and 14.3s
for fine z scan, a 50% overall increase but only 2x speed increase in
the fine z scan, a big improvement from CPU times.
Also updated transmission_Lyman to return None when given scalar z to
match behavior when given an array of redshifts, in the case where the
wavelength range is not affected by Lyman transmission. There is no
need in this case to calculate an array of all ones and then
additionally multiply the rebinned data by it.
Have yet to update --per-camera and -n_nearest options so right now
there are placeholders so that it does not crash that simply loop over
the existing CPU-mode logic. These will be updated shortly.