-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ray object dl for SPH data: interpolate kernel table for (b/hsml)**2 instead of b/hsml #4783
Conversation
Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution! |
I believe that the analysis in #4781 is correct! Thank you for your detailed report, @nastasha-w -- this is a great find, and very appreciated. I am going to mark approve, but I think for a change of this magnitude (even if it is just one line!) we do need another set of eyes. |
This is completely out of my expertise so I'll leave it to others to review; I just wanted to ask to anybody who'd want to push the button to please triage this to 4.3.1 or 4.4.0 depending on the expected impact ! thanks ! |
@matthewturk Thanks! It agree it's best to be sure about this. I did add a test for Ray objects to this pull request. I don't think it's entirely complete; for one thing, the mass / density factor in the The new test is in test_rays.py: |
pre-commit.ci autofix |
@matthewturk @jzuhone @langmm @chummels could you take a look at this pull request and approve it or give me some feedback? This slipped my mind for quite a while, but I do think there is an error in the code that this fixes. I merged in the current (well, very recent at least) main branch, and the tests are passing now. |
@neutrinoceros can I ask what the 4.4.0 milestone means? The github docs didn't seem very helpful on this. |
It means I think it's reasonable to expect this will be merged before we release yt 4.4.0, but it's more of a wishlist and less of a hard requirement. |
@nastasha-w I am satisfied with this. @chummels let us know what you think, @nastasha-w added some tests which I think are helpful. |
@jzuhone thanks! |
@chummels we really need you to have a look at this. We should absolutely merge it early next week. |
marking this as a blocker as per John's message on the mailing list. |
pre-commit.ci autofix |
The latest |
genuine question: do you mean the current state of this PR ? (#4783) |
nevermind, the message you wrote on the issue is unambiguous ... and this PR is already linked. |
@nastasha-w could you rebase this on |
I seem to have completely messed something up here...github seems to now be attributing all the changes on main to this branch, and some of the pre-commit.ci comments suggest some files got messed up too. Update: ok, the messed up file seems to have been one function that just got completely duplicated in the same file somewhere. That's fixed now. |
See issue 4781 for a description of the (possible) bug this fixes.
7872048
to
1e17256
Compare
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Welp, I ended up just locally saving copies of the files I actually changed in a different directory and pretty much rebase-nuking the whole branch back to the first commit. It worked well enough given the small number of changes involved. This does still incorporate the issue #4991 solution, since in the new history, I jumped straight to importing the functions I needed for the Ray tests from the I'm sorry to anyone whose reviews or tests got messed up. |
OK, apologies for all of the delays on this. The main hold up with this PR on my side has been getting an old PR I implemented a few years back to work properly. I realize that code-wise, this PR is very minor, but it has significant implications for the behavior of the The primary way I can envision testing this change to ensure consistent behavior between grid-based and particle-based snapshots, is simply to replicate a grid-based simulation snapshot into a particle-based simulation snapshot, apply the same code on both, and see if the PR gives the same results between the two. I wrote some code a few years ago that does just this, converting a grid-based sim output into a particle-based sim output, but it never ended up getting merged. PR #2187 . It works by simply monte carlo sampling a grid-based code into discrete particles, then loading those particles into a dataset using the load_particles() functionality. Amazingly, the PR basically works despite its age (one just has to change the import for the load_particles() call into yt.loaders ). One gets good behavior with this PR when one just increases the number of particles beyond the 1e5 val to 1e6 or greater. The main challenge I’m facing is that when I send a ray object through a dataset that has been created using If anyone has any ideas on how to get datasets produced using For reference, this is the code I'm trying to use:
And the error that it invokes is:
|
@chummels yeah, I ran into that same problem in my own tests... I think the check on which dt/dts generation method to use doesn't include the StreamParticleDataset option in the things that would make something an SPH simulation, so it tries to use the grid methods, which then fail. ds = fake_sph_grid_ds(hsml_factor=1.0)
ds.kernel_name = "cubic"
## Ray through the one particle at (0.5, 0.5, 0.5):
## test basic kernel integration
eps = 0.0 # 1e-7
start0 = np.array((1.0 + eps, 0.0, 0.5))
end0 = np.array((0.0, 1.0 + eps, 0.5))
ray0 = ds.ray(start0, end0)
b0 = np.array([np.sqrt(2.0) * eps])
hsml0 = np.array([0.05])
len0 = np.sqrt(np.sum((end0 - start0) ** 2))
# for a ParticleDataset like this one, the Ray object attempts
# to generate the 't' and 'dts' fields using the grid method
ray0.field_data["t"] = ray0.ds.arr(ray0._generate_container_field_sph("t"))
ray0.field_data["dts"] = ray0.ds.arr(ray0._generate_container_field_sph("dts")) |
Specifically, in the field = data_source._determine_fields(field)[0]
finfo = data_source.ds.field_info[field]
particle_datasets = (ParticleDataset, StreamParticlesDataset)
is_sph_field = finfo.is_sph_field
elif isinstance(data_source.ds, particle_datasets) and is_sph_field:
# do SPH stuff (Note that Would it be worth changing that bit of the code in |
@nastasha-w : Thanks for the suggestions! I tried to add in the "by-hand" call to generate the
|
@chummels That's what this line is for: ds.kernel_name = "cubic" the |
Guys, as much as it pains me to say this: if this is indeed a bugfix, it could easily be release in yt 4.4.1. Should this really block 4.4.0 ? we are more than a month beyond "schedule" (there's no actual schedule, but this finalizing yt 4.4.0 has been on my todo list for 2 months now and I'd really like to get it out the door). |
I think the motivation for getting this out in a feature release is that it significantly changes the output for these methods so it feels a bit bigger than a standard bugfix -- even though the api isn't breaking it is in some sense breaking expected behavior relative to prior behavior? that said IMO it'd be OK to release in 4.4.1 but it'd be good to send out some broader messages to the email list and slack to note the change when/if it goes in (which I'm happy to help send out). And if an email goes out for 4.4.0 it might be worth adding an extra note to bring attention to this PR just to put it on people's radar for 4.4.1. |
@neutrinoceros this is a big enough bug it warrants inclusion in a feature release. We need to wait on 4.4.0 for this. I will chat with @chummels about this today. |
I mean nothing forbids doing a 4.5.0 soon after 4.4.0. |
@neutrinoceros I talked to @chummels today and I can predict with confidence that we can get this into 4.4.0 |
OK, I was at speaking at a dark sky festival all weekend, which precluded me from working on this until today. But thanks to @nastasha-w 's suggestion, I was able to get my testing code to work. As discussed above, I take a grid-based dataset and create a mirrored particle based dataset, populating it by use of a monte carlo method. Then I simply place a number of random The column density distribution should be the same in the grid and the particle datasets, so we can use this test to identify whether the PR addresses the problem and whether there is a problem or not. I tested this out on two datasets, the In both datasets, we see the same behavior, that prior to this PR, the bug revealed itself with particle-based datasets reporting column densities too low by a factor of 3 or greater. After the PR, these differences between particle- and grid-based datasets are statistically speaking gone. Thus, I am convinced that this PR addresses the bug, and it should absolutely be included in the codebase. Many apologies for all of the holdups on my end, and thank you for your patience. I just wanted to ensure that this was correct, as this will change previously published results for a number of papers that used Trident to generate spectra with particle-based datasets like FIRE, TNG, Eagle, etc. I think we need to have a conversation about how to report this bugfix to the community, but I don't want it to hold up this PR any longer. Many many thanks go out to @nastasha-w for catching this subtle and deep bug, and for coming up with a way to resolve the problem. Great work! |
@chummels Thanks for doing this detailed testing! |
thanks all !
yes please. I usually just include the PR title in the release notes, except for highlights, and this definetly feels like one. I'd like to include a 2 or 3 lines summary of the change, though I trust it would be best handled by you guys. |
Fixes a (possible) bug in the calculation of path lengths dl for SPH/non-grid data in the YT Ray objects. This affects e.g., column density and absorption spectrum calculations, like those done in the Trident package.
PR Summary
Replace
dl = itab.interpolate_array(b / hsml) * mass / dens / hsml**2
by
dl = itab.interpolate_array((b / hsml)**2) * mass / dens / hsml**2
in the
_generate_container_field_sph
method of the YT Ray object (https://github.com/yt-project/yt/blob/main/yt/data_objects/selection_objects/ray.py).This is the proposed fixed for the possible bug mentioned in issue #4781.
In short, the
_generate_container_field_sph
retrieves path lengths dl for normalized SPH kernels by linearly interpolating values calculated for a smallish set of (impact parameter / smoothing length) values. However, if I'm not mistaken, the table actually stores those dl values for different values of (impact parameter / smoothing length)^2. That means we need to input (b / hsml)**2 into the interpolation function, not (b / hsml).PR Checklist
The only documentation issue I can see here is that if this was indeed a bug, we should probably send out some message warning people who might have used/be using Ray objects for papers about the issue. As for tests, @chummels metioned a possible test on the slack channel involving SPH-ifying a grid dataset and comparing the surface/column densities retrieved from both.