Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find parsing bottle necks. #137

Closed
MicahGale opened this issue Feb 22, 2023 · 12 comments
Closed

Find parsing bottle necks. #137

MicahGale opened this issue Feb 22, 2023 · 12 comments
Labels
code improvement A feature request that will improve the software and its maintainability, but be invisible to users. good first issue Good for newcomers performance 🐌 Issues related to speed and memory
Milestone

Comments

@MicahGale
Copy link
Collaborator

Currently an ATR whole core model takes 40 seconds to parse. This would be a good task to identify what the bottle neck is in reading large models, and then implement some optimizations.

@MicahGale
Copy link
Collaborator Author

I suspect this is all sly being slow because parsing is difficult. One option would be to multi-thread the parsing as we won't be resource constrained by the file I/O.

@tjlaboss
Copy link
Collaborator

sly is slow, but it isn't the bottleneck. Here are some highlights from preliminary profiling I did on the model I've been working with.

>>> p.sort_stats(SortKey.CUMULATIVE, SortKey.TIME).print_stats(20, 'sly')
Mon Aug 26 08:32:48 2024    Profile1

         521393177 function calls (502227196 primitive calls) in 157.902 seconds

   Ordered by: cumulative time, internal time
   List reduced from 1903 to 20 due to restriction <20>
   List reduced from 20 to 1 due to restriction <'sly'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    19497    2.517    0.000   15.187    0.001 sly/yacc.py:2064(parse)
>>> p.sort_stats(SortKey.CUMULATIVE, SortKey.TIME).print_stats(20, 'montepy')
Mon Aug 26 08:32:48 2024    Profile1

         521393177 function calls (502227196 primitive calls) in 157.902 seconds

   Ordered by: cumulative time, internal time
   List reduced from 1903 to 20 due to restriction <20>
   List reduced from 20 to 14 due to restriction <'montepy'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  149.982  149.982 montepy/input_parser/input_reader.py:6(read_input)
        1    0.113    0.113  149.981  149.981 montepy/mcnp_problem.py:236(parse_input)
        1    0.006    0.006   74.486   74.486 montepy/mcnp_problem.py:304(__update_internal_pointers)
     1396    0.810    0.001   59.804    0.043 montepy/data_inputs/material.py:173(update_pointers)
   378833    0.143    0.000   57.314    0.000 montepy/data_inputs/material.py:259(__eq__)
   378833    8.856    0.000   56.984    0.000 montepy/data_inputs/material.py:243(__hash__)
    34037    2.443    0.000   49.055    0.001 montepy/numbered_object_collection.py:192(append)
 21970084   10.758    0.000   46.409    0.000 montepy/numbered_object_collection.py:75(numbers)
 52957827   25.651    0.000   45.834    0.000 montepy/utilities.py:76(getter)
 24587278   11.843    0.000   16.076    0.000 montepy/data_inputs/isotope.py:221(__lt__)
    97257    0.161    0.000   15.782    0.000 montepy/mcnp_object.py:37(__init__)
  7322705   10.445    0.000   15.606    0.000 montepy/data_inputs/isotope.py:213(__str__)
    19497    0.051    0.000   15.258    0.001 montepy/input_parser/parser_base.py:133(parse)
    10368    0.083    0.000   11.501    0.001 montepy/cells.py:22(__setup_blank_cell_modifiers)

Updating internal pointers takes about half the runtime. For my ATR experiment depletion model, the lion's share of that was the material pointers, which makes sense given the large number of complicated materials. This involves a huge number of hashing operations, each of which presently requires sorting material isotopes. This is a bottleneck that we could address in or after #507.

Another thing that sticks out is the time spent in NumberedObjectCollection.numbers,

Third, there are the magic property utilities. The decorator getter was called 53 million times in the example above. Heavy usage of utilities.make_prop_val_node() might be putting too much syntactic sugar in the proverbial gas tank.

@MicahGale
Copy link
Collaborator Author

So my first thought: It would be interesting to run this profile for every release, and see if we can find specific changes that are very costly.

@MicahGale
Copy link
Collaborator Author

MicahGale commented Aug 26, 2024

My thoughts so far on the these results.

  1. I realized a very bad implementation detail: In MCNP_Probelm.update_pointers there is a for data_input in self.data_inputs loop effective.
    1. This then calls Material.update_pointers which has: for data_input in data_inputs: loop.
    2. That's right: I wrote an O(N^2) operation! 🎉 🎉 I found the worst implementation to do.
    3. This operation is linking M to MT. If we did vice versa MT could use hashing to move to an O(1) operation.
    4. This is an easy fix (see Fix O(N^2) operation in Material #510).
  2. Why is Material.__hash__ getting called so much? NumberedObjectCollection is meant to hash on numbers and not objects I thought. There is something weird going on here that needs further investigation.
  3. NumberedObjectCollection.append is called a lot and is expensive. I suspect this is because the num_cache dict has to be regrown a lot, which likely requires rehashing everything. This could be fixed by having a temperorary list and then filling the collection all at once so hashing operations are done once, and no resizing is needed: https://stackoverflow.com/questions/1298636/how-to-set-initial-size-for-a-dictionary-in-python.
    1. We may need to create/optimize a function for inserting a whole list of items such that internal checks and hashing (.numbers) are minimized.
  4. Isotope.__lt__ is only used for sorting, AFAIK. Why are we sorting MaterialComponent so much?
  5. utilities.make_prop_val_node() why do you think this is causing issues @tjlaboss? I don't see it showing up in these stats at all. The syntactic sugar I believe should be on par with @property for performance. Though it may be moving some of the __getattr__ code out of cpython. I don't know how @property is implemented under the hood and if we are thrashing that.
  6. I think we should have a profiling test where we profile at least every PR. The hardest part I think will be coming up with a large MCNP model that can be publicly distributed.

@tjlaboss
Copy link
Collaborator

4. Isotopes are sorted every time the parent material is hashed.

5. 52957827 25.651 0.000 45.834 0.000 montepy/utilities.py:76(getter)

6. VERA or BEAVRS are two possibilities.

@MicahGale
Copy link
Collaborator Author

MicahGale commented Aug 26, 2024

Do you know when Material is getting hashed?

Ohh dang I missed that in the montepy/utilities. Yeah this is worth looking into and seeing if it can be optimized. I think part of it is that almost all properties go through it so it just gets hammered. Worth seeing if it is actually that sub-optimal. I'm getting 4e-7 seconds in the method and 8.65e-7 seconds total per call.

Are there MCNP models freely available of BEAVRS/VERA? My 2 second search yielded nothing.

@tjlaboss
Copy link
Collaborator

tjlaboss commented Aug 26, 2024

Do you know when Material is getting hashed?

Material.update_pointers -> data_inputs.remove -> list.remove -> __eq__ -> __hash__

@MicahGale
Copy link
Collaborator Author

Oooofff yikes. So going to add that to scope of #510.

@MicahGale
Copy link
Collaborator Author

I have noticed that append_renumber is very slow. I suspect that it's just trashing like crazy to check all existing numbers. I would if there is someway to check for numbers that have changed.

@tjlaboss tjlaboss added the performance 🐌 Issues related to speed and memory label Aug 27, 2024
@tjlaboss
Copy link
Collaborator

With #518, there is a marked decrease in MontePy runtime, largely in __update_internal_pointers due to improvements in M/MT linking:

>>> p.sort_stats(SortKey.CUMULATIVE, SortKey.TIME).print_stats(20, 'montepy')
Tue Aug 27 20:37:09 2024    Profile2

         334195605 function calls (329836095 primitive calls) in 103.127 seconds

   Ordered by: cumulative time, internal time
   List reduced from 1919 to 20 due to restriction <20>
   List reduced from 20 to 10 due to restriction <'montepy'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   82.291   82.291 montepy/input_parser/input_reader.py:6(read_input)
        1    0.097    0.097   82.290   82.290 montepy/mcnp_problem.py:236(parse_input)
    34037    2.280    0.000   44.285    0.001 montepy/numbered_object_collection.py:192(append)
 21970084   10.021    0.000   41.825    0.000 montepy/numbered_object_collection.py:75(numbers)
 45370107   19.781    0.000   33.022    0.000 montepy/utilities.py:76(getter)
        1    0.005    0.005   14.499   14.499 montepy/mcnp_problem.py:304(__update_internal_pointers)
    97257    0.149    0.000   14.066    0.000 montepy/mcnp_object.py:37(__init__)
        1    0.000    0.000   13.797   13.797 montepy/__init__.py:2(<module>)
    19497    0.043    0.000   13.627    0.001 montepy/input_parser/parser_base.py:133(parse)
        1    0.000    0.000   13.200   13.200 montepy/input_parser/__init__.py:2(<module>)

@MicahGale
Copy link
Collaborator Author

I think numbered_object_collection is the next target for optimization.

@MicahGale
Copy link
Collaborator Author

I think this has been fixed in 0.5.0 with #510 and #556. I think this has been resolved to the point that sly is the primary suspect, and we can get this open in #509.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code improvement A feature request that will improve the software and its maintainability, but be invisible to users. good first issue Good for newcomers performance 🐌 Issues related to speed and memory
Projects
None yet
Development

No branches or pull requests

2 participants