Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index and other optimisations #922

Merged
merged 24 commits into from
Feb 24, 2024
Merged

Index and other optimisations #922

merged 24 commits into from
Feb 24, 2024

Commits on Feb 23, 2024

  1. Indices: optimise _calc_endemism_absolute

    Take advantage of the label hash global precalc,
    and use hash aliases instead of refs.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    11a9bbe View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    501bd7b View commit details
    Browse the repository at this point in the history
  3. TreeNode.pm: use a linear scan for get_hash_lists_below

    Might as well avoid any recursion overheads.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    f9bcf98 View commit details
    Browse the repository at this point in the history
  4. Add an array args version of set_basedata_ref

    And stop throwing errors when ref is undefined
    in get_basedata_ref.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    1e7deb8 View commit details
    Browse the repository at this point in the history
  5. Trees: clone_without_caches also clears parameters

    This was we avoid cloning basedata refs,
    analysis args and the like.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    4dec15f View commit details
    Browse the repository at this point in the history
  6. Common::get_zscore_from_comp_results - avoid a lot of grepping

    No need to find the index names when they are
    in the base_list_ref already.
    
    Also use refaliasing to avoid some derefs
    and declutter loop variables.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    965c1ad View commit details
    Browse the repository at this point in the history
  7. optimise Tree::convert_comparisons_to_significances

    Passing in the base list allows fewer grep comparisons.
    This makes a large difference when there are many lists
    with many keys.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    6ee1109 View commit details
    Browse the repository at this point in the history
  8. optimise Spatial::convert_comparisons_to_significances

    Passing in the base list allows fewer grep comparisons.
    This makes a large difference when there are many lists
    with many keys.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    80bc505 View commit details
    Browse the repository at this point in the history
  9. Indices: add a hierarchical mode flag

    This allows future optimisations when
    calculating indices for cluster trees.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    81f38b3 View commit details
    Browse the repository at this point in the history
  10. Indices: Support hierarchical calculations

    This allows several indices to be optimised when
    calculated for cluster nodes, providing they
    are done starting from the tips.
    
    PE has been optimised in this commit.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    1c6a509 View commit details
    Browse the repository at this point in the history
  11. Indices: calc_labels_not_on_tree: return early if nothing to work with

    Avoids a lot of hash creation and deletion
    with large datasets.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    5d6a8b4 View commit details
    Browse the repository at this point in the history
  12. Indices: add a hierarchical variant of get_path_lengths_to_root_node

    Speeds up PD calcs for cluster trees.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    67f655e View commit details
    Browse the repository at this point in the history
  13. Indices: _calc_endemism_hier_part: avoid some method calls

    Use a treenode method that caches, rather
    than repeatedly calling methods to get
    the same answer.
    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    6d9c553 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    8a69ac2 View commit details
    Browse the repository at this point in the history
  15. delete commented code

    shawnlaffan committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    a37dbb3 View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2024

  1. Indices: refactor hierarchical node details

    It is cleaner to pack the node and child names
    in their own structure.  That also enables
    later additions without adding yet more
    top level arguments.
    shawnlaffan committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    9b1846d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    744c430 View commit details
    Browse the repository at this point in the history
  3. TreeNode::add_to_lists: optimise

    Use direct assignment if starting with empty list.
    shawnlaffan committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    7586528 View commit details
    Browse the repository at this point in the history
  4. Cluster spatial calcs: add lists by ref

    Avoids a lot of copying.
    shawnlaffan committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    0c34d12 View commit details
    Browse the repository at this point in the history
  5. compare_lists_by_item: lift a var outside the loop

    This can be a _very_ hot loop so even small
    differences add up.
    shawnlaffan committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    4168546 View commit details
    Browse the repository at this point in the history
  6. Indices: cache the current results from each sub

    These are cleared as we go to avoid leakage.
    shawnlaffan committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    b2ee3ed View commit details
    Browse the repository at this point in the history
  7. Indices: reuse whole and central endemism results when appropriate

    If the second neighbour set is empty then the
    whole and central variants return the same results.
    So short circuit in these cases.
    shawnlaffan committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    e173a47 View commit details
    Browse the repository at this point in the history
  8. formatting

    shawnlaffan committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    16a4f86 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    62fd8a1 View commit details
    Browse the repository at this point in the history