Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow recursive keep_keys #8571

Closed
wants to merge 1 commit into from
Closed

Allow recursive keep_keys #8571

wants to merge 1 commit into from

Conversation

MPStudyly
Copy link

@MPStudyly MPStudyly commented Jul 1, 2024

SUMMARY

This PR adds a parameter and functionality to the keep_keys filter introduced by #8438 to allow recursive searches over nested lists of dicts. This is a draft for now, as it for sure needs some polishing. Also I was unsure if the checks performed should be moved to the according plugins_utils file. remove_keys and replace_keys are also not touched by this (yet), as I'd like to have some input on this first.

Due to no issue existing yet, I was unaware on how to name/number a changelog fragment appropriately. If I can simply use the ref number of the PR, I'll create one with that. If I have to open a feature request ticket, I'll do that.

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

keep_keys

ADDITIONAL INFORMATION
- name: Debug before change
  vars:
    router_static_routes:
    - address_families:
        - afi: "ipv4"
          routes:
            - description: "WANv4"
              dest: "0.0.0.0/0" # default route
              next_hops:
                - forward_router_address: "192.168.0.1"
        - afi: "ipv6"
          routes:
            - description: "WANv6"
              dest: "::/0" # default route
              next_hops:
                - forward_router_address: "2001:db8::1"
  run_once: true
  ansible.builtin.debug:
    msg: {{ groups.all | map('extract', hostvars) | community.general.keep_keys(target=['dest', 'forward_router_address']) }}

gives

TASK [Debug static routes non recursive] ****************************************
ok: [localhost] => 
  msg: []

while

- name: Debug adter change
  vars:
    router_static_routes:
    - address_families:
        - afi: "ipv4"
          routes:
            - description: "WANv4"
              dest: "0.0.0.0/0" # default route
              next_hops:
                - forward_router_address: "192.168.0.1"
        - afi: "ipv6"
          routes:
            - description: "WANv6"
              dest: "::/0" # default route
              next_hops:
                - forward_router_address: "2001:db8::1"
  run_once: true
  ansible.builtin.debug:
    msg: {{ groups.all | map('extract', hostvars) | community.general.keep_keys(target=['dest', 'forward_router_address'], recurse_lists=true)) }}

gives

TASK [Debug static routes recursive] ************************************
ok: [localhost] => 
  msg:
  - router_static_routes:
    - address_families:
      - routes:
        - dest: 0.0.0.0/0
          next_hops:
          - forward_router_address: 192.168.0.1
      - routes:
        - dest: ::/0
          next_hops:
          - forward_router_address: 2001:db8::1

Add a parameter to allow recursive search of keys to be kept
Adjust result yielding to loop over subsequent lists of dicts if requested
@MPStudyly MPStudyly marked this pull request as draft July 1, 2024 17:36
@MPStudyly MPStudyly marked this pull request as ready for review July 1, 2024 17:38
@MPStudyly MPStudyly marked this pull request as draft July 1, 2024 17:38
@ansibullbot

This comment was marked as outdated.

@ansibullbot
Copy link
Collaborator

cc @vbotka
click here for bot help

@ansibullbot ansibullbot added WIP Work in progress ci_verified Push fixes to PR branch to re-run CI feature This issue/PR relates to a feature request filter filter plugin new_contributor Help guide this first time contributor plugins plugin (any type) labels Jul 1, 2024
@vbotka
Copy link
Contributor

vbotka commented Jul 2, 2024

Get the lists you want to transform and map the filter

    t: [key_in_list, other]
    rl: "{{ testvar |
            map(attribute='list_of_dicts') |
            map('community.general.keep_keys', target=t) }}"

gives

  rl:
    - - {key_in_list: one, other: one other}
      - {key_in_list: two, other: two other}
    - - {key_in_list: three, other: three other}
      - {key_in_list: four, other: four other}

Use the filter dict_kv if you want a list of dictionaries

  rd: "{{ rl | map('community.general.dict_kv', 'list_of_dicts' ) }}"

gives what you want

  rd:
    - list_of_dicts:
      - {key_in_list: one, other: one other}
      - {key_in_list: two, other: two other}
    - list_of_dicts:
      - {key_in_list: three, other: three other}
      - {key_in_list: four, other: four other}

This use case is not consistent and can be solved with current tools. I'd propose to reject this PR.

@MPStudyly
Copy link
Author

MPStudyly commented Jul 2, 2024

Get the lists you want to transform and map the filter

    t: [key_in_list, other]
    rl: "{{ testvar |
            map(attribute='list_of_dicts') |
            map('community.general.keep_keys', target=t) }}"

gives

  rl:
    - - {key_in_list: one, other: one other}
      - {key_in_list: two, other: two other}
    - - {key_in_list: three, other: three other}
      - {key_in_list: four, other: four other}

You are right, it does. But as another lever gets added, this won't work anymore. I should have made a more accurate example of the situation I'm facing to show how this PR makes sense.

Use the filter dict_kv if you want a list of dictionaries

  rd: "{{ rl | map('community.general.dict_kv', 'list_of_dicts' ) }}"

gives what you want

  rd:
    - list_of_dicts:
      - {key_in_list: one, other: one other}
      - {key_in_list: two, other: two other}
    - list_of_dicts:
      - {key_in_list: three, other: three other}
      - {key_in_list: four, other: four other}

Again, in this case, it does. It still requires me to "hardcode" a new dict key name to restore the original layout. Also this breaks, when at least one other level is added.

This use case is not consistent and can be solved with current tools. I'd propose to reject this PR.

As explained, there is no meaningful way yet to apply this filter to deeply nested dicts with lists of dicts. Mapping the way through a dict is a possible way around (map(attribute=...)), but heavily limits flexibility, as the whole dict structure has to be known beforehand. I will update my example code to better reflect the actual problem I try to solve.

EDIT: I've updated the example @vbotka, please have another look.

@vbotka
Copy link
Contributor

vbotka commented Jul 2, 2024

There are recursive filters ansible.utils (keep_keys, remove_keys, and replace_keys) . The design of the community.general filters is simple, compact, and clear. To keep it that way, the filters are intentionally not recursive. The complexity of your PR would increase the maintenance effort not worth the use case benefit.

Instead, the filter ansible.utils.keep_keys is recursive. Quoting:

This plugin keeps only specified keys from provided data recursively

There is a bug:

Instead of fixing the filter ansible.utils.keep_keys complexity I decided to write the simple non-recursive filter community.general.keep_keys. I'm sure you understand that merging your PR would be counterproductive. If you need the recursion I suggest you fix the ansible.utils filters or write your recursive filters, e.g. keep_keys_recursive. Please feel free to use the code.

To avoid a monolithic design the code must be properly structured. If you decide to write your filter reuse the code from plugins/filter/keep_keys.py

    if matching_parameter == 'equal':
        def keep_key(key):
            return key in tt
    elif matching_parameter == 'starts_with':
        def keep_key(key):
            return key.startswith(tt)
    elif matching_parameter == 'ends_with':
        def keep_key(key):
            return key.endswith(tt)
    elif matching_parameter == 'regex':
        def keep_key(key):
            return tt.match(key) is not None

    return [dict((k, v) for k, v in d.items() if keep_key(k)) for d in data]

and create function _keep_keys in plugins/plugin_utils/keys_filter.py

def _keep_keys(data,tt, matching_parameter):
    ...

Then, import the functions in your code

from ansible_collections.community.general.plugins.plugin_utils.keys_filter import (
    _keys_filter_params,
    _keys_filter_target_str,
    _keep_keys)

@MPStudyly
Copy link
Author

MPStudyly commented Jul 11, 2024

Sorry for my late response.

There are recursive filters ansible.utils (keep_keys, remove_keys, and replace_keys) . The design of the community.general filters is simple, compact, and clear. To keep it that way, the filters are intentionally not recursive. The complexity of your PR would increase the maintenance effort not worth the use case benefit.

Okay, while I don't like the naming collision of these two filters that basically do different stuff (I know, the FQDN is different, but still), I accept that keeping complexity low is a valid reason to not merge this.

Instead, the filter ansible.utils.keep_keys is recursive. Quoting:

This plugin keeps only specified keys from provided data recursively

There is a bug:

* keep_keys doesn't exclude a key if the value is list.
  [keep_keys doesn't exclude a key if the value is list. ansible.utils#353](https://github.com/ansible-collections/ansible.utils/issues/353)

That's the exact reason we resorted to the community.general variant. We first misunderstood this to be recursive as well, which it wasn't. As we had a look at both filters, we found this one way easier to adapt than fixing the other one.

Instead of fixing the filter ansible.utils.keep_keys complexity I decided to write the simple non-recursive filter community.general.keep_keys. I'm sure you understand that merging your PR would be counterproductive. If you need the recursion I suggest you fix the ansible.utils filters or write your recursive filters, e.g. keep_keys_recursive. Please feel free to use the code.

I'll see if I can have another look at the ansible.utils filter. We found that code unnecessarily hard to grasp. For now we're working with a local copy of this filter only, as that suffices our needs.

To avoid a monolithic design the code must be properly structured. If you decide to write your filter reuse the code from plugins/filter/keep_keys.py

    if matching_parameter == 'equal':
        def keep_key(key):
            return key in tt
    elif matching_parameter == 'starts_with':
        def keep_key(key):
            return key.startswith(tt)
    elif matching_parameter == 'ends_with':
        def keep_key(key):
            return key.endswith(tt)
    elif matching_parameter == 'regex':
        def keep_key(key):
            return tt.match(key) is not None

    return [dict((k, v) for k, v in d.items() if keep_key(k)) for d in data]

and create function _keep_keys in plugins/plugin_utils/keys_filter.py

def _keep_keys(data,tt, matching_parameter):
    ...

Then, import the functions in your code

from ansible_collections.community.general.plugins.plugin_utils.keys_filter import (
    _keys_filter_params,
    _keys_filter_target_str,
    _keep_keys)

Thx for the input! This is why the whole PR is tagged as a draft, this code wasn't meant to be merged as is anyway, as there is quite some stuff missing (e.g. remove_keys, replace_keys adaptions, tests, etc.) :)

How should we proceed with this? Should we close the PR and move on at ansible.utils or is this still of relevance for the community.general collection?

EDIT: I just noticed your issue/PR which referenced this. As far as the examples go, it looks like it is doing the exact same thing, but I don't really get the naming there.

@ansibullbot ansibullbot added has_issue stale_ci CI is older than 7 days, rerun before merging labels Jul 11, 2024
@vbotka
Copy link
Contributor

vbotka commented Jul 11, 2024

How should we proceed with this? Should we close the PR and move on at ansible.utils or is this still of relevance for the community.general collection?

Yes. I think this PR should be closed. I think the justification to solve this use case by this filter is weak. IMO, if the data are reported by a device you might want to parse it first. There is a new universe in Parsing semi-structured text with Ansible. If this is configuration data you might find a better structure that fits your use case. A custom filter might be a solution too. Generally, the below structure is wrong

    - ld:
      - {key: 1, other: other1}
      - {key: 2, other: other2}
    - ld:
      - {key: 3, other: other3}
      - {key: 4, other: other4}

This can be reduced to what you ultimately need (optionally, including other dictionaries)

  ld:
    1: other1
    2: other2
    3: other3
    4: other4

, or if you need the batches

  ld:
    - 1: other1
      2: other2
    - 3: other3
      4: other4

EDIT: I just noticed your issue/PR which referenced this. As far as the examples go, it looks like it is doing the exact same thing, but I don't really get the naming there.

Yes. The filter community.general.reveal_ansible_type and the test community.general.ansible_type should help to solve use cases like yours. For more details see the plugin examples or the integration test. There are also my public notes.

@MPStudyly
Copy link
Author

MPStudyly commented Jul 15, 2024

How should we proceed with this? Should we close the PR and move on at ansible.utils or is this still of relevance for the community.general collection?

Yes. I think this PR should be closed. I think the justification to solve this use case by this filter is weak. IMO, if the data are reported by a device you might want to parse it first. There is a new universe in Parsing semi-structured text with Ansible. If this is configuration data you might find a better structure that fits your use case. A custom filter might be a solution too. Generally, the below structure is wrong

    - ld:
      - {key: 1, other: other1}
      - {key: 2, other: other2}
    - ld:
      - {key: 3, other: other3}
      - {key: 4, other: other4}

The structure is actually required by the collection we are using to configure routers. Though, we came to a similar conclusion that treating this structure as source of truth is not optimal. Some restructuring already cleared most of the issues we had in this regard.

Principally I don't see why such structures should be "wrong" per se, as nested lists of dicts are way easier to traverse in a generic way than nested dicts. I know there is dict2items, but that often unnecessarily bloats templates, at least in our use cases.

EDIT: I just noticed your issue/PR which referenced this. As far as the examples go, it looks like it is doing the exact same thing, but I don't really get the naming there.

Yes. The filter community.general.reveal_ansible_type and the test community.general.ansible_type should help to solve use cases like yours. For more details see the plugin examples or the integration test. There are also my public notes.

I'm not 100% convinced yet that it would really help us/our use case (if we didn't work around it yet), as it is still required to generate a list of dicts from a list of lists using a statically defined key. I appreciate your work though!

@MPStudyly MPStudyly closed this Jul 15, 2024
@vbotka
Copy link
Contributor

vbotka commented Jul 15, 2024

    - ld:
      - {key: 1, other: other1}
      - {key: 2, other: other2}
    - ld:
      - {key: 3, other: other3}
      - {key: 4, other: other4}

... I don't see why such structures should be "wrong" per se, as nested lists of dicts are way easier to traverse
in a generic way than nested dicts. I know there is dict2items, but that often unnecessarily bloats templates

The dictionary's lower complexity O(1) should be chosen over list O(n) when possible. See TimeComplexity. The benefit of lower complexity exceeds the cost of Ansible filter dict2items, or Python method .items(). The tools (filters, tests, methods, ...) are designed to cover rational choices.

@MPStudyly
Copy link
Author

    - ld:
      - {key: 1, other: other1}
      - {key: 2, other: other2}
    - ld:
      - {key: 3, other: other3}
      - {key: 4, other: other4}

... I don't see why such structures should be "wrong" per se, as nested lists of dicts are way easier to traverse
in a generic way than nested dicts. I know there is dict2items, but that often unnecessarily bloats templates

The dictionary's lower complexity O(1) should be chosen over list O(n) when possible. See TimeComplexity. The benefit of lower complexity exceeds the cost of Ansible filter dict2items, or Python method .items(). The tools (filters, tests, methods, ...) are designed to cover rational choices.

Thank you for explaining this, I was completely ignorant about these types of complexity in this context 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci_verified Push fixes to PR branch to re-run CI feature This issue/PR relates to a feature request filter filter plugin has_issue new_contributor Help guide this first time contributor plugins plugin (any type) stale_ci CI is older than 7 days, rerun before merging WIP Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants