Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add ability to set things to be visible if loaded in dbgap #19

Closed
wants to merge 5 commits into from

Conversation

chris-s-friedman
Copy link
Collaborator

@chris-s-friedman chris-s-friedman commented Mar 9, 2022

see #18

I added a flag to dbgapconsent so that every sample that is loaded in dbgap for a particular study (along with descendants and associated participants) be set to be visible in the dataservice.

@fiendish
Copy link
Contributor

fiendish commented Mar 9, 2022

You should update the visibility rules declared in the readme.

@fiendish fiendish closed this Mar 9, 2022
@fiendish fiendish reopened this Mar 9, 2022
@fiendish fiendish changed the title ✨ add ability to set things to be visible if loaded in dbgap ✨ Add ability to set things to be visible if loaded in dbgap Mar 9, 2022
@fiendish
Copy link
Contributor

fiendish commented Mar 9, 2022

Also, I know it's annoying, but if you want emojis to work right in your release notes, you need to either use the actual glyph in the PR title and not the :sparkles: shortcode or make changes to https://github.com/d3b-center/d3b-release-maker

kf_update_dbgap_consent/app/cli.py Outdated Show resolved Hide resolved
kf_update_dbgap_consent/app/cli.py Outdated Show resolved Hide resolved
self.db_url or self.api_url,
"biospecimens",
list(unhidden_specimens.keys()),
ignore_gfs_with_hidden_external_contribs=False,
Copy link
Contributor

@fiendish fiendish Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ignore_gfs_with_hidden_external_contribs should be True here. You don't want to unhide a multi-specimen GF that has other specimens in it that are still hidden. You want to follow the internal behavior of unhide_descendants_by_kfids https://github.com/kids-first/kf-utils-python/blob/9ff5997566204a7375e8c68176d1e7e982f8b5c1/kf_utils/dataservice/descendants.py#L506 and note how it differs from its sibling hide function.

Copy link
Collaborator Author

@chris-s-friedman chris-s-friedman Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

That said, i don't think ignore_gfs_with_hidden_external_contribs should be True. I think the correct action is to get gfs_with_hidden_external_contribs and then remove them from my list of files to hide if one of the external contribs is not loaded in dbgap.

Otherwise, i could have a situation where an external_contribs is currently hidden but should be made visible because it's loaded in dbgap and just not visible at the time of descendant discovery

chris-s-friedman and others added 2 commits March 10, 2022 09:13
Co-authored-by: Avi Kelman <patcherton.fixesthings@gmail.com>
@chris-s-friedman chris-s-friedman self-assigned this Mar 10, 2022
@chris-s-friedman chris-s-friedman added the enhancement New feature or request label Mar 10, 2022
how to handle gfs with hidden_external_contribs that will end up being not hidden? maybe a way to do this is to tell the user "run the tool twice"? i don't like that. Maybe get all the descendants of a study and then work through them to see if they should or should not be hidden in the end....
@chris-s-friedman
Copy link
Collaborator Author

I've been thinking about this some last week and over the weekend. I no longer think it's a good idea to bundle setting visibility in with dbgapconsent.

The foremost reason why I'm closing this PR is that dbgapconsent can't know when things - files, specimens, participants, or families 1 - should remain hidden even if the associated specimen is loaded into dbgap. The way we've been setting visibility before has been by running a one-off script written by the data analyst to set visibility appropriately and then run the acl tool. If we were to continue moving forward with this idea of setting visibility in the acl tool, in many instances, a data analyst would still need to go back and set visibility for files, specimens, participants, or families that should not be on the portal.

Perhaps there should be some functionality in kf-utils to generate a list of entities taking as input the dbgap xml file that a data analyst can then use to set visibility based on whatever conditions are appropriate.

Because of the above issue, i'm going to close this PR.

Just to document the paths that I was going down before I came to the above realization about needing to o some manual-ish work with visibility:

I'm really glad that @fiendish brought up genomic files with hidden contributing samples. Where I got stuck with handling these cases was how those files would be gathered. I explored two paths:

  1. gather descendants of specimens loaded into dbgap with ignore_gfs_with_hidden_external_contribs=False. What would then have to happen is iterating through each genomic_file to check if all its associated specimens are loaded in dbgap. If not, then the genomic file and all of its descendants should be left to whatever visibility they have 2.
  2. gather descendants of specimens loaded into dbgap with ignore_gfs_with_hidden_external_contribs=True... twice: Once where it happens now and then again after specimens that should be visible are set to visible 3.

The downside with option one is that it is.... complicated and intensive because for each genomic file, there would need to be a query to see what biospecimens that genomic_file is descendant from. I'm worried that making all those queries would slow down the tool too much.

The most prominent downside to option 2 is that the dry_run wouldn't give a full picture of what actually would be patched. because some genomic files wouldn't be able to be gathered until patching is done.

Footnotes

  1. e.g, file is corrupted or was regenerated by the sequencing center or the bix team, specimen failed bix QC, participant doesn't have sufficient/ valid clinical data, or there was a peddy issue related to the family in question.

  2. When I first was putting this together, i didn't think of setting things not in dbgap to hidden... perhaps that is the way this should have gone: don't just set things that should be visible to visible; set things that should be hidden to hidden too.

  3. I'm leaving when that second set of gathering, patch construction, and patch sending happens vague. Would it happen at the tail end of sample_status.py? Maybe a second call to get_patches_for_study() in the cli? maybe just explicitly tell users in multiple places "run dbgapconsent twice to get all possible things."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants