Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs in is_registry_path #456

Closed
nsheff opened this issue Feb 15, 2024 · 11 comments
Closed

Bugs in is_registry_path #456

nsheff opened this issue Feb 15, 2024 · 11 comments

Comments

@nsheff
Copy link
Contributor

nsheff commented Feb 15, 2024

this is poorly done:

looper/looper/utils.py

Lines 578 to 595 in 1468956

def is_registry_path(input_string: str) -> bool:
"""
Check if input is a registry path to pephub
:param str input_string: path to the PEP (or registry path)
:return bool: True if input is a registry path
"""
try:
if input_string.endswith(".yaml"):
return False
except AttributeError:
raise RegistryPathException(
msg=f"Malformed registry path. Unable to parse {input_string} as a registry path."
)
try:
registry_path = RegistryPath(**parse_registry_path(input_string))
except (ValidationError, TypeError):
return False
return True

I'm trying to pass it a path to a csv file which is a valid PEP, config/demo_fasta.csv, and it's interpreting it as a registry path!

This should prioritize if it's a file, and only if the file doesn't exist, then it should check to see if it could be a registry path.

This bug makes looper basically unusable unless you're using PEPhub.

@nsheff nsheff added the bug label Feb 15, 2024
@nsheff
Copy link
Contributor Author

nsheff commented Feb 15, 2024

I've added a minimal reproducible example in hello_looper -- here: https://github.com/pepkit/hello_looper/

the csv example shows the behavior. This should actually be an example in hello looper to show a legitmate use case.

Maybe having that example would have prevented this bug. It should be integrated into some smoketests.

@nsheff
Copy link
Contributor Author

nsheff commented Feb 15, 2024

the docs make it seem like you're supposed to say pephub:: to indicate a registry path, but this does not appear to be what is happening.

image

@nsheff nsheff added this to the v1.8.0 milestone Feb 16, 2024
@nsheff
Copy link
Contributor Author

nsheff commented Feb 16, 2024

Related to #426

@donaldcampbelljr
Copy link
Contributor

I attempted a simple fix in the above draft PR, where the function is_registry_path would also return False if the path ended with .csv. I confirmed that this will go on to load a Project via:

else:
try:
p = Project(
cfg=subcommand_args.config_file,
amendments=subcommand_args.amend,
divcfg_path=divcfg,
runp=subcommand_name == "runp",
**{
attr: getattr(subcommand_args, attr)
for attr in CLI_PROJ_ATTRS
if attr in subcommand_args
},
)
except yaml.parser.ParserError as e:
_LOGGER.error(f"Project config parse failed -- {e}")
sys.exit(1)

However, Looper goes on to later call prj.name and it fails with the exception:
Project name inference isn't supported on a project that lacks a config file.

@nsheff
Copy link
Contributor Author

nsheff commented Mar 14, 2024

I attempted a simple fix in the above draft PR, where the function is_registry_path would also return False if the path ended with .csv. I confirmed that this will go on to load a Project via:

No! don't hack it more...

Can we not rely on the registry path functions we already wrote and have been using for years in ubiquerg?

@nsheff
Copy link
Contributor Author

nsheff commented Mar 14, 2024

However, Looper goes on to later call prj.name and it fails with the exception: Project name inference isn't supported on a project that lacks a config file.

This would be a separate bug, then, because those are valid projects.

@donaldcampbelljr
Copy link
Contributor

Ok, I investigated parse_registry_path mentioned in #426 from Ubiquerg. However, that function parses an input string ending with .yaml just fine (which is not a registry path, right?). In Looper, this parsed input string is handed back to RegistryPath from PEPHubClient which validates this parsed string without issue.
image

Should we be checking the parsed subitem for common file extensions during validation? Should we check during parse_registry_path and raise an exception?

I also investigated is_registry_path from PEPhubclient but then realized its very similar to Looper's function and just checks for the file extension before calling: registry_path = RegistryPath(**parse_registry_path(input_string))

@donaldcampbelljr
Copy link
Contributor

I meant to add this question: Is it desired that we should just check the provided config_file (basically move the check for .yaml or .csv out of that function) and if its not a file, then proceed to check if its a registry path. Essentially, reverse the check during project initialization:

# Initialize project
if is_registry_path(subcommand_args.config_file):
if vars(subcommand_args)[SAMPLE_PL_ARG]:
p = Project(
amendments=subcommand_args.amend,
divcfg_path=divcfg,
runp=subcommand_name == "runp",
project_dict=PEPHubClient()._load_raw_pep(
registry_path=subcommand_args.config_file
),
**{
attr: getattr(subcommand_args, attr)
for attr in CLI_PROJ_ATTRS
if attr in subcommand_args
},
)
else:
raise MisconfigurationException(
f"`sample_pipeline_interface` is missing. Provide it in the parameters."
)
else:
try:
p = Project(
cfg=subcommand_args.config_file,
amendments=subcommand_args.amend,
divcfg_path=divcfg,
runp=subcommand_name == "runp",
**{
attr: getattr(subcommand_args, attr)
for attr in CLI_PROJ_ATTRS
if attr in subcommand_args
},
)
except yaml.parser.ParserError as e:
_LOGGER.error(f"Project config parse failed -- {e}")
sys.exit(1)

@donaldcampbelljr
Copy link
Contributor

After discussion:

  • We should check first if a file was provided, then check if its a proper registry path, then send to PEPHUB.
  • Write a function in ubiquerg (or update the existing one) such that the user can pass flags, e.g. require_namespace=True and have it return the parsed path (or a False if it could not parse)

@donaldcampbelljr
Copy link
Contributor

Ah, interestingly, if I give the following string to parse_registry_path, it returns None. Example: '/tmp/tmp3j_05ulh/basic/project/project_config.yaml'

But during initial testing setup I found that this works fine: something/example.yaml

@donaldcampbelljr
Copy link
Contributor

I've merged changes and marked this as completed. However, related issues pepkit/ubiquerg#43 and #484 are still WIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants