Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: Ersilia Catalog when no models are available #1072

Closed
GemmaTuron opened this issue Mar 13, 2024 · 28 comments
Closed

🐛 Bug: Ersilia Catalog when no models are available #1072

GemmaTuron opened this issue Mar 13, 2024 · 28 comments
Assignees
Labels
bug Something isn't working

Comments

@GemmaTuron
Copy link
Member

Is your feature request related to a problem? Please describe.

When you try to run ersilia catalog --local but no models are available locally, instead of telling you this it outputs an error:

Traceback (most recent call last):
  File "/home/gturon/anaconda3/envs/ersilia/bin/ersilia", line 33, in <module>
    sys.exit(load_entry_point('ersilia', 'console_scripts', 'ersilia')())
  File "/home/gturon/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/gturon/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/gturon/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/gturon/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/gturon/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/gturon/github/ersilia-os/ersilia/ersilia/cli/commands/__init__.py", line 22, in wrapper
    return func(*args, **kwargs)
  File "/home/gturon/github/ersilia-os/ersilia/ersilia/cli/commands/catalog.py", line 47, in catalog
    catalog = mc.local().as_json()
AttributeError: 'NoneType' object has no attribute 'as_json'

Describe the solution you'd like.

We could add a better info - Not prioritary at all though

Describe alternatives you've considered

No response

Additional context.

No response

@GemmaTuron GemmaTuron added the enhancement New feature or request label Mar 13, 2024
@Adhivp
Copy link

Adhivp commented Mar 13, 2024

can you assign me this issue @DhanshreeA ?, I would be happy to learn and work on it

@GemmaTuron
Copy link
Member Author

Hello @Adhivp

If you are working on the Outreachy contribution period please follow the guidelines and the tasks set out on the specific issue, we are a small community and cannot provide support outside those.
Thanks for your understanding

@Adhivp
Copy link

Adhivp commented Mar 13, 2024

It's ok , hope to see support in the future, Have a nice day

@DhanshreeA DhanshreeA added bug Something isn't working and removed enhancement New feature or request labels Mar 14, 2024
@DhanshreeA DhanshreeA changed the title 📑 Feature Request: Ersilia Catalog when no models are available 🐛 Bug Report: Ersilia Catalog when no models are available May 28, 2024
@DhanshreeA DhanshreeA changed the title 🐛 Bug Report: Ersilia Catalog when no models are available 🐛 Bug: Ersilia Catalog when no models are available May 28, 2024
@DhanshreeA DhanshreeA self-assigned this May 28, 2024
@GemmaTuron
Copy link
Member Author

I am pretty sure there was an issue for that but I cannot find it now. The catalog used to provide a nice table with the Eos Identifier, Slug and model name, whereas now I only see something like:

[
    {
        "Identifier": "eos9taz"
    },
    {
        "Identifier": "eos8bhe"
    }
]

Can we go back to having the nice table?
The command I am running is
ersilia catalog --local

@GemmaTuron
Copy link
Member Author

GemmaTuron commented Jun 17, 2024

Also:

catalog --local --more

gives:

[
    {
        "Identifier": "eos6tg8",
        "Slug": "natural-product-fingerprint",
        "Title": "Natural product fingerprint"
    },
    {
        "Identifier": "eos43at",
        "Slug": "molgrad-herg",
        "Title": "Coloring molecules for hERG blockade"
    },
    {
        "Identifier": "eos9ei3",
        "Slug": "sa-score",
        "Title": "Synthetic accessibility score"
    },
    {
        "Identifier": "eos4zfy",
        "Slug": "maip-malaria",
        "Title": "MAIP: antimalarial activity prediction"
    }
]

We could have something more informative, like whether the model is from docker or not

@DhanshreeA
Copy link
Member

DhanshreeA commented Jun 24, 2024

Hey @dzumii and @Malikbadmus this issue is also an interesting one with potential for increased exposure to the Ersilia CLI. Let me know if either of you wants to take this up after your tracking tasks and before we jump into writing unit tests. :)

We have two tasks here:

I think you can both work on one of these tasks. LMK what you think!

@DhanshreeA DhanshreeA removed their assignment Jun 24, 2024
@dzumii
Copy link
Contributor

dzumii commented Jun 24, 2024

Yeah, interesting.
I'll get started with the first task once we conclude the tracking functionality changes

@Malikbadmus
Copy link
Contributor

I'll start working on the second task.

@Malikbadmus
Copy link
Contributor

I am pretty sure there was an issue for that but I cannot find it now. The catalog used to provide a nice table with the Eos Identifier, Slug and model name, whereas now I only see something like:

[
    {
        "Identifier": "eos9taz"
    },
    {
        "Identifier": "eos8bhe"
    }
]

Can we go back to having the nice table? The command I am running is ersilia catalog --local

@DhanshreeA , the table format previously used to display model information when you run the catalog command was removed in the last commit, which can be found here, should we restore the table format or improve the current format by adding the additional information we are working on to the current dictionary output?

@DhanshreeA
Copy link
Member

DhanshreeA commented Jun 26, 2024

I am pretty sure there was an issue for that but I cannot find it now. The catalog used to provide a nice table with the Eos Identifier, Slug and model name, whereas now I only see something like:

[
    {
        "Identifier": "eos9taz"
    },
    {
        "Identifier": "eos8bhe"
    }
]

Can we go back to having the nice table? The command I am running is ersilia catalog --local

@DhanshreeA , the table format previously used to display model information when you run the catalog command was removed in the last commit, which can be found here, should we restore the table format or improve the current format by adding the additional information we are working on to the current dictionary output?

Thanks @Malikbadmus this is useful.

@miquelduranfrigola requesting you to weigh in here - is there a particular reason why the use of tabulate library was removed from printing local model catalog? I see that a lot of work was done in a related issue a while back but it was removed in the commit referenced by Malik here as part of cleaning up catalog.

@miquelduranfrigola
Copy link
Member

Thanks all.

A few comments:

  1. We definitely want the JSON format to be available. Having a tabular option is obviously desirable.
  2. We did not want to at yet another library as a dependency, this is why we removed tabulate. Ideally, we should be doing this print with pure python or with default python libraries.
  3. The reason why we do not give too much information is to make this command fast (asking for --more is slower). We can certainly re-consider this, especially now that @carcablop has worked on a nice CSV file available in S3.

@DhanshreeA
Copy link
Member

DhanshreeA commented Jul 1, 2024

@miquelduranfrigola

  1. We can leave it up to the user to decide if they would like the output in a tabular format for which they can install the additional dependency. Rewriting this functionality in pure python would be unnecessarily reinventing the wheel.
  2. I agree, this will be very helpful in not compromising on performance. @Malikbadmus could you profile how long ersilia catalog --local takes, and ersilia catalog --local --more takes with something like 5, 10, 15 models on the system?

@GemmaTuron
Copy link
Member Author

Hi, chipping in.

I do not think that asking the user to manually decide if they want the tabular format is a good solution for this particular function, does not make much sense for a small info to leave the decision up to the user. I am not asking for much information, but to print at least:

  1. Model identifier
  2. Slug
  3. Source (github, s3 or dockerhub)

@Malikbadmus
Copy link
Contributor

Malikbadmus commented Jul 1, 2024

@miquelduranfrigola

  1. We can leave it up to the user to decide if they would like the output in a tabular format for which they can install the additional dependency. Rewriting this functionality in pure python would be unnecessarily reinventing the wheel.
  2. I agree, this will be very helpful in not compromising on performance. @Malikbadmus could you profile how long ersilia catalog --local takes, and ersilia catalog --local --more takes with something like 5, 10, 15 models on the system?

ersilia catalog --local consistently maintains a fluctuating response time between 0.6s and 0.9s, regardless of the number of models added to the system. In contrast, ersilia catalog --local --more takes increasingly longer to display results as more models are added, it becomes 9 times slower compared to ersilia --local with 5 models, 18 times slower with 10 models, and 27 times slower with 15 models. This slowdown persists regardless of the model service class.

@miquelduranfrigola
Copy link
Member

miquelduranfrigola commented Jul 4, 2024

This is very very useful, @Malikbadmus

There is a way to solve the issue, but it will be time consuming (and fun).

Basically, when we do --more we are fetching information from the web (previously, AirTable, now possibly via S3). This is slow of course. However, if you think about it, we already have the information locally, stored in the eos folder as dest/$MODEL_ID/information.json. Threfore, potentially, we can get all the information available in --more from these files, for each model.

Alternatively, we could think of having a dynamically updated table locally, with all the information we need already there, stored as eos/models.csv or eos/models.json. That'd be super fast of course but will need to be synchronized with the fetch and delete commands in Ersilia.

@DhanshreeA, thoughts?

@Malikbadmus
Copy link
Contributor

Yeah, I think this will be an interesting and fun task @miquelduranfrigola .

Can I work on this @DhanshreeA ?

@DhanshreeA
Copy link
Member

Awesome @Malikbadmus, which approach are you thinking of taking - using information.json for the model or building a local metadata store (ie eos/models.csv)? I am more in favor of utilizing information.json and simply caching the result to that function call in an lru_cache. Alternatively, there is an on-disk SQL table we utilise which we could potentially populate with this data and use that, instead of creating another data store. Let me know whichever approach you decide to use. :)

@Malikbadmus
Copy link
Contributor

Malikbadmus commented Jul 9, 2024

Awesome @Malikbadmus, which approach are you thinking of taking - using information.json for the model or building a local metadata store (ie eos/models.csv)? I am more in favor of utilizing information.json and simply caching the result to that function call in an lru_cache. Alternatively, there is an on-disk SQL table we utilise which we could potentially populate with this data and use that, instead of creating another data store. Let me know whichever approach you decide to use. :)

@DhanshreeA, while I would have loved to explore and work with the SQL-based solution, I agree with you that using information.json with with lru_cache is the best choice to implement here.

@DhanshreeA
Copy link
Member

Presently, the catalog command has been modified by @Malikbadmus and @dzumii in the following ways:

  1. When no models are available locally, ersilia catalog --local exits gracefully and does not crash
  2. When run with the more flag, as ersilia catalog --local --more, the command reports the fetching method for models available locally in addition to the information previously reported, and it does so by reading from a local information.json file as @miquelduranfrigola pointed out, and is therefore quite fast. Related PR: Modifying the local Catalog functionality #1195

Closing this issue as completed.

@GemmaTuron
Copy link
Member Author

GemmaTuron commented Jul 15, 2024

Hi @DhanshreeA and @Malikbadmus

I do not see the functionality as described, so I am reopening the issue. I just pulled Ersilia's last code, and upon running ersilia catalog --local --more I get the attached catalog.txt - see that all the models are specified "conda" except one that is "null". They are in my DockerHub, also the one that shows "null", and I thought we agreed that the information of where are they fetched from was important so show.
catalog.txt

@DhanshreeA
Copy link
Member

@Malikbadmus I think I understand what's happening here - every model will consistently show as being run using a conda-environment because we copy the information.json from inside the model's docker container (where the model is running as part of a conda environment) to the host file system - which is not a correct representation of how the model is actually running on the host machine, or even the model's size. Let me rethink this a little bit, please feel free to explore other solutions.

@Malikbadmus
Copy link
Contributor

Malikbadmus commented Jul 15, 2024

@Malikbadmus I think I understand what's happening here - every model will consistently show as being run using a conda-environment because we copy the information.json from inside the model's docker container (where the model is running as part of a conda environment) to the host file system - which is not a correct representation of how the model is actually running on the host machine, or even the model's size. Let me rethink this a little bit, please feel free to explore other solutions.

Oh, I get it, should I return the model service class value to being fetched from service_class.txt instead of from information.json for the meantime? While we work towards getting the correct informations to the information.json file.

@DhanshreeA
Copy link
Member

DhanshreeA commented Jul 15, 2024

@Malikbadmus I suppose that could work, yes. Let's do this - fetch a couple of models, one for each of: from_github, repo_path, from_dockerhub, from_s3, and see what gets stored in service_class.txt? If the only thing that it really captures if whether the model environment is pulled_docker/ system/conda, then we might have to come up with an entirely different approach. Perhaps it might even be helpful to just create another field in the information.json, or some other file, to reflect this.

@Malikbadmus
Copy link
Contributor

@Malikbadmus I suppose that could work, yes. Let's do this - fetch a couple of models, one for each of: from_github, repo_path, from_dockerhub, from_s3, and see what gets stored in service_class.txt? If the only thing that it really captures if whether the model environment is pulled_docker/ system/conda, then we might have to come up with an entirely different approach. Perhaps it might even be helpful to just create another field in the information.json, or some other file, to reflect this.

Utilizing the service_class.txt file to get the model service class value gave me this.

  • from_github: conda
  • from_dockerhub: pulled_docker
  • from_s3: coda
  • repo_path:conda

see the attached file for more information
catalog.txt

So it looks like we will be utilizing a new approach.

@DhanshreeA
Copy link
Member

That makes sense, thanks for the testing @Malikbadmus! Let's get some clarity from others first. :)

@miquelduranfrigola
Copy link
Member

Hi, this is interesting. Thanks @Malikbadmus. I am tempted to suggest that we use the information.json file as @DhanshreeA has mentioned.
The approach of keeping in eos a local_catalog.csv or local_catalog.json file would also make sense. This file could be updated every time we fetch or delete a model.

@GemmaTuron
Copy link
Member Author

Hi @DhanshreeA

There is a merged PR related to this. Can you summarise how does the catalog command work now and then close the issue if it is completed?

@DhanshreeA
Copy link
Member

Hi @GemmaTuron

Presently the output from ersilia catalog --local --more looks like so when models are present in your system (ie through a populated eos directory):

[
    {
        "Identifier": "eos6ru3",
        "Slug": "whales-qmug",
        "Title": "WHALES similarity search on 600k molecules from Q-Mug",
        "Status": "In progress",
        "Input": "Compound",
        "Output": "Compound",
        "Model Source": "DockerHub",
        "Service Class": "pulled_docker"
    },
    {
        "Identifier": "eos2b6f",
        "Slug": "pkasolver",
        "Title": "Microstate pKa values",
        "Status": "Ready",
        "Input": "Compound",
        "Output": "Experimental value",
        "Model Source": "DockerHub",
        "Service Class": "pulled_docker"
    },
    {
        "Identifier": "eos44zp",
        "Slug": "ncats-cyp450",
        "Title": "CYP450 metabolism",
        "Status": "Ready",
        "Input": "Compound",
        "Output": "Probability",
        "Model Source": "DockerHub",
        "Service Class": "pulled_docker"
    }
]

And in the case there are no models in your system, or the eos directory has been removed, this is what the command returns:

No local model is available. Please fetch a model by running 'ersilia fetch' command

We have addressed a few things in this issue:

  1. Ersilia signalling that no models are available when it does not find any on the system, instead of running into an error.
  2. Now the catalog command shows model source, ie, where the model was fetched.

It is safe to close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

6 participants