Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build and deployment #6

Closed
luciansmith opened this issue Jun 23, 2023 · 33 comments
Closed

Fix build and deployment #6

luciansmith opened this issue Jun 23, 2023 · 33 comments

Comments

@luciansmith
Copy link
Contributor

The building and deployment of biosimultors_PySCeS is currently broken. Nothing works, and it's actually kind of hard to see how it used to work.

I have created the 'use-conda' branch in an attempt to get things working that way (one of PySCeS's dependencies is available on conda, but not pip)--everything now installs, but running the tests results in a segmentation fault. The next step would be to reproduce that environment on a local machine to see what caused the crash, as the github action can't be interacted with, and it's impossible to see what step or piece caused the crash. Or we could try to make the old pip way work again, somehow?

@jmrohwer
Copy link
Contributor

jmrohwer commented Jun 30, 2023

I have never actually worked in detail on the build of this, but have only provided assistance in getting the GH actions working from the PySCeS repo. So I'm not sure how the build actually works or not but have been getting notifications that it fails, so am interested and prepared to help to try to get this working!

The pip way would entail building from source since the latest version of assimulo on PyPi is only provided as a source tarball (this is v3.0). This is very finicky to get working, but then again we might as well use the latest version from github.

I have managed to get this to compile on Linux using the following commands (in the unpacked assimulo source directory):

sed -i "s|BLASname_t+'.a'|BLASname_t+'.so'|g" setup.py
python setup.py install --optimize=1 --extra-fortran-link-flags="-shared" --sundials-home=/usr --blas-home=/usr/lib --lapack-home=/usr --extra-fortran-compile-flags="-fallow-argument-mismatch" --extra-c-flags="-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION"

This is on Arch linux (manjaro). The location of the sundials, blas and lapack libraries might have to be updated for Ubuntu. The compiled and thus installed version works to use with PySCeS, equally well as the Anaconda provided one.

It also requires pre-installed sundials, lapack and blas. I am not sure of the GH build system (are you using GH actions?), if it is Ubuntu based there should be packages available that could be installed via apt during the build process, and of course gcc and gfortran are also needed. I would be happy to assist further but as mentioned the GH action (all 421 lines of it!) is not very clear to me.

@jmrohwer
Copy link
Contributor

Also, in this build are you using the PySCeS source or binary wheels? The PySCeS build in itself is also not trivial to get working since we have to compile two fortran modules and currently do this using scikit-build. There is a GH action that automatically does this after each new release (in the PySCeS repo), so you may want to look at that, or alternatively wait 1h after a new release until the wheels have been built (they are manylinux2014 wheels) and directly use these.

@jmrohwer
Copy link
Contributor

jmrohwer commented Aug 7, 2023

I've fixed the Assimulo build on CI, and now PySCeS can get installed, the Docker image gets built successfully as well. However, 2 of the 11 tests fail (9 pass).

@luciansmith could you take a look at the output from the latest CI workflow run? I am not at all familiar with the details of the biosimulators tests, and am assuming they would be the same for the different simulation backends? At least we seem to be getting somewhere here. Happy to test further if you could give me some pointers.

@luciansmith
Copy link
Contributor Author

OK, I finally got to this! There were only two failures, and one of them was because the test wasn't set up correctly (any more), so I fixed that, leaving only a single test that fails.

However, that failure is more mysterious: it failed when trying to run https://github.com/biosimulators/Biosimulators_PySCeS/blob/dev/tests/fixtures/Parmar-BMC-Syst-Biol-2017-iron-distribution.omex in the Docker image.

The command that fails is:

E subprocess.CalledProcessError: Command '['docker', 'run', '--mount', 'type=bind,source=/tmp/tmp4955k8m9,target=/tmp/in,readonly', '--mount', 'type=bind,source=/tmp/tmp1bddu_rm/out,target=/tmp/out', '--env', 'REPORT_FORMATS=h5', '--user', '1001', '--tty', '--rm', 'ghcr.io/biosimulators/biosimulators_pysces/pysces:latest', '-i', '/tmp/in/Parmar-BMC-Syst-Biol-2017-iron-distribution.omex', '-o', '/tmp/out']' returned non-zero exit status 1.

Sadly, that's the only information there; but maybe it's enough if I send it back to you for more? Are you able to create docker images and run that on it?

@jmrohwer
Copy link
Contributor

jmrohwer commented Sep 4, 2023

I fixed this. It was a bug in PySCeS with a variable name being an exact substring of another variable name in the Parmar model, resulting in incorrect string substitution. It works now on the latest commit on https://github.com/PySCeS/pysces/tree/development. I have also built the docker image locally and the tests pass using that image.

@luciansmith is there a way to test this on Github CI without making a new release of PySCeS? Specifically I'd like to test whether there are any other issues downstream in ci.yml. Otherwise it will have to wait for the PySCeS release.

@luciansmith
Copy link
Contributor Author

Oh, that's great news! Thanks for tracking that down.

I assume there's some way to point the github action at something that gets PySCeS from there instead of the official source, but I feel like that's probably too much work, given how hard it's been already to try to get everything up and running? I'm happy to wait for a new release if you are.

@luciansmith
Copy link
Contributor Author

I thought of a simpler way to check downstream elements: just comment out the failing test! This led to my discovery that, just like every single other biosimulators project, 'sphinxprettysearchresults' needed to be removed, since the current release is buggy. But with that, everything else passed: https://github.com/biosimulators/Biosimulators_PySCeS/actions/runs/6090029562/job/16524048586

So, I made the same fix for dev, so everything should be fine whenever pysces gets a new release.

@jmrohwer
Copy link
Contributor

jmrohwer commented Sep 6, 2023

Cool 😃

Might be a week or 2 before I can cut a release, got a meeting coming up next week and quite a bit of travel in September.

@luciansmith
Copy link
Contributor Author

@jmrohwer : Did Pysces get a new release, as foretold?

@jmrohwer
Copy link
Contributor

@luciansmith Still waiting on final code review but basically ready to go.

@jmrohwer
Copy link
Contributor

@luciansmith I finally managed to get a new PySCeS release out, but there was an issue with the CI generating the wheels due to updated GH actions. As a result there were no new release files on PyPI and the CI run failed. I have fixed this, the wheel files are now on PyPI and I triggered the "release-biosimulators" action on the PySCeS repo by hand. Is this now sorted?

@jmrohwer
Copy link
Contributor

I see this time-outs on the following:

Run emilioschepis/wait-for-endpoint@v1.0.0
  with:
    url: https://pypi.org/pypi/pysces/1.2.0/json
    method: GET
    expected-status: 200
    timeout: 3600000
    interval: 60000
Error: Waiting exceeded timeout.

However, I'm not sure how to debug this since it is a perfectly valid URL and resolves if I enter it in a browser.

@luciansmith
Copy link
Contributor Author

I swear github and python updates make maintaining anything an absolute nightmare.

I was able to get past the problem you found by updating the wait-for-endpoint task to 1.0.3. This let things get much further, until a new error:

'lxml.etree.XPath' object has no attribute 'evaluate'

which is the same error I now get for biosimulators_tellurium, so it's another 'python updated in the background and now everything breaks again' problem. With both of us now having that problem, at least we can fix it once and everyone will benefit. I'll look into it this afternoon.

@jmrohwer
Copy link
Contributor

I was able to get past the problem you found by updating the wait-for-endpoint task to 1.0.3. This let things get much further, until a new error:

BTW you can just specify emilioschepis/wait-for-endpoint@v1 and it will default to the latest version. This is useful as certain github actions may update to fix certain bugs. Or you can choose emilioschepis/wait-for-endpoint@v1.0 if you want to stick in the 1.0 series and not update to 1.1 once it comes out.

Heads up: there have been a lot of updates to GH actions, specifically checkout, upload-artifact, download-artifact, setup-python. This has to do with Nodejs 16 being deprecated and the move to enforce Nodejs 20. At the moment the old ones still work and only raise a warning, but I'm not sure for how long. There may be others.

Maybe a good idea to go through all the actions and see if there are updates. I know this can be done with dependabot to always keep the actions up to date, but I've never looked into this.

@luciansmith
Copy link
Contributor Author

I tried @v1, but that failed; there may be some other format (@v1.*?) that works, but for now I just put it at 1.0.3 again.

I updated biosimulators_utils to use the latest lxml and fixed the xpath.execute() error. Now, the tests are complaining that scipy.zeros doesn't exist; presumably, scipy dropped it because it's part of numpy, and people can just use that?

As far as I can tell, this is a genuine issue with PySCeS, not the test framework, but should be straightforward to fix (replace scipy.zeros with numpy.zeros).

@luciansmith
Copy link
Contributor Author

@jmrohwer Any update on this? I think pysces is using scipy.zeros, which seems to no longer exist.

@luciansmith
Copy link
Contributor Author

@jmrohwer (ping!)

@jmrohwer
Copy link
Contributor

@luciansmith I see the latest workflow run still installs PySCeS 1.1.1, whereas the latest release is 1.2.0. However, the workflow was triggered (Feb 17) AFTER 1.2.0 was released (Feb 14). These bugs with scipy.zeros were fixed in 1.2.0. There must be a problem with the version not being detected correctly in the workflow action.

@luciansmith
Copy link
Contributor Author

Aha! OK, thanks for that, and sorry for not noticing it myself. I managed to update everything to 1.2.0, and now everything runs through. I even managed to get it submitted to biosimulators, so hopefully the latest version will be up soon. Thank you so much!

Also, for what it's worth, https://pysces.sourceforge.net/ still claims that 1.1.1 is the latest version.

@jmrohwer
Copy link
Contributor

Excellent! Thanks for your efforts.

As regards the PySCeS website, the new website is hosted on github: https://pysces.github.io/
That one is up to date. The sourceforge one should actually just be a redirect to the GH one, instead of the current duplication, but I don't have permissions to change that. @bgoli could you look into that?

@luciansmith
Copy link
Contributor Author

OK, we're even closer, but one last test case is failing, this time from the biosimulators_test_suite:

https://github.com/biosimulators/Biosimulators_test_suite/tree/dev/examples/sbml-core/Ciliberto-J-Cell-Biol-2003-morphogenesis-checkpoint-continuous

The error message is:

"Could not find GLIMDA."

which seems to come from Assimilo. I don't know why it wants GLIMDA? That seems to be for solving algebraic rules, and there's no algebraic rules in the model, so I'm not sure what's going on.

The full test suite results can be seen at biosimulators/Biosimulators#690 Most of them succeed!

@jmrohwer
Copy link
Contributor

jmrohwer commented May 4, 2024

I've seen this error before on some Assimulo builds, but not all. The error/warning is thrown during the assimulo import, and PySCeS imports assimulo upon startup, which would explain why the error/warning shows up. Is the test suite treating this as a warning or error, i.e. is this the reason for the fail?

GLIMDA is built during the Assimulo build from Fortran sources (distributed with Assimulo source) using F2PY. As you point out, however, it is not needed, as the only algo PySCeS needs from Assimulo is CVODE. In spite of this warning, simulations using PySCeS with CVODE will succeed so I have never bothered about this.

Is this causing the test to error out? In the rest of the error log I just see warnings, no errors, so I can't really see where it fails. Is there a way to test this off line? (PySCeS unfortunately does not have the ability to read SEDML).

@luciansmith
Copy link
Contributor Author

Oh, hmm. If it's just a warning, maybe it's a red herring, and we're failing for some other reason. Is it possible to just run a normal time course simulation on biomodels 297? The SED-ML file simulation is just:

<uniformTimeCourse id="simulation_1" name="simulation 1" initialTime="0" outputStartTime="0" outputEndTime="140" numberOfPoints="200">

@jmrohwer
Copy link
Contributor

jmrohwer commented May 6, 2024

Argh, it is a bug in PySCeS. Has to do with parsing of symbols in assignment rules when events are present. I think I fixed it but will have to cut a new release for this workflow to run.
@luciansmith can you check whether the attached simulation results are as expected?
BIOMD297.xlsx

@luciansmith
Copy link
Contributor Author

Sorry we didn't catch it in this repo's tests, and only in the submission tests! But I'm glad we found it eventually.

The only thing that needs to happen as far as the literal test that failed is that the execution should succeed; it doesn't care if the numbers are right or not.

I don't have exact expected results, but I can give you Tellurium's results and Copasi's results for the same experiment:

te_297.csv

copasi_297.csv

There are some differences? But there are also differences between Tellurium and Copasi themselves; it may just be a matter of degree.

But again, we can push this later and work on numerical consistency after the pipeline runs and we can run the full suite of Biomodels on Pysces and multiple other simulators.

@jmrohwer
Copy link
Contributor

jmrohwer commented May 7, 2024

Okay, I have looked into the numerics. The following shows the percentage difference (absolute value) between the datasets, listing the highest value for each model variable over the simulated time course.

>>> pcdiff(pscdata, cpsdata)
BE       0.059497
Cdc20    0.103844
Cdc20a   1.211851
Cdh1     0.267624
Clb      0.231881
Cln      0.079118
IE       1.087345
Mcm      0.419004
Mih1a    0.404092
PClb     0.417781
PSwe1    0.891808
PSwe1M   1.854822
PTrim    0.232609
SBF      0.500795
Sic      0.060238
Swe1     1.547339
Swe1M    2.262981
Time     0.000461
Trim     0.137942
mass     0.000454

>>>pcdiff(tedata, cpsdata)
BE       0.000532
Cdc20    0.001590
Cdc20a   0.374366
Cdh1     0.006378
Clb      0.235497
Cln      0.000816
IE       0.358881
Mcm      0.419061
Mih1a    0.404147
PClb     0.418080
PSwe1    0.403695
PSwe1M   0.444031
PTrim    0.165286
SBF      0.008386
Sic      0.000629
Swe1     0.002289
Swe1M    0.003319
Time     0.000461
Trim     0.000999
mass     0.000454

>>>pcdiff(pscdata, tedata)
BE       0.058970
Cdc20    0.104432
Cdc20a   1.264842
Cdh1     0.265918
Clb      0.123133
Cln      0.078474
IE       1.173094
Mcm      0.122169
Mih1a    0.123204
PClb     0.225029
PSwe1    0.890975
PSwe1M   1.850942
PTrim    0.232669
SBF      0.492367
Sic      0.059905
Swe1     1.545015
Swe1M    2.259737
Time     0.000000
Trim     0.137756
mass     0.000000

As you can see the Te and CPS data are closer, differing at most by 0.5%. The variation with the PSC data, some variables differ by up to 2.2%. However, this is still so small that when plotting the simulation, the results are indistinguishable between the 3 platforms.

Incidentally I also compared the data to the expected results (HDF5 file in the repo). The caveat is that those results consist of 201 points, so I could not directly compare the numerical value to the Te and CPS data, which have 200 points. (I think it is not clear for simulators if they should include zero in the points count in an SEDML document or not. All 3 our platforms seem to do so, but the results on the repo do not). I redid the analysis with PySCeS with 201 points:

>>>pcdiff(pscdata2, test_data)
Time      0.000000
Cdc20a    1.688207
BE        0.093424
Cdc20     0.163695
Trim      0.236627
Cdh1      0.410526
Clb       0.210876
Cln       0.124502
IE        1.660586
mass      0.000000
Mcm       0.210273
Mih1a     0.212358
Swe1M    16.939740
PTrim     0.400177
PClb      0.386282
PSwe1     2.833552
PSwe1M   10.023572
SBF       0.779963
Sic       0.091964
Swe1      2.953427
Swe1T     0.056478

The discrepancies are much bigger, up to 17%. Part of the discrepancy may stem from the fact that this model has numerous events and "checkpoints", the exact time in the simulation that this is triggered may depend on the internal step size of the solver, for example, and the specific implementation.

The fact that the 3 platforms (Te, CPS, PSC) agree more closely than the reference data lets me think that a) at least I have fixed the bug in PySCeS, and b) I wouldn't worry too much further about numerics at this stage.

@luciansmith
Copy link
Contributor Author

Good job finding the reference data; it had eluded me. And I agree that it's clear that essentially, all three platforms are doing 'the same thing'. I am now curious where the reference data came from (maybe VCell?) but anyway.

I will admit that I ran tellurium and copasi by hand, so set the number of points to match your data. The problem with SED-ML is that earlier versions named an attribute "numberOfPoints", but then defined it as not including zero, which was the wrong thing to do. As of SED-ML L1v4, the attribute is now named "numberOfSteps", which matches the actual definition.

Glad everything else got sorted out, and I'm happy to work with things on my end to get the wrapper pushed to biosimulations again. It'll be nice to finally have a working pipeline again!

@jmrohwer
Copy link
Contributor

jmrohwer commented May 8, 2024

@luciansmith are there any other tests in the submission pipeline that have not yet been run because 297 failed? If so, please provide me the details so I can test them offline before drafting a release, and thus preventing having to draft a further release should any of the other tests fail.

@luciansmith
Copy link
Contributor Author

That's a good question, but no, as biosimulators/Biosimulators#690 shows, all of the tests were run, and only the test with 297 failed (though many were skipped). The tests come from https://github.com/biosimulators/Biosimulators_test_suite

@luciansmith
Copy link
Contributor Author

Actually, hey: would you like me to mark that test as a 'skip' for now so that we can get a version of PySCeS up and running on biosimulations now, even though it'll fail on models like that? And then work on all the Biomodels failures; that bug and others, so you don't have to go through the release cycle an extra time?

@jmrohwer
Copy link
Contributor

jmrohwer commented May 9, 2024

I think I'll actually to a release. Should be done today or tomorrow. This is quite a serious bug and i don't want it hanging around unnecessarily long while it's been fixed. All our wheels and Anaconda packages are generated automatically by CI, so it's not too much of a deal.

@jmrohwer
Copy link
Contributor

New version 1.2.1 of PySCeS released and I see that the CI workflow ran successfully (finally!) 😃
@luciansmith Can you test the deployment to biosimulators (or is that automatic)?

@luciansmith
Copy link
Contributor Author

The CI is indeed supposed to test everything, but it's always worth checking by hand. Which I did, and it worked! Congratulations!

https://run.biosimulations.org/runs/663eabc116d2b6badabee790

I also tried an updated OMEX file with 297 in it, which successfully ran, but which failed to produce output because of one of the updates: it asks for output from variables with assignment rules (i.e. 'Cdh1in'), and that failed. I'm not sure if that's a problem with the wrapper or PySCeS itself?

https://run.biosimulations.org/runs/663eac7216d2b6badabee7e1#tab=log

The following variable targets cannot not be recorded:
  - /sbml:sbml/sbml:model/sbml:listOfParameters/sbml:parameter[@id='Cdh1in']
  - /sbml:sbml/sbml:model/sbml:listOfParameters/sbml:parameter[@id='IEin']
  - /sbml:sbml/sbml:model/sbml:listOfParameters/sbml:parameter[@id='Mcmin']
  - /sbml:sbml/sbml:model/sbml:listOfParameters/sbml:parameter[@id='Mih1']
  - /sbml:sbml/sbml:model/sbml:listOfParameters/sbml:parameter[@id='SBFin']
  - /sbml:sbml/sbml:model/sbml:listOfParameters/sbml:parameter[@id='Swe1T']
  - /sbml:sbml/sbml:model/sbml:listOfParameters/sbml:parameter[@id='kmih']
  - /sbml:sbml/sbml:model/sbml:listOfParameters/sbml:parameter[@id='kswe']

Targets must have one of the following SBML ids:
  - BE
  - BUD
  - Cdc20
  - Cdc20a
  - Cdh1
  - Cdh1tot
[...]

However, this problem pales in comparison to the fact that I filed this issue almost a year ago, and the issue has been solved! The build system now runs fine, everything gets pushed to biosimulations, and the results can be compared to other simulators again! Thank you so much for all the work you put into making this possible; I was very lost at the beginning. I'm going to close this, re-run all the biomodels from my revamped temp-biomodels repository, and update #7 with the new list. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants