Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifactory is broken #9

Open
ViliusSutkus89 opened this issue Jul 26, 2024 · 63 comments
Open

artifactory is broken #9

ViliusSutkus89 opened this issue Jul 26, 2024 · 63 comments
Assignees

Comments

@ViliusSutkus89
Copy link
Contributor

ERROR: HTTPSConnectionPool(host='artifactory.opendocument.app', port=443): Max retries exceeded with url: /artifactory/api/conan/conan/v1/ping (Caused by ResponseError('too many 503 error responses')). [Remote: odr]

Could this be because of too much traffic?

@ViliusSutkus89
Copy link
Contributor Author

I've checked https://artifactory.opendocument.app/ and it keeps showing that animation.

image

Hello, btw

@andiwand
Copy link
Member

rebooting. @TomTasche it appears we are closing in on the 50 GB

@ViliusSutkus89
Copy link
Contributor Author

Thanks for the restart. There's a new NDK version, I think that means a lot of new binaries :D

@andiwand
Copy link
Member

Looks like it is still loading? I think something more serious broke down. I see a couple of errors in the log

@ViliusSutkus89
Copy link
Contributor Author

Yep, still loading. Do you mean artifactory log or our build log?

@andiwand
Copy link
Member

The artifactory log. Looks like there was a breaking change on their side. I try to pin the previous version

@andiwand
Copy link
Member

Should be back up now. I had to create a fresh instance. Also retriggered a build now https://github.com/opendocument-app/conan-odr-index/actions/runs/10122612090

@ViliusSutkus89
Copy link
Contributor Author

Cool, I can get back to flooding the server again :D

@ViliusSutkus89
Copy link
Contributor Author

@andiwand , did I broke it again?

ERROR: HTTPSConnectionPool(host='artifactory.opendocument.app', port=443): Max retries exceeded with url: /artifactory/api/conan/conan/v1/ping (Caused by ResponseError('too many 502 error responses')). [Remote: odr]

@ViliusSutkus89 ViliusSutkus89 reopened this Aug 2, 2024
@andiwand
Copy link
Member

andiwand commented Aug 3, 2024

[ERROR] /opt/jfrog/artifactory/var is running with 2% free storage. Free up space or increase volume size and try again. Exiting

@TomTasche can we get more storage on the machine?

@andiwand
Copy link
Member

andiwand commented Aug 3, 2024

@ViliusSutkus89 alternatively we have to switch from RelWithDebInfo to Release for the chunky packages. This is definitely using a lot of space

@andiwand
Copy link
Member

andiwand commented Aug 3, 2024

@ViliusSutkus89 I was able to clean out some stuff. I fear that is not a permanent solution tho

@ViliusSutkus89
Copy link
Contributor Author

I just did some clean local builds and ~/.conan2/p size difference between Release and RelWithDebInfo is like 10%. I've checked with only one arch though, maybe some particular arch is more taxing on space usage

@ViliusSutkus89
Copy link
Contributor Author

Just did all 4 arches pdf2htmlex build for both build types.
RelWithDebInfo du -hs ~/.conan2/p is 8.2G
Release du -hs ~/.conan2/p is 6.8G

I don't think this is the biggest issue

@andiwand
Copy link
Member

andiwand commented Aug 3, 2024

Ok then we don't bother with this. Thanks for checking! I was able to clean out 30 GB so we should be fine for a couple of weeks

@TomTasche
Copy link
Member

Upgraded from 50 to 100GB. Not sure what exactly is going to happen now 😂 I think they are going to do a manual upgrade next week?

@andiwand
Copy link
Member

andiwand commented Aug 3, 2024

I think that usually takes a reboot for the root volume. let's see

@andiwand
Copy link
Member

andiwand commented Aug 3, 2024

Filesystem         Size  Used Avail Use% Mounted on
/dev/ploop62525p1   99G   19G   76G  20% /

well that was quick

@TomTasche
Copy link
Member

Nice! We can get 250GB next if necessary, but it costs 3€ more per month. That seems excessive to me...

@ViliusSutkus89
Copy link
Contributor Author

100GB should be enough, for now, until we switch to ndk2. We could also look into running a cron job to garbage collect oldest builds, I think conan has some functionality for that

@ViliusSutkus89
Copy link
Contributor Author

I broke it again, lol

@andiwand
Copy link
Member

andiwand commented Aug 7, 2024

wtf... should be back up now

@ViliusSutkus89
Copy link
Contributor Author

I have 26 conan installs running in parallel and each of them takes 15 minutes. 15 minutes just to download the binaries. It's a read only operation from the artifactory server. Could it be that it somehow overloads the server? I understand that write operations could be expensive, but just downloading packages shouldn't do that much to the server. I'm starting to think that this is because the server is written in java.

@ViliusSutkus89
Copy link
Contributor Author

I've changed my previously mentioned workflow to download conan binaries only once and artifact them for other jobs in the same pipeline. But this is still kind of wrong

@andiwand
Copy link
Member

andiwand commented Aug 7, 2024

Maybe it is the network bandwidth which is throttling. Not sure why the thing is dying so often. The log was not really helpful this time

@andiwand andiwand closed this as completed Aug 7, 2024
@TomTasche
Copy link
Member

Shall I increase the CPU resources for the machine?

@ViliusSutkus89
Copy link
Contributor Author

Ahhh, too bad. Well, swapping is write heavy and it does consume SSD life, I remember hearing something about Macbook SSDs dying, because the low ram models swapped constantly. So it's kind of expected from the hosting provider, also considering that the configurable swap may prevent a hosting plan upgrade

@ViliusSutkus89
Copy link
Contributor Author

@andiwand I know you said you wiped the package, but I'm still getting the same output. Could it be that it's cached somehow?

$ conan list --remote=odr "odrcore/4.1.1:*"
odr
  odrcore
    odrcore/4.1.1
      revisions
        a82aa16c3e9b0eb3f3e9854651fd418d (2024-08-10 06:33:01 UTC)
          packages
            df4ccd70934170c72e308ded1fbc7fddba76a12d
              info
                settings
                  arch: x86_64
                  build_type: Release
                  compiler: msvc
                  compiler.cppstd: 20
                  compiler.runtime: dynamic
                  compiler.runtime_type: Release
                  compiler.version: 194
                  os: Windows
                options
                  shared: False
                requires
                  cryptopp/8.8.Z
                  miniz/3.0.Z
                  nlohmann_json/3.11.3#45828be26eb619a2e04ca517bb7b828d:da39a3ee5e6b4b0d3255bfef95601890afd80709
                  pugixml/1.14.Z
                  uchardet/0.0.Z
                  utfcpp/4.0.4#6d93b29490c5cba2a9b4c06ae0c89cfd:da39a3ee5e6b4b0d3255bfef95601890afd80709
                  vincentlaucsb-csv-parser/2.1.3#cdcc68cfc02a61cfbe8210e48b61d23e:da39a3ee5e6b4b0d3255bfef95601890afd80709

@andiwand
Copy link
Member

wiped it again. is there a similar problem for the other versions?

@ViliusSutkus89
Copy link
Contributor Author

This time it's gone now. I don't think other versions are affected. Will see if the problem stays after republishing without msvc and then with msvc compiler

@ViliusSutkus89
Copy link
Contributor Author

OK, 100% confirmed that MSVC package is messing up the artifactory. Built odrcore with all 12 conan configurations, it was all fine, then in the next run built with MSVC and on upload it just removed all 12 previous binaries.

https://github.com/opendocument-app/conan-odr-index/actions/runs/10341952942/job/28624331527

How do we solve this?

@andiwand
Copy link
Member

how the f 😄 do you think this is a conan or artifactory bug? I can also restart the artifactory

@ViliusSutkus89
Copy link
Contributor Author

Can conan client remove packages? I mean, is that even allowed from the conan client?

@ViliusSutkus89
Copy link
Contributor Author

msvc-1939 and msvc-1940 binaries seem to coexist just fine in the artifactory. https://github.com/opendocument-app/conan-odr-index/actions/runs/10354867660/job/28661187522

Looked at the docs a bit, turns out I can remove packages using conan remove. What's interesting, when it removes packages, it gives this output:

$ conan remove --remote=odr "odrcore/4.1.1:*" --confirm
Listing binaries of odrcore/4.1.1#a82aa16c3e9b0eb3f3e9854651fd418d in odr (1/2)
Remove summary:
odr
  odrcore/4.1.1#6898b4d7e5ed9cb73d9a717e50dd07a9: Removed binaries: ['b0500433fda3230d8d88f6ab110b61b2eda9456a', 'a31fa0d5e7b222e2daf7ed8b1f47b7faded53b29', 'b6174f0c529d39d90faddbe569be83d86b98f061', '2e1745237b29007e0419c1d5bb349480ebe403c6', '6c7d40e1b1f74df0911bf0ee34c8629bb55b14d4', '42c65bdb2fdac74ef5afef7dad22cd790bc4eed2', 'd8624c74bef45da6d92af21c4b9481076299ba2f', 'e0d867c05d3e43c41d74ad9f2cafc176d71133fb', '14a08b1f30452f377ceba9b92905211552e78217', 'e71058af4fa5aefdc07c8dbf350b4afe20a7a4db', '6338f165fcfc0de177b749b364d107748a034396', 'da47ff3417e691a6a9d2420de85994784fd01beb']
  odrcore/4.1.1#a82aa16c3e9b0eb3f3e9854651fd418d: Removed binaries: ['df4ccd70934170c72e308ded1fbc7fddba76a12d', 'e40aa7bc7c1f7f34fab9559b9f13bcc6fb71d25c']

df4ccd70934170c72e308ded1fbc7fddba76a12d - this was msvc-1940 binary, e40aa7bc7c1f7f34fab9559b9f13bcc6fb71d25c was msvc-1939 binary. First block is all 12 unix binaries.

Artifactory has all the binaries, it's just that the package revision is different for unix and msvc and conan always prefers to take the msvc revision, while unix builds keep uploading to the unix revision, which is not the top most recent, so no conan install command can find them.

@ViliusSutkus89
Copy link
Contributor Author

It's probably different line endings in git checkout causing different revisions

@ViliusSutkus89
Copy link
Contributor Author

yes, it was different line endings. Solved with .gitattributes

@ViliusSutkus89
Copy link
Contributor Author

What are our options in the storage space upgrade question?

ERROR: 
Error uploading file: conan_package.tgz, '400: No space left on device'
ERROR: Execute upload again to retry upload the failed files: conan_package.tgz. [Remote: odr]
Error: Process completed with exit code 1.

@andiwand
Copy link
Member

I think if we cannot make it fit on 100GB we will not fit on 1TB in very short time. I would be rather in favor of cleaning out old artifacts automatically. But not sure how and what would be the criteria

@ViliusSutkus89
Copy link
Contributor Author

I could write a scheduled job on GH to clean out packages with non-latest versions

@ViliusSutkus89
Copy link
Contributor Author

But that scheduled job would still have some limitations:

Run conan remove --remote=odr "pdf2htmlex/*#*" --confirm
  
Found 2 pkg/version recipes matching pdf2htmlex/* in odr
ERROR: HTTPSConnectionPool(host='artifactory.opendocument.app', port=443): Max retries exceeded with url: /artifactory/api/conan/conan/v2/conans/pdf2htmlex/0.18.8.rc1-20240805-git/_/_/revisions/4c44a45877555080e62fd840c16e151a (Caused by ResponseError('too many 500 error responses')). [Remote: odr]
Error: Process completed with exit code 1.

@andiwand
Copy link
Member

It might be easier to schedule this directly on the artifactory machine. If you can provide me a script I can setup cron. maybe we can even create a private repo under our orga and I just clone it on the VM

@ViliusSutkus89
Copy link
Contributor Author

ViliusSutkus89 commented Aug 22, 2024

Can you clean out the server manually? I would have a few days time of responsive server to come up with the proper cleaning script

@andiwand
Copy link
Member

andiwand commented Aug 22, 2024

Should be clean now - I hope I didn't kill it in the process. Restarting

@ViliusSutkus89
Copy link
Contributor Author

Something's still wrong with the server. Whenever I try to conan list or conan remove, I keep getting these errors

conan list --remote=odr '*'
odr
  ERROR: 400: org.jfrog.storage.binstore.exceptions.BinaryNotFoundException: Binary provider has no content for '5450269607ab1cc3a52eb08de4e8848af8abf850'. [Remote: odr]

I think it's expecting a binary, which was deleted. No idea which package 5450269607ab1cc3a52eb08de4e8848af8abf850 is

@andiwand
Copy link
Member

cleared it again

@ViliusSutkus89
Copy link
Contributor Author

Cool, I hope it works now :D thanks

@ViliusSutkus89
Copy link
Contributor Author

Hey. Me again. As we all know, I haven't set up a cleanup cron job yet. And now the server is full and unresponsive again :(

@andiwand
Copy link
Member

should be up again

@ViliusSutkus89
Copy link
Contributor Author

artifactory is still problematic :( When I try to access it, it gives ERROR: 400: org.jfrog.storage.binstore.exceptions.BinaryNotFoundException: Binary provider has no content for 'dab99ff11ff3b49c952eebcd93bd740e8c05ea05'. [Remote: odr]. Same error when I tried to clear all packages through our actions workflow

@ViliusSutkus89
Copy link
Contributor Author

odr.droid workflow succeeds, but only on GitHub. When I try to build it locally, my conan asks for and errors out about dab99ff11ff3b49c952eebcd93bd740e8c05ea05, whichever package that is

@andiwand
Copy link
Member

I tried to nuke everything again. Can you try again?

@ViliusSutkus89
Copy link
Contributor Author

All good, no error now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants