Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The webdriver_manager seems to be sending repeated HTTP requests #656

Open
stanislaw opened this issue Jan 20, 2024 · 1 comment
Open

The webdriver_manager seems to be sending repeated HTTP requests #656

stanislaw opened this issue Jan 20, 2024 · 1 comment

Comments

@stanislaw
Copy link

Hello,

First of all, thank you for creating this library! We are using it to print PDF documents from HTML using Chrome Driver.

To explore how the library works and following the documentation, we created our own custom HTTP client, and now we can see the GET requests in the logs as they are made by the library.

    http_client = HTML2PDF_HTTPClient()
    download_manager = WDMDownloadManager(http_client)
    path_to_chrome = ChromeDriverManager(
        download_manager=download_manager
    ).install()
    print(f"HTML2PDF: Chrome Driver available at path: {path_to_chrome}")  # noqa: T201

One thing that we noticed was that the URLs of requests seem to be repeating and we are wondering if this behavior is by design, or there is something in the code that makes the job done twice?

This is how a hot run looks like when the webdriver_manager downloads a new file:

HTML2PDF: creating Chrome Driver service.
HTML2PDF_HTTPClient: sending GET request attempt 1: https://googlechromelabs.github.io/chrome-for-testing/latest-patch-versions-per-build.json
HTML2PDF_HTTPClient: sending GET request attempt 1: https://googlechromelabs.github.io/chrome-for-testing/latest-patch-versions-per-build.json
HTML2PDF_HTTPClient: sending GET request attempt 1: https://googlechromelabs.github.io/chrome-for-testing/latest-patch-versions-per-build.json
HTML2PDF_HTTPClient: sending GET request attempt 1: https://googlechromelabs.github.io/chrome-for-testing/known-good-versions-with-downloads.json
HTML2PDF_HTTPClient: sending GET request attempt 1: https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/120.0.6099.109/mac-x64/chromedriver-mac-x64.zip
HTML2PDF_HTTPClient: sending GET request attempt 1: https://googlechromelabs.github.io/chrome-for-testing/latest-patch-versions-per-build.json
HTML2PDF: Chrome Driver available at path: /Users/Stanislaw/workspace/projects/strictdoc-project/strictdoc/strictdoc/export/html2pdf/.wdm/drivers/chromedriver/mac64/120.0.6099.109/chromedriver-mac-x64/chromedriver

This is how a hot run looks like when the driver is downloaded:

HTML2PDF: creating Chrome Driver service.
HTML2PDF_HTTPClient: sending GET request attempt 1: https://googlechromelabs.github.io/chrome-for-testing/latest-patch-versions-per-build.json
HTML2PDF_HTTPClient: sending GET request attempt 1: https://googlechromelabs.github.io/chrome-for-testing/latest-patch-versions-per-build.json
HTML2PDF: Chrome Driver available at path: /Users/Stanislaw/workspace/projects/strictdoc-project/strictdoc/strictdoc/export/html2pdf/.wdm/drivers/chromedriver/mac64/120.0.6099.109/chromedriver-mac-x64/chromedriver

Is this behavior as expected? Our interest here: Since we are testing our HTML2PDF function in our tests, we are interested in seeing less HTTP requests to be made to the internet.

Thank you.

@cortlepp
Copy link

Hey,
(for clarification: I am neither a maintainer nor a contributor, just another interested user)
I have had (for unrelated reasons) some problems with this library's http calls when instantiating the driver and investigated a bit. My findings:

  • You could at least minimize the calls to https://googlechromelabs.github.io/chrome-for-testing/latest-patch-versions-per-build.json to 1 by just remembering the response. This is actually already supposed to be done, but it is currently broken (I would happily make a PR for this, but I don't think that there is anybody who could merge it).
  • You cannot (without changing the behaviour of the library) get to 0 because webdriver manager checks that it has the latest driver installed on every instantiation (this is done in DriverCacheManager.find_driver()). For this it of course needs to call the google API. Actually, even if it doesn't find a later version it will download the latest driver again if it is older than x days (by default x is 1).

I don't know why the policy here is so aggressive, and why webdriver-manager is not just happy once it has downloaded a driver that is suitable for the currently installed browser. Is it really necessary to always be using the very latest patch? Maybe a maintainer/contributor can shed some light on this, but it seems a bit excessive to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants