Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSl Proxy Support #9

Open
Chetan11-dev opened this issue Dec 23, 2023 · 0 comments
Open

SSl Proxy Support #9

Chetan11-dev opened this issue Dec 23, 2023 · 0 comments

Comments

@Chetan11-dev
Copy link

Chetan11-dev commented Dec 23, 2023

Hi, I have created a package named botasaurus-proxy-authentication, which enables SSL support for proxies requiring authentication.

For instance, when using an authenticated proxy with a tool like seleniumwire to scrape a Cloudflare-protected website such as G2.com, a non-SSL connection typically results in being blocked.

To illustrate, run this code:

First, install the required packages:

python -m pip install selenium_wire chromedriver_autoinstaller

Then, execute this Python script:

from seleniumwire import webdriver
from chromedriver_autoinstaller import install

# Define the proxy
proxy_options = {
    'proxy': {
        'http': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy
        'https': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy
    }
}

# Install and set up the driver
driver_path = install()
driver = webdriver.Chrome(driver_path, seleniumwire_options=proxy_options)

# Navigate to the desired URL
link = 'https://www.g2.com/products/github/reviews'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')

# Wait for user input
input("Press Enter to exit...")

# Clean up
driver.quit()

You'll likely be blocked by Cloudflare:

blocked

First, install the required packages:

python -m pip install botasaurus-proxy-authentication

However, using botasaurus_proxy_authentication with proxies circumvents this problem. Notice the difference by running the following code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from chromedriver_autoinstaller import install
from botasaurus_proxy_authentication import add_proxy_options

# Define the proxy settings
proxy = 'http://username:password@proxy-provider-domain:port'  # Replace with your proxy

# Set Chrome options
chrome_options = Options()
add_proxy_options(chrome_options, proxy)

# Install and set up the driver
driver_path = install()
driver = webdriver.Chrome(driver_path, options=chrome_options)

# Navigate to the desired URL
link = 'https://www.g2.com/products/github/reviews'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')

# Wait for user input
input("Press Enter to exit...")

# Clean up
driver.quit()

Result:
not blocked

I suggest using botasaurus_proxy_authentication for its SSL support for authenticated proxies, improving the success rate of scraping Cloudflare-protected websites and thus increasing revenue for Oxylabs.
Also, Thanks Oxylabs for your Great Work in Proxy.
Good Luck to the Team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant