Skip to content

Commit

Permalink
Merge pull request #138 from algo7/57-update-readmemd-to-reflect-the-…
Browse files Browse the repository at this point in the history
…addition-of-proxy_pool-service

57 update readmemd to reflect the addition of proxy pool service
  • Loading branch information
algo7 committed Dec 9, 2023
2 parents 9ac17d8 + 67802b6 commit 3f97c3d
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 5 deletions.
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ A simple scraper for TripAdvisor reviews.
- [Run the container provisioner](#run-the-container-provisioner)
- [Visit the UI](#visit-the-ui)
- [Live Demo](#live-demo)
- [Proxy Pool](#proxy-pool)
- [Running the Proxy Pool](#running-the-proxy-pool)

## How to Install Docker:
1. [Windows](https://docs.docker.com/desktop/windows/install/)
Expand Down Expand Up @@ -83,4 +85,18 @@ The `docker-compose.yml` for the provisioner is located in the `container_provis
The UI is accessible at `http://localhost:3000`.

## Live Demo
A live demo of the container provisioner is available at [https://algo7.tools](https://algo7.tools).
A live demo of the container provisioner is available at [https://algo7.tools](https://algo7.tools).

# Proxy Pool
Proxy Pool is a docker image that runs both HTTP and SOCKS5 Proxies over OpenVPN (config to be provided by the user via docker bind mounts). `sockd`, `squid`, and `openvpn` client are managed by `supervisord` in the container. The service integrates with the Container Provisioner to provide a pool of proxies for the scraper to use. The container provisioner uses `docker-compose labels` to distinguish between different proxies. At this moment, the container provisioner only supports connecting to the Proxy Pool using HTTP proxies. Each service in the `docker-compose.yml` file represents a single proxy in the pool. The `docker-compose.yml` file for the proxy pool is located in the `proxy_pool` folder.

The Proxy Pool service can also be used directly with the scraper. Just make sure that the `PROXY_ADDRESS` environment variable is in the `docker-compose.yml` file for the scraper.

## Running the Proxy Pool
1. Pull the latest scraper Docker image
```bash
docker pull ghcr.io/algo7/tripadvisor-review-scraper/vpn_worker:latest
```
2. Create a docker-compose.yml file containing the configurations for each proxy (see the docker-compose.yml provided in the proxy_pool folder).
3. Place the OpenVPN config file of each proxy in the corresponding bind mount folder speicified in the docker-compose.yml file.
4. Run `docker-compose up` to start the container.
8 changes: 4 additions & 4 deletions proxy_pool/docker-compose-dev.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
version: '3.9'
services:
vpnch:
image: su:latest
image: ghcr.io/algo7/tripadvisor-review-scraper/vpn_worker:latest
labels:
# This label is used by the container_provisioner to identify the containers that are part of the proxy pool.
- 'TaskOwner=PROXY'
Expand Down Expand Up @@ -44,7 +44,7 @@ services:
create_host_path: true

vpnse:
image: su:latest
image: ghcr.io/algo7/tripadvisor-review-scraper/vpn_worker:latest
labels:
- 'TaskOwner=PROXY'
- 'vpn.region=SE'
Expand Down Expand Up @@ -79,7 +79,7 @@ services:
create_host_path: true

vpnuk:
image: su:latest
image: ghcr.io/algo7/tripadvisor-review-scraper/vpn_worker:latest
labels:
- 'TaskOwner=PROXY'
- 'vpn.region=UK'
Expand Down Expand Up @@ -114,7 +114,7 @@ services:
create_host_path: true

vpnbe:
image: su:latest
image: ghcr.io/algo7/tripadvisor-review-scraper/vpn_worker:latest
labels:
- 'TaskOwner=PROXY'
- 'vpn.region=BE'
Expand Down

0 comments on commit 3f97c3d

Please sign in to comment.