Skip to content

Commit

Permalink
Merge pull request #22 from bgpkit/crawler
Browse files Browse the repository at this point in the history
`bgpkit-broker` cli with crawler + api + search
  • Loading branch information
digizeph authored Jul 30, 2023
2 parents f662280 + f552ba1 commit f5a88cb
Show file tree
Hide file tree
Showing 19 changed files with 2,604 additions and 140 deletions.
20 changes: 20 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# flyctl launch added from .gitignore
target
**/Cargo.lock
**/.idea

**/*.sqlite3*
**/*.duckdb*
**/*.parquet
.env

# flyctl launch added from .idea/.gitignore
# Default ignored files
.idea/shelf
.idea/workspace.xml
# Editor-based HTTP Client requests
.idea/httpRequests
# Datasource local storage ignored files
.idea/dataSources
.idea/dataSources.local.xml
fly.toml
4 changes: 2 additions & 2 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ jobs:
- uses: actions/checkout@v3
- name: Build
run: cargo build --verbose
- name: Run tests
run: cargo test --verbose
- name: Test SDK
run: cargo test --no-default-features --verbose
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
/target
Cargo.lock
.idea

*.sqlite3*
*.duckdb*
*.parquet
/.env
59 changes: 55 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,24 +1,75 @@
[package]
name = "bgpkit-broker"
version = "0.6.2"
version = "0.7.0-alpha.1"
edition = "2018"
authors = ["Mingwei Zhang <mingwei@bgpkit.com>"]
readme = "README.md"
license = "MIT"
repository = "https://github.com/bgpkit/bgpkit-broker"
documentation = "https://docs.rs/bgpkit-broker"
description = """
A library to access BGPKIT Broker API and enable searching for BGP data archive files over time from public available
data sources.
A library and command-line to provide indexing and searching functionalities for public BGP data archive files over time.
"""
keywords = ["bgp", "bgpkit", "api"]

[[bin]]
path= "src/cli/broker.rs"
name="bgpkit-broker"
required-features = ["cli"]

[dependencies]

#############################################
# Core Broker Rust SDK dependencies
#############################################
chrono = { version = "0.4", features = ["serde"] }
log="0.4"
reqwest = {version = "0.11.17", features = ["blocking", "json"]}
serde={version="1", features = ["derive"]}
serde_json = "1"
thiserror = "1.0"
tracing = "0.1"

#############################################
# Optional dependencies
#############################################

# command-line interface dependencies
clap = {version= "4.3", features=["derive"], optional=true}
dirs = {version="5", optional=true}
envy = {version = "0.4", optional = true }
humantime = {version="2.1", optional = true}
num_cpus = {version="1.15", optional=true}
tabled = {version = "0.13", optional = true}
tracing-subscriber = {version="0.3", optional=true}

# crawler dependencies
futures = {version="0.3", optional = true}
oneio = {version="0.11.0", features = ["lib", "s3"], optional = true}
regex = { version = "1", optional = true }
scraper = { version = "0.17", optional = true }
tokio = {version="1", optional = true, features = ["full"] }

# api dependencies
poem = {version="1", optional = true}
poem-openapi = {version= "3", features=['swagger-ui', 'chrono'], optional = true}

# database dependencies
duckdb = {version="0.8", optional = true, features = ["bundled", "parquet", "httpfs", "r2d2"]}
r2d2 = {version="0.8", optional = true}

[features]
default=[]
cli = [
# command-line interface
"clap", "dirs", "envy", "humantime", "num_cpus", "tracing-subscriber", "tabled",
# crawler
"futures", "oneio", "regex", "scraper", "tokio",
# database
"duckdb", "r2d2",
# RESTful API
"poem", "poem-openapi",
]

[dev-dependencies]
anyhow = "1.0"
tracing-subscriber = "0.3.17"
22 changes: 22 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# select build image
FROM rust:1.70 as build

# create a new empty shell project
RUN USER=root cargo new --bin my_project
WORKDIR /my_project

# copy your source tree
COPY ./src ./src
COPY ./Cargo.toml .

# build for release
RUN cargo build --release

# our final base
FROM debian:bullseye

# copy the build artifact from the build stage
COPY --from=build /my_project/target/release/bgpkit-broker /usr/local/bin/bgpkit-broker
RUN DEBIAN=NONINTERACTIVE apt update; apt install -y curl libssl-dev ca-certificates tzdata cron; rm -rf /var/lib/apt/lists/*

ENTRYPOINT bash -c '/usr/local/bin/bgpkit-broker serve'
153 changes: 133 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,22 +17,31 @@
[mastodon-url]: https://infosec.exchange/@bgpkit
[mastodon-badge]: https://img.shields.io/mastodon/follow/109852506691103147?domain=https%3A%2F%2Finfosec.exchange&style=social

[BGPKIT Broker](https://bgpkit.com/broker) is an online data API service that allows users to search for publicly available BGP archive
files by time, collector, project, or data type. The service indexes the archives in close to real-time (delay is
less than 5 minutes). Currently, we are indexing BGP table dump and updates files from RIPE RIS and RouteViews.
[BGPKIT Broker](https://bgpkit.com/broker) is an online data API service that allows users to search for publicly available BGP archive files by time, collector, project, or data type. The service indexes the archives in close to real-time (delay is less than 5 minutes). Currently, we are indexing BGP table dump and updates files from [RIPE RIS][ripe-ris] and [RouteViews][route-views].

This Rust library provides access to the BGPKIT Broker API with the capability to search and paginate results.
[ripe-ris]: https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris/ris-data-access/mrt-files-store
[route-views]: http://archive.routeviews.org/

For more details, please check out the BGPKIT API docs at <https://api.bgpkit.com/docs>.
This Rust library provides SDK access to the BGPKIT Broker API as well as a command-line tool to start a self-hosted broker instance.
Current BGPKIT Broker API is available at <https://api.bgpkit.com/docs>.

## Usage
BGPKIT Broker is used in production at [Cloudflare Radar][radar] powering its [routing page][routing] and projects like [BGP hijack detection]() and [route leak detection](https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/).

[radar]: https://radar.cloudflare.com/
[route-leak]: https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/
[hijack]: https://blog.cloudflare.com/bgp-hijack-detection/
[routing]: https://blog.cloudflare.com/radar-routing/

## Broker Rust SDK

### Usage

Add the following dependency line to your project's `Cargo.toml` file:
```yaml
bgpkit-broker = "0.6"
bgpkit-broker = "0.7.0-alpha.1"
```

## Example
### Example

You can run the follow example with `cargo run --example query` ([source code](./examples/query.rs)).

Expand All @@ -57,24 +66,128 @@ pub fn main() {
}
```

## Contribution
## `bgpkit-broker` CLI Tool

### Issues and Pull Requests
`bgpkit-broker` is a command-line application that packages many functionalities to allow users to self-host a BGPKIT Broker instance with ease.

If you found any issues of this Rust library or would like to contribute to the code base, please feel free to open an
issue or pull request. Code or documentation issues/PRs are both welcome.
`bgpkit-broker` has the following subcommands

### Data Provider
```text
A library and command-line to provide indexing and searching functionalities for public BGP data archive files over time.
If you have publicly available data and want to be indexed BGPKIT Broker service, please send us an email at
data@bgpkit.com. Our back-end service is designed to be flexible and should be able to adapt to most data archiving
approaches.
## On-premise Deployment
Usage: bgpkit-broker [OPTIONS] <COMMAND>
Commands:
serve Serve the Broker content via RESTful API
update Update the Broker database
config Print out current configuration
bootstrap Bootstrap the Broker database
backup Export broker database to parquet file
search Search MRT files in Broker db
help Print this message or the help of the given subcommand(s)
Options:
--no-log disable logging
--bootstrap-parquet bootstrap from parquet file instead of DuckDB file
-h, --help Print help
-V, --version Print version
```

### `serve`
`bgpkit-broker serve` is the main command to start the BGPKIT Broker service. It will start a web server that serves the API endpoints. It will also periodically update the local database unless the `--no-update` flag is set.

```text
Serve the Broker content via RESTful API
Usage: bgpkit-broker serve [OPTIONS]
Options:
-i, --update-interval <UPDATE_INTERVAL> update interval in seconds [default: 300]
--no-log disable logging
--bootstrap-parquet bootstrap from parquet file instead of DuckDB file
-h, --host <HOST> host address [default: 0.0.0.0]
-p, --port <PORT> port number [default: 40064]
-r, --root <ROOT> root path, useful for configuring docs UI [default: /]
--no-updater disable updater service
--no-api disable API service
-h, --help Print help
-V, --version Print version
```

### `update`
`bgpkit-broker update` triggers a local database update manually. This command **cannot** be run at the same time as `serve` because the active API will lock the database file.

```text
Update the Broker database
Usage: bgpkit-broker update [OPTIONS]
Options:
--no-log disable logging
--bootstrap-parquet bootstrap from parquet file instead of DuckDB file
-h, --help Print help
-V, --version Print version
```

We provide service to allow companies to host their own BGP Broker backend on-premise to allow maximum
performance and customization. If you are interested in deploying one, please contact us at contact@bgpkit.com.
### `config`
`bgpkit-broker config` displays current configuration, e.g. local database path, update interval, etc.

## Built with ❤️ by BGPKIT Team
```text
Print out current configuration
Usage: bgpkit-broker config [OPTIONS]
Options:
--no-log disable logging
--bootstrap-parquet bootstrap from parquet file instead of DuckDB file
-h, --help Print help
-V, --version Print version
```

### `backup`
`bgpkit-broker update` runs a database backup and export the database to a duckdb file and a parquet file. This *can* be run while `serve` is running.

```text
Export broker database to parquet file
Usage: bgpkit-broker backup [OPTIONS]
Options:
--no-log disable logging
--bootstrap-parquet bootstrap from parquet file instead of DuckDB file
-h, --help Print help
-V, --version Print version
```

### `search`
`bgpkit-broker search` queries for MRT files using the default production API unless specified otherwise.

```text
Search MRT files in Broker db
Usage: bgpkit-broker search [OPTIONS]
Options:
--no-log disable logging
-t, --ts-start <TS_START> Start timestamp
--bootstrap-parquet bootstrap from parquet file instead of DuckDB file
-T, --ts-end <TS_END> End timestamp
-p, --project <PROJECT> filter by route collector projects, i.e. `route-views` or `riperis`
-c, --collector-id <COLLECTOR_ID> filter by collector IDs, e.g. 'rrc00', 'route-views2. use comma to separate multiple collectors
-d, --data-type <DATA_TYPE> filter by data types, i.e. 'update', 'rib'
--page <PAGE> page number
--page-size <PAGE_SIZE> page size
-u, --url <URL>
-j, --json print out search results in JSON format instead of Markdown table
-h, --help Print help
-V, --version Print version
```

## Data Provider

If you have publicly available data and want to be indexed BGPKIT Broker service, please send us an email at
data@bgpkit.com. Our back-end service is designed to be flexible and should be able to adapt to most data archiving
approaches.

<a href="https://bgpkit.com"><img src="https://bgpkit.com/Original%20Logo%20Cropped.png" alt="https://bgpkit.com/favicon.ico" width="200"/></a>
Loading

0 comments on commit f5a88cb

Please sign in to comment.