Skip to content

Commit

Permalink
update for 2.0 (#61)
Browse files Browse the repository at this point in the history
* update for 2.0

* update schema
  • Loading branch information
radiofreejohn authored Jul 25, 2024
1 parent 4a56977 commit 47bdd4f
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 38 deletions.
9 changes: 6 additions & 3 deletions docs/chains.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,14 @@ sidebar_position: 3

Currently, the following chains are supported, the name in parenthesis is the path name in R2.

- Ethereum (ethereum)
- Arweave (arweave) - blocks and transactions are still backfilling
- Ethereum (ethereum) - decoded logs are still backfilling
- Gnosis (gnosis)
- Base (base-goerli)
- Arweave (arweave)
- Linea (linea)
- Optimism (optimism)
- PGN (pgn)
- zkSync (zksync) - decoded logs are still backfilling
- Zora (zora)

# Requesting Support

Expand Down
8 changes: 4 additions & 4 deletions docs/dataset/awscli.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ First, you’ll need to add the account and secret key for the indexed.xyz R2 bu

```
[indexedxyz]
aws_access_key_id = 43c31ff797ec2387177cabab6d18f15a
aws_secret_access_key = afb354f05026f2512557922974e9dd2fdb21e5c2f5cbf929b35f0645fb284cf7
aws_access_key_id = 094c97e8d9532a90e8b04a910e27e34b
aws_secret_access_key = 9ecf4202fe4c67127e1ce6656626f094585e27494a51d57f457cfff410307ef4
```

## Downloading with the AWS CLI tools
Expand All @@ -17,9 +17,9 @@ aws_secret_access_key = afb354f05026f2512557922974e9dd2fdb21e5c2f5cbf929b35f0645
To retrieve the files using the AWS cli tools, you can then run the following command in a terminal with the provided credentials:

```bash
$ aws s3 cp --endpoint-url https://data.indexed.xyz --profile indexedxyz s3://indexed-xyz/ethereum/decoded/logs/v1.2.0/partition_key=9d/ . --recursive
$ aws s3 cp --endpoint-url https://ed5d915e0259fcddb2ab1ce5592040c3.r2.cloudflarestorage.com --profile indexedxyz s3://indexed-xyz-wnam/ethereum/raw/logs/v2.0.0/dt=2020-02-20/ . --recursive
```

This will download the Parquet files into the current directory.

> Keep in mind that since the partition keys are only two digits, the partitions will contain data for multiple smart contracts, not necessarily just the one that you’re looking for.
> Keep in mind that since the data is partitioned by day, the download will contain data for multiple smart contracts, not necessarily just the one that you’re looking for.
18 changes: 9 additions & 9 deletions docs/dataset/rclone.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,20 @@ First, you’ll need to add the account and secret key for the indexed.xyz R2 bu
[r2]
type = s3
provider = Cloudflare
access_key_id = 43c31ff797ec2387177cabab6d18f15a
secret_access_key = afb354f05026f2512557922974e9dd2fdb21e5c2f5cbf929b35f0645fb284cf7
access_key_id = 094c97e8d9532a90e8b04a910e27e34b
secret_access_key = 9ecf4202fe4c67127e1ce6656626f094585e27494a51d57f457cfff410307ef4
region = auto
endpoint = https://data.indexed.xyz
endpoint = https://ed5d915e0259fcddb2ab1ce5592040c3.r2.cloudflarestorage.com
```

## rclone configuration inputs

Follow the instructions linked above. You will need three inputs specific to the indexed.xyz R2 bucket:

```
access_key_id = 43c31ff797ec2387177cabab6d18f15a
secret_access_key = afb354f05026f2512557922974e9dd2fdb21e5c2f5cbf929b35f0645fb284cf7
endpoint = https://data.indexed.xyz
access_key_id = 094c97e8d9532a90e8b04a910e27e34b
secret_access_key = 9ecf4202fe4c67127e1ce6656626f094585e27494a51d57f457cfff410307ef4
endpoint = https://ed5d915e0259fcddb2ab1ce5592040c3.r2.cloudflarestorage.com
```

## Downloading with rclone
Expand All @@ -31,9 +31,9 @@ endpoint = https://data.indexed.xyz
To retrieve the files using the rclone cli tool, you can then run the following command in a terminal with the provided credentials:

```bash
$ rclone copy r2://indexed-xyz/ethereum/decoded/logs/v1.2.0/partition_key=9d/ .
$ rclone copy r2://indexed-xyz-wnam/ethereum/raw/logs/v2.0.0/dt=2020-02-20/ .
```

This will download the Parquet files into the current directory.
This will download the Parquet files for the specified date into the current directory.

> Keep in mind that since the partition keys are only two digits, the partitions will contain data for multiple smart contracts, not necessarily just the one that you’re looking for.
> Keep in mind that data partitioned by date contains all rows for that particular day. If you want to filter for specific contracts, then you can do it after downloading the data.
31 changes: 9 additions & 22 deletions docs/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,30 +11,13 @@ We're hosting indexed.xyz on Cloudflare's R2. R2 has an S3 API, so you can grab

The prefix structure for data in R2 is:

`s3://indexed-xyz/<chain>/(decoded|raw)/logs/v1.2.0/partition_key=<XX>/dt=<YYYY>`
`s3://indexed-xyz-wnam/<chain>/(decoded|raw)/logs/v2.0.0/dt=<yyyy-MM-dd>`

For now, the only chain available is ethereum (without the angle brackets), though we will be expanding that as we go, if there's a chain you would like to see, shoot us an [email](mailto:support@goldsky.com) and we'll consider adding it.
Right now [these chains](chains.md) are supported, but if you'd like to see other chains here, shoot us an [email](mailto:support@goldsky.com) and we'll consider adding it.

You'll probably want the decoded files, as that's what this document describes.

The partition key is a two digit hexadecimal value that's chosen based on the lower-cased md5 hash of the lower-cased smart contract address.

For example:

```javascript
const crypto = require('crypto');
const contract = '0x22c1f6050e56d2876009903609a2cc3fef83b415';
const prefix = crypto
.createHash('md5')
.update(`${contract}`)
.digest('hex')
.slice(-2);

console.log(prefix);
// e4
```

Finally, the data is further partitioned by year. In most tools you can leave that part of the prefix off and download all years recursively, but to limit downloads and local storage, you may want to pull a smaller subset of the data to get started.
The data is partitioned by day. In most tools you can leave that part of the prefix off and download all data recursively, but to limit downloads and local storage, you may want to pull a smaller subset of the data to get started.

## Decoded Logs

Expand All @@ -44,7 +27,7 @@ The Parquet file scheme we're using is:

| column_name | column_type |
| ----------------- | ----------- |
| block_time | BIGINT |
| block_timestam | BIGINT |
| address | VARCHAR |
| event_signature | VARCHAR |
| event_params | VARCHAR[] |
Expand All @@ -56,6 +39,7 @@ The Parquet file scheme we're using is:
| data | VARCHAR |
| topics | VARCHAR |
| id | VARCHAR |
| dt | VARCHAR |

Here’s an example from one of the files, queried using [DuckDB](https://duckdb.org):

Expand Down Expand Up @@ -96,6 +80,7 @@ Some caveats to keep in mind:
| timestamp | BIGINT |
| transaction_count | BIGINT |
| base_fee_per_gas | BIGINT |
| dt | VARCHAR |

## Raw Transactions

Expand All @@ -117,12 +102,13 @@ Some caveats to keep in mind:
| max_priority_fee_per_gas | VARCHAR |
| transaction_type | BIGINT |
| block_timestamp | BIGINT |
| dt | VARCHAR |

## Raw Logs

| column_name | column_type |
| ----------------- | ----------- |
| block_time | BIGINT |
| block_timestamp | BIGINT |
| block_number | BIGINT |
| block_hash | VARCHAR |
| transaction_hash | VARCHAR |
Expand All @@ -132,6 +118,7 @@ Some caveats to keep in mind:
| data | VARCHAR |
| topics | VARCHAR |
| id | VARCHAR |
| dt | VARCHAR |

## Arweave Raw Blocks

Expand Down

0 comments on commit 47bdd4f

Please sign in to comment.