update for 2.0 (#61)

* update for 2.0 * update schema
indexed-xyz · Jul 25, 2024 · 47bdd4f · 47bdd4f
1 parent 4a56977
commit 47bdd4f
Show file tree

Hide file tree

Showing 4 changed files with 28 additions and 38 deletions.
diff --git a/docs/chains.md b/docs/chains.md
@@ -6,11 +6,14 @@ sidebar_position: 3
 
 Currently, the following chains are supported, the name in parenthesis is the path name in R2.
 
-- Ethereum (ethereum)
+- Arweave (arweave) - blocks and transactions are still backfilling
+- Ethereum (ethereum) - decoded logs are still backfilling
 - Gnosis (gnosis)
-- Base (base-goerli)
-- Arweave (arweave)
 - Linea (linea)
+- Optimism (optimism)
+- PGN (pgn)
+- zkSync (zksync) - decoded logs are still backfilling
+- Zora (zora)
 
 # Requesting Support
 

diff --git a/docs/dataset/awscli.md b/docs/dataset/awscli.md
@@ -6,8 +6,8 @@ First, you’ll need to add the account and secret key for the indexed.xyz R2 bu
 
 ```
 [indexedxyz]
-aws_access_key_id = 43c31ff797ec2387177cabab6d18f15a
-aws_secret_access_key = afb354f05026f2512557922974e9dd2fdb21e5c2f5cbf929b35f0645fb284cf7
+aws_access_key_id = 094c97e8d9532a90e8b04a910e27e34b
+aws_secret_access_key = 9ecf4202fe4c67127e1ce6656626f094585e27494a51d57f457cfff410307ef4
 ```
 
 ## Downloading with the AWS CLI tools
@@ -17,9 +17,9 @@ aws_secret_access_key = afb354f05026f2512557922974e9dd2fdb21e5c2f5cbf929b35f0645
 To retrieve the files using the AWS cli tools, you can then run the following command in a terminal with the provided credentials:
 
 ```bash
-$ aws s3 cp --endpoint-url https://data.indexed.xyz --profile indexedxyz s3://indexed-xyz/ethereum/decoded/logs/v1.2.0/partition_key=9d/ . --recursive
+$ aws s3 cp --endpoint-url https://ed5d915e0259fcddb2ab1ce5592040c3.r2.cloudflarestorage.com --profile indexedxyz s3://indexed-xyz-wnam/ethereum/raw/logs/v2.0.0/dt=2020-02-20/  . --recursive
 ```
 
 This will download the Parquet files into the current directory.
 
-> Keep in mind that since the partition keys are only two digits, the partitions will contain data for multiple smart contracts, not necessarily just the one that you’re looking for.
+> Keep in mind that since the data is partitioned by day, the download will contain data for multiple smart contracts, not necessarily just the one that you’re looking for.
diff --git a/docs/dataset/rclone.md b/docs/dataset/rclone.md
@@ -8,20 +8,20 @@ First, you’ll need to add the account and secret key for the indexed.xyz R2 bu
 [r2]
 type = s3
 provider = Cloudflare
-access_key_id = 43c31ff797ec2387177cabab6d18f15a
-secret_access_key = afb354f05026f2512557922974e9dd2fdb21e5c2f5cbf929b35f0645fb284cf7
+access_key_id = 094c97e8d9532a90e8b04a910e27e34b
+secret_access_key = 9ecf4202fe4c67127e1ce6656626f094585e27494a51d57f457cfff410307ef4
 region = auto
-endpoint = https://data.indexed.xyz
+endpoint = https://ed5d915e0259fcddb2ab1ce5592040c3.r2.cloudflarestorage.com
 ```
 
 ## rclone configuration inputs
 
 Follow the instructions linked above. You will need three inputs specific to the indexed.xyz R2 bucket:
 
 ```
-access_key_id = 43c31ff797ec2387177cabab6d18f15a
-secret_access_key = afb354f05026f2512557922974e9dd2fdb21e5c2f5cbf929b35f0645fb284cf7
-endpoint = https://data.indexed.xyz
+access_key_id = 094c97e8d9532a90e8b04a910e27e34b
+secret_access_key = 9ecf4202fe4c67127e1ce6656626f094585e27494a51d57f457cfff410307ef4
+endpoint = https://ed5d915e0259fcddb2ab1ce5592040c3.r2.cloudflarestorage.com
 ```
 
 ## Downloading with rclone
@@ -31,9 +31,9 @@ endpoint = https://data.indexed.xyz
 To retrieve the files using the rclone cli tool, you can then run the following command in a terminal with the provided credentials:
 
 ```bash
-$ rclone copy r2://indexed-xyz/ethereum/decoded/logs/v1.2.0/partition_key=9d/ .
+$ rclone copy r2://indexed-xyz-wnam/ethereum/raw/logs/v2.0.0/dt=2020-02-20/ .
 ```
 
-This will download the Parquet files into the current directory.
+This will download the Parquet files for the specified date into the current directory.
 
-> Keep in mind that since the partition keys are only two digits, the partitions will contain data for multiple smart contracts, not necessarily just the one that you’re looking for.
+> Keep in mind that data partitioned by date contains all rows for that particular day. If you want to filter for specific contracts, then you can do it after downloading the data.
diff --git a/docs/schema.md b/docs/schema.md
@@ -11,30 +11,13 @@ We're hosting indexed.xyz on Cloudflare's R2. R2 has an S3 API, so you can grab
 
 The prefix structure for data in R2 is:
 
-`s3://indexed-xyz/<chain>/(decoded|raw)/logs/v1.2.0/partition_key=<XX>/dt=<YYYY>`
+`s3://indexed-xyz-wnam/<chain>/(decoded|raw)/logs/v2.0.0/dt=<yyyy-MM-dd>`
 
-For now, the only chain available is ethereum (without the angle brackets), though we will be expanding that as we go, if there's a chain you would like to see, shoot us an [email](mailto:support@goldsky.com) and we'll consider adding it.
+Right now [these chains](chains.md) are supported, but if you'd like to see other chains here, shoot us an [email](mailto:support@goldsky.com) and we'll consider adding it.
 
 You'll probably want the decoded files, as that's what this document describes.
 
-The partition key is a two digit hexadecimal value that's chosen based on the lower-cased md5 hash of the lower-cased smart contract address.
-
-For example:
-
-```javascript
-const crypto = require('crypto');
-const contract = '0x22c1f6050e56d2876009903609a2cc3fef83b415';
-const prefix = crypto
-  .createHash('md5')
-  .update(`${contract}`)
-  .digest('hex')
-  .slice(-2);
-
-console.log(prefix);
-// e4
-```
-
-Finally, the data is further partitioned by year. In most tools you can leave that part of the prefix off and download all years recursively, but to limit downloads and local storage, you may want to pull a smaller subset of the data to get started.
+The data is partitioned by day. In most tools you can leave that part of the prefix off and download all data recursively, but to limit downloads and local storage, you may want to pull a smaller subset of the data to get started.
 
 ## Decoded Logs
 
@@ -44,7 +27,7 @@ The Parquet file scheme we're using is:
 
 | column_name       | column_type |
 | ----------------- | ----------- |
-| block_time        | BIGINT      |
+| block_timestam    | BIGINT      |
 | address           | VARCHAR     |
 | event_signature   | VARCHAR     |
 | event_params      | VARCHAR[]   |
@@ -56,6 +39,7 @@ The Parquet file scheme we're using is:
 | data              | VARCHAR     |
 | topics            | VARCHAR     |
 | id                | VARCHAR     |
+| dt                | VARCHAR     |
 
 Here’s an example from one of the files, queried using [DuckDB](https://duckdb.org):
 
@@ -96,6 +80,7 @@ Some caveats to keep in mind:
 | timestamp         | BIGINT      |
 | transaction_count | BIGINT      |
 | base_fee_per_gas  | BIGINT      |
+| dt                | VARCHAR     |
 
 ## Raw Transactions
 
@@ -117,12 +102,13 @@ Some caveats to keep in mind:
 | max_priority_fee_per_gas | VARCHAR     |
 | transaction_type         | BIGINT      |
 | block_timestamp          | BIGINT      |
+| dt                       | VARCHAR     |
 
 ## Raw Logs
 
 | column_name       | column_type |
 | ----------------- | ----------- |
-| block_time        | BIGINT      |
+| block_timestamp   | BIGINT      |
 | block_number      | BIGINT      |
 | block_hash        | VARCHAR     |
 | transaction_hash  | VARCHAR     |
@@ -132,6 +118,7 @@ Some caveats to keep in mind:
 | data              | VARCHAR     |
 | topics            | VARCHAR     |
 | id                | VARCHAR     |
+| dt                | VARCHAR     |
 
 ## Arweave Raw Blocks