Skip to content

Commit

Permalink
LWMS ID 11389: HeatWave Lakehouse Quaterly update (#439)
Browse files Browse the repository at this point in the history
* LWMS ID 11389: HeatWave Lakehouse Quaterly update

LWMS ID 11389: HeatWave Lakehouse Quaterly update
LWMS ID 11389: Updated Lab 6 to read column names from headers in data files with MySQL HeatWave Lakehouse

* LWMS ID 11389: Fixed Lab1 Title

LWMS ID 11389: Fixed Lab1 Title
  • Loading branch information
plforacle committed Nov 8, 2023
1 parent 0b3b227 commit 9ba8f73
Show file tree
Hide file tree
Showing 11 changed files with 48 additions and 129 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ A set of files have been created for you to use in this workshop. You will creat

- An Oracle Trial or Paid Cloud Account
- Some Experience with MySQL Shell
- Completed Lab 3
- Completed Lab 5

## Task 1: Download and unzip Sample files

Expand All @@ -41,19 +41,19 @@ A set of files have been created for you to use in this workshop. You will creat
3. Download sample files

```bash
<copy>wget https://objectstorage.us-ashburn-1.oraclecloud.com/p/nnsIBVX1qztFmyAuwYIsZT2p7Z-tWBcuP9xqPCdND5LzRDIyBHYqv_8a26Z38Kqq/n/mysqlpm/b/plf_mysql_customer_orders/o/lakehouse/lakehouse-order.zip</copy>
<copy>wget https://objectstorage.us-ashburn-1.oraclecloud.com/p/11vOOD1Z73v4baInYk3QlKOOZWb1BMo4gIcogWrO0jS4GQ29yFaQxwW9Jl6ufOFm/n/mysqlpm/b/mysql_customer_orders/o/lakehouse/lakehouse-orders-v3.zip</copy>
```

4. Unzip lakehouse-order.zip file which will generate folder datafiles with 4 files
4. Unzip lakehouse-order.zip file which will generate folder data with 4 files

```bash
<copy>unzip lakehouse-order.zip</copy>
<copy>unzip lakehouse-orders-v3.zip</copy>
```

5. Go into the lakehouse/datafiles folder and list all of the files
5. Go into the lakehouse/data folder and list all of the files

```bash
<copy>cd ~/lakehouse/datafiles</copy>
<copy>cd ~/lakehouse/data</copy>
```

```bash
Expand Down Expand Up @@ -93,10 +93,10 @@ A set of files have been created for you to use in this workshop. You will creat
## Task 3: Add files into the Bucket using the saved PAR URL
1. Go into the lakehouse/datafiles folder and list all of the files
1. Go into the lakehouse/data folder and list all of the files
```bash
<copy>cd ~/lakehouse/datafiles</copy>
<copy>cd ~/lakehouse/data</copy>
```
```bash
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified heatwave-lakehouse/load-csv-data/images/load-delivery-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified heatwave-lakehouse/load-csv-data/images/load-script-dryrun.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified heatwave-lakehouse/load-csv-data/images/set-table-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
157 changes: 38 additions & 119 deletions heatwave-lakehouse/load-csv-data/load-csv-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,27 +26,28 @@ We will now load the DELIVERY_ORDERS table from the Object Store. This is a larg

## Task 1: Create the PAR Link for the "delivery_order" files

1. To create a PAR URL
- Go to menu **Storage —> Buckets**
![Bucket menu](./images/storage-bucket-menu.png "storage bucket menu")
1. Create a PAR URL for all of the **order folder** objects with a prefix

- Select **lakehouse-files —> order** folder.
2. Select the first file —> **delivery-orders-1.csv** and click the three vertical dots.
3. Click on **Create Pre-Authenticated Request**
- a. From your OCI console, navigate to your lakehouse-files bucket in OCI.
- b. Select the folder —> order and click the three vertical dots.

![delivery-orders-1.csv 3 dots](./images/storage-create-par-orders.png "storage create par orders")
![Select folder](./images/storage-delivery-orders-folder.png "storage delivery order folder")

- c. Click on ‘Create Pre-Authenticated Request’
- d. Click to select the ‘Objects with prefix’ option under ‘PreAuthentcated Request Target’.
- e. Leave the ‘Access Type’ option as-is: ‘Permit object reads on those with the specified prefix’.
- g. Click to select the ‘Enable Object Listing’ checkbox.
- h. Click the ‘Create Pre-Authenticated Request’ button.

4. The **Object** option will be pre-selected.
5. Keep **Permit object reads** selected
6. Kep the other options for **Access Type** unchanged.
7. Click the **Create Pre-Authenticated Request** button.
![Create Folder PAR](./images/storage-delivery-orders-folder-page.png "storage delivery order folder page")

![Create PAR](./images/storage-create-par-orders-page.png "storage create par orders page")
- i. Click the ‘Copy’ icon to copy the PAR URL.
- j. Save the generated PAR URL; you will need it later.
- k. You can test the URL out by pasting it in your browser. It should return output like this:

8. Click the **Copy** icon to copy the PAR URL.
![Copy PAR](./images/storage-create-par-orders-page-copy.png "storage create par orders page copy")
![List folder file](./images/storage-delivery-orders-folder-list.png "storage delivery order folder list")

9. Save the generated PAR URL; you will need it in the next task
2. Save the generated PAR URL; you will need it in the next task

## Task 2: Connect to your MySQL HeatWave system using Cloud Shell

Expand Down Expand Up @@ -92,7 +93,7 @@ We will now load the DELIVERY_ORDERS table from the Object Store. This is a larg

## Task 3: Run Autoload to infer the schema and estimate capacity required for the DELIVERY table in the Object Store

1. Part of the DELIVERY information for orders is contained in the delivery-orders-1.csv file in object store for which we have created a PAR URL in the earlier task. In a later task, we will load the other files for the DELIVER_ORDERS table into MySQL HeatWave. Enter the following commands one by one and hit Enter.
1. The DELIVERY information for orders is contained in the delivery-orders-*1*.csv files in object store for which we have created a PAR URL in the earlier task. Enter the following commands one by one and hit Enter.

2. This sets the schema we will load table data into. Don’t worry if this schema has not been created. Autopilot will generate the commands for you to create this schema if it doesn’t exist.

Expand All @@ -107,14 +108,16 @@ We will now load the DELIVERY_ORDERS table from the Object Store. This is a larg
"db_name": "mysql_customer_orders",
"tables": [{
"table_name": "delivery_orders",
"dialect":
{
"format": "csv",
"field_delimiter": "\\t",
"record_delimiter": "\\n"
},
"file": [{"par": "(PAR URL)"}]
}] }]';</copy>
"dialect": {
"format": "csv",
"field_delimiter": "\\t",
"record_delimiter": "\\r\\n",
"has_header": true,
"is_strict_mode": false},
"file": [{"par": "(PAR URL)"}]
}
]}
]';</copy>
```

- It should look like the following example (Be sure to include the PAR Link inside at of quotes("")):
Expand All @@ -124,7 +127,7 @@ We will now load the DELIVERY_ORDERS table from the Object Store. This is a larg
4. This command populates all the options needed by Autoload:

```bash
<copy>SET @options = JSON_OBJECT('mode', 'dryrun', 'policy', 'disable_unsupported_columns', 'external_tables', CAST(@dl_tables AS JSON));</copy>
<copy>SET @options = JSON_OBJECT('mode', 'dryrun', 'policy', 'disable_unsupported_columns', 'external_tables', CAST(@dl_tables AS JSON));</copy>
```

5. Run this Autoload command:
Expand All @@ -147,31 +150,17 @@ We will now load the DELIVERY_ORDERS table from the Object Store. This is a larg

![Dryrun script](./images/load-script-dryrun.png "load script dryrun")

8. The execution result conatins the SQL statements needed to create the table and then load this table data from the Object Store into HeatWave.
8. The execution result contains the SQL statements needed to create the table and then load this table data from the Object Store into HeatWave.

![Create Table](./images/create-delivery-order.png "create delivery order")

9. Copy the **CREATE TABLE** command from the results. It should look like the following example

![autopilot create table with no field name](./images/create-table-no-fieldname.png "autopilot create table with no field name")

10. Modify the **CREATE TABLE** command to replace the generic column names, such as **col\_1**, with descriptive column names. Use the following values:

- `col_1 : orders_delivery`
- `col_2 : order_id`
- `col_3 : customer_id`
- `col_4 : order_status`
- `col_5 : store_id`
- `col_6 : delivery_vendor_id`
- `col_7 : estimated_time_hours`

11. Your modified **CREATE TABLE** command should look like the following example:

![autopilot create table with field name](./images/create-table-fieldname.png "autopilot create table with field name")

12. Execute the modified **CREATE TABLE** command to create the delivery_orders table.
10. Execute the **CREATE TABLE** command to create the delivery_orders table.

13. The create command and result should look lie this
11. The create command and result should look lie this

![Delivery Table create](./images/create-delivery-table.png "create delivery table")

Expand All @@ -191,102 +180,32 @@ We will now load the DELIVERY_ORDERS table from the Object Store. This is a larg
<copy> ALTER TABLE `mysql_customer_orders`.`delivery_orders` SECONDARY_LOAD; </copy>
```

3. Once Autoload completes,point to the schema

```bash
<copy>use mysql_customer_orders</copy>
```

4. Check the number of rows loaded into the table.
3. Check the number of rows loaded into the table.

```bash
<copy>select count(*) from delivery_orders;</copy>
```

5. View a sample of the data in the table.
The DELIVERY table has 34 million rows.

4. View a sample of the data in the table.

```bash
<copy>select * from delivery_orders limit 5;</copy>
```

a. Join the delivery_orders table with other table in the schema
5. Join the delivery_orders table with other table in the schema

```bash
<copy> select o.* ,d.* from orders o
join delivery_orders d on o.order_id = d.order_id
where o.order_id = 93751524; </copy>
```
6. Your output for steps 2 thru 5 should look like this:
6. Output of steps 6 through 5
![Add data to table](./images/load-delivery-table.png "load delivery table")
7. Your DELIVERY table is now ready to be used in queries with other tables. In the next lab, we will see how to load additional data for the DELIVERY table from the Object Store using different options.
## Task 5: Load all data for DELIVERY table from Object Store
The DELIVERY table contains data loaded from one file so far. If new data arrives as more files, we can load those files too. The first option is by specifying a list of the files in the table definition. The second option is by specifying a prefix and have all files with that prefix be source files for the DELIVERY table. The third option is by specifying the entire folder in the Object Store to be the source file for the DELIVERY table.
We will use the second option which Loads the data by specifying a PAR URL for all objects with a prefix.
1. First unload the DELIVERY table from HeatWave:
```bash
<copy>ALTER TABLE delivery_orders SECONDARY_UNLOAD;</copy>
```
2. Create a PAR URL for all objects with a prefix
- a. From your OCI console, navigate to your lakehouse-files bucket in OCI.
- b. Select the folder —> order and click the three vertical dots.
![Select folder](./images/storage-delivery-orders-folder.png "storage delivery order folder")
- c. Click on ‘Create Pre-Authenticated Request’
- d. Click to select the ‘Objects with prefix’ option under ‘PreAuthentcated Request Target’.
- e. Leave the ‘Access Type’ option as-is: ‘Permit object reads on those with the specified prefix’.
- g. Click to select the ‘Enable Object Listing’ checkbox.
- h. Click the ‘Create Pre-Authenticated Request’ button.
![Create Folder PAR](./images/storage-delivery-orders-folder-page.png "storage delivery order folder page")
- i. Click the ‘Copy’ icon to copy the PAR URL.
- j. Save the generated PAR URL; you will need it later.
- k. You can test the URL out by pasting it in your browser. It should return output like this:
![List folder file](./images/storage-delivery-orders-folder-list.png "storage delivery order folder list")
3. Since we have already created the table, we will not run Autopilot again. Instead we will simply go ahead and change the table definition to point it to this new PAR URL as the table source.
4. Copy this command and replace the **(PAR URL)** with the one you saved earlier. It will be the source for the DELIVERY table:
```bash
<copy>ALTER TABLE `mysql_customer_orders`.`delivery_orders` ENGINE_ATTRIBUTE='{"file": [{"par": "(PAR URL)"}], "dialect": {"format": "csv", "field_delimiter": "\\t", "record_delimiter": "\\n"}}'; </copy>
```
5. Your command should look like the following example. Now Execute your modified command
![autopilot alter table](./images/alter-table.png "autopilot alter table")
**Output**
![Add data to Table](./images/load-all-delivery-table.png "load all delivery table")
6. Load data into the DELIVERY table:
```bash
<copy>alter table delivery_orders secondary_load;</copy>
```
7. View the number of rows in the DELIVERY table:
```bash
<copy>select count(*) from delivery_orders;</copy>
```
The DELIVERY table now has 34 million rows.
8. Output of steps 6 and 7
![Add data to tabel](./images/load-final-delivery-table.png "load final delivery table")
7. Your DELIVERY table is now ready to be used in queries with other tables.
You may now **proceed to the next lab**
Expand Down
2 changes: 1 addition & 1 deletion heatwave-lakehouse/workshops/freetier/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
},

{
"title": "Lab 1: Create Compartment, VCN and MySQL HeatWave DB System while loading DB Data",
"title": "Lab 1: Create Compartment, VCN and MySQL HeatWave DB System",
"filename": "../../create-heatwave-vcn-db/create-heatwave-vcn-db.md"
},

Expand Down
2 changes: 1 addition & 1 deletion heatwave-lakehouse/workshops/ocw23-freetier/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
},

{
"title": "Lab 1: Create Compartment, VCN and MySQL HeatWave DB System while loading DB Data",
"title": "Lab 1: Create Compartment, VCN and MySQL HeatWave DB ",
"filename": "../../create-heatwave-vcn-db/create-heatwave-vcn-db.md"
},

Expand Down

0 comments on commit 9ba8f73

Please sign in to comment.