Skip to content

Latest commit

 

History

History
117 lines (93 loc) · 8.94 KB

import-to-tidb.md

File metadata and controls

117 lines (93 loc) · 8.94 KB

How to import tpcc data to TiDB

Currently if you want to import tpcc dataset into the database via go-tpc, you have two ways:

  1. Using the go-tpc prepare to load data to DB directly, this is valid for both MySQL and TiDB.
  2. Output the data into CSV files, then import the CSV files to TiDB with lightning (TiDB only)

This document will explain how to use the above ways to import data into TiDB. For simplicity, we will start a small TiDB cluster using tiup playground.

~ tiup playground
Starting /Users/yeya24/.tiup/components/playground/v0.0.9/playground
Playground Bootstrapping...
Starting /Users/yeya24/.tiup/components/pd/v4.0.0-rc.1/pd-server --name=pd-0 --data-dir=/Users/yeya24/.tiup/data/RyBAqGx/pd-0/data --peer-urls=http://127.0.0.1:2380 --advertise-peer-urls=http://127.0.0.1:2380 --client-urls=http://127.0.0.1:2379 --advertise-client-urls=http://127.0.0.1:2379 --log-file=/Users/yeya24/.tiup/data/RyBAqGx/pd-0/pd.log --initial-cluster=pd-0=http://127.0.0.1:2380
Starting /Users/yeya24/.tiup/components/tidb/v4.0.0-rc.1/tidb-server -P 4000 --store=tikv --host=127.0.0.1 --status=10080 --path=127.0.0.1:2379 --log-file=/Users/yeya24/.tiup/data/RyBAqGx/tidb-0/tidb.log
Starting /Users/yeya24/.tiup/components/tikv/v4.0.0-rc.1/tikv-server --addr=127.0.0.1:20160 --status-addr=127.0.0.1:20180 --pd=http://127.0.0.1:2379 --config=/Users/yeya24/.tiup/data/RyBAqGx/tikv-0/tikv.toml --data-dir=/Users/yeya24/.tiup/data/RyBAqGx/tikv-0/data --log-file=/Users/yeya24/.tiup/data/RyBAqGx/tikv-0/tikv.log
.....
CLUSTER START SUCCESSFULLY, Enjoy it ^-^
To connect TiDB: mysql --host 127.0.0.1 --port 4000 -u root
To view the dashboard: http://127.0.0.1:2379/dashboard

Load data directly

This way is very easy and straightforward. However, it will be slower than the second way because it just executes the SQL Insert statement at the database.

go-tpc tpcc prepare --warehouses 100 -D test -H 127.0.0.1 -P 4000 -T 10

Here the warehouses flag means the data size, the -T flag means how many threads we want to use for loading data.

Using CSV and lightning

This method includes two steps:

  1. Export CSV files
  2. Import CSV files to TiDB using lightning

Export CSV files

go-tpc tpcc prepare --warehouses 100 -D test -H 127.0.0.1 -P 4000 -T 16 --output-type csv --output-dir csv/

In order to export CSV files, we need to specify two flags here. --output-type is what type of file we want to export, and --output-dir is the directory of the exported CSV files.

ls csv/
test.customer.0.csv    test.district.14.csv   test.history.6.csv     test.order_line.1.csv  test.orders.15.csv     test.stock.7.csv
test.customer.1.csv    test.district.15.csv   test.history.7.csv     test.order_line.10.csv test.orders.2.csv      test.stock.8.csv
test.customer.10.csv   test.district.2.csv    test.history.8.csv     test.order_line.11.csv test.orders.3.csv      test.stock.9.csv
test.customer.11.csv   test.district.3.csv    test.history.9.csv     test.order_line.12.csv test.orders.4.csv      test.warehouse.0.csv
test.customer.12.csv   test.district.4.csv    test.item.0.csv        test.order_line.13.csv test.orders.5.csv      test.warehouse.1.csv
test.customer.13.csv   test.district.5.csv    test.new_order.0.csv   test.order_line.14.csv test.orders.6.csv      test.warehouse.10.csv
test.customer.14.csv   test.district.6.csv    test.new_order.1.csv   test.order_line.15.csv test.orders.7.csv      test.warehouse.11.csv
test.customer.15.csv   test.district.7.csv    test.new_order.10.csv  test.order_line.2.csv  test.orders.8.csv      test.warehouse.12.csv
test.customer.2.csv    test.district.8.csv    test.new_order.11.csv  test.order_line.3.csv  test.orders.9.csv      test.warehouse.13.csv
test.customer.3.csv    test.district.9.csv    test.new_order.12.csv  test.order_line.4.csv  test.stock.0.csv       test.warehouse.14.csv
test.customer.4.csv    test.history.0.csv     test.new_order.13.csv  test.order_line.5.csv  test.stock.1.csv       test.warehouse.15.csv
test.customer.5.csv    test.history.1.csv     test.new_order.14.csv  test.order_line.6.csv  test.stock.10.csv      test.warehouse.2.csv
test.customer.6.csv    test.history.10.csv    test.new_order.15.csv  test.order_line.7.csv  test.stock.11.csv      test.warehouse.3.csv
test.customer.7.csv    test.history.11.csv    test.new_order.2.csv   test.order_line.8.csv  test.stock.12.csv      test.warehouse.4.csv
test.customer.8.csv    test.history.12.csv    test.new_order.3.csv   test.order_line.9.csv  test.stock.13.csv      test.warehouse.5.csv
test.customer.9.csv    test.history.13.csv    test.new_order.4.csv   test.orders.0.csv      test.stock.14.csv      test.warehouse.6.csv
test.district.0.csv    test.history.14.csv    test.new_order.5.csv   test.orders.1.csv      test.stock.15.csv      test.warehouse.7.csv
test.district.1.csv    test.history.15.csv    test.new_order.6.csv   test.orders.10.csv     test.stock.2.csv       test.warehouse.8.csv
test.district.10.csv   test.history.2.csv     test.new_order.7.csv   test.orders.11.csv     test.stock.3.csv       test.warehouse.9.csv
test.district.11.csv   test.history.3.csv     test.new_order.8.csv   test.orders.12.csv     test.stock.4.csv
test.district.12.csv   test.history.4.csv     test.new_order.9.csv   test.orders.13.csv     test.stock.5.csv
test.district.13.csv   test.history.5.csv     test.order_line.0.csv  test.orders.14.csv     test.stock.6.csv

After exporting the files, we can check them in the directory. Here all CSV files conform to the naming scheme <db name>.<table name>.<thread number>.csv.

Please note that no matter how many threads you are using, there is only test.item.0.csv for the item table since we only use one thread to create that.

Import data using lightning

Since Tiup doesn't support lightning so far, we have to download the binary somewhere or build it from source. For simplicity, this document will not include that part, please refer to lightning doc for more details.

With the lightning binary, then it is easy to import data. We also provide an example config for lightning. You can just execute the command below:

lightning -c tidb-lightning.toml

Please note that:

  1. This example config uses tidb as lightning backend instead of importer. Please update related configs if you are using importer.
  2. Please change data-source-dir field to the CSV directory you set in the previous step.
  3. Please update the tidb section if you have a different set up.

For the status of the import process, please check tidb-lightning.log, if you see the logs below, then it is perfect!

[2020/05/03 12:51:25.004 -04:00] [INFO] [backend.go:265] ["engine close start"] [engineTag=`test`.`stock`:-1] [engineUUID=5565f8ab-07bc-5dfb-a64c-717945dd3a64]
[2020/05/03 12:51:25.004 -04:00] [INFO] [backend.go:267] ["engine close completed"] [engineTag=`test`.`stock`:-1] [engineUUID=5565f8ab-07bc-5dfb-a64c-717945dd3a64] [takeTime=210ns] []
[2020/05/03 12:51:25.004 -04:00] [INFO] [restore.go:1422] ["import and cleanup engine start"] [engineTag=`test`.`stock`:-1] [engineUUID=5565f8ab-07bc-5dfb-a64c-717945dd3a64]
[2020/05/03 12:51:25.004 -04:00] [INFO] [backend.go:279] ["import start"] [engineTag=`test`.`stock`:-1] [engineUUID=5565f8ab-07bc-5dfb-a64c-717945dd3a64] [retryCnt=0]
[2020/05/03 12:51:25.004 -04:00] [INFO] [backend.go:282] ["import completed"] [engineTag=`test`.`stock`:-1] [engineUUID=5565f8ab-07bc-5dfb-a64c-717945dd3a64] [retryCnt=0] [takeTime=304ns] []
[2020/05/03 12:51:25.004 -04:00] [INFO] [backend.go:294] ["cleanup start"] [engineTag=`test`.`stock`:-1] [engineUUID=5565f8ab-07bc-5dfb-a64c-717945dd3a64]
[2020/05/03 12:51:25.004 -04:00] [INFO] [backend.go:296] ["cleanup completed"] [engineTag=`test`.`stock`:-1] [engineUUID=5565f8ab-07bc-5dfb-a64c-717945dd3a64] [takeTime=189ns] []
[2020/05/03 12:51:25.004 -04:00] [INFO] [restore.go:1429] ["import and cleanup engine completed"] [engineTag=`test`.`stock`:-1] [engineUUID=5565f8ab-07bc-5dfb-a64c-717945dd3a64] [takeTime=54.46µs] []
[2020/05/03 12:51:25.004 -04:00] [INFO] [restore.go:602] ["restore table completed"] [table=`test`.`stock`] [takeTime=29.720372962s] []
[2020/05/03 12:51:25.004 -04:00] [INFO] [restore.go:697] ["restore all tables data completed"] [takeTime=38.919570374s] []
[2020/05/03 12:51:25.004 -04:00] [INFO] [restore.go:475] ["everything imported, stopping periodic actions"]
[2020/05/03 12:51:25.004 -04:00] [INFO] [restore.go:1072] ["skip full compaction"]
[2020/05/03 12:51:25.014 -04:00] [INFO] [restore.go:1241] ["clean checkpoints start"] [keepAfterSuccess=false] [taskID=1588524646072446000]
[2020/05/03 12:51:25.014 -04:00] [INFO] [restore.go:1248] ["clean checkpoints completed"] [keepAfterSuccess=false] [taskID=1588524646072446000] [takeTime=152.037µs] []
[2020/05/03 12:51:25.014 -04:00] [INFO] [restore.go:283] ["the whole procedure completed"] [takeTime=38.936017956s] []
[2020/05/03 12:51:25.014 -04:00] [INFO] [main.go:77] ["tidb lightning exit"]