-
Notifications
You must be signed in to change notification settings - Fork 6
Using Oligos Guide
Oligos is a profiling tool that extracts a Myriad data generator specification from available meta-information of a reference database (you can think of the reference database as a gold standard that the Myriad data generator should approximate). To do that Oligos collects schema information, as well as database statistics such as table and column cardinality, most frequent values, distribution statistics, and converts this information to create a Myriad XML Specification.
In order to do so you need to run the compile:oligos
assistant tool task from your Myriad project. The only prerequisite to run the compile:oligos
task is to configure the path to the specific JDBC driver for your database (currently, the only database that we support is DB2, but in the future we plan to add support for other widely used databases). There are two ways to configure the path to your JDBC driver:
- add the path to the
$CLASSPATH
environment variable, or - set the path to the
MYRIAD_OLIGOS_CP
property in the.myriad-settings
file of your Myriad project.
Once you have done that, you can start using Oligos from your Myriad project! The basic syntax of the compile:oligos
task is a follows:
myriad-assistant compile:oligos -h [HOST] -P [PORT] -D [DATABASE] -u [USER] -p [SCHEMA_SPEC]
The task has the following (optional) parameters:
-
-j,--jdbc Driver Vendor
- the vendor of the JDBC driver (oracle or db2) -
-h,--host HOST
- the hostname of your database, -
-P,--port PORT
- the database port, -
-D,--database DATABASE
- the name of the database, -
-u,--username USERNAME
- the name the database user, and -
-p
- a boolean flag indicating that the specified database user requires a passport for authentication.
The SCHEMA_SPEC
parameter is a sequence of schema specifications that defines which schemata, tables, and columns should be profiled by the Oligos task. The SCHEMA_SPEC
parameter has the following syntax (given in EBNF):
SCHEMA_SEQUENCE = SCHEMA_DEF { "," SCHEMA_DEF }
SCHEMA_DEF = SCHEMA_ID [ "(" TABLE_SEQUENCE ")" ]
TABLE_SEQUENCE = TABLE_DEF { "," TABLE_DEF }
TABLE_DEF = TABLE_ID [ "(" COLUMN_SEQUENCE ")" ]
COLUMN_SEQUENCE = COLUMN_ID { "," COLUMN_ID }
A schema specification consists of at least one SCHEMA_ID
, which is the name of the schema you want to profile. Optionally, a SCHEMA_ID
is followed by a sequence of table definitions enclosed in parentheses and separated by comma. Each TABLE_DEFINITION
in turn contains a mandatory TABLE_ID
and an optional sequence of COLUMN_IDs
. Omitting the TABLE_SEQUENCE
or COLUMN_SEQUENCE
clause is interpreted as a wildcard to profile all tables (resp. columns).
Take a look at the following examples of some concrete calls of the compile:oligos
task. All of the examples use the schema of the TPC-H benchmark. The schema is as follows:
We assume that the database is running IBM DB2 on localhost on port 60000, the name of the database is TPCH
, and both the username and the default user schema are DB2INST1
.
The first example will generate a data generator specification only for the O_ORDERDATE
and O_TOTALPRICE
columns from the ORDERS
table:
myriad-assistant compile:oligos -j db2 -h localhost -P 60000 -D TPCH -u DB2INST1 -p "DB2INST1(ORDERS(O_ORDERDATE,O_TOTALPRICE))"
The next example will generate a specification for the ORDERS
and CUSTOMER
tables using all columns in these two tables:
myriad-assistant compile:oligos -j db2 -h localhost -P 60000 -D TPCH -u DB2INST1 -p "DB2INST1 (ORDERS, CUSTOMER)"
The third example gives the most general schema specification, where Oligos will consider all tables and all columns located in the DB2INST1
:
myriad-assistant compile:oligos -j db2 -h localhost -P 60000 -D TPCH -u DB2INST1 -p "DB2INST1"