Skip to content

Using Oligos Guide

carabolic edited this page Oct 15, 2014 · 9 revisions

About Oligos

Oligos is a profiling tool that extracts a Myriad data generator specification from available meta-information of a reference database (you can think of the reference database as a gold standard that the Myriad data generator should approximate). To do that Oligos collects schema information, as well as database statistics such as table and column cardinality, most frequent values, distribution statistics, and converts this information to create a Myriad XML Specification.

Using Oligos from the Myriad Assistant Tool

In order to do so you need to run the compile:oligos assistant tool task from your Myriad project. The only prerequisite to run the compile:oligos task is to configure the path to the specific JDBC driver for your database (currently, the only database that we support is DB2, but in the future we plan to add support for other widely used databases). There are two ways to configure the path to your JDBC driver:

  • add the path to the $CLASSPATH environment variable, or
  • set the path to the MYRIAD_OLIGOS_CP property in the .myriad-settings file of your Myriad project.

Once you have done that, you can start using Oligos from your Myriad project! The basic syntax of the compile:oligos task is a follows:

myriad-assistant compile:oligos -h [HOST] -P [PORT] -D [DATABASE] -u [USER] -p [SCHEMA_SPEC]

The task has the following (optional) parameters:

  • -j,--jdbc Driver Vendor - the vendor of the JDBC driver (oracle or db2)
  • -h,--host HOST - the hostname of your database,
  • -P,--port PORT - the database port,
  • -D,--database DATABASE - the name of the database,
  • -u,--username USERNAME - the name the database user, and
  • -p - a boolean flag indicating that the specified database user requires a passport for authentication.

The SCHEMA_SPEC parameter is a sequence of schema specifications that defines which schemata, tables, and columns should be profiled by the Oligos task. The SCHEMA_SPEC parameter has the following syntax (given in EBNF):

SCHEMA_SEQUENCE   = SCHEMA_DEF { "," SCHEMA_DEF }
SCHEMA_DEF        = SCHEMA_ID [ "(" TABLE_SEQUENCE ")" ]
TABLE_SEQUENCE    = TABLE_DEF { "," TABLE_DEF }
TABLE_DEF         = TABLE_ID [ "(" COLUMN_SEQUENCE ")" ]
COLUMN_SEQUENCE   = COLUMN_ID { "," COLUMN_ID }

A schema specification consists of at least one SCHEMA_ID, which is the name of the schema you want to profile. Optionally, a SCHEMA_ID is followed by a sequence of table definitions enclosed in parentheses and separated by comma. Each TABLE_DEFINITION in turn contains a mandatory TABLE_ID and an optional sequence of COLUMN_IDs. Omitting the TABLE_SEQUENCE or COLUMN_SEQUENCE clause is interpreted as a wildcard to profile all tables (resp. columns).

Examples

Take a look at the following examples of some concrete calls of the compile:oligos task. All of the examples use the schema of the TPC-H benchmark. The schema is as follows:

![TPCH Schema](https://www.ki.informatik.hu-berlin.de/wbi/teaching/archive/sose04/fosem/tpch-schema.png/image)

We assume that the database is running IBM DB2 on localhost on port 60000, the name of the database is TPCH, and both the username and the default user schema are DB2INST1.

The first example will generate a data generator specification only for the O_ORDERDATE and O_TOTALPRICE columns from the ORDERS table:

myriad-assistant compile:oligos -j db2 -h localhost -P 60000 -D TPCH -u DB2INST1 -p "DB2INST1(ORDERS(O_ORDERDATE,O_TOTALPRICE))"

The next example will generate a specification for the ORDERS and CUSTOMER tables using all columns in these two tables:

myriad-assistant compile:oligos -j db2 -h localhost -P 60000 -D TPCH -u DB2INST1 -p "DB2INST1 (ORDERS, CUSTOMER)"

The third example gives the most general schema specification, where Oligos will consider all tables and all columns located in the DB2INST1:

myriad-assistant compile:oligos -j db2 -h localhost -P 60000 -D TPCH -u DB2INST1 -p "DB2INST1"
Clone this wiki locally