DBMS core = relational engine, with implementations of:
- Selection, projection, join, set operations.
- Scanning, sorting, grouping, aggregation.
Relational operations are supported by low level functions:
- Relation openRel(db, name)
:
- Gets handle on relation name
in database db
.
- ``Page request_page(rel, pid)``:
- Get page ``pid`` from relation ``rel``, return buffer containing page.
- ``Record get_record(buf, tid)``:
- Return record ``tid`` from page ``buf``.
- ``Tuple mkTuple(rel, rec)``:
- Convert record rec to a tuple, based on ``rel`` schema.
Example of using low level functions:
// scan a relation Employees
Page p; // current page
Tuple t; // current tuple
Relation r = relOpen(db, "Employees");
for (int i = 0; i < nPages(r); i++) {
p = request_page(rel,i);
for (int j = 0; j < nRecs(p); j++)
t = mkTuple(r, get_record(p, j));
... process tuple t ...
}
}
There are two dimensions of variations to relational operations:
- Which relation operation (e.g. Sel, Proj, Join, Sort, ...).
- Which access-method (e.g. file structure: heap, indexed, hashed).
Each query method involves an operator and a file structure. Examples:
- Primary-key selection on a hashed file.
- Primary-key selection on indexed file.
- Join on ordered heap file.
- Join on hashed file.
- Two-dimensional range query on R-tree indexed file.
Two main measures of relational operation cost:
- Time Cost: total time taken to execute method.
- Affected by many factors such as:
- Speed of IO devices.
- Load on machine.
- We do not consider time cost in our analyses.
- Affected by many factors such as:
- Page Cost: number of pages read and/or written.
- Better for comparing methods:
- Identifies workload imposed by method.
- However is clearly affected by buffering.
- Better for comparing methods:
Quantities in analysis:
- A relation is a set of
r
tuples with average tuple sizeR
bytes. - Tuples are stored in
b
data pages on disk. - Each page has size
B
bytes and contains up toc
tuples. - Tuples which answer query
q
are contained inb_q
pages. - Data is transferred between disk and memory in whole pages.
- Cost of disk and memory transfer
T_r/w
is very high.