Persistent storage is:
- Large, cheap, relatively slow, accessed in blocks.
- Used for long-term storage of data.
Computational storage is:
- Small, expensive, fast, accessed by byte/word.
- Used for all analysis of data.
RAM | HDD | SDD | |
---|---|---|---|
Read latency | ~1 micro | ~10 milli | ~50 micro |
Write latency | ~1 micro | ~10 milli | ~900 micro |
Read unit | byte | block (e.g. 1 KB) | byte |
Writing | byte | write a block | write on empty block |
Aims of storage management in DBMS:
- Provide view of data as collection of pages/tuples.
- Map from database objects (e.g. tables) to disk files.
- Manage transfer of data to/from disk storage.
- Use buffers to minimize disk/memory transfers.
- Interpret loaded data as tuples/records.
- Basis for file structures used by access methods.
DB
(handle on an authorized/opened database).Rel
(handle on an opened relation).Page
(memory buffer to hold contents of disk block).Tuple
(memory holding data values from one tuple).
Addressing in DBMSs:
PageID = FileID + Offset
identifies a block of data.- Where
Offset
gives location of block within file.
- Where
TupleId = PageId + Index
identifies a single tuple.- Where
Index
gives location of tuple within page.
- Where
- Disks and files.
- Performance issues and organization of disk files.
- Buffer management.
- Using caching to improve DBMS system throughput.
- Tuple/Page management.
- How tuples are represented within disk pages.
- DB Object Management (Catalog).
- How tables/views/functions/types, etc are represented.
Important aspects in determining costs of DB operations:
- Data is always transferred to/from disk as whole blocks (pages).
- Cost of manipulating tuples in memory is negligible.
- Overall cost determined primarily by #data-blocks read/written.
Complicating factors in determining costs:
- Not all page accesses require disk access (buffer pool).
- Tuples typically have variable size (tuples/page?).