Skip to content

Latest commit

 

History

History
20 lines (12 loc) · 641 Bytes

README.md

File metadata and controls

20 lines (12 loc) · 641 Bytes

Dominance-based queries on Apache Spark.

Skyline queries are a popular and powerful paradigm for extracting interesting objects from a multi-dimensional dataset. Given a set D of d-dimensional objects (or points), the skyline set of R is the set of Pareto-optimal, or undominated, points in D

Algorithms

  1. Skyline query based on the Sort Filter Skyline (SFS) algorithm.

  2. Top-k dominating based on the Skyline-based Top-k Dominating (STD).

  3. Top-k dominating on Skyline

Datasets

There are 4 distributions of synthetic datasets to run the algorithms, from 2-d to 10-d.

  1. Correlated
  2. Uniform
  3. Normal
  4. Anti-correlated