graphameleon-ds: a RDF dataset for process mining on Web navigation traces. The dataset comes from the Graphameleon Web extension.
The following dataset was built using the Graphameleon Web extension, an open-source plug-in that captures Web navigation traces and transforms them into a RDF graph for further exploration (eg., process-mining of navigation traces, Web browser and server behavior analysis, network topology analysis).
The RDF dataset implements the concepts of micro-activity and macro-activity (see below) on the basis of the UCO (Unified Cyber Ontology) vocabulary for the semantic representation of user activities. UCO is a popular community-developed ontology built around the cyber security domain, covering a wide range of important concepts such as agents, resources, or actions. Since UCO is also used in NORIA-O, an ontology enabling to describe a IT network, one could establish meaningful connections and access additional contextual information within a knowledge graph combining Graphameleon data and network topology data.
The data in the sub-folders are describing:
-
exp-01: RDF triples generated by Graphameleon during the initial connection to a set of websites. We refer to this data as the "Website complexity clustering" experiment in which we sought to understand to what extent the behavior of a website during a first connection is crucial in creating a usable footprint subsequently for anomaly detection.
-
exp-02: RDF triples generated by Graphameleon during three pre-defined Web navigation scenarios on a simulated online bookstore website. We refer to this data as the "Navigation trace classification" experiment in which we sought to classify the Web navigation traces as either normal or abnormal behaviors.
The typical use of the provided data corresponds to:
- Cloning the repository to your computer,
- Query the data with SPARQL queries (eg., using the Apache Jena CLI toolset),
- Analyse the user activities with higher-level tools (eg., using the PM4Py Python package).
Semantic representation of a macro-activity:
Semantic representation of a micro-activity:
If you use this dataset in a scientific publication, please cite:
Lionel Tailhardat, Benjamin Stach, Yoan Chabot, and Raphaël Troncy. 2024. Graphameleon: Relational Learning and Anomaly Detection on Web Navigation Traces Captured as Knowledge Graphs. In The Web Conference 2024, WWW '24, Singapore, May 13--17, 2024, Proceedings. https://doi.org/10.1145/3589335.3651447
BibTex format:
@inproceedings{graphemeleon-2024,
title = {{Graphameleon: Relational Learning and Anomaly Detection on Web Navigation Traces Captured as Knowledge Graphs}},
author = {{Lionel Tailhardat} and {Benjamin Stach} and {Yoan Chabot} and {Rapha\"el Troncy}},
booktitle = {{The Web Conference 2024, WWW '24, Singapore, May 13--17, 2024, Proceedings}},
year = {2024},
doi = {10.1145/3589335.3651447}
}
Copyright (c) 2023, Orange. All rights reserved.