Aggregators to improve data access across many pods, a social media perspective #99

maartyman · 2023-02-23T09:11:02Z

Pitch

This challenge is an extension of Challenge 24. Applications that require to aggregate data across many pods can be faced with slow response times due to the latency of data retrieval and processing of the large number of pods. This is typically the case in a social media scenario, where the timelines of their users are curated based on the activities of their contacts. Computing these timelines when the users access their social media applications is typically not feasible due to latency constraints. Therefore, the timelines should be precomputed as a form of aggregation. The SolidBench.js benchmark will be used to simulate data pods with social media data.

Desired solution

This challenge has the same desired solution as challenge 24, with the change that instead of re-evaluating the query when resources change, you should use incremental query evaluation techniques. To complete this challenge, comunica should be altered, so it can guard the resources and calculate the changes in the query results based on the changes in the resources.

Guarding means checking the resources for changes, this can be done by pushing (websockets v0.1) or pulling (polling). When the resources change, comunica should determine the added and deleted triples. These added and deleted triples can then be used in the query engine to determine the changes in the query result. This can be done with incremental query techniques and incremental SPARQL operators, these calculate the changes in the operator result based on the changes of the input.

Acceptance criteria

Show the latency improvement (the time between changes in the data and changes in the query results) for an aggregator that re-executes the query and one that uses the incremental approach. Show this with the SolidBench (https://github.com/SolidBench/SolidBench.js) benchmark.

A demo that showcases this solution would need to be able to:

Add elements (comments, posts, friends, noise, ...) to a CSS that is running with SolidBench data.
Show that when these elements are added that the aggregator updates its result. Additionally, show the difference in latency between the two techniques (query redo and incremental) for a variety of queries.

Assumptions

As the topic of aggregation is still a novel research topic, a number of assumptions were taken:

Long term server-side authenticated sessions #13 have been solved and therefore the authentication part of this challenge is not taken into account.
The registered queries are SPARQL SELECT queries

maartyman added challenge technical problem applied to a use case proposal: pending ❓ labels Feb 23, 2023

maartyman assigned RubenVerborgh Feb 23, 2023

pheyvaer unassigned RubenVerborgh Mar 23, 2023

pheyvaer added proposal: approved ✅ and removed proposal: pending ❓ labels Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregators to improve data access across many pods, a social media perspective #99

Aggregators to improve data access across many pods, a social media perspective #99

maartyman commented Feb 23, 2023 •

edited by pheyvaer

Loading

Aggregators to improve data access across many pods, a social media perspective #99

Aggregators to improve data access across many pods, a social media perspective #99

Comments

maartyman commented Feb 23, 2023 • edited by pheyvaer Loading

Pitch

Desired solution

Acceptance criteria

Assumptions

maartyman commented Feb 23, 2023 •

edited by pheyvaer

Loading