Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregators to improve data access across many pods, a social media perspective #99

Open
maartyman opened this issue Feb 23, 2023 · 0 comments
Labels
challenge technical problem applied to a use case proposal: approved ✅

Comments

@maartyman
Copy link

maartyman commented Feb 23, 2023

Pitch

This challenge is an extension of Challenge 24. Applications that require to aggregate data across many pods can be faced with slow response times due to the latency of data retrieval and processing of the large number of pods. This is typically the case in a social media scenario, where the timelines of their users are curated based on the activities of their contacts. Computing these timelines when the users access their social media applications is typically not feasible due to latency constraints. Therefore, the timelines should be precomputed as a form of aggregation. The SolidBench.js benchmark will be used to simulate data pods with social media data.

Desired solution

This challenge has the same desired solution as challenge 24, with the change that instead of re-evaluating the query when resources change, you should use incremental query evaluation techniques. To complete this challenge, comunica should be altered, so it can guard the resources and calculate the changes in the query results based on the changes in the resources.

Guarding means checking the resources for changes, this can be done by pushing (websockets v0.1) or pulling (polling). When the resources change, comunica should determine the added and deleted triples. These added and deleted triples can then be used in the query engine to determine the changes in the query result. This can be done with incremental query techniques and incremental SPARQL operators, these calculate the changes in the operator result based on the changes of the input.

Acceptance criteria

Show the latency improvement (the time between changes in the data and changes in the query results) for an aggregator that re-executes the query and one that uses the incremental approach. Show this with the SolidBench (https://github.com/SolidBench/SolidBench.js) benchmark.

A demo that showcases this solution would need to be able to:

  • Add elements (comments, posts, friends, noise, ...) to a CSS that is running with SolidBench data.
  • Show that when these elements are added that the aggregator updates its result. Additionally, show the difference in latency between the two techniques (query redo and incremental) for a variety of queries.

Assumptions

As the topic of aggregation is still a novel research topic, a number of assumptions were taken:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge technical problem applied to a use case proposal: approved ✅
Projects
None yet
Development

No branches or pull requests

3 participants