You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've started using Algebird in conjunction with Google Dataflow, which works great! My main data structure is a deeply nested case class that is implicitly a monoid – it contains a bunch of stuff, mainly time-sorted Max values (such that I only keep the most recent version of some value).
My needs are a bit specific, though, and the built-in time window aggregation semantics are causing some complications. I'd like to ask if you have any experience implementing windowed aggregations using monoids, or whether you'd be interested in accepting such an implementation were I to write one.
My current idea is loosely based on TopK – let's just call it SlidingWindow. It keeps k "slots", each containing an instance of another monoid and a timestamp (the timestamp will be normalized according to the period of the time window, e.g. start-of-day). When adding two of these together, each slot that maps to the same timestamp will be added together, but only the k most recent slots will be retained. There would be a method that returned the "sum" of all the slots, representing, say, the total count of some event during the last k days/minutes/whatever.
Does this design make sense? I've tried looking, but I have seen little mention of time as a factor here, although it's critical in many streaming applications.
Sorry for the bother if this is the wrong place to ask.
The text was updated successfully, but these errors were encountered:
I've started using Algebird in conjunction with Google Dataflow, which works great! My main data structure is a deeply nested case class that is implicitly a monoid – it contains a bunch of stuff, mainly time-sorted Max values (such that I only keep the most recent version of some value).
My needs are a bit specific, though, and the built-in time window aggregation semantics are causing some complications. I'd like to ask if you have any experience implementing windowed aggregations using monoids, or whether you'd be interested in accepting such an implementation were I to write one.
My current idea is loosely based on TopK – let's just call it SlidingWindow. It keeps k "slots", each containing an instance of another monoid and a timestamp (the timestamp will be normalized according to the period of the time window, e.g. start-of-day). When adding two of these together, each slot that maps to the same timestamp will be added together, but only the k most recent slots will be retained. There would be a method that returned the "sum" of all the slots, representing, say, the total count of some event during the last k days/minutes/whatever.
Does this design make sense? I've tried looking, but I have seen little mention of time as a factor here, although it's critical in many streaming applications.
Sorry for the bother if this is the wrong place to ask.
The text was updated successfully, but these errors were encountered: