Replacing Redis with direct holepunched communication #10

rdettai · 2023-02-01T14:40:33Z

rdettai
Feb 1, 2023

I am the main author of the https://github.com/cloudfuse-io/lambdatization repository. I would like to get in touch with you regarding your decision of using Redis as a communication medium. I think this defeats the point of having a serverless compute engine and might not play ways with queries that need a very large shuffling between some stages (sorts, joins...). I was wondering if you knew about the research at ETH Zurich about the possibility to create direct connection between lambdas using TCP hole punching: https://arxiv.org/pdf/2202.06646.pdf

ghalimi · 2023-02-01T16:40:43Z

ghalimi
Feb 1, 2023
Maintainer

@rdettai This is really, really cool. We still need Redis for synchronization, but adding something like that as a shortcut could be really powerful. Would you be in a position to make an introduction to that team by any chance? I would love to work with them. I have many good friends who came from ETH, and they tend to be really good at what they do...

1 reply

rdettai Feb 2, 2023
Author

We still need Redis for synchronization

I would be curious to know why it needs to be Redis. I find it really a pity to break the serverless model by using long running Amazon ElastiCache instances 🙂. I understand your design required something beefy for shuffling, but if you reduced its scope just to synchronization, are you sure something more lightweight like SQS+SNS wouldn't make the cut?

Would you be in a position to make an introduction to that team by any chance?

Not really, I don't have a tight relationship with them yet.

After a bit of thought, I also realized that because you are building your own engine, you don't necessarily need the full scope of what Boxer provides. In particular you don't need the interposition library that makes the hole punching transparent. So you might also be interested in the work of another group at ETH: http://spcl.inf.ethz.ch/Publications/.pdf/2022_copik_serverless_collectives_report.pdf. They worked on hole punching as well, but from a message passing perspective, and the big advantage is that their work is actually open source (https://github.com/OpenCoreCH/FMI).

Last point, I am currently working on a re-implementation of something heavily inspired by the Boxer paper. It is still at a very early stage and not yet published, but I hope I'll have something to showcase before the end of the month.

Feel free to reach out in private if you want to discuss these further.

ghalimi · 2023-02-02T08:27:55Z

ghalimi
Feb 2, 2023
Maintainer

@rdettai I would not worry too much about Redis not being purely serverless. We must be pragmatic there. As long as it's managed by the Cloud Provider, it's good enough. Shuffling is critical if you want to support interesting SQL queries, and SNS+SQS have unacceptable latencies (100ms or more, versus submillisecond for Redis). Redis will give us a very solid foundation to build upon, at a very low cost. Once we have the platform running, we can spend time optimizing things on that front. But if we try to build everything at once, we won't deliver anything good anytime soon. Let's not re-invent the wheel...

Thanks a ton for you help. I will reach out directly.

1 reply

rdettai Feb 2, 2023
Author

Shuffling is critical if you want to support interesting SQL queries

Yes agreed, I was considering the case where shuffling would be managed by direct TCP hole punched communication.

Once we have the platform running, we can spend time optimizing things on that front

Makes a lot of sense. The FMI project I shared with you above seems to be incredibly inline with what you are doing, provided that you manage to to abstract yourself away from the HPC lingo 😄

ghalimi · 2023-02-02T10:48:58Z

ghalimi
Feb 2, 2023
Maintainer

Thanks!

0 replies

mcopik · 2023-02-16T23:26:58Z

mcopik
Feb 16, 2023

@ghalimi @rdettai I'm one of the authors of the FMI project, and I am the current maintainer of the project since the student working with us moved into a different domain. While our work focuses primarily on the high-performance computing domain, other use cases, such as data processing and analytics, are also close to ours, and we want to explore them one day. In the end, many parallel computing principles stay the same across domains :-)

Shuffling and data exchange should benefit from the latency and bandwidth of direct connections. It also fits use cases that might not require persistent data storage, e.g., Spark RDDs and other intermediate containers. FMI is focused on collective operations for parallel computing, but it should be relatively easy to adjust it towards more generic operations; we support P2P communication natively.

Please let me know if you have any questions, and find the project interesting and useful for your product, even at later stages when optimizing Redis-based communication. I would be happy to discuss it further and hear about your experiences in building distributed computations on top of serverless functions.

1 reply

ghalimi Feb 17, 2023
Maintainer

@mcopik I think you're right. We should definitely consider this. I will reach out directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing Redis with direct holepunched communication #10

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Replacing Redis with direct holepunched communication #10

rdettai Feb 1, 2023

Replies: 4 comments · 3 replies

ghalimi Feb 1, 2023 Maintainer

rdettai Feb 2, 2023 Author

ghalimi Feb 2, 2023 Maintainer

rdettai Feb 2, 2023 Author

ghalimi Feb 2, 2023 Maintainer

mcopik Feb 16, 2023

ghalimi Feb 17, 2023 Maintainer

rdettai
Feb 1, 2023

Replies: 4 comments 3 replies

ghalimi
Feb 1, 2023
Maintainer

rdettai Feb 2, 2023
Author

ghalimi
Feb 2, 2023
Maintainer

rdettai Feb 2, 2023
Author

ghalimi
Feb 2, 2023
Maintainer

mcopik
Feb 16, 2023

ghalimi Feb 17, 2023
Maintainer