Push-model data - How does OPAL handle when an OPA container is restarted or gets out-of-sync with source system #385
Replies: 3 comments 7 replies
-
HI @phil-lee-kb - this is quite a bit to unpack here; I'll try my best to cover all the key points.
OPA in bets of cases can handle up to 5GB of data, 2GB is already a struggle in most cases.
To be frank, I think the pull model in OPA is very bad - I wouldn't touch it unless you really have to.
Cool, OPAL can listen in directly to Kafka- https://docs.opal.ac/tutorials/run_opal_with_kafka#triggering-events-directly-from-kafka
OPA instances don't handle any of this - that's where OPAL comes in.
As I wrote , OPA just needs a subset of your data- the one relevant for the policy (you can transform the data as you need using custom data fetchers) .
Yes, basically - you can manage these pods / clusters via the health checks opal exposes (just don't forget to turn them on)
It assumes that state is lost and loads everything afresh according to OPAL has a lot of interesting little elements in it's design, which address most concerns people have when starting to think about the problem space; I suggest going through the tutorials one by one and gradually learning about OPAL. Hope this helps, |
Beta Was this translation helpful? Give feedback.
-
The pull model is very risky, you couple the stability, latency, performance, and availability of your services to a datasource (e.g. an application data SQL database) in one of your most critical chains. This can make the entire behavior of your application unpredictable - this is extremely apparent when you take into account that most of these data-sources weren't meant for critical performance, but just your run of the mill application data queries. Even something simple like changing schema or indexing on such a source can throw the entire thing out of balance.
In the future yes; now you don't have to have all the data in one static source, but you do need the ability to point the client to multiple sources to fetch in aggregate the data that would result in the up to date picture. |
Beta Was this translation helpful? Give feedback.
-
One more question - whilst I can see the benefits of having OPAL pull directly from a git repo for policy, in our organisation, it's likely we would require a separate release process for policies. I.e. policy changes get made and merged into main, but we want to have them release via a separate pipeline. We've done a POC of that with vanilla-opa using the bundle api, where we deploy policies via a separate pipeline and then OPA picks them up in due course. I was wondering if OPAL supports this. I.e. disable OPALs auto-retrieval of policies from a git repo, and instead leave OPA to retrieve policy from a separately updated bundle api (either something like nginx, or directly from S3 or another http-accessible file store) |
Beta Was this translation helpful? Give feedback.
-
Hi there,
I'm just looking into OPAL as an option for the push-model scenario where we have relatively large (5Gb) source database that gets updated in real-time, and we're looking at whether to use the pull-model in OPA, or use OPAL to handle the push-model. We would likely use Kafka to mediate between the source db and our cloud services such as OPAL, meaning there would be a Kafka topic containing all events from the db to read from.
Something I'm finding it hard to understand is how OPA instances handle startup and restart scenarios under the push-model, and also where there might be, say, network issues which result in the OPA instance not having the latest data. This also applies if we have OPAL in the picture. If we consider the scenarios one by one:
Happy to be pointed at the relevant sections of the documentation if these are already answered :-)
Thanks,
Phil
Beta Was this translation helpful? Give feedback.
All reactions