Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hub online synchronization #82

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

rjmateus
Copy link
Member

@rjmateus rjmateus commented Nov 28, 2023

This is a RFC to improve content synchronization for HUB scenarios. It also relates to Inter Server Synchronization.

Rendered version

@rjmateus rjmateus marked this pull request as draft November 28, 2023 12:08
@rjmateus rjmateus force-pushed the hub_online_sync branch 6 times, most recently from b440755 to d4594dd Compare November 28, 2023 21:33
@rjmateus rjmateus marked this pull request as ready for review November 29, 2023 12:27
@mcalmer
Copy link
Contributor

mcalmer commented Nov 30, 2023

@rjmateus Would it make sense to add also all the other ISSv2 features to this RFC and plan possible replacements? This would give a full picture but we can still do the implementation in steps. It would not prevent us to start with channels first.
But in case we need for the channels a special feature which needs to be implemented anyways for the full replacement, it would be better to know this before.

@rjmateus
Copy link
Member Author

@mcalmer Good point Michael.
I will add that to the next steps-section, but with some detail in the solution.
This HUB online synchronization will not be able to fully replace ISSv2 because of the disconnected environments. However we may think about a solution using RMT to sync data, and then export to disconnected environment (but I'm not sure if I like it)

The SUSE Manager server needs a set of metadata to be able to operate. Currently that metadata is provided by SCC directly or, in the case of PAYG, provided by the cloud RMT infrastructure. We should also provide this data in SUSE Manager HUB to be consumed by the peripheral servers.
The minimal endpoints to be provided are:
- "/suma/product_tree.json"
- "/connect/organizations/products/unscoped"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To list the correct migration path, we use upgrade_paths.json and some information from SCC in combination. We need a way to have this information on peripheral servers to in order to make product migration work correctly.

@mcalmer am I missing anything or misunderstood around this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@admd I think that data is static and coming from the file "/usr/share/susemanager/scc/upgrade_paths.json" which is part of package "susemanager-sync-data", but MC can confirm this assumption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade_paths.json is static and I think we just dropped it for 5.0 as it contains only outdated OSes.
Mainly SLES10 systems.

Copy link
Contributor

@admd admd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lovely, and I appreciate it. I only have a few questions, but aside from that, everything appears satisfactory from my perspective.

accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved
accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved
Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Comment on lines 109 to 110
We can follow a similar approach to what exists on ISSv1. On the hub side we can define multiple peripheral servers to connect to by providing the FQDN and an authentication token.
On the peripheral side we also need to define the Hub server FQDN and an Authentication token.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hub will have peripheral and generate associated auth token, right?

Why do we need to provide peripheral FQDN? Wouldn't generic peripheral name (may be FQDN) and generated token be enough? Or do we approach this as username/pass scenario?

I am assuming that connection will be always from peripheral to the hub, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be used for authentication from HUB to peripheral.
Communication will be bi-directional. Some cases will be peripheral calling hub (like synchronize software channels and calling SCC endpoints) other cases will be HUB calling the peripheral API like creating channels, pushing configuration channels, etc)

accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved
accepted/0000-hub-online-synchronization.md Outdated Show resolved Hide resolved

## Peripheral software channels creation

We need a mechanism to create the channels in the peripheral servers (vendor and CLM's) in the desired organization. The peripheral channel creation must be done automatically from the HUB server through an API. Since we are making special channel creation (defined next), those API methods should be available to server-to-server communication only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to define a little bit more in detail, how this server-to-server API should look like.
This should also say something about the exiting API namespaces sync.master and sync.slave.

  • What namespace should be used for it?
  • one namespace or multiple?
  • design it with an outlook to the future and what else needs to be added in future to this API. E.g. activation keys, config channels, images, formulas, etc.
  • how should the authentication work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Michael. I added some clarification about API namespace and use cases. I didn't add any details about the exact API methods to develop because it looks to me like it's an implementation detail.
Could you have a look if it's more clear now? Thank you

Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Signed-off-by: Ricardo Mateus <rmateus@suse.com>
Copy link
Contributor

@aaannz aaannz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing one important section and that is failure scenarios:

  • what happens when peripheral is to be synced but is unavailable?

Seems like particularly in channel creation and CLM, where sync should be done automatically, this scenario can happen.
Since in this case connection direction is expected to be:

other cases will be HUB calling the peripheral API like creating channels, pushing configuration channels, etc)

So if peripheral is unavailable, do we keep track what was updated and what not? And how?

  • what happens when peripheral or hub crashes during the sync?

With ISSv2 everything was just one transaction so inconsistencies should not happen.
We should however check ACIDity of our APIs. And not only individual API calls, but sequences of them which we will use for the sync and define expected failure modes.


An implementation example is available from the community [link](https://github.com/uyuni-project/contrib/blob/main/os-image-tools/osimage-import-export.py).

Communication can be done from the HUB to the peripheral server to synchronize all the necessary data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to point out not everyone would like to sync from HUB to peripheral.

Case being one of our users where they have different SUMAs for different environments - prod, qa, dev.
They build their images using SUMA in the dev environment, once image pass basic validation it is exported and imported to the qa SUMA using above mentioned script. Here image passes more thorough testing and once pass and maint. window open it is again moved to the prod. This process ensures no further changes to the image is done as the import/export do not modify image in any way.

Centrally managed hub network would help them with ensuring same configuration of the those SUMAs, however they would certainly need an ability to either:

  • sync images from the peripheral to another peripheral and to HUB (this can be done outside HUB arch by existing APIs)
  • prevent auto-syncing images from peripheral to HUB and/or overwriting peripheral images from HUB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point in here Ondrej.
Let's move by parts, and I will start from the end.

The idea is for users to be able to define in the HUB if they were to synchronize all data or only a select set. This way they could control when an image lands on each peripheral server.

HUB server could also have the ability to build images. It will have all the channels, so they create a build host assigned to the HUB server.

Considering those two assumptions would make sense to build the image on HUB server and then transfer that to dev preipheral server and make all the necessary tests. After all dev test where made, we transfer the new image version to other environments (qua, prod) and make it available to all.

If this is not the case, then we can always use the script you mentioned or ISSv2, since it will stay around.

The goal for this solution is scalability only, but other use cases will stay around and may need different implementation and components.

Signed-off-by: Ricardo Mateus <rmateus@suse.com>
@rjmateus
Copy link
Member Author

@aaannz I added a section about failing scenarios. Do you think is clear enough or should it be more completed?

@srbarrios srbarrios self-requested a review July 5, 2024 10:28
@cbosdo
Copy link

cbosdo commented Jul 25, 2024

Remember that a lazy repo-sync has been started. This may have an impact on the design.

@rjmateus
Copy link
Member Author

Remember that a lazy repo-sync has been started. This may have an impact on the design.

The design of this RFC should not have any impact. I would say is the other way around, the existence of this RFC and the possibility of having a chained SUSE manager server, that may not have the package synchronized is something that will impact the new reposync.

[alternatives]: #alternatives

- Create a new UI and use ISSv2 to synchronize data
- Solve the existing and known problems of ISSv2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not discard this point, in addition to implement ISSv3.
As ISSv2 it's going to be used for disconnected environments, we can still bring a better user experience to that use case. For example, can we consider some improvement around performing parallel sql queries?

- All peripheral can start synchronizing at same time
- Can be problematic if we have several peripherals performing full synchronization at same time. However we can configure the peripherals to run repo-sync in different hours and spread the load
- This is only a problem in first sync, since subsequent sync only transfer differences

Copy link
Member

@srbarrios srbarrios Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When synchronizing the channels from SCC service, in theory, we rely on a service with HA.
In that proposal, the peripherals will rely in a unique Hub instance through custom and vendor channels pointing to that machine, what happen if that goes down or.. the Hub disk burns? I would also consider how we can recover under these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants