Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Cached KV-based store #384

Closed
wants to merge 93 commits into from

Conversation

levb
Copy link
Contributor

@levb levb commented Nov 15, 2022

Summary

A cluster aware, plugin-KV based list store for things like App's and Subscription's.

  • added an initial implementation, based on pluginapi.cluster.Mutex + local tests
  • moved the AppStore to use it

Ticket Link

TODO - several tickets

TODO

  • Add Clean() to the CachedStore API to use in debug clean
  • Document the design
  • Test command that can be executed on bigger clusters
  • Implement data migration
  • Restore MessageHasBeenPosted, separate PR?
  • Alternate, single-writer implementation based OnClusterLeaderChanged + benchmark (ideally in a cluster)
  • Manually test with cluster
  • Move the subscription store

@levb levb added 2: Dev Review Requires review by a core committer Do Not Merge Should not be merged until this label is removed Work In Progress Not yet ready for review labels Nov 15, 2022
@levb levb added this to the v1.3.0 milestone Nov 15, 2022
@hanzei
Copy link
Contributor

hanzei commented Nov 15, 2022

@levb Before I jump into the code: Is there a rough overview of the design of the cached store?

@levb
Copy link
Contributor Author

levb commented Nov 15, 2022

@hanzei not yet, still WIP

in brief - for a given CachedStore (e.g. apps, subscriptions) a single index key in the KV holds a list of (key, data-SHA) and the individual items are stored as keyed, in the prefixed namespace. The in-memory index/cache is a sync.Map. KV writes are from whatever host received the request, protected for exclusivity by cluster.Mutex. Plugin cluster messages are sent to all hosts to notify them of new cache entries (PUT, DELETE).

The upcoming single-writer version will send the same cluster messages, but the writes will be serialized on the master, with no need for clusterMutex.Lock()

@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2022

Codecov Report

Base: 21.06% // Head: 23.73% // Increases project coverage by +2.66% 🎉

Coverage data is based on head (b285fd3) compared to base (d6253ea).
Patch coverage: 33.23% of modified lines in pull request are covered.

Additional details and impacted files
@@                         Coverage Diff                          @@
##           lev-MM-48793-bot-subs-simplified     #384      +/-   ##
====================================================================
+ Coverage                             21.06%   23.73%   +2.66%     
====================================================================
  Files                                    81       85       +4     
  Lines                                  5473     5579     +106     
====================================================================
+ Hits                                   1153     1324     +171     
+ Misses                                 4181     4091      -90     
- Partials                                139      164      +25     
Impacted Files Coverage Δ
apps/app.go 0.00% <0.00%> (ø)
apps/manifest.go 91.17% <0.00%> (-2.77%) ⬇️
server/config/service.go 0.00% <0.00%> (ø)
server/proxy/bindings.go 33.96% <0.00%> (ø)
server/proxy/enable.go 4.41% <0.00%> (ø)
server/proxy/install.go 0.00% <0.00%> (ø)
server/proxy/invoke_oauth2.go 0.00% <0.00%> (ø)
server/proxy/notify.go 0.00% <0.00%> (ø)
server/proxy/service.go 6.89% <0.00%> (-0.09%) ⬇️
server/proxy/synchronize.go 0.00% <0.00%> (ø)
... and 20 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hanzei
Copy link
Contributor

hanzei commented Nov 15, 2022

The upcoming single-writer version will send the same cluster messages, but the writes will be serialized on the master, with no need for clusterMutex.Lock()

What are the advantages of that approach vs the current usterMutex.Lock() one?

@levb
Copy link
Contributor Author

levb commented Nov 15, 2022

What are the advantages of that approach vs the current usterMutex.Lock() one?

Hypothetically:

  • Fewer DB queries
  • Better reliability (TL;DR - cluster leader is "server-native", trustworthy, and cheap. cluster.Mutex - homegrown. I specifically do not understand how cluster.Mutex is reset if the locking plugin terminates). The argument about distributing the load - I do not 100% agree with. Specifically, 1/5 consolidating DB writes to a single host may actually achieve better performance; definitely not sub-par unless other overhead (CPU, memory, IO) requires distributed writers.

1/5 I'd much prefer to submit both implementations, with tests to demonstrate the differences.

Copy link
Contributor

@hanzei hanzei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. I will post my high-level ones in Mattermost for a broader discussion.

server/plugin.go Outdated Show resolved Hide resolved
Data []IndexEntry[T]
}

type Index[T any] map[string]*IndexEntry[T]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Index need to be part of the API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there's no real external API since these are wrapped in AppStore and the likes, but I am making some types public with the intent of go-documenting them.


func MakeCachedStore[T any](name string, api plugin.API, mmapi *pluginapi.Client, logger utils.Logger) (*CachedStore[T], error) {
s := &CachedStore[T]{
name: name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can namebe an empty string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will resolve. In theory, it could - in practice it should not be.

type cachedStoreEvent[T any] struct {
Key string `json:"key"`
Method string `json:"method"`
SentAt time.Time `json:"sent_at"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is SentAt used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing ATM, debugging and metrics later.

Comment on lines 121 to 124
prevIndex, err := s.syncFromKV(true)
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the sync with the KV store needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just verifies that the in-memory cache is indeed the same as the KV state, based on the index hash; will not reload unless the hashes differ (which they never should)

Copy link
Contributor

@javaguirre javaguirre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some small details I want to discuss, but I have to spend a bit more time reviewing the code.

server/config/service.go Outdated Show resolved Hide resolved
conf.AllowHTTPApps = true
}

conf.AWSAccessKey = os.Getenv(upaws.AccessEnvVar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not separating code that has different responsibilities? I understand it may be because It's not used anywhere else. It could help for legibility (AWS Cloud config (142-171?), Proxy setup (103-122).

if cachedIndex.hash() != prevPersistedIndex.hash() {
add, change, remove := cachedIndex.compareTo(prevPersistedIndex)
if logWarnings {
s.logger.Warnf("stale cache for %s, rebuilding. extra keys: %v, missing keys:%v, different values for keys:%v", s.name, remove, add, change)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we log the warning and control from the outside if we want to see that level of logging and remove logWarnings?

Co-authored-by: Ben Schumacher <ben.schumacher@mattermost.com>
@levb levb changed the base branch from master to lev-MM-48793-bot-subs-simplified January 31, 2023 01:45
@@ -52,7 +52,7 @@ type service struct {
mattermostConfig *model.Config
}

func MakeService(mm *pluginapi.Client, pliginManifest model.Manifest, botUserID string, telemetry *telemetry.Telemetry, i18nBundle *i18n.Bundle, log utils.Logger) (Service, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanzei I took out your logging fix because I think the logging changes in this PR already handle the invalid config use case. I tested with a missing SiteURL, and get the initial logging (ensureBot), followed by the expected activate error (invalid config) logged at INFO, then exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@levb
Copy link
Contributor Author

levb commented Feb 19, 2023

This PR will be closed, re-submitted as a series of PRs: #457, #458, #459

@levb levb closed this Feb 19, 2023
@mattermost-build mattermost-build removed Work In Progress Not yet ready for review Do Not Merge Should not be merged until this label is removed labels Feb 19, 2023
@hanzei hanzei added Invalid This doesn't seem right and removed 2: Dev Review Requires review by a core committer labels Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants