Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize/cache the packages.json index #81

Open
AaronHolbrook opened this issue Sep 21, 2018 · 24 comments
Open

Optimize/cache the packages.json index #81

AaronHolbrook opened this issue Sep 21, 2018 · 24 comments

Comments

@AaronHolbrook
Copy link

As stated here:

#19 (comment)

This is a pretty big hit performance wise. And seeing as there's not a strong reason NOT to cache this, I vote that we figure out a good static/cached file that gets updated any time there's a package updated.

@bradyvercher
Copy link
Member

There are a few places where caching could be implemented and various ways to do it, so I'm open to any suggestions. Caching the output would likely provide the greatest benefit since checksums are generated for every release any time packages.json is loaded, but that wouldn't benefit the admin screen.

A few things that need to be considered:

  • The output could vary based on who is accessing packages.json, so the cache key probably needs to consider the logged in status and current user/API key
  • How many packages people are managing and how large the packages.json file is getting. I don't have any idea, but I've heard some people are managing 80+ plugins.
  • When to invalidate the cache and how to do that if it's based on the current user?
  • Transient, object cache, or file-based cache? Leave it to the server or something like Batcache?

It might also be possible to see some gains by caching the result of the directory scan so that doesn't have to be done on each load. That would be easier to invalidate, but wouldn't help with the checksums (I'm guessing that's slow, but I don't have any benchmarks).

Does anyone else have any thoughts?

@AaronHolbrook
Copy link
Author

Now at the point where we're timing out due to how long it takes to build the package.json file.

  • I don't think the output should at all be dependent on who is accessing packages.json, it should be a full representation of the plugins that are in satispress.
  • We're managing 100 plugins.
  • I would invalidate the cache any time a plugin is updated.
  • I would use file based cache. Actually create a real packages.json file.

@bradyvercher
Copy link
Member

If you're at the point where the request is timing out, it sounds like caching may only be part of the solution. Wouldn't the request time out when the cache goes stale and prevent any output?

I've had a few people express interest in varying the output based on logged in status and the identity of the user/client. Doing that is possible right now, but if the cache doesn't account for it, then it removes a lot of flexibility, so I think it really does need to be considered.

What you're requesting is definitely the simplest solution and would probably be adequate for most people, so maybe a basic add-on/extension plugin would work?

@AaronHolbrook
Copy link
Author

I think the real win here would be batch or chunk processing or optimizing the build of the json file in some way.

I think the caching just makes sense to have it available at a moment's notice to decrease overall build/composer install time.

@bradyvercher
Copy link
Member

One reason I asked about the number of plugins being managed is because using provider-includes and providers-url could be one way to try to optimize the build if people are managing a lot of packages.

I agree caching by default would be worthwhile, but I do want to make sure it doesn't limit flexibility. I don't have a lot of time to dedicate to this, so I may look into putting together a basic add-on for now, but I'm happy to review any pull requests or make suggestions.

@Tawmu
Copy link

Tawmu commented Mar 23, 2020

I realise this issue is a couple of years old now but has anyone managed to speed up Satispress at all? We're managing ~85 plugins with it. Satispress has been huge for us but it's getting very slow at this point.

@tyrann0us
Copy link
Contributor

@Tawmu, maybe this is useful for you: #117 (comment).

We still haven't tried it on our live SatisPress instance but it should work in theory and improve performance significantly.

@bradyvercher
Copy link
Member

@Tawmu I'm stoked to hear that SatisPress has been helpful! There should be several ways to speed it up depending on how y'all are using it. If you're interested in sponsoring improvements, feel free to reach out to discuss what you need.

@timnolte
Copy link

timnolte commented Aug 14, 2020

In regards to performance, reliability, I was just throwing the idea out to our DevOps/SysAdmin team about the idea of hosting the .zip fils via an S3 bucket, and if SatisPress could present the S2 bucket URL as the source of the .zip files instead of itself. I don't see any documentation on doing anything like this, or available hooks/filters to try this. This would move the resource requirements off of the SatisPress server since requests for the packages themselves, especially from what's listed in the composer.lock files, would never hit the SatisPress server.

I recognize that what I'm talking about is different then the performance of serving up many plugins in the packages.json, I'm looking a bit farther down the path. Maybe what I'm talking about warrants a new issue.

@bradyvercher
Copy link
Member

@timnolte The storage layer in SatisPress was abstracted so that you could use other services, you would just need to write an adapter that implements the Storage interface. I wrote the majority of an adapter for S3 awhile back, but it's not totally complete.

I don't think it'd have a noticeable impact on performance one way or another unless the SatisPress server was being used by a large internal team or publicly. I think the main problem is that as the number of plugins and cached releases grows, it takes more and more time to generate packages.json.

@timnolte
Copy link

@bradyvercher right, I recognized that this topic is more about the packages.json in my post revision. We haven't quite seen the issue with the packages.json yet. Our issues has been the SatisPress server going down somehow and failing all sorts of builds because they can't pull the plugins.

I did start poking around and saw the Storage interface. However, it seems like there isn't any sort of hook to provide a custom storage adapter which is what I think would be required here:

$container['storage.packages'] = function( $container ) {
$path = path_join( $container['storage.working_directory'], 'packages/' );
return new Storage\Local( $path );
};

Perhaps there is another way to provide an adapter that I'm just not finding.

@retrorism
Copy link

Hi @bradyvercher, first of all, thank you for all your work on SatisPress, it's a vital part of our CI/CD workflow and we wouldn't know what we'd do without it! I was happy to see that my colleagues @tyrann0us and @widoz have already contributed to SatisPress and even this discussion. (@tyrann0us is no doubt going to laugh at the length of this reply and seeing it's from me, so be it 😇)

The SatisPress instance we run has grown to provide access to ~175 plugins and themes. I should add that this count could probably be brought down by 20/30% if we did some proper housekeeping. Still, a number below 100 seems like a stretch. In our case, the generated JSON currently weighs over 750KB.

We're pretty sure that caching the output for the /satispress/packages.json endpoint would help us make our Composer-based CI/CD processes more performant and reliable. After a recent migration to a somewhat leaner server with more control/visibility over logs and performance, we could more easily establish that the endpoint is responsible for a good chunk of the load on the server.

As @tyrann0us already pointed out referring to this comment, switching to a multisite approach to manage 'categories' of plugins helps us with the WordPress dashboard performance.
Assuming that it's not possible to have a separate Composer endpoint for each subsite (or is it?), without caching, our SatisPress instance is the bottleneck when one of our developers runs a composer require or composer install command for a package that relies on a third-party plugin.

Our DevOps' person's first attempt at coming up with a caching mechanism independent of WordPress / PHP failed today.
We'd like to help out to bring the caching to SatisPress itself, but there's one thing I don't quite understand yet:

In your opening reply, you wrote

The output could vary based on who is accessing packages.json, so the cache key probably needs to consider the logged in status and current user/API key
and
When to invalidate the cache and how to do that if it's based on the current user?

How will the output vary apart from the fact that user / API key combos that aren't registered simply are denied access?
I apologize if this is something obvious I missed.

@perforsberg77
Copy link

Hi!

Any progress in this issue or any solution to suggest? Satispress with ~100 plugins runs really slow for us. The multisite approach looks promising if you think of performance, but I don´t like the idea of having several satispress repos in the composer.json file and confusing to know which plugin is in what repo.

Have anyone found a way around this?

@tyrann0us
Copy link
Contributor

Hi @perforsberg77

The multisite approach looks promising if you think of performance, but I don´t like the idea of having several satispress repos in the composer.json file and confusing to know which plugin is in what repo.

This is a misunderstanding; SatisPress is network-active, so you only have one repository (one vendor name), even if the underlaying WordPress installation is a multisite.

@perforsberg77
Copy link

Hi @tyrann0us!

This is a misunderstanding; SatisPress is network-active, so you only have one repository (one vendor name), even if the underlaying WordPress installation is a multisite.

Aha ok, so I install SatisPress on the main site so it will be network active and active on all underlaying sites. Then I create underlaying sites and name them as I want (for example by vendor as you did in some test), install plugins/themes in those underlaying sites (max 10-20 plugins per site). SatisPress on the main site will keep track on all of them so I only need on repo in composer.json file. This will also solve the problem with plugins not compatible with each other, just install them on different underlaying sites. Great!

No performance issues with too many underlaying sites so far?

Do you recommend to convert existing SatisPress installation to multisite, or start from scratch with a new one?

@timnolte
Copy link

@perforsberg77 for clarity with a WordPress multisite plugins are installed and managed at the network level not the site level. The performance aspect is just having plugins Active, in order to trigger automatic updates and such, on a per site basis.

@perforsberg77
Copy link

Aha ok, thanks @timnolte!

@perforsberg77
Copy link

@perforsberg77 for clarity with a WordPress multisite plugins are installed and managed at the network level not the site level. The performance aspect is just having plugins Active, in order to trigger automatic updates and such, on a per site basis.

I have all plugins on network level and then 12 subsites where I have activated 5-10 plugins per subsite. My problem now is that only themes/plugins activated on the main site is automatically updated. Not any theme/plugin activated on any subsite. How can I enable auto updates for all network installed plugins that are not network enabled, but subsite activated?

@tyrann0us
Copy link
Contributor

How can I enable auto updates for all network installed plugins that are not network enabled, but subsite activated?

Please see this comment #117 (comment):

The only difference to the current setup would be that the update cronjobs need to run for each site.

In practice, this means that you need to set up a bash script like this:

#!/bin/bash

LOG_FILE="/path/to/logs/auto-update.log"

echo "#### Upgrading WordPress core #####" >> $LOG_FILE
wp core update &>> $LOG_FILE

for SITE in $(wp site list --field=url 2>/dev/null); do # this does the trick
    echo "#### Processing ${SITE} #####" >> $LOG_FILE
    wp --url="${SITE}" theme update --all &>> $LOG_FILE
    wp --url="${SITE}" plugin update --all &>> $LOG_FILE
    wp --url="${SITE}" cron event run --due-now &>> $LOG_FILE
done

Then, execute this script from a cronjob. We have the cronjob set to run every four hours.

After having this setup in place for quite some time, we noticed that some plugins sometimes don't get updated. We haven't checked yet what exactly is causing the issues, and if them not being network-active could be the reason. But in general, this approach works fine.

@kimmenbert
Copy link

Jumping in on this. We have been caching the packages.json for some time now, and I'm looking into a better approach for this. Thinking cloudflare, nginx caching etc.

The issue is as with all caching, how to, and when to invalidate. The simplest way is ofc to just delete the transient / file every x time. But then you have to wait x time after adding / updating a plugin.

The better way would be on plugin install or update, and will investigate that some more.

Current solution I'm looking into is setting the following in the nginx configs (thank you ChatGPT 🙈). Using FastCGI caching.

FYI, our satispress has 315 plugins 😳

http {
    fastcgi_cache_path /var/cache/nginx/satispress levels=1:2 keys_zone=satispresscache:10m inactive=60m use_temp_path=off;
}

server {
    location ~* ^/satispress/packages\.json$ {
        fastcgi_cache satispresscache;
        fastcgi_cache_valid 200 10m;
        fastcgi_cache_key $scheme$request_uri;
        add_header X-Cache-Status $upstream_cache_status;

        fastcgi_pass unix:/run/php/php8.3-fpm.sock;
        fastcgi_index index.php;
        include fastcgi_params;

        fastcgi_param SCRIPT_FILENAME $document_root/index.php;
        fastcgi_param QUERY_STRING $query_string;

        fastcgi_ignore_headers Cache-Control Expires Set-Cookie;
    }
}

The previous solution involved making changes to the satispress plugin (SatisPress\Transformer\ComposerRepositoryTransformer::transform)

$upload_dir = wp_upload_dir();
$upload_dir = $upload_dir['basedir'] . '/satispress';

if ( false !== get_transient( 'satispress_packages' ) ) {
	$items = json_decode( file_get_contents( $upload_dir . '/packages.json' ), true );

	if ( ! empty( $items ) ) {
		return [ 'packages' => $items ];
	}
}

And

set_transient( 'satispress_packages', 'load-file', 60 * 60 * 24 );

if ( ! file_exists( $upload_dir ) ) {
	mkdir( $upload_dir );
}

$packages_json = json_encode( $items );
file_put_contents( $upload_dir . '/packages.json', $packages_json );

@aaronware
Copy link
Contributor

aaronware commented Dec 30, 2024

@kimmenbert have you explored overriding 'transformer.composer_repository' I am using a similar technique to override the storage to use R2 so in theory based on the example from @bradyvercher. You could use a similar technique. I haven't had the chance to circle back but below is a super quick example and from there you wouldn't need to edit the main plugin

/**
 * Hook into SatisPress to register the an extended ComposerRepositoryTransformer
 */
add_action( 'satispress_compose', function( $satispress, $container ) {
	$container['storage.packages'] = new YourExtendedComposerRepositoryTransformer(
				$container['transformer.composer_package'],
				$container['release.manager'],
				$container['version.parser'],
				$container['logger']
			);
}, 10, 2 );

See my original here #216

@kimmenbert
Copy link

@aaronware No, I have not. My latest attempt to cache the packages.json is not doing to well either. Generating it when cache is invalidated is just too heavy - some times.

Current time to generate packages.json is 40 - 50 seconds, so any composer command is real slow, We really need to cache it and will continue testing different approaches here.

@bradyvercher
Copy link
Member

@kimmenbert @aaronware When I looked into this a couple years ago, overriding the ComposerRepositoryTransformer was one good option. I guess it depends on how big the output is whether you'd want to store it in memory, a file, or somewhere else.

Another option is to do caching in the SatisPress\Route\Composer::handle() method. That class could be extended or overridden with a custom implementation.

Invalidating the cache would need to be done in a couple of places (the actual implementation would look different, this is just a quick example):

/**
 * Clear the SatisPress cache.
 */
function satispress_clear_cache() {
	// Invalidate the cache.
}

// Clear the cache when a package is added or removed from the repository.
add_action( 'update_option_satispress_plugins', 'satispress_clear_cache' );
add_action( 'update_option_satispress_themes', 'satispress_clear_cache' );

Then you would also need to clear the cache when a new version is archived by overriding the ReleaseManager class.

use SatisPress\Release;
use SatisPress\ReleaseManager;

class CustomReleaseManager extends ReleaseManager {
	public function archive( Release $release ): Release {
		if ( ! $this->exists( $release ) ) {
			satispress_clear_cache();
		}

		return parent::archive( $release );
	}
}

@kimmenbert
Copy link

Thanks!

I did some profiling today and found that the main work was being done in SatisPress\Storage\Local::checksum(); so now I'm attempting a solution where I cache the file hashes.

In our case, this is ran 4333 times, and will only increase going forward, unless we do some cleanup of plugin versions not being used.

This changes packages.json load time from ≈ 50 seconds to ≈ 5 seconds
I clear and build up this cache twice per hour (might not be needed to do it this often)

Have not verified, but I guess this would make new versions and new plugins appear immediately as well, instead of waiting until the next packages.json cache flush

Have never experienced a shasum change for existing packages previously, so I think I should be fine doing this, but will let this run for a while and see if any colleagues hit any issues.

For now I have hardcoded it into the plugin itself, so the checksum() func now looks like this:

public function checksum( string $algorithm, string $file ): string {
	$filename = $this->get_absolute_path( $file );

	if ( ! file_exists( $filename ) ) {
		throw FileNotFound::forInvalidChecksum( $filename );
	}

	$cache_key  = 'file_hash_' . md5( $algorithm . $filename . filemtime( $filename ) );

	$cached_hash = get_transient( $cache_key );
	if ( $cached_hash !== false ) {
			return $cached_hash;
	}

	$hash = hash_file( $algorithm, $filename );
	set_transient( $cache_key, $hash );

	return $hash;
}

The reason I'm not having expiration on the transients is because they then get autoloaded, which saves me the 1000s of db queries that I would do otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants