Performance and scalability questionnaire

When designing or developing a new feature, or improving an existing one, use this set of questions to identify potential performance problem areas.

Task execution time and frequency

How much time does it take to execute the task and how does it impact users?

The first thing is know how much time it takes to execute the task. One can use log information, of a timer when executing a command. After this, we should analyze how often the feature is used and the impact it could have on user experience.

Note: High execution time is not always a problem. If a task takes 1 minute but is only once it could be acceptable. On the other hand, if a task takes 1 second, but is executed on each Web UI page request, it will be a performance problem.

Note: Some tasks will need to be executed per-client, or per-channel, or per-product, or per some other quantity we can estimate. Keep in mind:

thousands of clients are normal these days. Tens of thousands is possible - depending on the feature it might or might not be relevant for users with an infrastructure that large
more than a thousand channels is becoming commonplace
tens of products is considered normal
we rarely see more than a few tens of users, rarely more than ten concurrently active

Example 1

A feature that auto-generates product bootstrap repository for each product at the end of repo-sync (values and considerations are just examples to make a point).

How long does it take to execute the task?

In a local test, with one product it takes ~30 seconds every time to create the bootstrap repository.

Will the time be the same for all products?

It will depend on the product size and how many packages are needed for the initial bootstrap. So it can be higher.

How often is the feature used/called?

It will execute each time spacewalk-repo-sync is called and one time per channel. spacewalk-repo-sync can be called once when a new channel is added and users want to use it as fast as possible.

How many channels costumers normally have?

Hundreds.

Will it impact user experience?

spacewalk-repo-sync is a task that takes time to perform. Several optimizations were made over time to improve it. Improving performance here will have an impact on user experience.

Conclusion

This feature will probably have an impact on the user's experience and need at least some attention to the performance impact.

Example 2

Feature: Load all users from LDAP at start time and ensure they exist in the database.

How long does it take to execute the task?

In a local test, with 100 LDAP users, it tokes ~5 seconds at startup time. With 1.000 LDAP users it will take ~50 seconds.

How often is the feature used/called?

Only at start up time.

Will it impact user experience?

It will impact startup time. It could be documented that this feature usage will increase start-up time.

Conclusion

We should be safe to merge it as is.

Database interactions

Is the database, and in particular Hibernate, used properly?

Are indexes in place?

We almost universally want indexes on any column that appears in a WHERE conditions, and we definitely want it on any key column or set of columns. If your change adds tables or columns, are indexes added as well?

Is the Hibernate cache going to grow too large?

Every time a Hibernated object is created as a result of some query, it enters a collection called the Hibernate cache. We are going to face problems if this collection contains multiple thousand objects or more. So if your code mass-loads multiple objects per client, and users might operate on many or all of their clients, that might be problematic.

Is Hibernate code being composed?

Looping over code that uses Hibernate to interact the database is usually a performance risk - Java coders are used to compose code this way, but pushing computation to the SQL layer is normally much faster. Whenever you write a loop around something that uses Hibernate, consider the maximum number of loops.

Is lazy loading handled correctly?

When loading an object, Hibernate might eager-load any referenced objects or wait to load them (with an additional SQL command) only if and when they are used later. In case you need all referenced objects anyway, it's best to configure Hibernate for eager loading.

Is explicit locking used?

Explicit locking at the database level is seldom really needed. Use with care because it carries a higher chance of deadlocks.

Is explicit committing used?

Committing happens at the end of each Web UI HTTP request, XMLRPC API call or Taskomatic task automatically. Committing in other places is normally discouraged and not needed, unless there are very specific reasons.

Salt interactions

Every time we call Salt some performance costs are incurred, and if the salt call actually targets minions we have to keep in mind network delays, potentially slow minions and dropped connections.

How many Salt calls are being performed?

Whenever Salt calls have to be performed per-minion we are facing a potential performance problem, unless the particular use case is exclusively about a few minions. We should strive to use Salt calls targeting multiple minions in one shot whenever possible.

When are Salt calls performed?

Blocking the UI while waiting for one or more minions to come back with a result is usually not ideal, while making calls in the background is normally ok.

Actions

Many pieces of functionality are exposed via Actions to users (appearing under "Schedule" in the main menu).

Is only one Action created, targeting multiple clients?

Action handling can have an important cost if one Action is created per targeted client. Cost is much less if one only Action (or a small set of Actions) is created, each targeting a potentially large number of clients.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly