Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server performance issues #258

Open
connorhsm opened this issue Dec 17, 2024 · 5 comments
Open

Server performance issues #258

connorhsm opened this issue Dec 17, 2024 · 5 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@connorhsm
Copy link
Member

connorhsm commented Dec 17, 2024

A continuation of #151

See past 14 day view of metrics:
image

  • Server configuration: 2vCPU + 4GB memory.
  • Note multiple issues:

1. CPU usage (and thus load) spiking and staying at a high baseline

This is a long-standing issue and is of lesser concern than point 2. This issue rears its head typically ~2 weeks after a game server start. The update (and restart) predating this was December 3rd, see spiking begin December 11th. The usage is directly caused by the game server process, which sits at 100% usage on a single core, thus 50% baseline spike.

There isn't any measurable impact on player experience when this issue appears.

This can also be seen via htop:
Oddly, the process is reporting uptime of 8 days, which lines up roughly when the spiking began. Coincidence?
image

2. Memory usage continually marching up

Memory has typically been a slight concern. Though, from my memory, we first started having significant problems leading up to the most recent wipe (3rd November). We put this down to the map size and waited for the wipe. The rise of memory early in this map's lifetime was concerning, but we also hoped this would level out, it appears this is not the case.

This does inevitably cause the host machine to run out of memory and kill the game server process. Adding swap in previous testing appears to mitigate this to some degree. Suspected memory leak, now to find it 🙃

Action

  • I've taken a backup of the live server and will run it without any player load on a smaller server configuration (1vCPU, 1GB). I'll allow this to run for at least the remainder of the year to allow time for the CPU issue to appear, but will focus on debugging memory.
  • I'll add a 2GB swap, given memory appears to be growing at 5 % points a day (80% current).
@connorhsm connorhsm added bug Something isn't working help wanted Extra attention is needed labels Dec 17, 2024
@connorhsm connorhsm self-assigned this Dec 17, 2024
@connorhsm
Copy link
Member Author

Here's a Valgrind dump, capturing the server startup using a copy of the current map. No players joined during this.
valgrind-out.txt

@connorhsm
Copy link
Member Author

Effects of enabling swap: higher disk usage and lower memory usage. I didn't expect it to be used so much.
image
image

@connorhsm
Copy link
Member Author

Two weeks later, we're back at 86% memory usage + 2GB of Swap at 100%. I'll be adding another 2GB of Swap to prolong and debug further.

A separated test server has not produced any results with no players. I'll consider running a test with real players and observing with Valgrind.
image

@connorhsm
Copy link
Member Author

Immediate reduction in memory usage (Swap usage taken up)
image

@connorhsm
Copy link
Member Author

Looks like I was initially wrong about the testing server results, Valgrind did find a notable memory leak which should soon be handled, as linked above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant