Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnocchi-metricd service RAM usage increasing #1294

Open
aleksei-mv opened this issue Feb 7, 2023 · 2 comments
Open

gnocchi-metricd service RAM usage increasing #1294

aleksei-mv opened this issue Feb 7, 2023 · 2 comments

Comments

@aleksei-mv
Copy link

Before reporting an issue on Gnocchi, please be sure to provide all necessary
information.

Which version of Gnocchi are you using

gnocchi affected versions 4.4.1-4.4.2
Kolla-based Openstack Yoga installation. Gnocchi containers gnocchi_api and gnocchi_metricd are built from source.

How to reproduce your problem

  1. Container-based(Kolla) Openstack Yoga installation
  2. Gnocchi version: 4.4.1-4.4.2
  3. Redis as incoming storage
  4. S3 as persistent storage
  5. Ceilometer as metric collector

What is the result that you get

  1. ~800MB RAM usage per metricd worker and growing.
  2. ~2GB RAM usage added per 24 hours for 12 workers configuration

What is result that you expected

~150MB RAM usage per metricd worker as mentioned in this issue #606

Additional info

Hi, everyone!

I ran into problem that my Gnocchi installation continiously increasing RAM usage after service start. Specifically, I found out that problem is in metricd module.
12 worker installation starts to use ~2GB more RAM daily. RAM cleans up after service restart, but gets littered after.

There is telemetry for gnocchi-metricd container by cAdvisor:

  1. RAM value after few hours from service start:
    image
  2. RAM after few days of service working:
    image

Enabling debug mode and inspecting logs does not give any result. No error logs found. Metricd proccessing all metrics from incoming storage, so no metrics stuck.

There is gnocchi.conf sections I'm using:

...
[metricd]
workers = 12
metric_processing_delay = 60
metric_reporting_delay = -1
metric_cleanup_delay = 60
processing_replicas = 3
cleanup_batch_size = 10000
...
[storage]
driver = s3
s3_endpoint_url = <url>
s3_access_key_id = <id>
s3_secret_access_key = <key>
s3_bucket_prefix = gnocchi
s3_check_consistency_timeout = 30
s3_max_pool_connections = 100
...

There is ps_mem output for gnocchi user in host system:

ps_mem -p $(pgrep -d, -u 42416)
 Private  +   Shared  =  RAM used	Program

 72.0 KiB +  20.5 KiB =  92.5 KiB	dumb-init
 35.6 MiB +   4.7 MiB =  40.3 MiB	gnocchi-metricd
607.0 MiB +  20.0 MiB = 627.0 MiB	apache2 (5)
 11.6 GiB +  30.0 MiB =  11.7 GiB	python3.8 (13)
---------------------------------
                         12.3 GiB
=================================

Unfortunately, my python/programming skills are on quiet low level, so I'm not able to debug such a large app on my own. :(

@tobias-urdin
Copy link
Contributor

Hello 👋 I don't have any great ideas, the only known issue we've had recently is the memory bug in the ujson library so I suggest checking the version of that used, see #1136

Other than that you would need to troubleshoot further to get more information, we are not using the S3 storage so I don't know if there might be an issue with that or if it's something else.

@daydrim
Copy link

daydrim commented Aug 27, 2023

Hello, we've found that lots of file descriptors opening (300K+) and can not be closed. Probably this RAM usage in increasing because file descriptors constantly grows.

gnocchi-metricd process opens eventpoll - descriptors and do not close

Firstly we thought that this is the problem with boto3 library , but do not found any bugs related to this directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants