Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(libsinsp): inspector thread pool #1949

Merged
merged 13 commits into from
Sep 5, 2024

Conversation

mrgian
Copy link
Contributor

@mrgian mrgian commented Jul 5, 2024

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:
Adds a thread pool capable of running non-blocking recurring routines.
Routines can be subscribed both by plugins and the inspector itself.

It also provides a way for plugins to know when the inspector open/closes the event capture.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

new(libsinsp): inspector thread pool

Copy link

github-actions bot commented Jul 19, 2024

Perf diff from master - unit tests

     1.28%     +1.09%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node
     3.54%     -0.87%  [.] sinsp_thread_manager::find_thread
     4.17%     -0.85%  [.] gzfile_read
     2.00%     +0.80%  [.] scap_event_decode_params
     3.17%     -0.75%  [.] sinsp_thread_manager::get_thread_ref
     0.88%     +0.64%  [.] scap_event_encode_params_v
     1.22%     -0.61%  [.] sinsp_evt::get_direction
     0.61%     +0.53%  [.] scap_event_has_large_payload
     1.16%     -0.51%  [.] scap_next
     0.55%     -0.41%  [.] libsinsp::runc::match_container_id

Heap diff from master - unit tests

peak heap memory consumption: -16B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            +0.0385         +0.0386           145           151           145           151
BM_sinsp_split_median                                          +0.0437         +0.0439           145           152           145           152
BM_sinsp_split_stddev                                          +1.2455         +1.2443             1             2             1             2
BM_sinsp_split_cv                                              +1.1622         +1.1609             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  +0.1263         +0.1264            42            47            42            47
BM_sinsp_concatenate_paths_relative_path_median                +0.1540         +0.1541            42            49            42            49
BM_sinsp_concatenate_paths_relative_path_stddev                +5.6023         +5.5989             0             2             0             2
BM_sinsp_concatenate_paths_relative_path_cv                    +4.8621         +4.8584             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     -0.0113         -0.0112            17            17            17            17
BM_sinsp_concatenate_paths_empty_path_median                   -0.0168         -0.0167            17            17            17            17
BM_sinsp_concatenate_paths_empty_path_stddev                   +0.5396         +0.5388             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       +0.5572         +0.5562             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.0714         +0.0715            43            46            43            46
BM_sinsp_concatenate_paths_absolute_path_median                +0.0629         +0.0630            43            45            43            45
BM_sinsp_concatenate_paths_absolute_path_stddev                +3.7232         +3.7205             0             1             0             1
BM_sinsp_concatenate_paths_absolute_path_cv                    +3.4084         +3.4055             0             0             0             0
BM_sinsp_split_container_image_mean                            +0.0104         +0.0105           349           352           349           352
BM_sinsp_split_container_image_median                          +0.0088         +0.0090           350           353           349           353
BM_sinsp_split_container_image_stddev                          -0.4461         -0.4469             3             2             3             2
BM_sinsp_split_container_image_cv                              -0.4518         -0.4527             0             0             0             0

Copy link

Perf diff from master - unit tests

Warning:
Processed 439738 events and lost 518 chunks!

Check IO/CPU overload!

     5.16%     -1.25%  [.] sinsp_parser::process_event
     2.74%     +1.17%  [.] sinsp_thread_manager::get_thread_ref
     7.19%     -1.14%  [.] sinsp::next
    10.18%     -1.10%  [.] sinsp_parser::reset
     2.09%     -0.74%  [.] sinsp::fetch_next_event

Perf diff from master - scap file

    15.61%     -5.50%  [.] sinsp_filter_check::extract_nocache
    10.85%     -4.47%  [.] sinsp_filter_check::tostring
     3.62%     +2.86%  [.] sinsp_evt_formatter::tostring_withformat
     5.28%     -1.33%  [.] sinsp_evt::get_category
     3.61%     +1.21%  [.] sinsp_evt::get_param_as_str
     7.47%     -1.05%  [.] sinsp_filter_check::rawval_to_string
     7.63%     +0.63%  [.] sinsp_filter_check::apply_transformers
     9.34%     -0.54%  [.] 0x00000000000a76c4
     3.61%     -0.42%  [.] next
     3.62%     -0.42%  [.] sinsp_evt::get_type

Heap diff from master - unit tests

total runtime: -2.03s.
calls to allocation functions: -54491 (26895/s)
temporary memory allocations: -2648 (1307/s)
peak heap memory consumption: -39.99K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

total runtime: -0.01s.
calls to allocation functions: -133 (9500/s)
temporary memory allocations: 1 (-71/s)
peak heap memory consumption: -37.75K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Copy link

codecov bot commented Jul 19, 2024

Codecov Report

Attention: Patch coverage is 23.31606% with 148 lines in your changes missing coverage. Please review.

Project coverage is 74.03%. Comparing base (e2c5174) to head (ae81bb6).
Report is 27 commits behind head on master.

Files with missing lines Patch % Lines
userspace/libsinsp/test/plugins/routines.cpp 0.00% 70 Missing ⚠️
userspace/libsinsp/plugin.cpp 28.86% 69 Missing ⚠️
userspace/libsinsp/sinsp.cpp 46.66% 8 Missing ⚠️
userspace/libsinsp/test/events_plugin.ut.cpp 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1949      +/-   ##
==========================================
- Coverage   74.31%   74.03%   -0.28%     
==========================================
  Files         253      254       +1     
  Lines       30966    31111     +145     
  Branches     5403     5417      +14     
==========================================
+ Hits        23011    23032      +21     
- Misses       7942     8070     +128     
+ Partials       13        9       -4     
Flag Coverage Δ
libsinsp 74.03% <23.31%> (-0.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mrgian mrgian marked this pull request as ready for review July 19, 2024 15:17
@poiana poiana requested a review from gnosek July 19, 2024 15:17
Copy link

Perf diff from master - unit tests

Warning:
Processed 440897 events and lost 552 chunks!

Check IO/CPU overload!

    10.13%     -1.37%  [.] sinsp_parser::reset
     7.16%     -1.28%  [.] sinsp::next
     5.13%     -1.15%  [.] sinsp_parser::process_event
     5.11%     -1.03%  [.] next
     2.73%     +0.63%  [.] sinsp_thread_manager::get_thread_ref

Perf diff from master - scap file

    11.99%     -6.18%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node
    10.17%     -5.00%  [.] sinsp::next
     7.98%     -3.58%  [.] 0x00000000000a76c4
     6.52%     -2.58%  [.] sinsp_filter_check::apply_transformers
     6.38%     -2.48%  [.] sinsp_filter_check::rawval_to_string
     8.94%     -2.47%  [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>
    13.35%     -2.00%  [.] sinsp_filter_check::extract_nocache
     3.07%     +1.17%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
     3.85%     +0.90%  [.] sinsp_parser::process_event
     9.28%     +0.75%  [.] sinsp_filter_check::tostring

Heap diff from master - unit tests

total runtime: -2.06s.
calls to allocation functions: -54591 (26461/s)
temporary memory allocations: -2758 (1336/s)
peak heap memory consumption: -39.99K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

total runtime: 0.00s.
calls to allocation functions: -133 (-33250/s)
temporary memory allocations: 1 (250/s)
peak heap memory consumption: -37.75K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Copy link

Perf diff from master - unit tests

Warning:
Processed 444904 events and lost 546 chunks!

Check IO/CPU overload!

     5.13%     -1.09%  [.] sinsp_parser::process_event
     1.72%     -0.83%  [.] libsinsp::sinsp_suppress::process_event
    10.13%     -0.78%  [.] sinsp_parser::reset
     2.73%     -0.72%  [.] sinsp_thread_manager::get_thread_ref
     1.54%     +0.70%  [.] 0x00000000000e83b0

Perf diff from master - scap file

     3.42%     +5.80%  [.] sinsp_evt_formatter::tostring_withformat
    11.27%     -3.81%  [.] sinsp::next
    13.28%     -3.55%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node
    14.78%     -3.08%  [.] sinsp_filter_check::extract_nocache
     7.07%     -2.78%  [.] sinsp_filter_check::rawval_to_string
     8.84%     -2.44%  [.] 0x00000000000a76c4
     3.42%     +2.04%  [.] sinsp_evt::get_type
    10.28%     -1.70%  [.] sinsp_filter_check::tostring
     3.42%     +1.48%  [.] next
     7.22%     -1.40%  [.] sinsp_filter_check::apply_transformers

Heap diff from master - unit tests

total runtime: -2.14s.
calls to allocation functions: -55177 (25795/s)
temporary memory allocations: -3260 (1524/s)
peak heap memory consumption: -39.99K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

total runtime: 0.00s.
calls to allocation functions: -133 (-133000/s)
temporary memory allocations: 1 (1000/s)
peak heap memory consumption: -37.75K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Copy link
Contributor

@gnosek gnosek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the plugin API changes but honestly the thread pool scares me. I must be missing some context about why we need it in the first place too (why can't a plugin just start a thread/pool itself? we expect async plugins to do this already).

Exposing this from sinsp feels like encouraging people to do stuff in the thread pool, which is fine, until somebody tries to access tables from there (they're very much not thread safe AFAIK)

So, why do we want this?

@mrgian
Copy link
Contributor Author

mrgian commented Jul 23, 2024

So, why do we want this?

For multiple reasons:

  • Better performance: we allocate a fixed number of threads during init, and then we use the same threads to run different async computations form different plugins.
  • Makes plugins cleaner/easier to develop: they don't need to worry about allocating new threads, they just need to subscribe routines functions.
  • Concurrency can be controlled from the inspector side: not currently implemented, but we can do things like removing routines that take a lot to execute and/or set priorities in the thread pool.

@gnosek
Copy link
Contributor

gnosek commented Jul 23, 2024

Do we have a specific use case in mind?

So, why do we want this?

For multiple reasons:

  • Better performance: we allocate a fixed number of threads during init, and then we use the same threads to run different async computations form different plugins.

With a fixed number of threads, what happens if you run out of them? It takes just a single routine (that keeps rescheduling itself) to block a thread.

  • Makes plugins cleaner/easier to develop: they don't need to worry about allocating new threads, they just need to subscribe routines functions.

Are we saving anything except a pthread_create call? I suppose passing parameters is easier that way.

  • Concurrency can be controlled from the inspector side: not currently implemented, but we can do things like removing routines that take a lot to execute and/or set priorities in the thread pool.

Cancelling a thread is far from trivial, especially if the executed code does not cooperate (if it does cooperate, it doesn't matter which thread pool it runs in)

One thing I can image this being useful for would be assorted deferred or periodic cleanups (run a lambda in the background whenever we have a moment) in plugins that aren't multithreaded otherwise, but then:

  • we'd need to prevent the routines from rescheduling themselves immediately (just make them all one-shot or executed on a fixed interval)
  • technically we don't even need a thread pool for this (though we may want one), just fire the callbacks on SCAP_TIMEOUT

Also, I'm really afraid of pushing multithreading to plugin users before we have a consistent multithreading story ourselves (especially with tables)

mrgian and others added 11 commits August 29, 2024 09:42
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
@mrgian mrgian force-pushed the sinsp-thread-pool branch 2 times, most recently from 349c5bc to d3a2977 Compare August 29, 2024 08:05
@mrgian
Copy link
Contributor Author

mrgian commented Aug 29, 2024

I made a bunch of changes according to the review, most importantly:

  • ss_plugin_routine_vtable's unsubscribe now returns ss_plugin_rc with value SS_PLUGIN_SUCCESS if the routine has been correctly unsubscribed.
  • I added a new cmake option ENABLE_THREAD_POOL.
    If enabled it pulls the needed dependency and sets thread_pool_bs as thread pool implementation.
    If disabled sets m_thread_pool to nullptr but still allows adopters to provide their own thread pool implementation using set_thread_pool (as long as it implements the thread_pool base class).
    This option is disabled by default.

WDYT? @jasondellaluce @gnosek

//
// Required: yes if capture_open is defined
// Arguments:
// - s: the plugin state, returned by init(). Can be NULL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing. I assume this is a copy-paste thing? But also, what do we do if ->capture_close returns an error? Is the return value meaningful here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we do if ->capture_close returns an error? Is the return value meaningful here?

Currently the return value of capture_open/capture_close is ignored by the inspector.
But in the future we may need to know if something went wrong in the plugin during capture_open/capture_close, we will be able to do that without introducing a breaking change in the plugin api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but what are you going to do about it? It's like a destructor returning an error. I don't think there's any way to handle an error here, except for feeling a little bit sad and moving on (you're not going to cancel the capture stop anyway).

Still, not going to protest too hard about it ;) it just seems useless, not actively harmful

(an error from capture_open should presumably tear down the whole inspector though, or maybe disable the plugin or something; in any case, it's definitely actionable and should be reported)

Copy link
Contributor

@jasondellaluce jasondellaluce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left two other minor suggestions! Looks good now!

userspace/libsinsp/CMakeLists.txt Outdated Show resolved Hide resolved
userspace/libsinsp/thread_pool_bs.h Outdated Show resolved Hide resolved
Signed-off-by: Gianmatteo Palmieri <mail@gian.im>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com>
Copy link
Contributor

@jasondellaluce jasondellaluce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

@poiana poiana added the lgtm label Sep 2, 2024
@poiana
Copy link
Contributor

poiana commented Sep 2, 2024

LGTM label has been added.

Git tree hash: 3714b104ac9ddd395f24253a87eabc984cae980c

@FedeDP
Copy link
Contributor

FedeDP commented Sep 5, 2024

/approve
We only need the approved label, since we already have 2 GitHub approvals.

@poiana
Copy link
Contributor

poiana commented Sep 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, mrgian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana added the approved label Sep 5, 2024
@poiana poiana merged commit 8f6f9df into falcosecurity:master Sep 5, 2024
44 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants