Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ConcurrentLru read throughput #645

Merged
merged 2 commits into from
Nov 20, 2024
Merged

Conversation

bitfaster
Copy link
Owner

@bitfaster bitfaster commented Nov 20, 2024

LruItem.WasAccessed was previously volatile to ensure that thread A marking an item as accessed is visible to thread B that is cycling the cache. Under the covers, volatile equates to half fence for reads and writes.

From the .NET memory model:

  • Volatile reads have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.
  • Volatile writes have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.
  • Full-fence operations have "full-fence semantics" - effects of reads and writes must be observable no later or no earlier than a full-fence operation according to their relative program order.

Immediately before calling ConcurrentLruCore.Cycle, there is always an interlocked call. We can thus piggy-back on interlocked and avoid the half fences.

Without the check in MarkAccessed, this does not result in the same throughput boost as #643 because x64 has a strong memory model (the write has release semantics and generates traffic to ensure CPU cache coherence).

Before

Results_Read_500_base

After

Results_Read_500

@coveralls
Copy link

coveralls commented Nov 20, 2024

Coverage Status

coverage: 99.218% (+0.07%) from 99.149%
when pulling 7c55eba on users/alexpeck/barrier
into aeae236 on main.

@bitfaster
Copy link
Owner Author

7c55ebaab8329906cf9d5d5eeead46f2f266be4c
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.2314)
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
Method Runtime Mean Error Ratio Code Size
ConcurrentDictionary .NET 6.0 7.375 ns 0.0925 ns 1.00 1,521 B
FastConcurrentLru .NET 6.0 8.569 ns 0.0252 ns 1.16 7,039 B
ConcurrentLru .NET 6.0 15.214 ns 0.0378 ns 2.06 7,286 B
AtomicFastLru .NET 6.0 27.285 ns 0.0655 ns 3.70 NA
FastConcurrentTLru .NET 6.0 11.830 ns 0.0299 ns 1.60 6,222 B
FastConcLruAfterAccess .NET 6.0 12.148 ns 0.2415 ns 1.65 8,001 B
FastConcLruAfter .NET 6.0 13.938 ns 0.1153 ns 1.89 8,083 B
ConcurrentTLru .NET 6.0 16.870 ns 0.0669 ns 2.29 7,752 B
ConcurrentLfu .NET 6.0 27.989 ns 0.5687 ns 3.80 NA
ClassicLru .NET 6.0 43.475 ns 0.0806 ns 5.90 NA
RuntimeMemoryCacheGet .NET 6.0 111.069 ns 0.3001 ns 15.06 89 B
ExtensionsMemoryCacheGet .NET 6.0 47.346 ns 0.3871 ns 6.42 119 B
ConcurrentDictionary .NET Framework 4.8 15.274 ns 0.1652 ns 1.00 4,127 B
FastConcurrentLru .NET Framework 4.8 15.951 ns 0.0542 ns 1.04 27,388 B
ConcurrentLru .NET Framework 4.8 20.185 ns 0.1386 ns 1.32 27,692 B
AtomicFastLru .NET Framework 4.8 37.835 ns 0.2130 ns 2.48 358 B
FastConcurrentTLru .NET Framework 4.8 28.312 ns 0.2128 ns 1.85 27,572 B
FastConcLruAfterAccess .NET Framework 4.8 30.603 ns 0.1348 ns 2.00 358 B
FastConcLruAfter .NET Framework 4.8 32.583 ns 0.2912 ns 2.13 358 B
ConcurrentTLru .NET Framework 4.8 32.897 ns 0.0534 ns 2.15 27,924 B
ConcurrentLfu .NET Framework 4.8 52.025 ns 0.4951 ns 3.41 NA
ClassicLru .NET Framework 4.8 56.101 ns 0.5643 ns 3.67 NA
RuntimeMemoryCacheGet .NET Framework 4.8 297.775 ns 1.1600 ns 19.50 79 B
ExtensionsMemoryCacheGet .NET Framework 4.8 93.033 ns 0.3655 ns 6.09 129 B

@bitfaster bitfaster changed the title Remove volatile from LruItem Optimize ConcurrentLru read throughput Nov 20, 2024
@bitfaster bitfaster marked this pull request as ready for review November 20, 2024 04:02
@bitfaster
Copy link
Owner Author

bitfaster commented Nov 20, 2024

Adds 2 instructions to GetOrAdd:

image

@bitfaster bitfaster merged commit 25ea2bd into main Nov 20, 2024
13 checks passed
@bitfaster bitfaster deleted the users/alexpeck/barrier branch November 20, 2024 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants