Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElementLocation optimisations #10029

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

drewnoakes
Copy link
Member

@drewnoakes drewnoakes commented Apr 18, 2024

  • Null annotate file
  • Remove some redundant code
  • Consolidate validation
  • Reduce branching during construction
  • Documentation updates
  • Generally reduce the number of diagnostics displayed in the IDE in this file

For Int32, GetHashCode just returns the value directly.
The SmallElementLocation class exists because very few element locations require 32 bits to store the line/column values. It uses ushort (2 bytes) instead of int (4 bytes) for each value, in an attempt to reduce memory usage.

However the CLR aligns ushort fields on classes at four-byte boundaries on most (all?) architectures, meaning the optimisation has no effect.

This commit explicitly packs the two values into four bytes to ensure that four bytes is saved per instance.
@drewnoakes drewnoakes added performance Performance-Scenario-General This issue affects performance in general. labels Apr 18, 2024
The caller performs this validation already, and no other code can call this. Avoid some indirection and branching.
The compiler will generate slightly better code from this switch statement, in cases where either line or column is zero.
There was inconsistent handling of validation between implementations. This moves it all into the `Create` method so that it can be handled in one place, consistently.
@drewnoakes drewnoakes force-pushed the element-location-perf branch from 92e2897 to fc82d47 Compare April 18, 2024 02:20
@ladipro
Copy link
Member

ladipro commented Apr 18, 2024

However the CLR aligns ushort fields on classes at four-byte boundaries on most (all?) architectures, meaning the optimisation has no effect.

This is surprising. The alignment requirement of primitives tends to be their size, so two-byte integers will typically be aligned at two-byte boundaries. Inspecting the x64 NetFx MSBuild with SOS supports it:

image

On which architecture did you see them unnecessarily padded?

@drewnoakes
Copy link
Member Author

Yes I was also surprised. I tested here, which is x64. Maybe the approach taken in that code is incorrect.

@drewnoakes
Copy link
Member Author

Here's a diff between field positions that shows four bytes offset between consecutive ushort fields.

@drewnoakes
Copy link
Member Author

Inspecting the x64 NetFx MSBuild with SOS supports it:

The highlighted bit of your screenshot shows the string name field is two bytes, which looks suspicious to me.

@ladipro
Copy link
Member

ladipro commented Apr 18, 2024

I believe the test gives misleading results because classes are always aligned at a pointer-size boundary. A class with two 2-byte integers will be padded with 4 extra bytes, making it use the same amount of space as a class with two 4-byte integers. If I tweak the test to use four fields instead of two, the sizes come out different.

@ladipro
Copy link
Member

ladipro commented Apr 18, 2024

Inspecting the x64 NetFx MSBuild with SOS supports it:

The highlighted bit of your screenshot shows the string name field is two bytes, which looks suspicious to me.

I believe this output uses hex numbers.

@drewnoakes
Copy link
Member Author

Here's a diff between field positions that shows four bytes offset between consecutive ushort fields.

Oh I used the wrong type here. Changing it to C2 works correctly.

@drewnoakes
Copy link
Member Author

Thank you for double checking. I've closed the original issue.

There are some changes here to reduce CPU as well which you might consider, given these seem to be highly used types/methods. I'll update the PR tomorrow. Let me know if any of the other changes seem problematic, and I'll back them out too.

@drewnoakes drewnoakes marked this pull request as draft April 18, 2024 10:25
@drewnoakes drewnoakes changed the title Fix optimisation in SmallElementLocation ElementLocation optimisations Apr 18, 2024
@ladipro
Copy link
Member

ladipro commented Apr 18, 2024

There's actually still a reasonable opportunity to reduce the size of this class on 64-bit platforms. The two 2-byte integers end up being padded to 8 bytes so it's no better than two 4-byte integers, as you found out. If we switched file from a reference (8 bytes) to a 4-byte index into a "file table", the size of the class would be reduced from 32 to 24 bytes (including the method table pointer and syncblock).

We could go further. Instead of a class this could be a struct with a size of 8 bytes (4 byte for file index, 4 bytes for line/column), allocated from a special pool. What would then be passed around instead of a reference to the class would be an index of the struct in the pool. But this would be quite a pervasive change, not sure if it would be worth the churn.

src/Build/ElementLocation/ElementLocation.cs Outdated Show resolved Hide resolved
@danmoseley
Copy link
Member

(Sometimes felt like the MSBuild engine is basically string manipulation, file IO, map lookups, and caches...) These could certainly be cached, but then you have interning and cache invalidation to deal with, which has been challenging in several places, such as the XML cache and the original string interning cache.

@drewnoakes
Copy link
Member Author

Pulling the 8-byte string reference out makes some sense. I wondered if we could find a way to remove it completely, if the locations are always used within the context of a document, or an element that can reach a document.

Converting to a struct is also possible, though we'd need to allocate the maximum size for line/column, as the small/regular approach used here relies on polymorphism which doesn't work with structs.

In both cases, I assumed we couldn't really consider them as possibilities given this is public API. If there's some leeway there then we can revisit.

The CLR does in fact pack these fields adjacent to one another, so we don't have to do this in code ourselves.
@drewnoakes drewnoakes marked this pull request as ready for review April 19, 2024 05:07
@ladipro
Copy link
Member

ladipro commented Apr 19, 2024

Ah, that's true, they're public. Definitely limits the options but "compressing" the file reference should still be possible.

@drewnoakes
Copy link
Member Author

If the string cache is a global singleton then yes it'd be straightforward. Would we consider such a cache? I think it'd be per-process, grow-only, with no eviction. The assumption being that there's a limited number of files involved in build operations.

If there's interest, I'll push an update to explore the idea.

@ladipro
Copy link
Member

ladipro commented Apr 19, 2024

I think it depends on the impact. I would probably first measure how much we can save relative to all MSBuild allocations.

@drewnoakes
Copy link
Member Author

dotnet new console temp
msbuild /getProperty:MyProperty temp.csproj

image

For evaluation of an empty console app, it looks like SmallElementLocation is just over 2% of allocated bytes.

We believe SmallElementLocation objects are currently 24 bytes each:

  • 8 header
  • 8 string reference
  • 4 line
  • 4 column

Instead, we could have 16 bytes to cover almost everything:

  • Small
    • 8 header
    • 2 bytes index
    • 2 column
    • 4 line

So roughly an 0.7% reduction for evalution. But please check my numbers!

@ladipro
Copy link
Member

ladipro commented Apr 19, 2024

Thank you. The overhead of being an object is 16 bytes on 64-bit so I think it would go from 32 to 24 bytes actually taken up on the GC heap but I guess the delta is the important number and that's still -8 bytes.

For evaluation, I suspect some of the allocations seen during command-line execution of MSBuild.exe are one-time initialization that wouldn't happen again on subsequent evaluations. So the relative reduction in e.g. VS scenarios would probably be slightly more. I'd say it's worth it - assuming that the implementation does not add too much complexity and doesn't slow down construction of these objects.

Adds new subtypes for `ElementLocation` that pack to multiples of eight bytes, to avoid wasting space at runtime on padding between instances of this class in memory.

The primary gain here comes from being able to use a smaller value for the `File` value. With this change, there's a lock-free cache of file paths which are then stored by index. When the index is small, as it usually will be, it can be packed for efficiently (e.g. in 2 bytes) than a string reference (8 bytes on 64-bit architectures).

See code comment for more details.

Also remove file IO from unit tests so they run faster.
// Line and column are good enough
return Line.GetHashCode() ^ Column.GetHashCode();
// We don't include the file path in the hash
return (Line * 397) ^ Column;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should be unchecked for perf and in very extreme cases also for correctness.

// When combinedValue is negative, it implies that either line or column were negative.
ErrorUtilities.VerifyThrow(combinedValue > -1, "Use zero for unknown");

// TODO store the last run's value and check if this is for the same file. If so, skip the dictionary lookup (tree walk).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if storing the last file in a thread-static variable wouldn't amortize the cost of full lookup/add enough that we could use a conventional data structure with a simple lock.

@rainersigwald rainersigwald added this to the VS 17.13 milestone Sep 11, 2024
@JanKrivanek
Copy link
Member

Relevant for #11160 - as this has potential to improve incremental eval perf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-Scenario-General This issue affects performance in general.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants