Add block key masking #11

raphaelthegreat · 2024-03-13T13:46:52Z

Having so much state in the block key leads to cases where a block is compiled multiple times even though the actual emitted code is the same. Not every block uses all of the state provided by the key. To address this each time a block is compiled a mask of used registers is also emitted. This matches well with the switch to the lookup table as the mask is inherently location specific.

Block key state is accessed through methods that automatically mask the appropriate member, so this wont break in the future if we forget to manually mask during block generation. It leads to about a 5x reduction in compiled blocks (from 6000 to 1500 during first 10 seconds if 3D Land intro in Surround Sound mode). The max lookup iteration also dropped from 15 to 7 meaning the dispatcher is now faster too.

Initially I wanted to optimize the key matching using AVX2 intrinsics like so, which generates very good asm, but it seems to cause crashes most of the time launching which I'm not sure why

bool KeyCompare(const Key& lhs, const Key& rhs, u256 mask_lo, u256 mask_hi) {
    const u256* lhsp = (const u256*)&lhs;
    const u256* rhsp = (const u256*)&rhs;
    const u256 l = _mm256_cmpeq_epi16(_mm256_and_si256(lhsp[0], mask_lo), rhsp[0]);
    const u256 r = _mm256_cmpeq_epi16(_mm256_and_si256(lhsp[1], mask_hi), rhsp[1]);
    return (_mm256_movemask_epi8(l) & _mm256_movemask_epi8(r)) == std::numeric_limits<u32>::max();
}

wheremyfoodat · 2024-03-13T20:41:26Z

Can you post the generated code here? AVX/2 are more lax on memory alignment requirements but you can still get some crashes from that if the compiler decides to use vmovaps or similar

wheremyfoodat · 2024-03-13T20:46:29Z

src/jit_no_ir.h

+    };
+
+    union KeyMask {
+        std::array<u16, 32> words{};


This feels a bit scary and I'm not sure if it's 100% standard-legal, might be better to use a constructor?

raphaelthegreat · 2024-03-13T20:54:16Z

Here's a godbolt link with a demo https://godbolt.org/z/xP59G38bE

raphaelthegreat · 2024-03-13T20:55:21Z

Ignore the fact it has the entire robin map headers, wanted to compare against the map lookup

wheremyfoodat · 2024-03-13T21:05:15Z

I do not see anything wrong with that and it seems to run fine. Can you try marking the keys as alignas(32) maybe? Though the godbolt link doesn't have any alignment-sensitive instructions...

raphaelthegreat · 2024-03-13T21:10:08Z

Can you test locally on windows in case this is a linux specific issue?

wheremyfoodat · 2024-03-13T21:37:37Z

Works here, tested with several games

wheremyfoodat · 2024-03-13T21:41:00Z

Locally generated asm. I'd look out in case that vmovdqu is a vmovdqa on your setup

raphaelthegreat · 2024-03-13T21:54:36Z

Thanks for conforming, will checkout what asm this generates here for clang xD

Add block masking

b781af4

wheremyfoodat reviewed Mar 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add block key masking #11

Add block key masking #11

raphaelthegreat commented Mar 13, 2024 •

edited

Loading

wheremyfoodat commented Mar 13, 2024

wheremyfoodat Mar 13, 2024

raphaelthegreat commented Mar 13, 2024

raphaelthegreat commented Mar 13, 2024

wheremyfoodat commented Mar 13, 2024 •

edited

Loading

raphaelthegreat commented Mar 13, 2024

wheremyfoodat commented Mar 13, 2024

wheremyfoodat commented Mar 13, 2024 •

edited

Loading

raphaelthegreat commented Mar 13, 2024

Add block key masking #11

Are you sure you want to change the base?

Add block key masking #11

Conversation

raphaelthegreat commented Mar 13, 2024 • edited Loading

wheremyfoodat commented Mar 13, 2024

wheremyfoodat Mar 13, 2024

Choose a reason for hiding this comment

raphaelthegreat commented Mar 13, 2024

raphaelthegreat commented Mar 13, 2024

wheremyfoodat commented Mar 13, 2024 • edited Loading

raphaelthegreat commented Mar 13, 2024

wheremyfoodat commented Mar 13, 2024

wheremyfoodat commented Mar 13, 2024 • edited Loading

raphaelthegreat commented Mar 13, 2024

raphaelthegreat commented Mar 13, 2024 •

edited

Loading

wheremyfoodat commented Mar 13, 2024 •

edited

Loading

wheremyfoodat commented Mar 13, 2024 •

edited

Loading