This report follows the structure of the revised agreement.
Make requested revisions to Paolo's upstream patches:
- COMPLETE.
- no negative feedback on first patch (optimizing small load/store);
- more changes required by reviewers for second patch (use of target vectorized memcpy); and
- it is not clear the second patch will ever be accepted.
SiFive benchmarks:
- COMPLETE.
- 2 benchmarks fail with SIGILL for VLEN=1024 and LMUL=8; and
- these benchmarks will be tested on BananaPi by the Embecosm team.
Generate TCG Ops for vector load/store
- STARTED
- design work.
Improve first-fault handling for vector load/store helper functions.
- NOT STARTED.
- work is affected by Max's version 6 patch.
Improve strided load/store helper functions.
- NOT STARTED.
Reports attached. Ranges for speedup are VLEN=128/LMUL=1 and VLEN=1024/LMUL=8. Commits in the rise-rvv-tcg-qemu repository:
- master: f15f7273ea
- max-v6-20241106: 07d7bd62b2
- embecosm-max-v6-20241106: 7e9817d530
Max V6 patch v. upstream master:
- no impact: none
- improvement:
memchr
(1.4-1.5x),memcmp
(2-7x),memcpy
(3-130x),memmove
(3-130x),memset
(2.5-100x),strcat
(1.5x),strchr
(1.3-1.5x),strcmp
(1.2-1.5x),strcpy
(1.3-1.9x),strncat
(2.5-2.9x),strncpy
(3x),strnlen
(3.3-4.5x) - no data:
strlen
,strncmp
Embecosm pathches + Max V6 patch v. Max V6 patch.
- no impact:
memchr
,strcat
,strchr
,strcmp
, - improvement:
memcpy
(1.3x),memmove
(1.3x),memset
(3-2x),strcpy
(2-1.5x),strncat
(2x),strncpy
(2.5-3x),strnlen
(2.5-3x) - no data:
strlen
,strncmp
,memcmp