Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting inputs is rather slow #8

Open
MichaelBell opened this issue Jul 8, 2024 · 9 comments
Open

Setting inputs is rather slow #8

MichaelBell opened this issue Jul 8, 2024 · 9 comments

Comments

@MichaelBell
Copy link
Contributor

MichaelBell commented Jul 8, 2024

This code

t_start = time.ticks_us()
for i in range(256):
    tt.input_byte = i
t_taken = time.ticks_us() - t_start
print(f"{t_taken/1000}ms = {t_taken/256}us per set")

takes 1.6 seconds to run - output at stock 125MHz RP2040 clock:

1606.953ms = 6277.16us per set

This seems extremely slow!

Using the Python gpio directly gets that down to 46ms:

inputs = [Pin(i, Pin.OUT) for i in (9, 10, 11, 12, 17, 18, 19, 20)]
t_start = time.ticks_us()
for i in range(256):
    for j in range(8):
        inputs[j].value(i & (1 << j))
t_taken = time.ticks_us() - t_start
print(f"{t_taken/1000}ms = {t_taken/256}us per set")
45.799ms = 178.9023us per set
@MichaelBell
Copy link
Contributor Author

This is horrible, but it's fast, and more importantly it's atomic, so all the pins change value together:

@micropython.native
def set_input_byte(val):
    val = ((val & 0xF) << 9) | ((val & 0xF0) << 17-4)
    val = (machine.mem32[0xd0000010] ^ val) & 0x1E1E00
    machine.mem32[0xd000001c] = val

@urish
Copy link
Member

urish commented Jul 8, 2024

This is horrible, but it's fast, and more importantly it's atomic, so all the pins change value together:

I wouldn't say this is horrible - it gets the job done, while providing a nice interface for the user.

One change that could potentially make it more "future-proof" would be to split it into two functions in a way that we could test that the pin numbers match the ui_in pins defined in the SDK code upon startup.

@psychogenic
Copy link
Collaborator

Yeah, it's horrible in the sense that it's hard-coded and difficult to understand.
But hard-coded is often the nature of optimization, c'est la vie. I'd like more comment to actually grok what's going on, but the current implementation was just a naive extension of all the pin magiks, so it's definitely suboptimal and I'll replace it.

@psychogenic
Copy link
Collaborator

Hey @MichaelBell: how is this working? I've replaced the default function with the @micropython.native stuff and I can't pass the boot-up stage (it fails to get the chip rom data etc). This is on a TT05 board, not sure how to debug it as it's pretty opaque.

@psychogenic
Copy link
Collaborator

Looks like pilot error on my part. Testing...

@psychogenic
Copy link
Collaborator

Well, it's an improvement, but I'm not clear on where the slowdown is... Sticking a straight call to the native stuff in the input_byte setter gives this:


def doit():
    t_start = time.ticks_us()
    for i in range(256):
        tt.input_byte = i
    t_taken = time.ticks_us() - t_start
    print(f"{t_taken/1000}ms = {t_taken/256}us per set")

>>> doit()
799.519ms = 3123.121us per set

So maybe twice as fast, but really far from sub 200us.

@psychogenic
Copy link
Collaborator

ok, solved that mystery -- it's the indirection on the property... calling direct into the pins is surprisingly good (?)

>>> def doitdirect():
...     t_start = time.ticks_us()
...     for i in range(256):
...         tt.pins.input_byte = i
...     t_taken = time.ticks_us() - t_start
...     print(f"{t_taken/1000}ms = {t_taken/256}us per set")
... 
>>> 
>>> doitdirect()
13.015ms = 50.83984us per set

@MichaelBell
Copy link
Contributor Author

Wow, that redirect is really slow! Maybe should document somewhere that tt.pins.xyz access is significantly faster?

And to explain that magic. It is just what the pico-sdk gpio_put_masked method does: https://github.com/raspberrypi/pico-sdk/blob/master/src/rp2_common/hardware_gpio/include/hardware/gpio.h#L729C20-L729C35

machine.mem32[0xd0000010] is sio_hw->gpio_out (the current value of the outputs), machine.mem32[0xd000001c] is sio_hw->gpio_togl, which is a register which toggles any output GPIOs that have a 1 bit in the mask.

The rest is just getting the values in the right place and XOR'ing and masking appropriately.

@psychogenic
Copy link
Collaborator

I put in your set_input and did similar for other ports. The implementations are in ttboard.util.platform. I renamed things to try and make it clear: if your are getting values, it returns only bits for GPIO that are inputs, if you are setting values it sets all the output bits (but won't impact input gpio unless they are switched to outs).
So the native functions are called read_* and write_* and no one has to care about them as they are accessed by the tt.*_byte property stuff under the hood.
Leaving open for now in case bugz discovered, but looking good so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants