Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target/riscv: fix halt reason for targets that do not support hit bit on triggers #1056

Merged
merged 1 commit into from
Jun 4, 2024

Conversation

aap-sc
Copy link
Collaborator

@aap-sc aap-sc commented Apr 23, 2024

Before this patch the following behavior is observed on targets that do not support hit bit:

bp 0x80000004 4 hw
resume 0x80000000
riscv.cpu halted due to watchpoint

This happens because the current implementation relies on the presence of hit bit way too much. While working on this patch few defects in hit bit-based trigger detection were discovered, added appropriate TODOs.

@aap-sc
Copy link
Collaborator Author

aap-sc commented Apr 23, 2024

Related to #1029

@aap-sc
Copy link
Collaborator Author

aap-sc commented Apr 23, 2024

@JanMatCodasip , @MarekVCodasip could you take a look?

Copy link
Collaborator

@MarekVCodasip MarekVCodasip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch.

It looks good. I just suggest some further changes/improvements.

src/target/riscv/riscv.c Show resolved Hide resolved
src/target/riscv/riscv.c Outdated Show resolved Hide resolved
Comment on lines +1555 to +1563
/* FIXME: this function assumes that we have only one trigger that can
* have hit bit set. Debug spec allows hit bit to bit set if a trigger has
* matched but did not fire. Such targets will receive erroneous results.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these cases. I think we should clear all hit bits. Since we cannot know which trigger triggered the target halt first and is target dependent. Can you please add it to the FIXME? Or if you really like, you can implement it right away. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth. hit bits are cleared by OpenOCD already. I've checked this visually. by inspecting the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that the trigger hit bit is cleared on line 1615: https://github.com/riscv-collab/riscv-openocd/pull/1056/files#diff-b4aa16f9e42cb8f0934baa7c8e0ec9c70a369bef98b99b26ae2e896c8aa95d6aR1615

However, the loop is exited when the first hit is found, so subsequent hit bits will not be cleared.

We believe the algorithm should continue clearing the "hit" bits of all triggers.

Copy link
Collaborator Author

@aap-sc aap-sc May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that the trigger hit bit is cleared on line 1615

Actually, I was talking about a different case. When we resume the target, we clear all hit bits anyway, since we temporally disable triggers when single-step is performed. If needed I can provide the exact lines. So technically we are fine for now.

We believe the algorithm should continue clearing the "hit" bits of all triggers.

That being said, I agree that it would not hurt to continue the loop in this case. I'm just not sure if this should be part of a subsequent or this commit - will take a look once again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explanation. I missed the fact that the temporary disabling of watchpoints on resume or step clears the "hit" bits as a by-product.

I would still prefer the clearing of all hit bits to occur in riscv_trigger_detect_hit_bits(), because:

  • It is very unclear to the code reader that the clearing of "hit" occurs at a different place.
  • We could make debug prints when more watchpoints have been hit - for easier troubleshooting.
  • The disabling of triggers before step/resume is somewhat questionable because the debugger (GDB, LLDB) is attempting the same operation, so we are often duplicate in OpenOCD what the debugger is already attempting. For that reason, the disabling of triggers on step/resume may one day change - in which case we would lose the "hit" bit clearing (perhaps without even noticing).

In any case, this can be done in a separate PR.

Copy link
Collaborator

@JanMatCodasip JanMatCodasip Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that clearing of hit bits likely should happen in a dedicated function.

I like that proposal. Agreed.

JanMatCodasip: The disabling of triggers before step/resume is somewhat questionable (...)

aap-sc: It's not that questionable, actually. It is a matter of usability/convenience. (...)

I agree that for convenience of TCL users, clearing of the breakpoints is strongly preferred. Side question: Would you know what other targets in OpenOCD do on TCL resume or step commands? Do they disable the breakpoints for the user or not? I wonder if the targets behave consistently in this regard. I haven't researched this yet.

What I was concerned (and considered questionable) is that riscv_openocd_step_impl() conditionally disables the current breakpoint (based on handle_breakpoints flag) but unconditionally disables all triggers - that doesn't seem consistent, and looks suspicious at the very least. For that reason, future changes in this part of the code are likely. And so we should not rely on disable_triggers() to clear the hit bits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No need to block the merge request due to this open thread with broader discussion.)

Copy link
Collaborator Author

@aap-sc aap-sc Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you know what other targets in OpenOCD do on TCL resume or step commands? Do they disable the breakpoints for the user or not?

AFAIK most of the targets do that. Last time I checked (at the time I've created #741). These were: arm, aarch64, mips, stm - all the major ones.

I don't see that other targets do any special processing with respect to trigger-like objects (maybe they don't need it? more on that below). Some targets have a code that handles masking of ISR, though.

You can search it like this:

git grep handle_breakpoints | grep if | awk -F: '{ print $1 }' | sort | uniq

What I was concerned (and considered questionable) is that riscv_openocd_step_impl() conditionally disables the current breakpoint (based on handle_breakpoints flag) but unconditionally disables all triggers - that doesn't seem consistent, and looks suspicious at the very least.

The reason is quite simple, I guess. For triggers you can't figure out which instruction is affected . I guess this is the reason. Whether or not other target require such behavior for trigger-like entities is a separate question, since it heavily depends on how single-step is implemented by hw/architecture (like target architecture can explicitly state that watch points are not fired during hw-assisted single step if a certain bit is set). It should be noted that the code that disabled triggers for RISC-V openocd was there for a while and I always took it for granted.

To summarize:

  1. I think we can conclude that the removal of SW breakpoint at the current address during step/resume is reasonable. Other targets do that too.
  2. As for other types of triggers... Now that you've mentioned this I kind of share the concern. I think we should at least investigate the behavior for other targets. And I think it would not hurt to ask in OpenOCD mailing list regarding the expected semantics of 'step/resume' for the case of watchpoints. I'll create a separate OS issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reply and I agree with your conclusions/summary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket for further investigations: #1079

src/target/riscv/riscv.c Outdated Show resolved Hide resolved
src/target/riscv/riscv.c Show resolved Hide resolved
src/target/riscv/riscv.c Show resolved Hide resolved
@aap-sc aap-sc force-pushed the aap-sc/no_hit_bit_status branch 2 times, most recently from eff47de to 02597d8 Compare May 10, 2024 08:26
@aap-sc aap-sc requested a review from MarekVCodasip May 10, 2024 08:27
@aap-sc
Copy link
Collaborator Author

aap-sc commented May 10, 2024

@MarekVCodasip could you take a look once again?

src/target/riscv/riscv.c Outdated Show resolved Hide resolved
Comment on lines +1555 to +1563
/* FIXME: this function assumes that we have only one trigger that can
* have hit bit set. Debug spec allows hit bit to bit set if a trigger has
* matched but did not fire. Such targets will receive erroneous results.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that the trigger hit bit is cleared on line 1615: https://github.com/riscv-collab/riscv-openocd/pull/1056/files#diff-b4aa16f9e42cb8f0934baa7c8e0ec9c70a369bef98b99b26ae2e896c8aa95d6aR1615

However, the loop is exited when the first hit is found, so subsequent hit bits will not be cleared.

We believe the algorithm should continue clearing the "hit" bits of all triggers.

src/target/riscv/riscv.c Show resolved Hide resolved
@aap-sc
Copy link
Collaborator Author

aap-sc commented May 13, 2024

@JanMatCodasip for whatever reason my replies are not displayed properly, so I'll re-post them here.

Regarding to your latest comments:

I can see that the trigger hit bit is cleared on line 1615:

Actually, I was talking about a different case. When we resume the target, we clear all hit bits anyway, since we temporally disable triggers when single-step is performed. If needed I can provide the exact lines. So technically we are fine for now.

We believe the algorithm should continue clearing the "hit" bits of all triggers.

That being said, I agree that it would not hurt to continue the loop in this case. I'm just not sure if this should be part of a subsequent or this commit - will take a look once again.

The fact that OpenOCD users went great lengths to implement this mechanism shows that it was important to them.

Well, yeah. Probably there are cases when it is useful. Probably, my opinion is way too strong :).

The mechanism need not be perfect to help users.

Yes, there are cases when it is terrible (see below). So I want to have an option to avoid these memory reads to reduce invasiveness. For example if we debug ONLY with OpenOCD without involvement of GDB we don't really need this functionality.

Reading memory -- one of the fundamental debugger operations -- should work reliably.

The thing is that it may not not work reliably in all cases. For example: I see design that is capable to hang if there is something wrong with the system bus. For example, the underlying device does not send a response after memory reference. We must not forget that people rarely use debugging with OpenOCD when everything is fine. One uses a debugger when something is (terribly) wrong. Reading memory may be considered way to intrusive for some scenarios.

JanMatCodasip
JanMatCodasip previously approved these changes May 14, 2024
… on triggers

Before this patch the following behavior is observed on targets that do
not support hit bit:

```
bp 0x80000004 4 hw
resume 0x80000000
riscv.cpu halted due to watchpoint
```

This happens because the current implementation relies on the presence
of hit bit way too much. While working on this patch few defects in
hit bit-based trigger detection were discovered, added appropriate
TODOs.
@aap-sc
Copy link
Collaborator Author

aap-sc commented May 30, 2024

@MarekVCodasip there is a pending change request from your side... @JanMatCodasip any chance you have capability to double-check if we should wait for Marek's reply or we can land the patch as is?

@JanMatCodasip
Copy link
Collaborator

MarekVCodasip there is a pending change request from your side... JanMatCodasip any chance you have capability to double-check if we should wait for Marek's reply or we can land the patch as is?

@aap-sc I have just checked with Marek and he does not have any further comments. Please feel free to go on with the merge. Thanks.

@aap-sc aap-sc merged commit b548653 into riscv-collab:riscv Jun 4, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants