Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug Module did not become active #1041

Closed
erhankur opened this issue Apr 11, 2024 · 14 comments
Closed

Debug Module did not become active #1041

erhankur opened this issue Apr 11, 2024 · 14 comments

Comments

@erhankur
Copy link
Contributor

We have a new RISCV chip with a dual-core CPU. (ESP32-P4)

The first core(LP) is in sleep mode by default so it's out of topic for now. 2nd core(HP) has 2 harts.

2 cores are connected through a daisy chain scheme where one DTM controls the LP core and the other DTM controls the HP core.

What is the expected way of configuring 2 harts in the same DM as SMP targets? I tried the below config but it it gives an error.

transport select jtag

set _CHIPNAME_0 lp
set _CHIPNAME_1 hp

jtag newtap $_CHIPNAME_0 cpu -irlen 5 -expected-id 0x0012c25
jtag newtap $_CHIPNAME_1 cpu -irlen 5 -expected-id 0x0012c25

set _TAPNAME_0 $_CHIPNAME_0.cpu
set _TAPNAME_1 $_CHIPNAME_1.cpu

set _TARGETNAME_0 $_CHIPNAME_0.cpu
set _TARGETNAME_1 $_CHIPNAME_1.cpu0
set _TARGETNAME_2 $_CHIPNAME_1.cpu1

target create $_TARGETNAME_1 riscv -chain-position $_TAPNAME_1 -coreid 0 -rtos FreeRTOS
target create $_TARGETNAME_2 riscv -chain-position $_TAPNAME_1 -coreid 1 -rtos FreeRTOS
target smp $_TARGETNAME_1 $_TARGETNAME_2
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 10000 kHz
Info : JTAG tap: lp.cpu tap/device found: 0x00012c25 (mfg: 0x612 (Espressif Systems), part: 0x0012, ver: 0x0)
Info : JTAG tap: hp.cpu tap/device found: 0x00012c25 (mfg: 0x612 (Espressif Systems), part: 0x0012, ver: 0x0)
Info : [hp.cpu0] dbgbase=0x0 coreid:0
Info : [hp.cpu0] dtmcontrol=0x70a1
Info : [hp.cpu0] Searching for DM with DMI base address (dbgbase) = 0x0
Info : [hp.cpu0] datacount=1 progbufsize=2
Info : [hp.cpu0] Detected 2 harts.
Info : [hp.cpu0]  XLEN=32, misa=0x40901125
Info : [hp.cpu0] Core 0 made part of halt group 1.
Info : [hp.cpu0] Examined RISC-V core; found 2 harts
Info : [hp.cpu0]  XLEN=32, misa=0x40901125
Info : [hp.cpu0] Examination succeed
Info : [hp.cpu1] dbgbase=0x0 coreid:1
Info : [hp.cpu1] dtmcontrol=0x70a1
Info : [hp.cpu1] Searching for DM with DMI base address (dbgbase) = 0x0
Error: [hp.cpu1] Debug Module did not become active. dmcontrol=0x0
Error: [hp.cpu1] Examination failed
Warn : target hp.cpu1 examination failed

In our fork, we have FreeRTOS SMP support and are mostly aligned with this repo.

log.txt

@TommyMurphyTM1234
Copy link
Collaborator

TommyMurphyTM1234 commented Apr 11, 2024

Does the stuff from Espressif not help?

Edit: ah - sorry - you ARE Espressif! I didn't realise... :-D

@erhankur
Copy link
Contributor Author

We have initial support for the chip, but I haven't been able to configure the SMP properly yet. Before diving in, I'd like to ensure that the riscv target already includes what we need.

@en-sc
Copy link
Collaborator

en-sc commented Apr 11, 2024

@erhankur, I'm a bit confused by the log.Seems like in the version you use logging from dump_field() are disabled on -d3.
(https://github.com/espressif/openocd-esp32/blob/master/src/target/riscv/riscv-013.c#L381-L383)
Please enable the logging in dump_field() and attach a new log (-d4 will be fine) -- without the info about the scans it is quite hard to be sure the issue is what I think it is.

For now It seems this is the same issue as #1024 (TLDR):

  • The first hart was successfully examined:
Info : 179 14 riscv-013.c:1929 examine(): [hp.cpu0] Debug Module @0x1211206a0
Debug: 180 14 riscv-013.c:495 dmi_scan(): [hp.cpu0] reset_delays_wait done
Debug: 181 14 riscv-013.c:466 increase_dmi_busy_delay(): [hp.cpu0] dtmcs_idle=7, dmi_busy_delay=1, ac_busy_delay=0
Debug: 182 14 riscv-013.c:453 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Debug: 183 15 riscv-013.c:466 increase_dmi_busy_delay(): [hp.cpu0] dtmcs_idle=7, dmi_busy_delay=2, ac_busy_delay=0
Debug: 184 15 riscv-013.c:453 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Info : 185 15 riscv-013.c:1934 examine(): [hp.cpu0] dm->was_reset
Debug: 186 15 riscv-013.c:5174 riscv013_invalidate_cached_progbuf(): [hp.cpu0] Invalidating progbuf cache
Info : 187 15 riscv-013.c:1958 examine(): [hp.cpu0] Debug Module dmcontrol=0x4010001

While examination of the second hart failed:

Info : 266 24 riscv-013.c:1929 examine(): [hp.cpu1] Debug Module @0x1211206a0
Debug: 267 24 riscv-013.c:5174 riscv013_invalidate_cached_progbuf(): [hp.cpu1] Invalidating progbuf cache
Debug: 268 24 riscv-013.c:495 dmi_scan(): [hp.cpu1] reset_delays_wait done
Debug: 269 24 riscv-013.c:466 increase_dmi_busy_delay(): [hp.cpu1] dtmcs_idle=7, dmi_busy_delay=1, ac_busy_delay=0
Debug: 270 24 riscv-013.c:453 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Debug: 271 24 riscv-013.c:466 increase_dmi_busy_delay(): [hp.cpu1] dtmcs_idle=7, dmi_busy_delay=2, ac_busy_delay=0
Debug: 272 24 riscv-013.c:453 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Error: 273 24 riscv-013.c:1954 examine(): [hp.cpu1] Debug Module did not become active. dmcontrol=0x0

It is hard to verify the issue is indeed the same without knowing on which exact scan reset_delays_wait done occurred.

Also, to verify the issue is with the write to dtmcs.dmireset, you can try removing the call to riscv reset_delays as a temporary solution.
If you use riscv-tests/debug it is done here:
https://github.com/riscv-software-src/riscv-tests/blob/fd01e46314636666b5bed1f08465f734751a4cda/debug/testlib.py#L837

@erhankur
Copy link
Contributor Author

@en-sc thanks for the answer. Here is the -d4 log

logd4.txt

@en-sc
Copy link
Collaborator

en-sc commented Apr 11, 2024

Edit: There is an error. Please see #1041 (comment)

Thanks!

This is not the same as #1024.

Seems like an OpenOCD bug:

Debug: 11161 48 riscv-013.c:453 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Debug: 11225 48 riscv-013.c:395 dump_field(): 44b - 00000000 @10 -> + 00000000 @10; 2i
Debug: 11226 48 riscv-013.c:409 dump_field(): read: dmcontrol=0 {}
Debug: 11325 49 riscv-013.c:395 dump_field(): 44b w 00000001 @10 -> + 00000000 @10; 2i
Debug: 11326 49 riscv-013.c:403 dump_field(): write: dmcontrol=1 {dmactive=active}
Debug: 11327 49 riscv-013.c:409 dump_field(): read: dmcontrol=0 {}
...
Debug: 11391 49 riscv-013.c:395 dump_field(): 44b - 00000000 @10 -> + 00000001 @10; 2i
Debug: 11392 49 riscv-013.c:409 dump_field(): read: dmcontrol=1 {dmactive=active}
Error: 11393 49 riscv-013.c:1954 examine(): [hp.cpu1] Debug Module did not become active. dmcontrol=0x0

We have successfully read dmcontrol.dmactive=active but report that Debug Module did not become active.

I have refactored the relevant code recently. Can you try back-porting the commits from #1023 and checking if the issue is still present?

@erhankur
Copy link
Contributor Author

Cherry-picked 3 commits


34d6fe367 (HEAD -> en-sc/check-ac-busy) target/riscv: check `abstractcs.busy`
8319eee9e target/riscv: introduce `examine_dm()` function
67b2a5a95 target/riscv: cache `abstractcs.busy` in `dm013_info_t`

Result is different now.

Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 1000 kHz
Info : JTAG tap: lp.cpu tap/device found: 0x00012c25 (mfg: 0x612 (Espressif Systems), part: 0x0012, ver: 0x0)
Info : JTAG tap: hp.cpu tap/device found: 0x00012c25 (mfg: 0x612 (Espressif Systems), part: 0x0012, ver: 0x0)
Info : [hp.cpu0] datacount=1 progbufsize=2
Info : [hp.cpu0] Core 0 made part of halt group 1.
Info : [hp.cpu0] Examined RISC-V core; found 1 harts
Info : [hp.cpu0]  XLEN=32, misa=0x40901125
Info : [hp.cpu0] Examination succeed
Error: OpenOCD only supports Debug Module version 2 (0.13) and 3 (1.0), not 0 (dmstatus=0x0). This error might be caused by a JTAG signal issue. Try reducing the JTAG clock speed.
Error: [hp.cpu1] Examination failed
Warn : target hp.cpu1 examination failed

logd4-2.txt

@en-sc
Copy link
Collaborator

en-sc commented Apr 11, 2024

@erhankur, sorry. I've made an error in the previous analysis.

I'm almost certain this is the same HW issue as #1024.

What was wrong in the first analysis:

Going step by step through the log snippet:

  • Write all ones to hrtsello & hartselhi to determine hatsellen:
Debug: 10618 46 riscv-013.c:395 dump_field(): 44b w 07ffffc1 @10 -> + 00000000 @32; 0i
Debug: 10619 46 riscv-013.c:403 dump_field(): write: dmcontrol=0x7ffffc1 {dmactive=active hartsello=0x3ff hasel=multiple hartselhi=0x3ff}
Debug: 10620 46 riscv-013.c:409 dump_field(): read:
Debug: 10676 47 riscv-013.c:395 dump_field(): 44b - 00000000 @10 -> b 07ffffc1 @10; 0i
  • Returns busy. Increase delays.
Debug: 10677 47 riscv-013.c:466 increase_dmi_busy_delay(): [hp.cpu1] dtmcs_idle=7, dmi_busy_delay=1, ac_busy_delay=0
Debug: 10804 47 riscv-013.c:453 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Debug: 10868 47 riscv-013.c:395 dump_field(): 44b - 00000000 @10 -> + 00000000 @10; 1i
Debug: 10869 47 riscv-013.c:409 dump_field(): read: dmcontrol=0 {}
  • Returns success. Now read back:
Debug: 10968 48 riscv-013.c:395 dump_field(): 44b r 00000000 @10 -> + 00000000 @10; 1i
Debug: 10969 48 riscv-013.c:409 dump_field(): read: dmcontrol=0 {}
Debug: 11033 48 riscv-013.c:395 dump_field(): 44b - 00000000 @10 -> b 00000000 @10; 1i
  • Returns busy. Increase delays.
Debug: 11034 48 riscv-013.c:466 increase_dmi_busy_delay(): [hp.cpu1] dtmcs_idle=7, dmi_busy_delay=2, ac_busy_delay=0
Debug: 11161 48 riscv-013.c:453 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Debug: 11225 48 riscv-013.c:395 dump_field(): 44b - 00000000 @10 -> + 00000000 @10; 2i
Debug: 11226 48 riscv-013.c:409 dump_field(): read: dmcontrol=0 {}
  • Returns success. dmcontrol is zero -- HW issue. Seems like the read was aborted by dtmcs.dmireset.
  • OpenOCD writes dmactive to dmcontrol here. This is meaningless. An error could have been reported earlier and is reported earlier after the refactor in such cases.
Debug: 11325 49 riscv-013.c:395 dump_field(): 44b w 00000001 @10 -> + 00000000 @10; 2i
Debug: 11326 49 riscv-013.c:403 dump_field(): write: dmcontrol=1 {dmactive=active}
Debug: 11327 49 riscv-013.c:409 dump_field(): read: dmcontrol=0 {}
Debug: 11391 49 riscv-013.c:395 dump_field(): 44b - 00000000 @10 -> + 00000001 @10; 2i
Debug: 11392 49 riscv-013.c:409 dump_field(): read: dmcontrol=1 {dmactive=active}
Error: 11393 49 riscv-013.c:1954 examine(): [hp.cpu1] Debug Module did not become active. dmcontrol=0x0
  • And here is the point where I made an error in the previous analysis. Current operation is a write. So the value returned by the scan is meaningless. But decoding logs read: dmcontrol=1 {dmactive=active}. Here read does not mean the value is read from DM. It means this is the value OpenOCD got from DMI (and that value is garbage in case dmi.op was write on the previous scan). This is exactly how I made the mistake in Debug Module did not become active #1041 (comment).

@en-sc
Copy link
Collaborator

en-sc commented Apr 11, 2024

HW issue being at fault is also confirmed by the second log after cherry-pick.

Here, examination of the DM is skipped the second time, since OpenOCD already learned hartsellen and so on by examining the DM during examination of the first hart.

  • So now we select the hart (write dmcontrol.hartsello to 1):
Debug: 10114 44 riscv-013.c:400 dump_field(): 44b w 00010001 @10 -> + 00000000 @32; 0i
Debug: 10115 44 riscv-013.c:408 dump_field(): write: dmcontrol=0x10001 {dmactive=active hartsello=1}
Debug: 10116 44 riscv-013.c:414 dump_field(): read:
Debug: 10172 45 riscv-013.c:400 dump_field(): 44b - 00000000 @10 -> b 00010001 @10; 0i
  • Returned busy. Increase delays.
Debug: 10173 45 riscv-013.c:471 increase_dmi_busy_delay(): [hp.cpu1] dtmcs_idle=7, dmi_busy_delay=1, ac_busy_delay=0
Debug: 10300 45 riscv-013.c:458 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Debug: 10364 45 riscv-013.c:400 dump_field(): 44b - 00000000 @10 -> + 00000000 @10; 1i
Debug: 10365 45 riscv-013.c:414 dump_field(): read: dmcontrol=0 {}
  • Returned success. Now Invalidate some cache.
Debug: 10366 45 riscv-013.c:5321 riscv013_invalidate_cached_progbuf(): [hp.cpu1] Invalidating progbuf cache
  • And read dmstatus:
Debug: 10465 45 riscv-013.c:400 dump_field(): 44b r 00000000 @11 -> + 00000000 @10; 1i
Debug: 10466 45 riscv-013.c:414 dump_field(): read: dmcontrol=0 {}
Debug: 10530 46 riscv-013.c:400 dump_field(): 44b - 00000000 @11 -> b 00000000 @11; 1i
  • Returned busy. Increase delays.
Debug: 10531 46 riscv-013.c:471 increase_dmi_busy_delay(): [hp.cpu1] dtmcs_idle=7, dmi_busy_delay=2, ac_busy_delay=0
Debug: 10658 46 riscv-013.c:458 dtmcontrol_scan(): DTMCS: 0x10000 -> 0x7ca1
Debug: 10722 46 riscv-013.c:400 dump_field(): 44b - 00000000 @11 -> + 00000000 @11; 2i
Debug: 10723 46 riscv-013.c:414 dump_field(): read: dmstatus=0 {}
  • The same result for a different register. Now after dtmcs.dmireset was written read of dmcontrol was aborted (or at least this is how I see it), and dmstatus is 0 (before patches it was dmcontrol). So an error is returned:
Error: 10724 46 riscv-013.c:809 dmstatus_read_timeout(): OpenOCD only supports Debug Module version 2 (0.13) and 3 (1.0), not 0 (dmstatus=0x0). This error might be caused by a JTAG signal issue. Try reducing the JTAG clock speed.

@erhankur
Copy link
Contributor Author

@en-sc thanks a lot for the analyze. I will confirm the bug with the hw team.

@TommyMurphyTM1234
Copy link
Collaborator

  • The same result for a different register. Now after dtmcs.dmireset was written read of dmcontrol was aborted (or at least this is how I see it), and dmstatus is 0 (before patches it was dmcontrol). So an error is returned:

Just curious - do the two (or more?) RISC-V implementations that seem to have this issue have any sort of common heritage such as using the same upstream HDL implementation by any chance?

@erhankur
Copy link
Contributor Author

The bug was confirmed with the HW team. I am closing the issue. Thanks for helping.

@TommyMurphyTM1234
Copy link
Collaborator

The bug was confirmed with the HW team. I am closing the issue. Thanks for helping.

Thanks for the update @erhankur.

Are you at liberty to address the question below or to outline the origin of your RISC-V hardware implementation - e.g. if it's based on some public/open source HDL implementation? It might be useful to know if this problem, which has been reported separately several times, arises from some specific RISC-V implementation.

Just curious - do the two (or more?) RISC-V implementations that seem to have this issue have any sort of common heritage such as using the same upstream HDL implementation by any chance?

@erhankur
Copy link
Contributor Author

@TommyMurphyTM1234 I am sorry, I am not authorized to disclose this information.

@TommyMurphyTM1234
Copy link
Collaborator

@TommyMurphyTM1234 I am sorry, I am not authorized to disclose this information.

Ok, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants