Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add a new function `nvme_mi_control` to perform a Control Primitive command. This function is used to control the service state of the NVMe device. The Control Primitive command is primarily used to resolve the issue where the NVMe Servicing State cannot return to the Idle state. Why the Control Primitive command is needed: For example, when NVMe transmits messages to the Management Controller (like BMC, hereinafter referred to as BMC), the state changes to `Transmit`. Some NVMe remains in this state until the message transmission completes, after which it reverts to `Idle`. In cases where multiple messages are needed to assemble a complete MI message, such as the admin Identify message with an MTU of 64, receiving 4096 bytes results in approximately 60-70 packets. If the BMC stops receiving halfway through, NVMe remains in the `Transmit` state without transitioning to `Idle`, preventing further communication. now we can use the Control Primitive command to aborts the transmission and return to the `Idle` state. Regarding the scenario where BMC stops receiving halfway: 1. BMC Reboot. 2. BMC uses commands like Ctrl-C during the read process to actively abort. 3. ... (other unexpected scenarios) TL;DR: A incomplete command may cause the NVMe device to be stuck in `Transmit`, need a way to abort the command and return to `Idle`. Details See `Out-of-Band Message Servicing Model`, `Figure 34: Command Servicing State Diagram`` For some test using the example below: 1. ~# while true; do mi-mctp 1 20 identify 0; done 2. ~# use `Ctrl+C` break the command. (ps: my nvme under i2c mux, the mux will be switch to another channel by other command, Causing an error is a high-probability event) 3. Check the nvme state (command in next commit), always in Transmit ~# mi-mctp 1 20 control-primitive get-state ``` NVMe control primitive Get State : cspr is 0x840b Slot Command Servicing State: Transmit ... Pause Flag: Yes ``` 4. send the identify command again, no response ~# mi-mctp 1 20 identify 0 ~# mi-mctp: can't perform Admin Identify command: Connection timed out 5. use the new function to abort the command ~# mi-mctp 1 20 control-primitive abort 6. the nvme state return to Idle, and the identify command can be executed again. Signed-off-by: Jian Zhang <zhangjian.3032@bytedance.com>
- Loading branch information