Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 connection issues #38

Open
tbzatek opened this issue Apr 25, 2024 · 6 comments
Open

IPv6 connection issues #38

tbzatek opened this issue Apr 25, 2024 · 6 comments

Comments

@tbzatek
Copy link

tbzatek commented Apr 25, 2024

I'm having trouble getting discovery from (static) IPv6 working during the pre-OS phase. Tested with the current timberland_upstream-dev-full_nbft-population-fixes branch (#35). For my setup and the resulting NBFT table, please see linux-nvme/libnvme#821.

Taking the second Discovery Descriptor URI nvme+tcp://[4321::BBBB:1]:4420/ I have no problem reaching it from linux with networking corresponding with the HFI Descriptor records.

This looks like a timeout, noticed the EFI boot process being stuck for a minute or two, with qemu eating 100% CPU, eventually booting from the first (IPv4) boot attempt. Might be related to a lost Host address prefix as reported in #37.

@tbzatek tbzatek changed the title Discovery from IPv6 issues IPv6 connection issues Apr 26, 2024
@tbzatek
Copy link
Author

tbzatek commented Apr 26, 2024

This is not limited to discovery from IPv6, tested with a specific subsysnqn - same problem. Looks like a general EFI networking stack issue. Thankfully the failed boot attempt is still recorded as an SSNS record, marked unavailable and nvme-cli still connects fine:

Apr 26 14:30:28 localhost.localdomain kernel: nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.boot.poc:test-target", addr 192.168.122.1:4420
Apr 26 14:30:28 localhost.localdomain kernel: nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.boot.poc:test-target", addr [4321:0000:0000:0000:0000:0000:bbbb:0001]:4420
      {
        "index":5,
        "num_hfis":1,
        "hfis":[
          2
        ],
        "transport":"tcp",
        "traddr":"4321::bbbb:1",
        "trsvcid":"4420",
        "subsys_port_id":0,
        "nsid":0,
        "nid":"",
        "subsys_nqn":"nqn.2014-08.org.nvmexpress.boot.poc:test-target",
        "controller_id":0,
        "asqsz":0,
        "pdu_header_digest_required":0,
        "data_digest_required":0,
        "discovered":0,
        "unavailable":1
      }

@trevor-cockrell
Copy link

I believe this is due to a few things --

  • the tcp6 stack requires ip6 configuration of the network interface before it will start
  • The NVMeoF IPv6 mimicry of iSCSI's ipv6 does not actually initialize/configure the ip6 stack with any manual host configuration
    • this nuance is shared between the two drivers, yet iSCSI properly does not provide any HII configuration for IPv6 host settings -- iSCSI only support host IPv6 via DHCP6

At this time, you should be able to connect via IPv6 if you configure the NIC's IPv6 settings prior to attempting a target connection - set desired interface to manual and configure w/a valid IPv6 address+gateway OR configure as auto with a valid DHCP6 server set up.

I'm looking into a resolution that will allow IPv6 configuration from the nvmeof HII without needing to configure the interface outside of the nvmeof menu.

@tbzatek
Copy link
Author

tbzatek commented Jun 3, 2024

Thanks Trevor. I've made another attempt and configured IPv6 addresses for the second network interface first (Device Manager -> Network Device List -> IPv6 Network Configuration -> Host addresses and Route Table), however it seems to have no effect. Still couldn't get the initiatior working. Also tried with clean efivars.

@trevor-cockrell
Copy link

How are you setting up/providing NICs for your qemu invocation?
I had a lot of trouble with qemu's ip6 networking until I set up a TAP for qemu..
I think I also had to forward ipv6 via sysctl.

@tbzatek
Copy link
Author

tbzatek commented Jun 11, 2024

How are you setting up/providing NICs for your qemu invocation?

My qemu network is very simple:

--netdev bridge,id=net0,br=virbr0 --device virtio-net-pci,netdev=net0,mac=52:54:00:72:c5:ae
--netdev bridge,id=net1,br=virbr1 --device virtio-net-pci,netdev=net1,mac=52:54:00:72:c5:af

This way qemu defaults to creating tap interfaces, adding them to the target bridges. Each bridge on the host has its own address in an isolated subnet and kernel nvme target is bound to that. Sysctl net.ipv6.conf.all.forwarding is set to 1, no firewall enabled.

The thing is that IPv6 works fine in linux and even performs discovery from the NBFT discovery record just fine. Obviously network stacks are different, would be good to get some logs or diagnostics information from the UEFI side.

The /0 prefix size issue for the second HFI record still persists:

    "hfi":[
...
      {
        "index":2,
        "transport":"tcp",
        "pcidev":"0:0:4.0",
        "mac_addr":"52:54:00:72:c5:af",
        "vlan":0,
        "ip_origin":1,
        "ipaddr":"4321::bbbb:2",
        "subnet_mask_prefix":0,
        "gateway_ipaddr":"::",
        "route_metric":0,
        "primary_dns_ipaddr":"::",
        "secondary_dns_ipaddr":"::",
        "dhcp_server_ipaddr":"",
        "this_hfi_is_default_route":1,
        "dhcp_override":0
      }
    ],

@tbzatek
Copy link
Author

tbzatek commented Jun 21, 2024

Hmmm, after lots of (other) testing, this looks like an issue on the linux kernel target side (kernel 6.8.1). After resetting (clearing and setting up) the linux target, UEFI connections are immediate and successful. It is after the guest VM reboot when timeouts are observed. Same issue when powering off the VM and starting again. The nvmet keepalive timeout is set to 5 seconds and I can see the old connections expiring, still no luck even after waiting a while.

Needs to be retested against some other NVMe/TCP target.

Tested also kernel 6.9.6, no difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants