-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expose SRIOV information #230
Conversation
Example output, PF device:
|
Example output, VF device:
|
README.md
Outdated
*This API is PROVISIONAL! ghw will try hard to not make breaking changes to this API, but still users are advice this new API is not | ||
declared stable yet. We expect to declare it stable with ghw version 1.0.0* | ||
|
||
SRIOV (Single-Root Input/Output Virtualization) is a class of PCI devices that ghw treats explicitely, like gpus; but unlike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/treats/handles/
7e97f20
to
5e67349
Compare
We are converging about which information we should expose about SRIOV devices. The current set of attributes represents what other established components, for example the k8s SRIOV operator exposes . Please note that in case of ghw, the same amount of data is split between the The biggest problem however is how to represent the SRIOV devices proper. Should SRIOV be a subpackage of the |
bbe0fb9
to
8b33ae8
Compare
Had a chat with @jaypipes and the my takeaways are:
|
8b33ae8
to
8cf64a0
Compare
2f4aee4
to
b146139
Compare
ok, test failure is unexpected. I'll have a look ASAP. |
weird, can't reproduce inside a xenial container:
|
same with go1.14.15 (yep, copypasta here #230 (comment)) |
9b6c782
to
f88d4a5
Compare
Interesting. We had apparently-random CI failures on xenial only.
Now I wonder if the default behaviour of |
Should the pcidb pkg fail to find a local pci database, it optionally fetches an updated copy from the network. Even though this is unlikely to happen, and even though this is a nice feature to have, this is enabled by default and can lead to hard-to-debug issues. we seen this first with failing CI on xenial in travis (jaypipes#230 (comment)) It's possible the ultimate cause is out of date test VM, or any CI failure; but it is also true this a symptom of a deeper issue: test environments should be tightly controlled in order to be reproducible. Fetching data from the network, even though from a safe and trusted location, voids this assumption. So we need to force this off in the test path. Unfortunately this patch has its own drawback: it changes the pcidb default behaviour, so test code and production code have different preferred code paths. To really fix this behaviour completely we should probably file a pcidb issue (+ PR) to change its defaults. Signed-off-by: Francesco Romani <fromani@redhat.com>
0f62884
to
62a092d
Compare
Well this still have some merits, turns out it was in turn a symptom. The real issue -with is fix- is captured in 62a092d |
080eb0b
to
1b41001
Compare
1b41001
to
1e9aa06
Compare
ea52d44
to
4715e44
Compare
4715e44
to
cf6f510
Compare
back to WIP: need to finish the rebase, and I'd like to land also #281 before to finish this PR. |
cf6f510
to
40ad1e1
Compare
All the deps of this PR have been merged! removing the WIP status |
40ad1e1
to
ce233e2
Compare
@jaypipes at last! all the dependencies of this PR have been merged, CI is green and it's ready for a new review round! |
Fixes: #92 |
Example output (long):
|
ce233e2
to
901f511
Compare
Add support to report SRIOV devices. Much like GPU devices, support is added using a new top-level package. SRIOV devices are either Physical Functions or Virtual functions. The preferred representation for ghw is Physical Functions, whose dependent devices will be Virtual Functions; however, for the sake of practicality, the API also exposes soft references to Virtual Functions, so consumers of the API can access them directly and not navigating the parent devices. This patch also adds support in `ghwc`, to report the sriov information, and in the `snapshot` package, to make sure to capture all the files in sysfs that ghw cares about. Last but not least, lacking access to suitable non-linux systems, support is provided only on linux OS, even though the API tries hard not to be linux-specific. Resolves: jaypipes#92 Signed-off-by: Francesco Romani <fromani@redhat.com>
901f511
to
9058f61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fromanirh, the more I look at this patch, the more I think these changes belong in the pkg/pci
package and not as a new pkg/sriov
package. The fact is, SR-IOV is specific to PCI Express and therefore should really be a set of additional attributes on the existing pkg/pci.Device
struct.
See inline for more explanation.
@@ -1115,6 +1115,67 @@ information | |||
`ghw.TopologyNode` struct if you'd like to dig deeper into the NUMA/topology | |||
subsystem | |||
|
|||
### SRIOV | |||
|
|||
*This API is PROVISIONAL! ghw will try hard to not make breaking changes to this API, but still users are advice this new API is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend removing this. :) The interfaces we publish are specific to a Git tag or SHA1 per Go package versioning and semver behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem! will remove.
"virtfn*", | ||
} | ||
|
||
ignoreSet := newKeySet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to create a new keySet
typedef. We can just use map[string]bool
here and where you're doing the ignoreSet.Contains
call below, just do:
if _, ignored := ignoreSet[globbedEntry]; ignored {
continue
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing, will simplify
} | ||
} | ||
|
||
func findNetworks(ctx *context.Context, devPath string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are they network names or are they network interface names? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, they are actually network interface names. Will clarify the function name.
type Device struct { | ||
Interfaces []string `json:"interfaces"` | ||
// the PCI address where the SRIOV instance can be found | ||
Address *pciaddr.Address `json:"address"` | ||
PCI *pci.Device `json:"pci"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new pkg/sriov
package and pkg/sriov.Device
structI think we should just modify the
pkg/pci.Device` struct to read SR-IOV (and network interface) information when the PCI device is a PF or VF.
We should add a new Interface
struct to pkg/net
:
// Interface represents a physical or virtual network interface on the host
type Interface struct {
// Name is the name of the interface on the host
Name string `json:"name"`
}
We can then modify the pkg/pci.Device
struct to add the following:
...
import (
...
"github.com/jaypipes/ghw/pkg/net"
...
)
...
type Device struct {
...
// NetworkInterfaces is the collection of network interfaces housed on this
// PCI device.
NetworkInterfaces []net.Interface `json:"network_interfaces,omitempty"`
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting idea. I'd like to process the concept a bit more to fully grok it but doesn't look bad at all. However this also mean that we aim to absorb all the PCI device type (= gpu) into the PCI package in the long run?
Or it's just SRIOV which is not special enough? (again no strong feeling, but wondering about the general direction)
} | ||
|
||
type PhysicalFunction struct { | ||
Device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here, we'd inherit from pkg/pci.Device
instead.
} | ||
|
||
type VirtualFunction struct { | ||
Device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
|
||
type VirtualFunction struct { | ||
Device | ||
ID int `json:"id"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the index of the VF within the PF? If so, should we call this Index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it should be called probably Index
and I should clarify it with a code comment.
// Address of the (parent) Physical Function this Virtual Function pertains to. | ||
ParentAddress *pciaddr.Address `json:"parent_address,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just store a pointer to the parent pkg/pci.Device
struct instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, could we just use a single Function
struct to describe all of the SR-IOV functions, both physical and virtual?
// Function describes an SR-IOV physical or virtual function. Physical functions
// will have no Parent Function struct pointer and will have one or more Function
// structs in the Functions field.
type Function struct {
Device // All Functions are PCI Devices
// Parent contains a pointer to the parent physical function.
// Will be empty when this is a physical function
Parent *Function `json:"parent,omitempty"`
// MaxVFs contains the maximum number of supported virtual
// functions for this physical function
MaxVFs int `json:"max_vfs"
// Functions contains the physical function's virtual functions
Functions []*Function `json:"functions"`
}
You could create a simple IsPhysical
utility method on pkg/pci.Function
:
// IsPhysical returns true if the PCIe function is a physical function, false
// if it is a virtual function
func (f *Function) IsPhysical() bool {
return f.Parent == nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting angle, I don't have again strong feelings about this idea. With respect the current approach, we lose something, we gain something. I'll file a separate PR so we can compare the two approaches, and I'll close the one we like less (with rationale of course).
sorry for the delay, caused by winter holidays plus crazy january. I'll resume work here ASAP, let's try this variation! |
(at LONG last I finally got some time to resume working on this PR!) Again, I don't have strong feelings here, but this (perceived?) inconsistency makes me think. |
at long last, the alternate implementation is available: #315 |
we agreed to move forward with #315 |
Expose informations about SRIOV devices. After some design iterations, we fixed these goals:
There are few more noteworthy items in this PR:
Fixes: #92
Signed-off-by: Francesco Romani fromani@redhat.com