-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context: Fix directory leakage for snapshots contexts #236
Conversation
thanks for the PR. We indeed leak the unpacked snapshot directory. it seems we do only on |
This fix will expose a bug in a couple tests, we'll need to fix them first - I'll post a PR soon(ish). |
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense to introduce explicitly the concept of dependent context. We add a functio to create a context subordinate to another. The subordinate context will effectively borrow all the resources from the parent one. The parent is in charge to perform any setup/teardown (e.g. for snapshot, to learn which chroot should be used, if at all) and so forth. Making this dependency explicit allow the client code to manage correctly the lifecycle of the topmost context, hence of all the resources. Signed-off-by: Francesco Romani <fromani@redhat.com>
9840479
to
a084180
Compare
The test failures are expected (#236 (comment)) and fixed by b7b2bc6 |
pkg/context/context.go
Outdated
if ctx.SnapshotRoot != "" { | ||
// if the client code provided the unpack directory, | ||
// then it is also in charge of the cleanup. | ||
if len(ctx.snapshotUnpackedPath) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly curious, why just not
if ctx.snapshotUnpackedPath == "" {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of a personal preference, I don't like to use string literals. If golang would have had a strings.Empty constant and specify it is the default string value, it would be fine, otherwise it has a "smell" (using strings literals in code - for comparision).
The joke is on me since the integer 0 is also a literal.
You can also have a look on the 'tabs vs spaces' like (https://www.youtube.com/watch?v=SsoOG6ZeyUI) "check empty strings" google chat:
https://groups.google.com/g/golang-nuts/c/7Ks1iq2s7FA?pli=1
here is an interesting quote from Russ Cox extracted from the above discussion:
The one that makes the code clear. If I'm about to look at element x I typically write len(s) > x, even for x == 0, but if I care about "is it this specific string" I tend to write s == "". It's reasonable to assume that a mature compiler will compile len(s) == 0 and s == "" into the same, efficient code. Right now 6g etc do compile s == "" into a function call while len(s) == 0 is not, but that's been on my to-do list to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, makes sense. I don't have a strong opinion here so I don't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcel-apf my rule of thumb is to generally use the style of code already present in the codebase :) If you wouldn't mind using the == ""
style, I'd appreciate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Add a simple helper to find a device among the `pci.Info.Devices`. Up until now, client code had to reimplement the find function every time, or to call `GetDevice`. Problem is: `GetDevice` access the system every time. In most cases, besides a tiny performance hit, this is no issue. When consuming snapshots, however, this may lead to surprising behaviour because the caller need to ensure the snapshot is available when `GetDevice` is called, or it will return unpredictable result (!!!). The wrong code looks like that: 1. PCI info is created supplying snapshot data with automatic management 1.a ghw unpacks the snapshot on a temporary dir 1.b ghw loads all the info from the snapshot. info.Devices contains correct data 1.c ghw cleans up the snapshot (see: jaypipes#236) 2. the client (test) code calls GetDevice 2.a GetDevice will try to access the unpacked snapshot data, which is gone 2.b GetDevice will fail Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense to introduce explicitly the concept of dependent context. We add a functio to create a context subordinate to another. The subordinate context will effectively borrow all the resources from the parent one. The parent is in charge to perform any setup/teardown (e.g. for snapshot, to learn which chroot should be used, if at all) and so forth. Making this dependency explicit allow the client code to manage correctly the lifecycle of the topmost context, hence of all the resources. Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense to introduce explicitly the concept of dependent context. We add a functio to create a context subordinate to another. The subordinate context will effectively borrow all the resources from the parent one. The parent is in charge to perform any setup/teardown (e.g. for snapshot, to learn which chroot should be used, if at all) and so forth. Making this dependency explicit allow the client code to manage correctly the lifecycle of the topmost context, hence of all the resources. Signed-off-by: Francesco Romani <fromani@redhat.com>
Add a simple helper to find a device among the `pci.Info.Devices`. This is needed because `GetDevice` accesses the system every time. In most cases, besides a tiny performance hit, this is no issue. When consuming snapshots, however, this may lead to surprising behaviour because the caller need to ensure the snapshot is available when `GetDevice` is called, or it will return unpredictable result (!!!). The wrong code looks like that: 1. PCI info is created supplying snapshot data with automatic management 1.a ghw unpacks the snapshot on a temporary directory. 1.b ghw loads all the info from the snapshot. info.Devices contains correct data. 1.c ghw cleans up the snapshot (see: jaypipes#236) 2. the client (test) code calls GetDevice 2.a GetDevice will try to access the unpacked snapshot data, which is gone. 2.b GetDevice will fail. Signed-off-by: Francesco Romani <fromani@redhat.com>
a084180
to
12f738a
Compare
I think this is a good fix. Thanks for the updates! |
@@ -111,12 +115,10 @@ func (ctx *Context) Setup() error { | |||
// You should always call `Teardown` if you called `Setup` to free any resources | |||
// acquired by `Setup`. Check `Do` for more automated management. | |||
func (ctx *Context) Teardown() error { | |||
if ctx.SnapshotRoot != "" { | |||
// if the client code provided the unpack directory, | |||
// then it is also in charge of the cleanup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @marcel-apf, can you please put the above code comment back into the file? It helps to explain why we're doing this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/context/context.go
Outdated
if ctx.SnapshotRoot != "" { | ||
// if the client code provided the unpack directory, | ||
// then it is also in charge of the cleanup. | ||
if len(ctx.snapshotUnpackedPath) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcel-apf my rule of thumb is to generally use the style of code already present in the codebase :) If you wouldn't mind using the == ""
style, I'd appreciate that.
Fix an error preventing the auto-unpacked snapshot files to be deleted. Add a unit-test to expose the issue. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
12f738a
to
7dce933
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @marcel-apf! :)
update to the tip of ghe main branch to pull in the latest fixes, most notably to avoid to leak temporary directories when consuming snapshots: jaypipes/ghw#236 jaypipes/ghw#240 Signed-off-by: Francesco Romani <fromani@redhat.com>
update to the tip of ghe main branch to pull in the latest fixes, most notably to avoid to leak temporary directories when consuming snapshots: jaypipes/ghw#236 jaypipes/ghw#240 Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense to introduce explicitly the concept of dependent context. We add a functio to create a context subordinate to another. The subordinate context will effectively borrow all the resources from the parent one. The parent is in charge to perform any setup/teardown (e.g. for snapshot, to learn which chroot should be used, if at all) and so forth. Making this dependency explicit allow the client code to manage correctly the lifecycle of the topmost context, hence of all the resources. Signed-off-by: Francesco Romani <fromani@redhat.com>
update to the tip of ghe main branch to pull in the latest fixes, most notably to avoid to leak temporary directories when consuming snapshots: jaypipes/ghw#236 jaypipes/ghw#240 Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. We add a function to create a context subordinate to another. The subordinate context will effectively borrow all the resources from the parent one. The parent is in charge to perform any setup/teardown (e.g. for snapshot, to learn which chroot should be used, if at all) and so forth. Making this dependency explicit allow the client code to manage correctly the lifecycle of the topmost context, hence of all the resources. Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. We add a function to create a context subordinate to another. The subordinate context will effectively borrow all the resources from the parent one. The parent is in charge to perform any setup/teardown (e.g. for snapshot, to learn which chroot should be used, if at all) and so forth. Making this dependency explicit allow the client code to manage correctly the lifecycle of the topmost context, hence of all the resources. Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. To move forward and make the ownership more explicit, we add an internal reference count in context. This is used only to track the context setup/teardown process. When contexts are reused, we always have a parent module -which creates and owns the context- which passes its context to a subordinate module, because the parent wants to consume the service offered by the subordinate. The subordinate (possibly further nested, or more than one) should never attempt Setup() and Teardown(). These are responsabilities of the parent, outer module, which owns the context. It's impractical to conditionally call Setup() and Teardown(), so the functions internally use and check the reference count to do the right thing automatically. The following are are idempotent (and safe and quickly return success) - calling multiple times Setup() on a ready context - calling multiple times Teardown() on a un-ready context Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. To move forward and make the ownership more explicit, we add an internal reference count in context. This is used only to track the context setup/teardown process. When contexts are reused, we always have a parent module -which creates and owns the context- which passes its context to a subordinate module, because the parent wants to consume the service offered by the subordinate. The subordinate (possibly further nested, or more than one) should never attempt Setup() and Teardown(). These are responsabilities of the parent, outer module, which owns the context. It's impractical to conditionally call Setup() and Teardown(), so the functions internally use and check the reference count to do the right thing automatically. The following are are idempotent (and safe and quickly return success) - calling multiple times Setup() on a ready context - calling multiple times Teardown() on a un-ready context Signed-off-by: Francesco Romani <fromani@redhat.com>
update to the tip of ghe main branch to pull in the latest fixes, most notably to avoid to leak temporary directories when consuming snapshots: jaypipes/ghw#236 jaypipes/ghw#240 Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. To move forward and make the ownership more explicit, we add an internal reference count in context. This is used only to track the context setup/teardown process. When contexts are reused, we always have a parent module -which creates and owns the context- which passes its context to a subordinate module, because the parent wants to consume the service offered by the subordinate. The subordinate (possibly further nested, or more than one) should never attempt Setup() and Teardown(). These are responsabilities of the parent, outer module, which owns the context. It's impractical to conditionally call Setup() and Teardown(), so the functions internally use and check the reference count to do the right thing automatically. The following are are idempotent (and safe and quickly return success) - calling multiple times Setup() on a ready context - calling multiple times Teardown() on a un-ready context Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. To move forward and make the ownership more explicit, we add an internal reference count in context. This is used only to track the context setup/teardown process. When contexts are reused, we always have a parent module -which creates and owns the context- which passes its context to a subordinate module, because the parent wants to consume the service offered by the subordinate. The subordinate (possibly further nested, or more than one) should never attempt Setup() and Teardown(). These are responsabilities of the parent, outer module, which owns the context. It's impractical to conditionally call Setup() and Teardown(), so the functions internally use and check the reference count to do the right thing automatically. The following are are idempotent (and safe and quickly return success) - calling multiple times Setup() on a ready context - calling multiple times Teardown() on a un-ready context Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. To move forward and make the ownership more explicit, we make the context management more explict. We refactor the Do method of context.Context as a top level function so we can pass it through in all the NewWithContext functions. With these changes: - the top-level packages, which owns the context, will keep working as usual with just a bit of extra verbosity - it will need to pass around `context.Do` which as implicit before - the auxiliary packages, which do NOT own the context and thus should not act upon it, will be created by the top-level packages using `NewWithContext` with the new `context.Bypass` helper, which is a NOP from the context setup/teardown perspective. Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. To move forward and make the ownership more explicit, we add a new function `NewWithContext` in all the subpackages. The function `NewWithContext` will assume the context it gets as argument is ready to be consumed and it will use as-is, attempting no setup or teardown in any circumstances. The usage rules are pretty simple: 1. In your client code, you most likely just need to use `New` as usual. 2. When consuming a package from another (e.g. `toppology` from `pci`), always use the `NewWithContext` function to get a handle for the auxiliary package, passin through the context from the top-level package. 3. Within a `NewWithContext` function, ally calls to other `NewWithContext` functions should be made to preserve the properties. Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR jaypipes#236 fixes a snapshot directory leakage issue, but also highlight a context lifecycle/ownership issue. In general in ghw each package creates its own context, including the cases on which a package is consumed by another, for example `topology` and `pci`. However, in this case, it would make more sense to reuse the context from the parent package, because the two packages are tightly coupled. Before the introduction of the the transparent snapshot support (jaypipes#202), the above mechanism worked, because each context, each time, was consuming over and over again the same parameters (e.g. environment variables); besides some resource waste, this had no negative effect. When introducing snapshots in the picture, repeatedly unpacking the same snapshot to consume the same data is much more wasteful. So it makes sense now to introduce explicitly the concept of dependent context. To move forward and make the ownership more explicit, we add a new function `NewWithContext` in all the subpackages. The function `NewWithContext` will assume the context it gets as argument is ready to be consumed and it will use as-is, attempting no setup or teardown in any circumstances. The usage rules are pretty simple: 1. In your client code, you most likely just need to use `New` as usual. 2. When consuming a package from another (e.g. `toppology` from `pci`), always use the `NewWithContext` function to get a handle for the auxiliary package, passin through the context from the top-level package. 3. Within a `NewWithContext` function, ally calls to other `NewWithContext` functions should be made to preserve the properties. Signed-off-by: Francesco Romani <fromani@redhat.com>
Fix an error preventing the auto-unpacked snapshot files to be deleted.
Add a unit-test to expose the issue.
Signed-off-by: Marcel Apfelbaum marcel@redhat.com