Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache TTLs for individual IP addresses in DNS responses. #6732

Merged
merged 14 commits into from
Nov 5, 2024

Conversation

hkiiita
Copy link
Member

@hkiiita hkiiita commented Oct 11, 2024

This is a draft PR for work towards #6722 .

pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
test/e2e/fqdn_dns_cache_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
Comment on lines 492 to 496
ipMetaDataHolder[ipStr] = ipWithTTL{
ip: ipMeta.ip,
expirationTime: ipMeta.expirationTime,
}
minTimeToReQuery = minTime(ipMeta.expirationTime, minTimeToReQuery)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since thees lines appear four times at least, might make sense to make it a helper.

Copy link
Member Author

@hkiiita hkiiita Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg Yes, that was a nice improvement.

@@ -76,11 +76,16 @@ func (fs *fqdnSelectorItem) matches(fqdn string) bool {
// expirationTime of the records, which is the DNS response
// receiving time plus lowest applicable TTL.
type dnsMeta struct {
expirationTime time.Time
//expirationTime time.Time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if no longer needed

pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
@@ -253,8 +258,8 @@ func (f *fqdnController) getIPsForFQDNSelectors(fqdns []string) []net.IP {
}
for fqdn := range fqdnsMatched {
if dnsMeta, ok := f.dnsEntryCache[fqdn]; ok {
for _, ip := range dnsMeta.responseIPs {
matchedIPs = append(matchedIPs, ip)
for _, ipWithMetaData := range dnsMeta.responseIPs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/ipWithMetaData/ipData


currentTime := time.Now()
ipWithTTLMap := make(map[string]ipWithTTL)
minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what time is this supposed to be? At the very least there should be a comment to explain. But based on usage, I am not sure why we are not just using the zero value for time.Time:

var minTimeToReQuery time.Time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas Actually ,

minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999)

i had chosen above expression to establish a reference point for tracking the minimum time to re-query DNS requests as IPs expire. Since our goal in this PR for this bug/issue is to send requests as and when an IP expires, initializing minTimeToReQuery to a maximum time value allows us to compare and update it as we process various time values , ultimately allowing us to determine the earliest re-query time.

Copy link
Contributor

@Dyanngg Dyanngg Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you could do though is to make earlierOf take time.Time pointers as parameters, and return the non-nil value if one of t1/t2 is nil. That way you would not need to initialize minTimeToReQuery. Note that in that case the return value should also be a pointer IMO (since earlierOf(nil, nil) is nil), and when it's used eventually you should check that minTimeToReQuery is not nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dyanngg , yes i think using that approach we can avoid initialising minTimeToReQuery .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the pointer solution, but it seems to me that we could just pick a reasonable time such as now + 24*time.Hour and use that (with an appropriate comment of course). The time is only used if len(ipWithTTLMap) > 0 anyway. And it seems reasonable to resend a query after 24h even if we don't really need to? At least it won't impact correctness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i think this was the point i was missing . Maybe i had set it to too large a value under assumption that i needed a very large value. The 24 hour time frame seems more reasonable as a replacement to existing large value that i had put.

minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999)
addressUpdate := false

ipMapFiller := func(ip string, ipMeta ipWithTTL, minTime *time.Time) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of this function (ipMapFiller) is really hurting the readability of the code. I would suggest something like addIPtoMap / addIPToCache / addIPToNewCache / setIPInMap / ...

Additionally, minTime does not need to be a parameter to this function: you are always using &minTimeToReQuery as the argument value, and minTimeToReQuery is captured by the closure (by reference).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , my bad. The closure captures the variable minTimeToReQuery and should not be passed as an argument.

t.Errorf("Expected TTL for %s to be %v, got %v", ipStr, expectedTTL, ipMeta.expirationTime)
}
}
println("\n\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use a print statement in a test like this

Comment on lines 604 to 605
// sample IP with time simulating expired time, i thought that using negative time will
// simulate expired time as it will always equate to before when compared to any time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be more concise in your comments:

// expired IP that should be removed from the cache when onDNSResponse is called

// we get new IP
"192.0.2.3": {ip: net.ParseIP("192.0.2.3"), expirationTime: currentTime.Add(10 * time.Second)},
// and an exisiting IP
"192.0.2.1": {ip: net.ParseIP("192.0.2.1"), expirationTime: currentTime.Add(10 * time.Second)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the test more useful if the new TTL for this IP is different from the current one? If the new TTL in the response is greater than the existing one, the new one should be sued. If it is less than the existing one, the current one should remain in use.

},
},
{
name: "old IP expired",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a strict subset of the previous test, and doesn't need to be included

there are 2 possible approaches here:

  1. have a single unit test case with multiple different IPs, ensuring that all the branches / cases of the onDNSResponse are tested
  2. have multiple test cases, each testing a different branch / scenario

Usually, we prefer the second approach because it makes test failures easier to identify / troubleshoot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes are unrelated to the rest of the PR, and should be removed

f.setFQDNMatchSelector(fqdn, selectorItem)
for ipStr, ipMeta := range newDNSresponseIPs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make sense to be in part of the for loop... We don't want to call addIPToCache for as many times as the number of selectorItems matching the fqdn

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg yes updated it.

if mustCacheResponse {

if len(ipWithTTLMap) > 0 {
addressUpdate = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. Isn't addressUpdate supposed to be determined by whether FQDN <-> IP mappings have changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg updated .

@@ -405,80 +409,102 @@ func (f *fqdnController) deleteRuleSelectedPods(ruleID string) error {

func (f *fqdnController) onDNSResponse(
fqdn string,
responseIPs map[string]net.IP,
lowestTTL uint32,
newDNSresponseIPs map[string]ipWithTTL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the camel format, it should be newDNSResponseIPs, however, I feel DNS and response is kind of duplicate with the function name, and given you name the struct ipWithTTL, I would just name the parameter newIPWithTTLs, or newIPWithTTLMap to correspond to the variable at L424.

Comment on lines 426 to 427
//minTimeToReQuery Establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
minTimeToReQuery := time.Now().Add(24 * time.Hour)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//minTimeToReQuery Establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
minTimeToReQuery := time.Now().Add(24 * time.Hour)
// minTimeToRequery establishes a maximum reference time for tracking the minimum requery time to DNS, as IPs expire.
minTimeToRequery := time.Now().Add(24 * time.Hour)

// This IP entry has already expired and not seen in the latest DNS response.
// It should be removed from the cache.

for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs {
for cachedIPStr, cachedIPMeta := range cachedDNSMeta.responseIPs {

// It should be removed from the cache.

for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs {
if _, exist := newDNSresponseIPs[cachedIpStr]; !exist {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if _, exist := newDNSresponseIPs[cachedIpStr]; !exist {
if newIPMeta, exist := newDNSresponseIPs[cachedIpStr]; !exist {

then use it at L461, instead of querying the map another time.

if mustCacheResponse {

if len(ipWithTTLMap) > 0 {
addressUpdate = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when no IP is changed, I think the map is not empty, why this is set to true?

responseIPs map[string]ipWithTTL
}

type ipWithTTL struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ipWithExpiration (we store the expiration time, not the TTL)

@@ -405,80 +408,115 @@ func (f *fqdnController) deleteRuleSelectedPods(ruleID string) error {

func (f *fqdnController) onDNSResponse(
fqdn string,
responseIPs map[string]net.IP,
lowestTTL uint32,
newIPWithTTLMap map[string]ipWithTTL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment: it should be newIPWithExpirationMap.

currentTime := time.Now()
ipWithTTLMap := make(map[string]ipWithTTL)

//minTimeToRequery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: leave whitespace after //

Comment on lines 444 to 446
if !addressUpdate {
addressUpdate = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically, you just unconditionally write addressUpdate = true in this case. I'm pretty sure there is no advantage in checking the value first. If anything, the extra branching probably makes it worse.

Comment on lines 484 to 486
if !addToCache {
addToCache = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
Comment on lines 680 to 691
if !exists {
t.Errorf("Expected %s to be found in dns cache.", ipStr)
continue
}

if cachedIPMeta.expirationTime.Before(time.Now()) {
t.Errorf("Expected %s to be found with a valid TTL,but it has expired.", ipStr)
}

if !cachedIPMeta.expirationTime.Equal(expectedTTL) {
t.Errorf("Expected TTL for %s to be %v, got %v", ipStr, expectedTTL, cachedIPMeta.expirationTime)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use testify assertions for consistency, not t.Error

Comment on lines 143 to 146
randomPatchAnnotationKey = "test.antrea.io/random-value"
annotationValueLen = 8
iperfPort = 5201
iperfSvcPort = 9999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I pointed this out before: please remove these unrelated changes from the PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder

//minTimeToRequery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
minTimeToRequery := currentTime.Add(24 * time.Hour)

addIPtoIPWithTTLMap := func(ip string, ipMeta ipWithTTL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits: addIPtoIPWithTTLMap -> updateMapWithIP, also minTimeToRequery -> timeToRequery as the former sounds like a duration rather than a timestamp

} else {
// First time seeing this domain.
// check if this needs to be tracked, by checking its presence in the datapath rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check if this needs to be tracked, by checking its presence in the datapath rules.
// check if this needs to be tracked, by checking if it matches any Antrea FQDN policy selectors.

Comment on lines 474 to 475
// If a FQDN policy had been applied then there must be rule records but because it's
// not in cache hence its FQDN:SelectorItem mapping may not be present.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this comment, but I'm pretty sure that it can be covered by the previous one

Comment on lines 426 to 434
// timeToRequery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
timeToRequery := currentTime.Add(24 * time.Hour)

updateMapWithIP := func(ip string, ipMeta ipWithExpiration) {
ipWithExpirationMap[ip] = ipMeta
if ipMeta.expirationTime.Before(timeToRequery) {
timeToRequery = ipMeta.expirationTime
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// timeToRequery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
timeToRequery := currentTime.Add(24 * time.Hour)
updateMapWithIP := func(ip string, ipMeta ipWithExpiration) {
ipWithExpirationMap[ip] = ipMeta
if ipMeta.expirationTime.Before(timeToRequery) {
timeToRequery = ipMeta.expirationTime
}
}
// timeToRequery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
var timeToRequery *time.Time
updateMapWithIP := func(ip string, ipMeta ipWithExpiration) {
ipWithExpirationMap[ip] = ipMeta
if timeToRequery == nil || ipMeta.expirationTime.Before(*timeToRequery) {
timeToRequery = &ipMeta.expirationTime
}
}

To avoid introducing the vague duration 24 * time.Hour

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to move least expiration calculation to a function of dnsMeta, then there is no need to have this function, and it can save a copy in "First time seeing this domain" case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. I also had earlier thought on similar lines but i thought of it using as :

var timeToRequery time.Time

but as you know it meant timeToRequery gets set to its zero value and would not have worked because its zero value would have always been lesser, making me to set an arbitrary value to it.

// timeToRequery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
timeToRequery := currentTime.Add(24 * time.Hour)

updateMapWithIP := func(ip string, ipMeta ipWithExpiration) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
updateMapWithIP := func(ip string, ipMeta ipWithExpiration) {
updateIPWithExpiration := func(ip string, ipMeta ipWithExpiration) {

// check for presence of already cached IPs in the new response.
for cachedIPStr, cachedIPMeta := range cachedDNSMeta.responseIPs {
if newIPMeta, exist := newIPWithExpiration[cachedIPStr]; !exist {
// An already cached IP has been found in new response.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the opposite of the code.

@@ -38,6 +39,12 @@ func newMockFQDNController(t *testing.T, controller *gomock.Controller, dnsServe
if dnsServer != nil {
dnsServerAddr = *dnsServer
}
mockDnsQueryQueue := workqueue.NewTypedRateLimitingQueueWithConfig(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mockDnsQueryQueue := workqueue.NewTypedRateLimitingQueueWithConfig(
mockDNSQueryQueue := workqueue.NewTypedRateLimitingQueueWithConfig(

Comment on lines 143 to 146
randomPatchAnnotationKey = "test.antrea.io/random-value"
annotationValueLen = 8
iperfPort = 5201
iperfSvcPort = 9999
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder

@tnqn
Copy link
Member

tnqn commented Oct 29, 2024

Could you remove Draft PR for #6722 -- from the PR title? The issue link can be found in the PR description.

Also, it's perhaps time to squash the commits and write a good commit message.

{
name: "new IP added",
existingDNSCache: map[string]dnsMeta{
"fqdn-test-pod.lfx.test": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can use testFQDN.

Comment on lines 704 to 746
assert.Equal(t, len(cachedDnsMetaData.responseIPs), len(tc.expectedIPs), "Expected %d IPs in cache, got %d", len(tc.expectedIPs), len(cachedDnsMetaData.responseIPs))

for ipStr, expectedTTL := range tc.expectedIPs {
cachedIPMeta, exists := cachedDnsMetaData.responseIPs[ipStr]

assert.True(t, exists, "Expected %s to be found in dns cache.", ipStr)
assert.False(t, cachedIPMeta.expirationTime.Before(currentTime), "Expected %s to have a valid TTL, but it has expired.", ipStr)
assert.Equal(t, expectedTTL, cachedIPMeta.expirationTime, "Expected TTL for %s to be %v, got %v", ipStr, expectedTTL, cachedIPMeta.expirationTime)

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just change tc.dnsResponseIPs to map[string]ipWithExpiration, then one line code can validate the result and cover all cases.

assert.Equal(t, tc.expectedIPs, cachedDNSMetaData.responseIPs)

Comment on lines 43 to 46
workqueue.NewTypedItemExponentialFailureRateLimiter[string](1*time.Millisecond, 100*time.Millisecond),
workqueue.TypedRateLimitingQueueConfig[string]{
Name: "fqdn",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should use a fake Clock to speed up enqueue, see

fakeController := newFakeController(t, fakeClock)


}

if tc.expectedItem == testFQDN {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if tc.expectedItem == testFQDN {
if tc.expectedItem != "" {

@hkiiita hkiiita changed the title Draft PR for #6722 -- cache TTLs for individual IP addresses in DNS responses. cache TTLs for individual IP addresses in DNS responses. Oct 29, 2024
@antoninbas antoninbas marked this pull request as ready for review October 29, 2024 18:44
…onses.

- Implement tracking of individual IP TTLs for FQDNs in the DNS cache.
- Upon expiration of an IP's TTL, trigger a new DNS query.
- Evict only those IPs that are no longer present in the latest DNS response and have exceeded their original TTL.

Signed-off-by: Hemant <hkbiet@gmail.com>
}
f.dnsQueryQueue.AddAfter(fqdn, recordTTL.Sub(time.Now()))
f.dnsQueryQueue.AddAfter(fqdn, timeToReQuery.Sub(currentTime))
Copy link
Contributor

@Dyanngg Dyanngg Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why timeToReQuery cannot be nil here. Theoretically I think it cannot, but
it is not obvious at all

Comment on lines 598 to 599
fakeClock := clocktesting.NewFakeClock(time.Now())
currentTime := fakeClock.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each test case should have its own clock, otherwise with your current code, one test advancing the clock will impact subsequent test cases

you can do something like this:

currentTime := time.Now()
// use currentTime to define your test cases, as you are doing now
// declare fakeClock := clocktesting.NewFakeClock(currentTime) at the beginning of each test case

currentTime := time.Now()
ipWithExpirationMap := make(map[string]ipWithExpiration)

// timeToReQuery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// timeToReQuery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
// timeToRequery establishes the time after which a new DNS query should be sent for the FQDN.
// timeToRequery will be set to the minimum expiration time of all IPs added or updated in the cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggesting this as I think the current comment is not very helpful or accurate

ipWithExpirationMap := make(map[string]ipWithExpiration)

// timeToReQuery establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
var timeToReQuery *time.Time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename timeToReQuery to timeToRequery?

Comment on lines 724 to 733
f, _ := newMockFQDNController(t, controller, nil)
mockDNSQueryQueue := workqueue.NewTypedRateLimitingQueueWithConfig(
workqueue.NewTypedItemExponentialFailureRateLimiter[string](minRetryDelay, maxRetryDelay),
workqueue.TypedRateLimitingQueueConfig[string]{
Name: "fqdn-test",
Clock: fakeClock,
},
)
f.dnsQueryQueue = mockDNSQueryQueue
f.dnsEntryCache = tc.existingDNSCache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this may work, a better approach is to have a clock.Clock as a field in fqdnController, and make sure the entire implementation of fqdnController uses that clock (e.g., calls clock.Now() instead of time.Now()). The default implementation will use clock.RealClock{}, but newMockFQDNController should support injecting a custom clock (fake clock).
with this approach, the test function won't need to build a custom workqueue

}

if tc.expectedItem != "" {
fakeClock.Step(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this 5 * time.Second values comes out of nowhere. I know it is based on the inputs to the test case (existingDNSCache, etc.) but it should itself be a part of the test case definition, otherwise it looks like some magical value

what I would do is remove tc.expectedItem and add tc.expectedRequeryAfter *time.Duration as a test case input. This way you only have one input for this. You already know that the expected item is testFQDN, so this doesn't need to be an input.

I'm also fixing a couple of issues in the snippet below:

if tc.expectedRequeryAfter != nil {
        fakeClock.Step(*tcexpectedRequeryAfter)
        // needed to avoid blocking on Get() in case of failure
        require.Eventually(t, func() bool { f.dnqQueryQueue.Len() > 0 }, 1*time.Second, 10*time.Millisecond)
        item, _ := f.dnsQueryQueue.Get()
        f.dnsQueryQueue.Done(item)
        assert.Equal(t, testFqdn, item)
} else {
        // make sure that there is no requery
        assert.Never(t, func() bool { f.dnqQueryQueue.Len() > 0 }, 100*time.Millisecond, 10*time.Millisecond)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for feedback and proper outline on this . I got some logical insights from this, for instance, I was directly calling Get() on the queue after advancing the time, without checking if the queue really contained any items or not because i "assumed" that it really must contain the item. This could have lead to blocking indefinitely until an item was actually available, and using require.Eventually helps ensure that the queue actually has items before attempting to retrieve them.

 - Replace time.Now() usages with the clock set in fqdnController.
 - Inject fakeClock for tests.
Removed mockDnsQueryQueue implementation.

Signed-off-by: Hemant <hkbiet@gmail.com>
@Dyanngg Dyanngg changed the title cache TTLs for individual IP addresses in DNS responses. Cache TTLs for individual IP addresses in DNS responses. Oct 29, 2024
@@ -17,7 +17,7 @@ package networkpolicy
import (
"context"
"fmt"
"math"
"k8s.io/utils/clock"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be in the second import group

@@ -152,6 +156,12 @@ type fqdnController struct {
ipv4Enabled bool
ipv6Enabled bool
gwPort uint32
// clock allows for time-dependent operations within the fqdnController.
// By using a clock.Clock as a field, we ensure that all time-related calls throughout the
// implementation of the fqdnController utilize this clock, for example , calls to clock.Now() instead of time.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// implementation of the fqdnController utilize this clock, for example , calls to clock.Now() instead of time.Now()
// implementation of the fqdnController utilize this clock

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This patch enables the caching of TTLs for individual IPs in DNS responses." is not a good commit message title. Please follow the "git commit" link in https://github.com/antrea-io/antrea/blob/main/CONTRIBUTING.md#getting-reviewers to write a proper one.

{
name: "existing DNS cache not impacted",
existingDNSCache: map[string]dnsMeta{
"fqdn-test-pod.lfx.test": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"fqdn-test-pod.lfx.test": {
testFQDN: {

Same comment applies to other cases

expectedIPs: map[string]ipWithExpiration{
"192.1.1.3": {ip: net.ParseIP("192.1.1.3"), expirationTime: currentTime.Add(5 * time.Second)},
},
expectedRequeryAfter: getDuration(5 * time.Second),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expectedRequeryAfter: getDuration(5 * time.Second),
expectedRequeryAfter: ptr.To(5 * time.Second),

and no need to create getDuration

},
},
{
name: "old IP resent in DNS response is retained with an updated TTL fetched from response",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add another IP to existingDNSCache and dnsResponseIPs, the former has shorter expirationTime than the latter, to validate it's always the later expiration time is used.

pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
…upport

- Implement tracking of TTLs for individual IPs linked to FQDNs in the DNS cache.
- Automatically trigger a new DNS query when an IP's TTL expires.
- Evict only those IPs that are no longer present in the latest DNS response and have surpassed their original TTL.
- Added clock field of type clock.Clock in fqdnController.
- Pass clock while initializing fqdn controller.

Signed-off-by: Hemant <hkbiet@gmail.com>
Signed-off-by: Hemant <hkbiet@gmail.com>
Signed-off-by: Hemant <hkbiet@gmail.com>
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
Comment on lines 505 to 507
// timeToRequery passed below is guaranteed to be non-nil here because updateIPWithExpiration() is called on each IP in
// newIPWithExpiration that either isn't cached in the existing DNS cache or it remains unexpired. This ensures that timeToReQuery is
// always set to the earliest expiration time in the response, hence also enabling proper DNS re-query scheduling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to wrap lines to 100 chars at the max

pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
…tuality test to 1 second

Signed-off-by: Hemant <hkbiet@gmail.com>
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
…ueue.

Updated onDNSResponse() to requeue DNS queries even when same IPs are receieved in a response.
Added and updated comments as per review.

Signed-off-by: Hemant <hkbiet@gmail.com>
Signed-off-by: Hemant <hkbiet@gmail.com>
Updated comment clarifying the case when ipWithExpirationMap is of zero length in onDNSResponse.
Updated test case which ensures the requery time to be the later time amongst the TTL already cached and TTL received in DNS response for cases involving same IPs, as present in DNS cache, being received.

Signed-off-by: Hemant <hkbiet@gmail.com>
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one small comment on my side, otherwise LGTM

pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
regplaced dnsQueryQueue.AddAfter logic as before.

Signed-off-by: Hemant <hkbiet@gmail.com>
@luolanzone luolanzone added this to the Antrea v2.2 release milestone Nov 4, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a format issue

pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
…ts from external imports

Signed-off-by: Hemant <hkbiet@gmail.com>
antoninbas
antoninbas previously approved these changes Nov 4, 2024
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…NSResponse test.

- Seemingly a race condition between the event of creation of Timer by DelayeingQueue and Step function of fakeClock is failing the test.
- The fix Uses the idea in antrea-io#6407
- The fix uses the ,already implemented wrapper, around fakeClock with a method that counts the number of times DelayedQueue's Timer has been called.

Signed-off-by: Hemant <hkbiet@gmail.com>
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
Signed-off-by: Hemant <hkbiet@gmail.com>
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

/test-all

@antoninbas antoninbas merged commit 232d582 into antrea-io:main Nov 5, 2024
56 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants