Skip to content
This repository has been archived by the owner on Jul 21, 2021. It is now read-only.

Commandeer#188 Update DNS Lookup Behavior #191

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Yu-Xie
Copy link

@Yu-Xie Yu-Xie commented Mar 23, 2018

This PR commandeer #188 and finishes the remaining work.
When some of servers fail net.lookup, it used to error out.
Instead we should consider it as success as long as any one server is alive, and start a separate go-routine to keep trying the lookup and if successful add server to the list.

@Yu-Xie Yu-Xie changed the title Update DNS lookup behavior Commandeer#188 Update DNS Lookup Behavior Mar 23, 2018
@coveralls
Copy link

coveralls commented Mar 23, 2018

Coverage Status

Coverage increased (+0.3%) to 80.346% when pulling fb52b7d on Yu-Xie:dns into c4fab1a on samuel:master.

}
hp.mu.Unlock()

time.Sleep(time.Millisecond * 5)
Copy link
Contributor

@jdef jdef Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[EDIT] sleeps like this can lead to test flakes in busy CI envs. consider implementing a retry loop that fails after a longer period of non-success (like 30s)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review @jdef . I've updated the commit.

// Starts a 30s retry loop to wait servers list to be updated
startRetryLoop := time.Now()
for {
hp.mu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add back the sleep as the first step of this for loop, to avoid pegging the CPU. sleeps are useful in tests like these, they just need to be wrapped in a loop like this to avoid CI flakes :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Fixed.

host, port, err := net.SplitHostPort(server)
if err != nil {
return err
return false, err
}
addrs, err := lookupHost(host)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more suggestion: lookupHost may take a while. you might not want to hold the lock while it's executing. consider releasing it, and re-locking once it returns (if there's more work to do)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess you could just hold off on obtaining the lock until you start manipulating shared state (starting with hp.servers = append ...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a very good suggestion @jdef . However this optimization makes unresolvedServers, sleep, and lookupHost not thread safe. It is actually acceptable, since they are only used during lookup.

I updated the commit based on your suggestion, and also the mutex now only protects servers, curr, and last.

Also updated the test to use channel instead to simulate a server that gets back online later. Previously I changed lookupHost inflight which would be a race in the new implementation.

@Yu-Xie
Copy link
Author

Yu-Xie commented Mar 28, 2018

Fixed all the comments. @jdef

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants