-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib.request resolves the host before checking it against the system's proxy bypass list [Security: LOW, minor info leak] #115197
Comments
I have created a fix for this problem internally. If it's okay, I can submit that as a pull request. |
Yes, please add a pull request. If it's a relatively simple fix, then this will be less controversial. For what it's worth, we don't usually consider web browsers to be the canonical behaviour for our APIs. If another low-level library similar to urllib is doing it, that is helpful guidance (e.g. urllib3). |
@zooba Yes, the solution is simply not to resolve the host before checking it against the system proxy bypass list. I don't recall any HTTP tools, for example, curl, has this behavior. I did some research and found that the |
Some other datapoints.
Gitlab blog about no_proxy: https://about.gitlab.com/blog/2021/01/27/we-need-to-talk-no-proxy/, the links above are based on that blog post (but point to main branch except for wget which I didn't change). The blog points out some other inconstencies between the implementations as well. Given the references above most major tools seem to not resolve names, which t.b.h. is not something I had expected. |
@ronaldoussoren Thank you for the investigation. This aligns more or less with what I thought. Even Python itself does not resolve the hostname when handling the Lines 2516 to 2548 in 31633f4
The affected code here is slightly different, as Python handles the proxy bypass list sourced from the operating system. To my knowledge, |
The Names are not what we get from a resolution process, so that would argue that at least on Windows, we should check the host name without resolving it. Whether we can safely make that change in terms of backwards compatibility is a separate question. My instinct is that we should check both for now and plan to stop checking addresses after a deprecation period (and probably add some way for users to stop checking them now). But I'm not sure if that's best. @gpshead, I expect you know more about this area (proxies) than I do - any thoughts? |
@zooba Thank you for the inspiration on how to find the documentation. I discovered that the corresponding documentation for macOS Does the "name" here imply that macOS shouldn't resolve the name as well? |
I agree with the majority of tools: A proxy bypass lists should always be a simple string match without name resolution. In proxy environments the point of the proxy is that it is entirely up to the proxy to do all resolving and remote connecting (or MITMing, denying, rejecting, etc). So we should not be doing any name resolution (or reverse name lookups of an address) in order to determine if a proxy should be used. @sethmlarson may also have an opinion. |
TLDR: Indeed, on macOS we shouldn't resolve the name to match system behaviour. I did some testing with Apple's system libraries and those don't resolve names when determining if a proxy should be used. That is, given the following script (requires PyObjC to be installed): import CFNetwork
import Cocoa
import SystemConfiguration
import socket
def main():
hn = "www.bol.com"
print(f"{hn} -> {socket.gethostbyname(hn)}")
url = Cocoa.CFURLCreateWithString(None, f"http://{hn}/", None)
print(f"{url=}")
settings = SystemConfiguration.SCDynamicStoreCopyProxies(None)
print(f"{settings=}")
proxies = CFNetwork.CFNetworkCopyProxiesForURL(url, settings)
print(f"{proxies=}")
if __name__ == "__main__":
main() I get the following output with a proxy configured an 34.35.121.47 added to the no-proxy list (the other two are part of the default configuration):
This shows that the "http://www.bol.com" will be proxied, even though its IP address is on the proxy list. That means that we shouldn't resolve names to match system behaviour. Not that any of this is clearly documented (sigh...). |
@zooba I can update the code to print a deprecation warning when the hostname does not match directly, but the resolved result does, ensuring backward compatibility for now. However, based on the current evidence, I personally believe this should be categorized as a bug. How would you advise handling this issue? |
The problem is that if the hostname doesn't match anything in the list, then we should be passing it to the proxy and not resolving it. So if we keep trying to resolve it, we could be creating a security issue (an information leak, to be specific). Probably the best thing to do is to just fix it and add a proper item in Arguably, in light of the potential for being part of an exfiltration chain, we should backport this to security releases as well. My "working for a big enterprise" bias says we should, because Python ought to integrate into existing infrastructure properly. But I may be overlooking impact on other users that would argue against doing that. Count me +0 for backporting with no deprecation warning. (We can't warn on IP addresses in the bypass list, or being given an IP address, so the only option I see is to resolve when we shouldn't for the sake of warning accurately, which I don't like...) |
I agree with backporting this one, at least through bugfix releases. I'd leave it to RMs to decide for older ones. It hasn't been raised until now and client side behavior like this is not where real security gets implemented, but it is a possible information leak vs what most proxy configurations expects to happen. I don't know how often proxy bypass lists actually get used in practice. (I'm used to the idea of all or nothing - but I haven't been within any application level proxy environments in decades) |
This problem affects almost all users who use the system proxy on macOS, as macOS automatically populates the proxy exception list with BTW, is there anything I need to update in the pull request for backporting? I am using the |
Nope, don't worry about that, we'll do it when backporting as far as we decide (which wouldn't be further than 3.8 if we even go that far). |
[History] for posterity and to save others spelunking version control: The host resolution on proxy_bypass checking for Windows was originally added in 2001 by an unidentified contributor adding windows registry proxy bypass list lookup code: #33859 commit 55c12d4 There was no discussion about the name resolution at the time, it's just the code that got written. The approach was probably copied when adding the equivalent macOS sysconf bypass code while de-Carbon-ing stuff to use modern APIs in ~2.6 in 9dd6b1d. |
Thanks for the ping @gpshead, I'm +1 on removing the name resolution as it definitely represents an information leak if done on the originating machine instead of by the proxy. Also +1 for a backport into security releases. |
…ythonGH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…ythonGH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…ythonGH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…ythonGH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…ythonGH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
thanks! |
…H-115210) gh-115197: Stop resolving host in urllib.request proxy bypass (GH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…H-115210) gh-115197: Stop resolving host in urllib.request proxy bypass (GH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…ythonGH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree.
…H-115210) (GH-116070) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…H-115210) (GH-116068) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…H-115210) (GH-116069) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree. (cherry picked from commit c43b26d) Co-authored-by: Weii Wang <weii.wang@canonical.com>
…ythonGH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree.
…ythonGH-115210) Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree.
Bug report
Bug description:
When system proxy bypass list is set, the urllib.request library on macOS and Windows resolves the hostname to an IP address and the IP address to a hostname (on Windows) before checking it against the system proxy bypass list (see here and here).
This causes DNS leak and HTTP requests to hang while waiting for DNS timeout in some air-gaped environments. This behavior also differs from other system applications (tested on macOS Sonoma with Safari and Windows Server 2022 with the Edge browser).
Test process on macOS and Windows:
Creating an A record from
<my-test-domain>.net
to<my-test-ip>
.macOS with Safari:
In the system network setting:
<my-test-ip>
In Safari:
http://<my-test-ip>
: does not use the proxyhttp://<my-test-domain>.net
: uses the proxyWindows Server 2022 with Edge browser:
in system network setting:
<my-test-ip>
In Edge browser:
http://<my-test-ip>
: does not use the proxyhttp://<my-test-domain>.net
: uses the proxyurllib.request on Windows also resolves the IP address back to FQDN before check, here's a test for that:
Windows Server 2022 with Edge browser:
Update the Host file so the IP address can be resolved back to FQDN (
socket.getfqdn("<my-test-ip>") == "<my-test-domain>.net"
).In system network setting:
In Edge browser:
http://<my-test-ip>
: uses the proxyhttp://<my-test-domain>.net
: does not use the proxyCPython versions tested on:
CPython main branch
Operating systems tested on:
macOS, Windows
Linked PRs
The text was updated successfully, but these errors were encountered: