Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.request resolves the host before checking it against the system's proxy bypass list [Security: LOW, minor info leak] #115197

Closed
weiiwang01 opened this issue Feb 9, 2024 · 17 comments · Fixed by #115210
Assignees
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes OS-mac OS-windows type-bug An unexpected behavior, bug, or error type-security A security issue

Comments

@weiiwang01
Copy link
Contributor

weiiwang01 commented Feb 9, 2024

Bug report

Bug description:

When system proxy bypass list is set, the urllib.request library on macOS and Windows resolves the hostname to an IP address and the IP address to a hostname (on Windows) before checking it against the system proxy bypass list (see here and here).

This causes DNS leak and HTTP requests to hang while waiting for DNS timeout in some air-gaped environments. This behavior also differs from other system applications (tested on macOS Sonoma with Safari and Windows Server 2022 with the Edge browser).

Test process on macOS and Windows:

Creating an A record from <my-test-domain>.net to <my-test-ip>.

macOS with Safari:

In the system network setting:

  • "Web proxy (HTTP)" is set to 172.16.0.1:8000
  • "Secure web proxy (HTTPS)" is set to 172.16.0.1:8000
  • "Bypass proxy settings" is set to <my-test-ip>

In Safari:

  • visiting http://<my-test-ip>: does not use the proxy
  • visiting http://<my-test-domain>.net: uses the proxy

Windows Server 2022 with Edge browser:

in system network setting:

  • "HTTP proxy" is set to 172.16.0.1:8000
  • "Do not use proxy server" is set to <my-test-ip>

In Edge browser:

  • visiting http://<my-test-ip>: does not use the proxy
  • visiting http://<my-test-domain>.net: uses the proxy

urllib.request on Windows also resolves the IP address back to FQDN before check, here's a test for that:

Windows Server 2022 with Edge browser:

Update the Host file so the IP address can be resolved back to FQDN (socket.getfqdn("<my-test-ip>") == "<my-test-domain>.net").

In system network setting:

  • "HTTP proxy" is set to 172.16.0.1:8000
  • "Do not use proxy server" is set to .net

In Edge browser:

  • visiting http://<my-test-ip>: uses the proxy
  • visiting http://<my-test-domain>.net: does not use the proxy

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS, Windows

Linked PRs

@weiiwang01 weiiwang01 added the type-bug An unexpected behavior, bug, or error label Feb 9, 2024
@weiiwang01
Copy link
Contributor Author

I have created a fix for this problem internally. If it's okay, I can submit that as a pull request.

@zooba
Copy link
Member

zooba commented Feb 9, 2024

Yes, please add a pull request. If it's a relatively simple fix, then this will be less controversial.

For what it's worth, we don't usually consider web browsers to be the canonical behaviour for our APIs. If another low-level library similar to urllib is doing it, that is helpful guidance (e.g. urllib3).

@weiiwang01
Copy link
Contributor Author

Yes, please add a pull request. If it's a relatively simple fix, then this will be less controversial.

For what it's worth, we don't usually consider web browsers to be the canonical behaviour for our APIs. If another low-level library similar to urllib is doing it, that is helpful guidance (e.g. urllib3).

@zooba Yes, the solution is simply not to resolve the host before checking it against the system proxy bypass list. I don't recall any HTTP tools, for example, curl, has this behavior. I did some research and found that the requests library has already created a patch inside requests to stop resolving for Windows, but it is still unfixed for macOS. Fixing this behavior in Python could definitely help.

@ronaldoussoren
Copy link
Contributor

ronaldoussoren commented Feb 9, 2024

Some other datapoints.

Gitlab blog about no_proxy: https://about.gitlab.com/blog/2021/01/27/we-need-to-talk-no-proxy/, the links above are based on that blog post (but point to main branch except for wget which I didn't change). The blog points out some other inconstencies between the implementations as well.

Given the references above most major tools seem to not resolve names, which t.b.h. is not something I had expected.

@weiiwang01
Copy link
Contributor Author

most

@ronaldoussoren Thank you for the investigation. This aligns more or less with what I thought. Even Python itself does not resolve the hostname when handling the no_proxy environment variable, just like wget, go, and curl.

cpython/Lib/urllib/request.py

Lines 2516 to 2548 in 31633f4

def proxy_bypass_environment(host, proxies=None):
"""Test if proxies should not be used for a particular host.
Checks the proxy dict for the value of no_proxy, which should
be a list of comma separated DNS suffixes, or '*' for all hosts.
"""
if proxies is None:
proxies = getproxies_environment()
# don't bypass, if no_proxy isn't specified
try:
no_proxy = proxies['no']
except KeyError:
return False
# '*' is special case for always bypass
if no_proxy == '*':
return True
host = host.lower()
# strip port off host
hostonly, port = _splitport(host)
# check if the host ends with any of the DNS suffixes
for name in no_proxy.split(','):
name = name.strip()
if name:
name = name.lstrip('.') # ignore leading dots
name = name.lower()
if hostonly == name or host == name:
return True
name = '.' + name
if hostonly.endswith(name) or host.endswith(name):
return True
# otherwise, don't bypass
return False

The affected code here is slightly different, as Python handles the proxy bypass list sourced from the operating system. To my knowledge, Ruby, curl, and go do not even attempt to read the proxy bypass list from the operating system. And there is essentially no documentation related to the operating system's proxy bypass list on Windows and macOS. This is why I ran experiments with the operating system's default browser and attempted to test the "canonical" interpretation of the system proxy bypass list. From the results, it appears that macOS and Windows do not resolve the hostname.

@zooba
Copy link
Member

zooba commented Feb 9, 2024

The WINHTTP_PROXY_INFO documentation says that "proxy bypass list contains one or more server names".

Names are not what we get from a resolution process, so that would argue that at least on Windows, we should check the host name without resolving it.

Whether we can safely make that change in terms of backwards compatibility is a separate question. My instinct is that we should check both for now and plan to stop checking addresses after a deprecation period (and probably add some way for users to stop checking them now). But I'm not sure if that's best. @gpshead, I expect you know more about this area (proxies) than I do - any thoughts?

@weiiwang01
Copy link
Contributor Author

@zooba Thank you for the inspiration on how to find the documentation. I discovered that the corresponding documentation for macOS kSCPropNetProxiesExceptionsList says that "Host name patterns that should bypass the proxy."

Does the "name" here imply that macOS shouldn't resolve the name as well?

@gpshead
Copy link
Member

gpshead commented Feb 9, 2024

I agree with the majority of tools: A proxy bypass lists should always be a simple string match without name resolution. In proxy environments the point of the proxy is that it is entirely up to the proxy to do all resolving and remote connecting (or MITMing, denying, rejecting, etc). So we should not be doing any name resolution (or reverse name lookups of an address) in order to determine if a proxy should be used.

@sethmlarson may also have an opinion.

@ronaldoussoren
Copy link
Contributor

@zooba Thank you for the inspiration on how to find the documentation. I discovered that the corresponding documentation for macOS kSCPropNetProxiesExceptionsList says that "Host name patterns that should bypass the proxy."

Does the "name" here imply that macOS shouldn't resolve the name as well?

TLDR: Indeed, on macOS we shouldn't resolve the name to match system behaviour.

I did some testing with Apple's system libraries and those don't resolve names when determining if a proxy should be used.

That is, given the following script (requires PyObjC to be installed):

import CFNetwork
import Cocoa
import SystemConfiguration
import socket


def main():
    hn = "www.bol.com"
    print(f"{hn} -> {socket.gethostbyname(hn)}")
    url = Cocoa.CFURLCreateWithString(None, f"http://{hn}/", None)
    print(f"{url=}")
    settings = SystemConfiguration.SCDynamicStoreCopyProxies(None)
    print(f"{settings=}")
    proxies = CFNetwork.CFNetworkCopyProxiesForURL(url, settings)
    print(f"{proxies=}")


if __name__ == "__main__":
    main()

I get the following output with a proxy configured an 34.35.121.47 added to the no-proxy list (the other two are part of the default configuration):

% python repro.py
www.bol.com -> 34.36.121.47
url=http://www.bol.com/
settings={
    ExceptionsList =     (
        "*.local",
        "169.254/16",
        "34.36.121.47"
    );
    FTPPassive = 1;
    HTTPEnable = 1;
    HTTPPort = 8080;
    HTTPProxy = "127.0.0.1";
    HTTPSEnable = 0;
    "__SCOPED__" =     {
        en0 =         {
            ExceptionsList =             (
                "*.local",
                "169.254/16",
                "34.36.121.47"
            );
            FTPPassive = 1;
            HTTPEnable = 1;
            HTTPPort = 8080;
            HTTPProxy = "127.0.0.1";
            HTTPSEnable = 0;
        };
    };
}
proxies=(
        {
        kCFProxyHostNameKey = "127.0.0.1";
        kCFProxyPortNumberKey = 8080;
        kCFProxyTypeKey = kCFProxyTypeHTTP;
    }
)

This shows that the "http://www.bol.com" will be proxied, even though its IP address is on the proxy list.

That means that we shouldn't resolve names to match system behaviour.

Not that any of this is clearly documented (sigh...).

@weiiwang01
Copy link
Contributor Author

@zooba I can update the code to print a deprecation warning when the hostname does not match directly, but the resolved result does, ensuring backward compatibility for now. However, based on the current evidence, I personally believe this should be categorized as a bug. How would you advise handling this issue?

@zooba
Copy link
Member

zooba commented Feb 12, 2024

The problem is that if the hostname doesn't match anything in the list, then we should be passing it to the proxy and not resolving it. So if we keep trying to resolve it, we could be creating a security issue (an information leak, to be specific).

Probably the best thing to do is to just fix it and add a proper item in Doc/whatsnew/3.13.rst so that people have a better chance of finding out about it.

Arguably, in light of the potential for being part of an exfiltration chain, we should backport this to security releases as well. My "working for a big enterprise" bias says we should, because Python ought to integrate into existing infrastructure properly. But I may be overlooking impact on other users that would argue against doing that.

Count me +0 for backporting with no deprecation warning. (We can't warn on IP addresses in the bypass list, or being given an IP address, so the only option I see is to resolve when we shouldn't for the sake of warning accurately, which I don't like...)

@gpshead
Copy link
Member

gpshead commented Feb 12, 2024

I agree with backporting this one, at least through bugfix releases. I'd leave it to RMs to decide for older ones.

It hasn't been raised until now and client side behavior like this is not where real security gets implemented, but it is a possible information leak vs what most proxy configurations expects to happen.

I don't know how often proxy bypass lists actually get used in practice. (I'm used to the idea of all or nothing - but I haven't been within any application level proxy environments in decades)

@weiiwang01
Copy link
Contributor Author

I agree with backporting this one, at least through bugfix releases. I'd leave it to RMs to decide for older ones.

It hasn't been raised until now and client side behavior like this is not where real security gets implemented, but it is a possible information leak vs what most proxy configurations expects to happen.

I don't know how often proxy bypass lists actually get used in practice. (I'm used to the idea of all or nothing - but I haven't been within any application level proxy environments in decades)

This problem affects almost all users who use the system proxy on macOS, as macOS automatically populates the proxy exception list with *.local,169.254/16 by default, triggering this issue. On Windows, if a user selects the option to not use a proxy for local addresses, the proxy override list will include an <local> entry, which also triggers this problem.

BTW, is there anything I need to update in the pull request for backporting? I am using the ipaddress library, which is only available after 3.3. Is that okay, considering all releases before 3.3 are EOL? Sorry, this is my first time contributing here.

@gpshead
Copy link
Member

gpshead commented Feb 13, 2024

BTW, is there anything I need to update in the pull request for backporting?

Nope, don't worry about that, we'll do it when backporting as far as we decide (which wouldn't be further than 3.8 if we even go that far).

@gpshead gpshead added type-security A security issue 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes 3.8 (EOL) end of life 3.12 bugs and security fixes 3.13 bugs and security fixes labels Feb 13, 2024
@gpshead
Copy link
Member

gpshead commented Feb 13, 2024

[History] for posterity and to save others spelunking version control: The host resolution on proxy_bypass checking for Windows was originally added in 2001 by an unidentified contributor adding windows registry proxy bypass list lookup code: #33859 commit 55c12d4

There was no discussion about the name resolution at the time, it's just the code that got written.

The approach was probably copied when adding the equivalent macOS sysconf bypass code while de-Carbon-ing stuff to use modern APIs in ~2.6 in 9dd6b1d.

@sethmlarson
Copy link
Contributor

Thanks for the ping @gpshead, I'm +1 on removing the name resolution as it definitely represents an information leak if done on the originating machine instead of by the proxy. Also +1 for a backport into security releases.

@gpshead gpshead changed the title urllib.request resolves the host before checking it against the system's proxy bypass list urllib.request resolves the host before checking it against the system's proxy bypass list [Security: LOW, minor info leak] Feb 13, 2024
gpshead pushed a commit that referenced this issue Feb 28, 2024
)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 28, 2024
…ythonGH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 28, 2024
…ythonGH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 28, 2024
…ythonGH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 28, 2024
…ythonGH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 28, 2024
…ythonGH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
@gpshead
Copy link
Member

gpshead commented Feb 28, 2024

thanks!

gpshead pushed a commit that referenced this issue Feb 28, 2024
…H-115210)

gh-115197: Stop resolving host in urllib.request proxy bypass (GH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
gpshead pushed a commit that referenced this issue Feb 28, 2024
…H-115210)

gh-115197: Stop resolving host in urllib.request proxy bypass (GH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…ythonGH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
ambv pushed a commit that referenced this issue Mar 19, 2024
…H-115210) (GH-116070)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
ambv pushed a commit that referenced this issue Mar 19, 2024
…H-115210) (GH-116068)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
ambv pushed a commit that referenced this issue Mar 19, 2024
…H-115210) (GH-116069)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
(cherry picked from commit c43b26d)

Co-authored-by: Weii Wang <weii.wang@canonical.com>
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…ythonGH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…ythonGH-115210)

Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason.  Proxy bypass lists are strictly name based.  Most implementations of proxy support agree.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes OS-mac OS-windows type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants