Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error with gaih_inet.constprop #410

Closed
devxzero opened this issue Nov 30, 2021 · 5 comments
Closed

Fatal error with gaih_inet.constprop #410

devxzero opened this issue Nov 30, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@devxzero
Copy link

devxzero commented Nov 30, 2021

Summary

Since one of our Tomcat websites upgraded to JDK 17 from JDK15, there are random fatal JVM crashes on daily/weekly basis. These crashes occur under both Azul JDK 17 and Adoptium JDK 17.

The causing code uses java.net.InetAddress to resolve a hostname from a given IP address, and then performs a reverse lookup to see if the given hostname is correct (not spoofed). The hostname is correct if the IP found in the reverse lookup matches the original given IP address.

This code runs for every new IP address that accesses the website, which can be about one to three times per second. Because of concurrency, a hostname lookup can happen multiple times in parallel for a single IP-address (this should be improved to only perform one lookup), until the hostname is resolved and stored in a MongoDB database.
When a HTTP requests for a new IP address comes in, the hostname resolving code runs in a separate threads using Reactor.

Because of this problem, I have logged all resolved IP addresses and hostnames, so that I have timestamps.
Currently, I don't think that a single IP-address is causing these crashes, although I haven't fully analyzed it. I think is more like a threading or concurrency problem inside or outside the JVM, which makes it difficult to reproduce.

I don't know how to fully interpret the attached error report, but it seems that one of the methods calls to java.net.InetAddress fails internally. The used methods calls are:
InetAddress.getByName(String)
InetAddress.getCanonicalHostName()
InetAddress.getHostAddress()

One of these methods internally calls the native Java method
Inet6AddressImpl.lookupAllHostAddr(hostname, characteristics)
Which then calls native code of getaddrinfo (known as "gai"), after which the gaih_inet.constprop problem occurs.

Steps to reproduce

I haven't found a way yet to reproduce this problem. It happens in an environment with multiple threads and clients. Simply running to code against a list of IP-addresses, isn't enough.

Triaging info

Java version:

$ java -version
openjdk version "17.0.1" 2021-10-19
OpenJDK Runtime Environment Temurin-17.0.1+12 (build 17.0.1+12)
OpenJDK 64-Bit Server VM Temurin-17.0.1+12 (build 17.0.1+12, mixed mode, sharing)

Other specs:
Machine: a VPS
OS: CentOS Stream release 8
JDK: OpenJDK Runtime Environment Temurin-17.0.1+12
Tomcat: Apache Tomcat 9.0.54
Spring Framework boot-depencencies 2.6.0
Languages: Java mixed with Kotlin 1.6.0
Spring Reactor 3.4.8

How did you install Java?
I used: https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.1%2B12/OpenJDK17U-jdk_x64_linux_hotspot_17.0.1_12.tar.gz

Did it work before?
Yes, under JDK 15 and lower.

Source code

@Service
class IpAccessService() {
	private fun updateDns(ip: String, runParallel: Boolean) {
		NetworkUtils.getHostnameAndReverseIpAsync(ip)
				.timeout(Duration.ofSeconds(8))
				.doOnError {
					logger.info("Unable to resolve hostname for: ${ip}. Reason: ${it.message}")
				}
				.doOnSuccessOrError { _, _ ->
					updateLastDnsCheck(ip)
				}.run {
					if (runParallel) subscribeOn(Schedulers.parallel()) else this
				}.subscribe {
					updateDns(ip, it.hostname.get())
					it.reverseIp.ifPresent { updateDnsIp(ip, it) }
				}
	}
}

public class NetworkUtils {
    public static Optional<String> getHostname(String ip) {
        try {
            logger.debug("Resolving IP to hostname. IP='{}'", ip);
            InetAddress inetAddress = InetAddress.getByName(ip);
            String hostName = inetAddress.getCanonicalHostName();
            logger.debug("Hostname resolved. IP='{}', Hostname='{}'", ip, hostName);

            return Optional.of(hostName);
        } catch (UnknownHostException e) {
            logger.debug("Unable to resolve hostname for ip: {}, {}", ip, e.getMessage());
            return Optional.empty();
        }
    }

    public static Optional<String> getIp(String hostname) {
        try {
            logger.debug("Resolving hostname to IP. Hostname='{}'", hostname);
            InetAddress inetAddress = InetAddress.getByName(hostname);
            String ipAddress = inetAddress.getHostAddress();
            logger.debug("Ip resolved. Hostname='{}', IP='{}'", hostname, ipAddress);

            return Optional.of(ipAddress);
        } catch (UnknownHostException e) {
            logger.debug("Unable to resolve reverse ip for hostname: {}, {}", hostname, e.getMessage());
            return Optional.empty();
        }
    }
	
    public static Mono<Optional<String>> getHostnameAsync(String ip) {
        return Mono.fromCallable(() -> getHostname(ip));
    }
    public static Mono<Optional<String>> getIpAsync(String hostname) {
        return Mono.fromCallable(() -> getIp(hostname));
    }

    public static Mono<HostnameAndReverseIp> getHostnameAndReverseIpAsync(String ip) {
        return getHostnameAsync(ip)
                .flatMap(hostnameO -> {
                    if (!hostnameO.isPresent()) {
                        return Mono.empty();
                    }

                    String hostname = hostnameO.get();

                    return getIpAsync(hostname)
                            .map(dnsIpO -> new HostnameAndReverseIp(hostname, dnsIpO.orElse(null)))
                            .onErrorReturn(new HostnameAndReverseIp(hostname, null));
                });
    }
}

Partial Error report

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f3c5f4cb3a1, pid=135735, tid=135872
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.1+12 (17.0.1+12) (build 17.0.1+12)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12 (17.0.1+12, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0xe53a1]  gaih_inet.constprop.7+0x311
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /apache-tomcat-9.0.54/bin/core.135735)
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED -Djava.util.logging.config.file=/apache-tomcat-9.0.54/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0022 -Djava.rmi.server.hostname=localhost -Dwicket.configuration=deployment -Xmx2000m -Dignore.endorsed.dirs= -Dcatalina.base=/apache-tomcat-9.0.54 -Dcatalina.home=/apache-tomcat-9.0.54 -Djava.io.tmpdir=/apache-tomcat-9.0.54/temp org.apache.catalina.startup.Bootstrap start

Host: Intel Core Processor (Haswell, no TSX, IBRS), 8 cores, 31G, CentOS Stream release 8
Time: Tue Nov 30 14:57:44 2021 CET elapsed time: 109978.619224 seconds (1d 6h 32m 58s)

---------------  T H R E A D  ---------------

Current thread (0x00007f3b880991b0):  JavaThread "parallel-2" daemon [_thread_in_native, id=135872, stack(0x00007f3b325d4000,0x00007f3b326d5000)]

Stack: [0x00007f3b325d4000,0x00007f3b326d5000],  sp=0x00007f3b326d2d80,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0xe53a1]  gaih_inet.constprop.7+0x311
C  [libc.so.6+0xe6e3b]  getaddrinfo+0x12b
C  [libnet.so+0x5d26]  Java_java_net_Inet6AddressImpl_lookupAllHostAddr+0x96
J 46056  java.net.Inet6AddressImpl.lookupAllHostAddr(Ljava/lang/String;)[Ljava/net/InetAddress; java.base@17.0.1 (0 bytes) @ 0x00007f3c4b471487 [0x00007f3c4b4713c0+0x00000000000000c7]
J 62344 c2 java.net.InetAddress$NameServiceAddresses.get()[Ljava/net/InetAddress; java.base@17.0.1 (209 bytes) @ 0x00007f3c4af43f18 [0x00007f3c4af43e00+0x0000000000000118]
J 62536 c2 java.net.InetAddress.getHostFromNameService(Ljava/net/InetAddress;Z)Ljava/lang/String; java.base@17.0.1 (109 bytes) @ 0x00007f3c4d099208 [0x00007f3c4d098040+0x00000000000011c8]
J 55962 c2 reactor.core.publisher.MonoFlatMap.subscribeOrReturn(Lreactor/core/CoreSubscriber;)Lreactor/core/CoreSubscriber; (41 bytes) @ 0x00007f3c4c4d762c [0x00007f3c4c4d7520+0x000000000000010c]
J 48285 c2 reactor.core.publisher.Mono.subscribe(Lorg/reactivestreams/Subscriber;)V (89 bytes) @ 0x00007f3c4ac49dc8 [0x00007f3c4ac49940+0x0000000000000488]
J 56420 c2 reactor.core.scheduler.WorkerTask.call()Ljava/lang/Void; (208 bytes) @ 0x00007f3c4b827888 [0x00007f3c4b827780+0x0000000000000108]
J 55388 c2 reactor.core.scheduler.WorkerTask.call()Ljava/lang/Object; (5 bytes) @ 0x00007f3c4c4088d4 [0x00007f3c4c4088a0+0x0000000000000034]
J 60589% c2 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base@17.0.1 (187 bytes) @ 0x00007f3c4acffb1c [0x00007f3c4acff940+0x00000000000001dc]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@17.0.1
j  java.lang.Thread.run()V+11 java.base@17.0.1
v  ~StubRoutines::call_stub
V  [libjvm.so+0x84d434]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x334
V  [libjvm.so+0x84ef1c]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x20c
V  [libjvm.so+0x90b760]  thread_entry(JavaThread*, JavaThread*)+0x70
V  [libjvm.so+0xefe6f0]  JavaThread::thread_main_inner()+0x180
V  [libjvm.so+0xf01b62]  Thread::call_run()+0xe2
V  [libjvm.so+0xc3ccd1]  thread_native_entry(Thread*)+0xe1

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 46056  java.net.Inet6AddressImpl.lookupAllHostAddr(Ljava/lang/String;)[Ljava/net/InetAddress; java.base@17.0.1 (0 bytes) @ 0x00007f3c4b471414 [0x00007f3c4b4713c0+0x0000000000000054]
J 62344 c2 java.net.InetAddress$NameServiceAddresses.get()[Ljava/net/InetAddress; java.base@17.0.1 (209 bytes) @ 0x00007f3c4af43f18 [0x00007f3c4af43e00+0x0000000000000118]
J 62536 c2 java.net.InetAddress.getHostFromNameService(Ljava/net/InetAddress;Z)Ljava/lang/String; java.base@17.0.1 (109 bytes) @ 0x00007f3c4d099208 [0x00007f3c4d098040+0x00000000000011c8]
J 55962 c2 reactor.core.publisher.MonoFlatMap.subscribeOrReturn(Lreactor/core/CoreSubscriber;)Lreactor/core/CoreSubscriber; (41 bytes) @ 0x00007f3c4c4d762c [0x00007f3c4c4d7520+0x000000000000010c]
J 48285 c2 reactor.core.publisher.Mono.subscribe(Lorg/reactivestreams/Subscriber;)V (89 bytes) @ 0x00007f3c4ac49dc8 [0x00007f3c4ac49940+0x0000000000000488]
J 56420 c2 reactor.core.scheduler.WorkerTask.call()Ljava/lang/Void; (208 bytes) @ 0x00007f3c4b827888 [0x00007f3c4b827780+0x0000000000000108]
J 55388 c2 reactor.core.scheduler.WorkerTask.call()Ljava/lang/Object; (5 bytes) @ 0x00007f3c4c4088d4 [0x00007f3c4c4088a0+0x0000000000000034]
J 60589% c2 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base@17.0.1 (187 bytes) @ 0x00007f3c4acffb1c [0x00007f3c4acff940+0x00000000000001dc]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@17.0.1
j  java.lang.Thread.run()V+11 java.base@17.0.1
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000400

Register to memory mapping:

RAX=0x00007f3b326d2d80 is pointing into the stack for thread: 0x00007f3b880991b0
RBX=0x00007f3b326d35b0 is pointing into the stack for thread: 0x00007f3b880991b0
RCX=0x0000000000007f3b is an unknown value
RDX=0x00007f3bbc019340 points into unknown readable memory: 0x0000078000000002 | 02 00 00 00 80 07 00 00
RSP=0x00007f3b326d2d80 is pointing into the stack for thread: 0x00007f3b880991b0
RBP=0x00007f3b326d3010 is pointing into the stack for thread: 0x00007f3b880991b0
RSI=0x00007f3bbc019370 points into unknown readable memory: 0x00007f3b00000780 | 80 07 00 00 3b 7f 00 00
RDI=0x0000000000000780 is an unknown value
R8 =0x0000000000000007 is an unknown value
R9 =0x00007f3bbc004390 points into unknown readable memory: 0x0006000600070003 | 03 00 07 00 06 00 06 00
R10=0x00007f3c5f7a5bc0: <offset 0x00000000003bfbc0> in /lib64/libc.so.6 at 0x00007f3c5f3e6000
R11=0x0000000000000007 is an unknown value
R12=0x00007f3b326d35b0 is pointing into the stack for thread: 0x00007f3b880991b0
R13=0x0 is NULL
R14=0x00000000000003f0 is an unknown value
R15=0x0 is NULL

The full error report is: errorreport.txt
The error reports always look very similar. Always with this line: C [libc.so.6+0xe53a1] gaih_inet.constprop.7+0x311

@devxzero devxzero added the bug Something isn't working label Nov 30, 2021
@devxzero
Copy link
Author

What would be the best place to report this bug? The problem isn't just limited to Adoptium JDK 17, but also to Azul JDK 17, but doesn't occur in JDK 15.

@karianna
Copy link
Contributor

Have you tried with 17.0.2?

@devxzero
Copy link
Author

I'll give 17.0.2 a try and report back when it crashes. (The production server will usually crash within a week. I haven't used JDK 17 anymore since I reported this issue.)

@devxzero
Copy link
Author

I haven't seen the problem anymore, since the upgrade from 17.0.1_12 to 17.0.2_8, so from my perspective it seems that something was fixed in the JDK 👍
So I'll close this ticket.

@JPL1988
Copy link

JPL1988 commented Dec 26, 2024

I encountered the same problem in jdk 1.8.0_302,But I don't know how to solve it because it is difficult to reproduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants