A JVM Crash Analysis

A JVM Crash Analysis

·

3 min read


1.1 Background

Crash Time: 2023-01-30, around 15:30 after a half-month vacation.

Here's the simplified architecture:

Note: The system is deployed in LAN.

                                        +--------------+
                                        |              |
                               +------> |   service01  |     down
                               |        |              |
                               |        +--------------+
                               |
                               |        +--------------+
         +--------------+      |        |              |
         |              |      +----->  |   service02  |     up
         |              |      |        |              |
+------> |    load      +------+        +--------------+
         |    balance   |      |
         |              |      |        +--------------+
         +--------------+      |        |              |
              IP Hash          +----->  |   service03  |     up
                               |        |              |
                               |        +--------------+
                               |
                               |        +--------------+
                               |        |              |
                               +----->  |   service04  |     up
                                        |              |
                                        +--------------+

One of four SpringBoot-based java cluster services crashed suddenly and a hs_err_pid.log was produced at the server where the crashed services existed. The rest of the services were working normally.

All of the services are running on CentOS7.6 and connected to the same DB through the HikariCP datasource.

1.2 hs_err_pid.log

There are lots of information in the log file.

  1. JVM

    • JRE version: OpenJDK Runtime Environment (8.0_222-b10) (build 1.8.0_222-b10)

    • Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops)

  2. OS

    • CentOS Linux release 7.6.1810 (Core)

    • CPU: total 16

    • MemTotal: 32 GB

1.2.2 Analyze the file

At the beginning of the file is the summary of the crash.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f629c217070, pid=764, tid=0x00007f6272f75700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_222-b10) (build 1.8.0_222-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.222-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libfreetype.so.6+0x20070] TT_Load_Glyph_Header+0x20
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happed outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.

We can see the Problematic frame:

# Problematic frame:
# C  [libfreetype.so.6+0x20070] TT_Load_Glyph_Header+0x20

So, it seems clear that the JVM got crashed due to native code running.

What is this libfreetype.so.6 and what's the function of this thing?

Linux uses libxxx.so naming policy to indicate dynamic library files. The 'so' means Shared Object, which means *.so files can be shared.

FreeType: A freely available software library to render fonts.

Linux uses FreeType and we can see the file libfreetype.so.6 at /usr/lib64 and our services also use FreeType and libfreetype.so.6 can be found.

Checking the source code of Freetype in truetype/ttgload.c, we can see the method TT_Load_Glyph_Header().


After Looking at the Internet for a while, I found a similar problem at Red Hat. According to the site, it is because JVM's temp files are removed while it is running causing it to crash. The JVM's temp files are stored in the directory /tmp with a 10d default expiry rule in Linux CentOS7.

1.2.3 Summary

The crashed service wasn't accessed during the half-month vacation causing the temp file deleted by Linux automatically. When someone used a function that needs to access the tmp file in this case it was printing, the service crashed.

1.2.4 Solution

After the service crashes, usually restart the service will fix the problem. But to avoid this kind of problem once and for all, we need to set the Java startup environment variable -Djava.io.tmpdir=<custom location> to change temporary font files being created in /tmp.