Troubleshooting the problem of high CPU usage in Java applications

Problem Description

The CPU in the test environment suddenly rises, and there are no abnormalities in the log query. You can use the JVM debugging tool to locate the problem.


problem analysis

What should you do if your Java application fills up the CPU 100%?

JVM tuning is generally enabled when a bottleneck occurs in the external service provided by a single node under load stress testing, and JVM tuning is only part of it. Before JVM tuning, I usually analyze CPU consumption, memory consumption, disk IO, network IO and program problems. Only when these indicators are normal will I turn to JVM tuning. Excellent, mainly depends on whether the memory size allocation is reasonable, whether the memory ratio is reasonable, whether the selection of garbage collector is reasonable based on system characteristics, and relevant threshold changes will be made to the JVM (upgrade threshold, JIT [Just In Time]), Collect relevant data by opening JVM logs and repeat this process

Usually the manifestation of performance bottleneck is excessive resource consumption, insufficient performance of the external processing system, or low resource consumption, but the response speed of the program still does not meet the requirements.
Resources are mainly consumed in CPU, file IO, network IO, and memory. The machine’s resources are limited. When a certain resource is consumed too much, it will usually cause the system to respond slowly.
The main reason why the resource consumption is not high, but the response speed of the program still does not meet the requirements is that the program code does not run efficiently enough, the resources are not fully used, or the program structure is unreasonable.

The following will be explained based on the actual situation. For Java applications, what should be done when the CPU consumption is too high?

Analysis of excessive CPU consumption

In Linux, the CPU is mainly used for task processing of interrupts, kernel and user processes. The priority is interrupt > kernel > user process.

context switch

Each CPU (or each core CPU in a multi-core CPU) can only execute one thread at the same time. Linux adopts preemptive scheduling, which allocates a certain execution time to each thread. When the execution time is reached, there is IO blocking in the thread. Or when a high-priority thread is to be executed, Linux will switch the executing thread. During the switch, it must store the execution status of the current thread and restore the status of the thread to be executed. This process is called context switching.
For Java applications, typically when performing file IO operations, network IO operations, lock waiting, or thread Sleep, the current thread will enter a blocked or dormant state, thus initiating context switching. Too many context switches may cause the kernel to occupy more CPU. Using, the application’s response speed decreases

run queue

Each CPU maintains a queue of runnable threads
The larger the run queue value, the longer the thread will take to complete.
It is recommended to control the run queue on each CPU core to 1 to 3

Utilization

CPU utilization is the percentage of CPU usage in five parts: user process, kernel, interrupt processing, IO waiting, and idle. These five values ​​are used to analyze key indicators of CPU consumption.
It is recommended that the ratio of user process CPU consumption/kernel CPU consumption is around 65%~70%/30%~35%

problem solved

Idea:
Locate the java service process –> Locate the java thread –> Locate the code block

Locate java service process

A server may have multiple java services. passtopThe command can be used to check which service has higher CPU usage.

top

The output of the top command can be divided into two parts: the first half is system statistics, and the second half is process information.

First half of information:
The first line: the current system time, system running time, and the current number of logged-in users. load average represents the average load of the system, that is, the average length of the task queue, which represents 1 minute, 5 minutes, and 15 minutes respectively.
The second line: process statistics, including the number of running processes, the number of sleeping processes, the number of stopped processes, and the number of zombie processes.
The third line: CPU statistical information, us represents the user space CPU occupancy, sy represents the kernel space CPU occupancy, ni represents the user process space CPU occupancy of the process whose priority has changed, id represents the idle CPU occupancy, and wa represents the CPU occupancy in the user process space. The percentage of waiting for IO during process execution, hi represents the percentage of hardware interrupts, and si represents the percentage of software terminals.

A few concepts:

RES: resident memory usage resident memory
(1) The memory size currently used by the process, but does not include swap out
(2) Contains shares of other processes
(3) If you apply for 100m of memory and actually use 10m, it will only grow by 10m, contrary to VIRT
(4) Regarding the memory occupied by the library, it only counts the memory size occupied by the loaded library files.
RES = CODE + DATA

VIRT:virtual memory usage
(1) The size of virtual memory “required” by the process, including libraries, code, data, etc. used by the process
(2) If a process applies for 100m of memory, but only 10m is actually used, then it will increase by 100m, not the actual usage.
VIRT = SWAP + RES

You can press 1 after entering the top view, and the consumption will be displayed by core.

Remember the PID numbers with higher CPU usage.

ps -aux |grep pid 
or 
top -p pid

Replenish:
When the CPU consumption is serious, it is mainly reflected in the value of us, sy, wa or hi becoming high. The value of wa is caused by IO waiting; the value of hi becomes high mainly due to hardware interruption, such as the situation where the network card receives data frequently;
For Java applications, serious CPU consumption is mainly reflected in the two values ​​of us and sy.
When the us value is too high, it means that the running application consumes most of the CPU. In this case, for Java applications, the most important thing is to find the code executed by the thread that specifically consumes the CPU.

Locate java thread

There are many threads executing in each Java service. First locate which thread has higher CPU usage. There are three ways:

  1. via ps command
ps H -eo pid,tid,%cpu --sort=%cpu |grep <PID>
  1. Through the top command
top -H -p <PID>
  1. via ps command
ps -mp <PID> -o THREAD,tid,time

Through any of the above methods, after finding the thread TID with high CPU usage, execute the following command:

printf "0x%x\n" < Thread TID>

Purpose: Convert the thread TID to hexadecimal to prepare for later searching for jstack logs;

Locate code blocks

Use jstack to generate a thread snapshot of the virtual machine at the current moment:

jstack -l <pid> >> jstackLog.out

Purpose: Save the current stack information as a file and find the code block where the problem is located through the hexadecimal TID;

Search to get the corresponding thread information, and you can intuitively see the code exception information.


Related Posts

Baoland BES installation and Springboot project packaging and deployment and websocket solutions

Yuanchuang Essay Collection|[Cloud Native] Docker View Log Usage Notes

IDEA2021.2 installation and configuration (continuous updates)

Modify the built-in user password in elasticsearch version 8.0 or above

Use java to implement a simple calculator

Java API to operate HBase

Details of automotive project based on SSM and Shiro

To merged Charts

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*