On a weekend or on a non busy day in your environment, you might see netscaler displaying high cpu (more than 80%) on the hypervisor. If your hypervisor team reports this to you, dont worry. Nothing is wrong.
The NetScaler packet processing engine is always “looking for work”, even when there is no work to be done. Therefore, it will do everything it can to take control of the CPU and not release it.
On a server installed with NetScaler VPX and nothing else, this results in it looking like that NetScaler is consuming the entire CPU. Looking at the CPU utilization from “inside NetScaler” (by using the CLI or the GUI) provides a real picture of NetScaler CPU capacity being used.
When we execute top command on NetScaler, we see the outcome:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1163 root 1 44 r0 1291M 1292M CPU2 2 1643.1 100.00% NSPPE-01 1164 root 1 44 r0 1291M 1292M CPU3 3 1643.1 100.00% NSPPE-02 1162 root 1 44 r0 1292M 1292M CPU1 1 1643.1 100.00% NSPPE-00
Example showing stat cpu command for true value:
Netscaler> stat cpu CPU statistics ID Usage 1 1 Done
CPU is a finite resource. Like many resources, there are limits to a CPU’s capacity. The NetScaler appliance has two kinds of CPUs in general: The Management CPU and Packet CPU.
Wherein, the Management CPU is responsible for processing all the Management traffic on the appliance and the Packet CPU(s) are responsible for handling all the data traffic for eg. TCP , SSL etc.
When diagnosing a complaint involving high CPU, start by gathering the following fundamental facts:
- CPUs impacted: nsppe (one or all) & management.
- Approximate time stamp/duration.
The following command o/p are quintessential for troubleshooting the high CPU issues:
- Output of top command: Gives the CPU utilization percentage by the processes running on the NetScaler.
- Output of stat system memory command: Gives the memory utilization percentage which can also contribute in the CPU utilization.
- Output of stat system cpu command: This gives the stats about the current CPU utilization in total on the appliance.
Sample o/p of stat cpu command:
> stat cpu CPU statistics ID Usage 1 29
The above o/p indicates that there is only 1 CPU (utilized for both Management and Data traffic) and the percentage of utilization is 29%.
The CPU ID is 1.
Now, there are appliances with multiple cores (nCore ) wherein more than single core is allocated to the appliance and then we see multiple CPU IDs on the “stat system cpu ” o/p.
*The high CPU seen when running a “top” command does not impact the performance of the box. It also “does not” mean that the NetScaler is running at high CPU or consuming all of the CPU. The NetScaler Kernel runs on top of BSD and that is what is being seen. Although it appears to be using the full amount of the CPU, it is actually not.
We can further follow the below steps for understanding the CPU usage:
- Check the following counters to understand CPU usage.CLASSIC:
(If AppFW or CMP is configured, then looking at slave_cpu_use also makes sense for classic)
(For an 8 Core system)
mgmt_cpu_use (CPU0 – nscollect runs here)
master_cpu_use (average of cpu(1) thru cpu(7))
- How to look for CPU use for a particular CPU?
Use the nsconmsg command and search for cc_cpu_use and grep for the CPU you are interested in.
The output will look like the following:
Index rtime totalcount-val delta rate/sec symbol-name&device-no 320 0 209 15 2 cc_cpu_use cpu(8) 364 0 205 -6 0 cc_cpu_use cpu(8) 375 0 222 17 2 cc_cpu_use cpu(8) 386 0 212 -10 -1 cc_cpu_use cpu(8) 430 0 216 6 0 cc_cpu_use cpu(8) 440 0 201 -15 -2 cc_cpu_use cpu(8) 450 0 208 7 1 cc_cpu_use cpu(8) 461 0 202 -6 0 cc_cpu_use cpu(8) 471 0 209 7 1 cc_cpu_use cpu(8) 482 0 238 29 4 cc_cpu_use cpu(8) 492 0 257 19 2 cc_cpu_use cpu(8)
- Look at the total count (third) column and divide by 10 to get the CPU percentage. For eg. in the last line above, 257 implies that 257/10 = 25.7% CPU is used by CPU(8).
Run the following command to investigate the nsconsmg counters for CPU issue:
nsconmsg –K newnslog –g cpu_use –s totalcount=600 –d current nsconmsg –K newnslog –d current | grep cc_cpu_use
- Look at the traffic, memory and CPU in conjunction. We may be hitting platform limits if it sustained high CPU usage. Try to understand if the CPU has gone up because of traffic. If so, try to understand if it is genuine traffic or any sort of attack.
- We can further check for the Profiler o/p to understand who is taking the CPU.
For details on the profiler o/p , logs , refer to the below article:
- We can further use the CPU counters mentioned in the below article for more details: