Using `top` command shows PHP FPM processes using more memory than available

I've read this answer about understanding top, as well as man top, but I think I still have trouble turning the presented data from top into actual information.

I'm logged on an Amazon EC2 instance which is part of a load-balanced group. It's running PHP FPM and I've filtered top to show only the php-fpm processes:

top - 11:27:43 up 18:59,  2 users,  load average: 0.59, 0.79, 0.74
Tasks: 171 total,   1 running, 125 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  9.1 sy,  0.0 ni, 90.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1935.6 total,    177.7 free,    692.3 used,   1065.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1052.0 avail Mem

  PID USER            PR    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                   
 1566 root            20  356.2m  30.5m  23.5m S        1.6   0:02.14 php-fpm: master process (/etc/php-fpm.conf)
 2188 webapp          20  442.2m  51.3m  31.8m S        2.7   2:18.47 php-fpm: pool www
 2189 webapp          20  442.1m  50.8m  30.4m S        2.6   2:19.21 php-fpm: pool www
 2190 webapp          20  444.2m  52.6m  30.4m S        2.7   2:20.14 php-fpm: pool www
 2191 webapp          20  442.1m  51.1m  31.3m S        2.6   2:22.69 php-fpm: pool www
 2192 webapp          20  442.2m  50.8m  31.6m S        2.6   2:19.64 php-fpm: pool www
 2193 webapp          20  436.3m  46.5m  32.0m S        2.4   2:17.29 php-fpm: pool www
 2194 webapp          20  452.2m  60.6m  30.4m S        3.1   2:19.82 php-fpm: pool www
 2195 webapp          20  438.1m  48.0m  31.6m S        2.5   2:17.78 php-fpm: pool www
 2197 webapp          20  442.2m  50.9m  30.6m S        2.6   2:18.28 php-fpm: pool www
12626 webapp          20  443.6m  50.4m  28.4m S        2.6   0:37.02 php-fpm: pool www
12627 webapp          20  443.5m  50.3m  28.6m S        2.6   0:35.68 php-fpm: pool www
12628 webapp          20  438.0m  45.1m  29.1m S        2.3   0:36.28 php-fpm: pool www
12629 webapp          20  443.9m  51.4m  29.3m S        2.7   0:35.26 php-fpm: pool www
12630 webapp          20  441.6m  48.9m  29.5m S        2.5   0:34.90 php-fpm: pool www
12631 webapp          20  443.6m  50.5m  28.7m S        2.6   0:34.93 php-fpm: pool www
12632 webapp          20  436.0m  43.2m  29.1m S        2.2   0:36.01 php-fpm: pool www
12635 webapp          20  441.6m  48.1m  28.3m S        2.5   0:34.56 php-fpm: pool www
12636 webapp          20  446.1m  55.0m  30.8m S        2.8   0:37.10 php-fpm: pool www
12637 webapp          20  441.9m  48.8m  29.0m S        2.5   0:35.16 php-fpm: pool www
12639 webapp          20  443.6m  50.3m  28.5m S        2.6   0:34.23 php-fpm: pool www
12640 webapp          20  438.0m  44.7m  28.9m S        2.3   0:36.33 php-fpm: pool www
12641 webapp          20  442.8m  49.5m  28.4m S        2.6   0:35.51 php-fpm: pool www
12642 webapp          20  443.8m  50.8m  29.1m S        2.6   0:36.22 php-fpm: pool www
12643 webapp          20  438.0m  44.2m  29.2m S        2.3   0:33.49 php-fpm: pool www
12644 webapp          20  440.0m  47.4m  29.3m S        2.5   0:36.44 php-fpm: pool www
12645 webapp          20  441.6m  48.7m  29.0m S        2.5   0:34.38 php-fpm: pool www
12646 webapp          20  441.6m  48.5m  28.8m S        2.5   0:34.53 php-fpm: pool www
12647 webapp          20  437.6m  44.5m  28.5m S        2.3   0:34.73 php-fpm: pool www
12648 webapp          20  437.7m  44.4m  28.6m S        2.3   0:33.64 php-fpm: pool www
12649 webapp          20  440.0m  46.9m  29.0m S        2.4   0:35.81 php-fpm: pool www
12651 webapp          20  444.1m  51.1m  29.0m S        2.6   0:34.77 php-fpm: pool www
12652 webapp          20  439.8m  47.0m  29.1m S        2.4   0:35.02 php-fpm: pool www
12657 webapp          20  443.9m  51.7m  29.9m S        2.7   0:35.35 php-fpm: pool www
12658 webapp          20  438.0m  45.5m  29.3m S        2.3   0:34.81 php-fpm: pool www
12667 webapp          20  441.9m  49.6m  29.6m S        2.6   0:34.21 php-fpm: pool www

Here's what I don't understand:

  1. What does the VIRT column display actually? I understand that it's virtual memory, but on line 2 of the memory summary, I can see that there's 1052.0 avail Mem, while the processes in the VIRT column sum up to way more than that.
  2. How is there 1052.0 of available swap/virtual memory, while there's 0.0 total?
  3. How is there only 692.3 memory used, when the sum of the %MEM column is 88.6%? Shouldn't it be at least 1935 * 0.886 = 1714? It looks like I get close to that number if I add in the 1065.7 buff/cache to those 692.3, but what's the reason behind that?
  4. Why is the master process sleeping (as indicated by the S column)? Shouldn't at least it be running?

My main concern

The memory_limit setting in PHP ini is set to 256M. That should mean that each child process spawned by FPM could use up to that much memory, correct? If that's right and I have 35 child processes (as currently shown by top), then theoretically they could use (or attempt to use) up to 35 * 256 = 8960M memory. That's way more than the 1935M I have in total.

Also, the total sum of memory in the RES column is 1749.6. In other words, the used memory is 1749/1935 = 90.38%. But that's when all processes are in a sleep state! If there's a spike in traffic and memory consumption suddenly goes up, that sounds disastrous.

I think about reducing the allowed memory from 256M to 128M and reducing pm.max_children from the default 50 to, say, 12. This way, the total possible memory consumption of PHP FPM would be 12 * 128 = 1536M, which will be within the actual available memory on the system. Does that make sense?


A few things to consider. Firstly, you see 692.3 used, but you also need to factor in 1065.7 buff/cache. If you sum the RES column it should align with the total of used and buff/cache. This is because buff/cache represents data used for those processes, and while not immediately available, should be available shortly.

As to memory_limit and pm.max_children: You can run the most resource hungry script in your app and measure its memory usage using a memory_get_peak_usage().

Take that value, add in some capacity for growth of your dataset, and now you have a realistic maximum memory needed per script. With that figure you can now estimate the pm.max_children your server can handle. Adjust this figure down to allow for other processes on your server.

You can also now set a more accurate memory_limit - but be aware breaching this value will cause your script to die. I find it often works best as a control to stop runaway scripts from killing your service, and set it higher than my most memory demanding scripts to allow for future, more complex jobs.

You should also check the fpm (pm.*) variables and adjust accordingly.

