many hanging zmu processes are not exiting, process leak

Discussions related to the 1.36.x series of ZoneMinder
fmeili1
Posts: 14
Joined: Fri Jun 11, 2021 4:15 pm

many hanging zmu processes are not exiting, process leak

Post by fmeili1 »

Hi,
I'm running Zoneminder v1.36.4 in a docker container zoneminderhq/zoneminder:latest-ubuntu18.04 (in an Ubuntu 20.10 docker host system). Everything runs perfect so far with 5 cameras (3 different models, 1 WiFi, 4 PoE) installed.

Two of the cameras sometimes does not respond (or does not respond fast enough) and the stream breaks up - for what reason ever (I think it's a camera firmware bug). Wen I try to watch the stream at the same time with VLC, I also see this stream problem, so it's not a Zoneminder issue. This situation sometimes last for a couple of minutes.

BUT, when this happens (which Zoneminder logs it as an "Failed to capture image from monitor ..." entry in the log), I will get suddenly a new hanging zmu process which NEVER ends. So over the time I will end up with many dozens of "dead" apache2 processes with zmu child processes. If the number of these processes is too high, the Zoneminder web UI does not longer respond and I have to restart Zoneminder or delete all the hanging zmu processes manually to solve this situation. Even if the camera is responding later again correctly, these dead zmu processes will never end. In fact, I will get 3 new processes for each of this situation because "apache2 -k start" starts "sh -c /usr/bin/zmu -m<number> -s" which starts "zmu -m<number> -s".

This is an example excerpt of such a process sub tree, e. for monitor 3:
...
root 439 1 0 Jul13 ? 00:00:03 /usr/sbin/apache2 -k start
...
www-data 292448 439 0 09:16 ? 00:00:12 /usr/sbin/apache2 -k start
www-data 312554 292448 0 10:24 ? 00:00:00 sh -c /usr/bin/zmu -m3 -s
www-data 312555 312554 0 10:24 ? 00:00:00 /usr/bin/zmu -m3 -s
www-data 299050 439 0 09:37 ? 00:00:08 /usr/sbin/apache2 -k start
www-data 310367 299050 0 10:16 ? 00:00:00 sh -c /usr/bin/zmu -m3 -s
www-data 310368 310367 0 10:16 ? 00:00:00 /usr/bin/zmu -m3 -s
...
All of these dead apache2 processes are belong to one parent (in this example pid 439)

Maybe, I can solve this situation by configuration, but I was not able to find a useful option for the zmu command which may help here (e.g. a timeout). Maybe, it looks like a problematic error handling in the zmu process for such a case, which ends up in never exiting this process.

Any ideas about how to resolve this situation? I'm the only one with such a problem?

Thanks,
Frank
jsylvia007
Posts: 116
Joined: Wed Mar 11, 2009 8:32 pm

Re: many hanging zmu processes are not exiting, process leak

Post by jsylvia007 »

I'm going to follow this post, because I believe this is related to what I just posted as well.
fmeili1
Posts: 14
Joined: Fri Jun 11, 2021 4:15 pm

SOLVED: many hanging zmu processes are not exiting, process leak

Post by fmeili1 »

IT'S SOLVED!

Hi again,

I have some new findings. It looks like I was digging in the complete wrong direction and my initial assumptions were all wrong. The root cause of all these problems where resource constraints on the system which runs my Zoneminder docker instance!

I've played around with my ulimits inside the docker container and also on the docker host system (nofile and memlock). Finally, after increasing the nofile and memlock values in my /etc/security/limits.conf the problems are not longer exists. Even my 2 cameras from one vendor which sometimes had problems with streams (even when I try this in VLC while the problems were ongoing) are gone now. Maybe the resource constraints in my Zoneminder system makes these cameras behave wrong (maybe the camera started to behave wrong because of never terminated connections from Zoneminder to them?!).

After this change, everything works and even my load on the system is now way lower than before.

Hope this may help also someone else,

Frank
mikb
Posts: 595
Joined: Mon Mar 25, 2013 12:34 pm

Re: SOLVED: many hanging zmu processes are not exiting, process leak

Post by mikb »

fmeili1 wrote: Fri Jul 16, 2021 4:00 am (maybe the camera started to behave wrong because of never terminated connections from Zoneminder to them?!).
That is entirely possible -- some camera's firmware isn't tested to cover all sorts of weird conditions. You could have found a "denial of service" bug there, in the camera, in that you ran the camera out of its limited resources. Of course it shouldn't happen, but flaky firmware ... :)

It also shows why using a tried-and-tested TCP stack that has had (most) of the stupid mistakes knocked out of it, is a good idea, rather than "I know, we'll write our own, how hard can it be?" :)
jsylvia007
Posts: 116
Joined: Wed Mar 11, 2009 8:32 pm

Re: SOLVED: many hanging zmu processes are not exiting, process leak

Post by jsylvia007 »

fmeili1 wrote: Fri Jul 16, 2021 4:00 am Finally, after increasing the nofile and memlock values in my /etc/security/limits.conf the problems are not longer exists.

After this change, everything works and even my load on the system is now way lower than before.
Hey Frank. I'd like to investigate this for myself and see if it solves my problem as well... Can you describe your setup (how many monitors (cameras), CPU info, Memory info, and what you eventually changes the values FROM and what you changed them TO? Assuming this was the only change you made, I'd like some insight to see what direction I may need to shift my values to.

Thanks!
fmeili1
Posts: 14
Joined: Fri Jun 11, 2021 4:15 pm

Re: many hanging zmu processes are not exiting, process leak

Post by fmeili1 »

Hi,

@mikb:
You're right, this is not like a camera (OS/firmware/ip-stack) should behave (it's a Reolink RLC-510A). Also it's always a good idea to block these cameras from internet access at all... no one knows what else may going on under the hood...

@jsylvia007:
I run the whole system on a quite old (Q4'13) Intel NUC D34010WYK (4th Gen Intel Core i3-4010U, 16GB RAM, 2 cores, 4 threads). The system base is a Kubuntu 20.10 (I know, it's not ideal for running background server processes, but I also use it from time to time as a video conferencing system also).

I'm using 5 cameras so far (2 Reolink RLC-510A PoE, 2 Revotech I706-(2)-P PoE, 1 Amcrest IP2M-841B-V3 WiFi). I run a typical setup and defined all cameras with a LowRes monitor stream in Modect mode with low FPS (5-9) an a corresponding HiRes monitor stream in Nodect mode linked to the LowRes monitor also with low FPS (5-8). All zones are defined in the LowRes monitors. For the 2 Reolink monitors I'm using RTMP (to reduce smearing) and the others I'm using RTSP protocol.

There are four docker containers running permanent on this system:
- OpenHAB 3.0.0 (smart home) via docker image openhab/openhab:3.0.2
- Mosquitto 2.0.10 (MQTT message broker) via docker image eclipse-mosquitto:latest
- Jellyfin 10.7.5 (media server) via docker image jellyfin/jellyfin:latest
- Zoneminder v1.36.4 (video surveillance) vi adocker image zoneminderhq/zoneminder:latest-ubuntu18.04

My previous old settings (when the error sometimes occured) in the /etc/security/limits.conf had only used 655350 for nofile as soft and hard limit. The others where on default values and it resulted in:

old:
- nofile 655350 (set by me)
- memlock 4098375 (not set by me, but it was the default on my system)
- nproc 12768 (not set by me, but it was the default on my system)

For my new settings, I've set nofile and nproc by times 10 and increased the memlock by about 1GB to the following values (for both soft and hard):
new:
- nofile 6553500
- memlock 5120000
- nproc 127680

I pretty sure this values are not ideal, but they work for me. I also have had not enough time to play more with this values to find out which of them really caused the problem.

I have not changed something else in my setup. Only these limit changes solved my hanging process issues and also my load is usually now around 0.7 to 1.2 which was between 2.0 and 2.5 before!

Just to mention, I've used the option --shm-size="2g" to create my zoneminder docker instance!

I hope this helps a bit.

Frank
jsylvia007
Posts: 116
Joined: Wed Mar 11, 2009 8:32 pm

Re: many hanging zmu processes are not exiting, process leak

Post by jsylvia007 »

fmeili1 wrote: Fri Jul 16, 2021 5:35 pm For my new settings, I've set nofile and nproc by times 10 and increased the memlock by about 1GB to the following values (for both soft and hard):
new:
- nofile 6553500
- memlock 5120000
- nproc 127680
Frank - This is very helpful.

By any chance, do you know how to query what the values currently are?

I know I can use "ulimit -n" (for nofile) and "ulimit -u" (for nproc) to check the hard (-H) and soft (-S) values, but I'm not sure what the commands are for memlock.

Interestingly, here are my outputs:

nofile:
SOFT: 1024
HARD: 1048576

nproc:
SOFT: 128118
HARD: 128118

Thanks!
fmeili1
Posts: 14
Joined: Fri Jun 11, 2021 4:15 pm

Re: many hanging zmu processes are not exiting, process leak

Post by fmeili1 »

I see one error in my previous posts. I've mixed up the number of open files with a different system which I'm working on in parallel, sorry about that confusion (one should not do too much tasks in parallel :wink: ). The correct new number is 90000 for my system, before I've only used 4096. I think the specific value does not really matter, it should be set to a very large number to have enough reserve to not run to that border. I was not able to just set it to unlimted, so I've used a "large" number...

I use
ulimit -a

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63251
max locked memory (kbytes, -l) 5120000
max memory size (kbytes, -m) unlimited
open files (-n) 90000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127680
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
and the content of my /etc/security/limits.conf are
* soft nofile 90000
* hard nofile 90000
* soft memlock 5120000
* hard memlock 5120000
* soft nproc 127680
* hard nproc 127680
jsylvia007
Posts: 116
Joined: Wed Mar 11, 2009 8:32 pm

Re: many hanging zmu processes are not exiting, process leak

Post by jsylvia007 »

fmeili1 wrote: Fri Jul 16, 2021 10:32 pm
* soft nofile 90000
* hard nofile 90000
* soft memlock 5120000
* hard memlock 5120000
* soft nproc 127680
* hard nproc 127680
Frank - Once again thank you!!

My original values were:
max locked memory (kbytes, -l) 65536
open files (-n) 1024
max user processes (-u) 128118
I used your numbers blindly and I'm going to monitor things going on. Interestingly enough, my nproc was already higher than yours by default, so I think I set that to an even 150000...

Will report back!
jsylvia007
Posts: 116
Joined: Wed Mar 11, 2009 8:32 pm

Re: many hanging zmu processes are not exiting, process leak

Post by jsylvia007 »

Interesting... After a reboot, it seems as though my limits.conf file is being ignored...

ulimit -a is showing the old values, not the new ones...

Any pointers?

EDIT: Might not actually be an issue... apparently when using * as the domain, the root user is unaffected, and currently, zoneminder runs as www-data on my box. I will do more testing later if the monitoring determines that the process counts are still rising.
Last edited by jsylvia007 on Sat Jul 17, 2021 12:03 am, edited 1 time in total.
fmeili1
Posts: 14
Joined: Fri Jun 11, 2021 4:15 pm

Re: many hanging zmu processes are not exiting, process leak

Post by fmeili1 »

Try in a shell with different values because the largest nofile differs from system to system, so reboots are required for each try.

1. list current values with "ulimit -a"
2. set a new nofile limit which is higher (e.g. try to double it for each try) than the current listed value with "ulimit -n <newValue>"
3. check again if the new value was really used (unlimit -a)
4. iterate steps 1 to 3 until you come close to your limit

If you found a number which is close to your highest possible value, take that number and put it in the limits.conf and reboot the system. After reboot check a last time with "ulimit -a' to see if it really worked.

Btw. with the command "sudo sysctl fs.file-nr" you can see the current number of currently allocated file handles (the first of the three numbers in the output result). (For my system, it's about 12000, so my limit of 90000 should be more than ok.)
Last edited by fmeili1 on Sat Jul 17, 2021 12:05 am, edited 1 time in total.
jsylvia007
Posts: 116
Joined: Wed Mar 11, 2009 8:32 pm

Re: many hanging zmu processes are not exiting, process leak

Post by jsylvia007 »

I just edited my post above... Looks like it might actually be working. It's related to the root user and the * domain. Going to monitor. Will report back. It only takes a few hours for the graph to trend noticeably upward.
jsylvia007
Posts: 116
Joined: Wed Mar 11, 2009 8:32 pm

Re: many hanging zmu processes are not exiting, process leak

Post by jsylvia007 »

Looks like this made no difference for me. Things are still climbing.
fmeili1
Posts: 14
Joined: Fri Jun 11, 2021 4:15 pm

Re: many hanging zmu processes are not exiting, process leak

Post by fmeili1 »

It's really strange... after a couple of days without this problems (after increasing my limits), the problem started again?!

As a workaround, I use now a cronjob which runs every 3h to delete all (hanging) zmu processes, like this:

Code: Select all

0 */3 * * * docker exec -i zoneminder pkill --signal kill "^zmu$" >> /dev/null 2>&1
Until this problem get fixed, it helps me to prevent the "hang" situation... maybe it could help you also
jsylvia007
Posts: 116
Joined: Wed Mar 11, 2009 8:32 pm

Re: many hanging zmu processes are not exiting, process leak

Post by jsylvia007 »

Ha... I resorted to a cronjob that restarts the apache2 service everyday at 3am as well. Same effective result.
Post Reply