Cameras randomly dropping out - 1.30.4 unique system install

Forum for questions and support relating to the 1.30.x releases only.
Locked
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

I have a unique system install with a lot of potential gotchas, but I'm at a loss on how to figure out where to start to debug, hoping someone can help?

I had a functional system installed on Ubuntu 16.04 bare metal on an HP server with 24 cores and 48GB of RAM. 4 Hikvision cameras running RTSP, 1280x720 24-bit color space, 15fps. The cameras are on their own VLAN on a managed POE switch, and the server had one interface also on that VLAN. No router is involved, just the switch and the PC and cameras. Everything worked well, though I am only about 90% sure I upgraded to 1.30.x. MySQL database was on the same machine. Image and event storage was on an NFS mount on another HP server running FreeNAS; it was bind-mounted to the normal folder before Zoneminder started. Never any major errors or data loss, though I'd get the occasional camera warning message about incomplete frames that still persists.

I wanted to virtualize as I was running too many services on the same machine, so I installed Proxmox and set up an LXC container for Zoneminder. MySQL is now on a separate full VM. I clean-installed Zoneminder on the container, Proxmox mounts the NFS share and bind-mounts it to the container, and I can see all of the old events (from the previous install) and write new events. I copied the entire SQL database over to the new server. There are no file access errors in the logs.

The camera setup was in the SQL so it was literally identical. The cameras start up and everything is normal, for a minute or two after filling the buffer, then they start dropping frequently and randomly - sometimes blue-screening and sometimes complete black screen with no timestamp from ZM.

I ran VLC and determined that the actual resolution was 1280x738, so I made that adjustment, no change. I was getting buffer errors so I jacked them up to 1000. I gave the container 12GB of RAM and unlimited CPU access. CPU access never exceeds 2.5 (out of 24), and Proxmox never exceeds about 3, so CPU work isn't an issue. SHM is 30-35% even at a 1000 frame buffer.

I tried FFMPEG access instead of RTP. I also tried all of the RTP access options available. The same thing occurs.

I tried switching to just Monitor to see if that would help, but no change. VLC on a separate computer on the network (temporarily) seems to not have any of the same issues; I watched a feed for 10 minutes with no glitches.

I tried hard-rebooting the cameras, no help.

I bumped up the HTTP timeout as well, no issues. I confirmed that shmmax and shmall are high (reported some 20-digit or more number).

There don't appear to be any networking issues; Proxmox doesn't even get an IP address on this network, it just forwards traffic to this container.
The storage server is also a 24-core 48GB machine and isn't reporting any CPU, memory, or abnormal network stress. SQL server isn't exceeding any memory, I/O, or CPU limits.

I suppose the next step might be a new VM to try it out, maybe one with a GUI installed so I can run VLC on Proxmox and see if there are network issues? I'm kind of at a loss.

Some log entries while running in monitoring mode, I can get the full log if needed.

Code: Select all

2017-11-30 11:46:17.225046	zmc_m2		6792	WAR	Discarding incomplete frame 2, 0 bytes	zm_rtp_source.cpp	336
2017-11-30 11:46:17.202445	zmc_m2		6792	WAR	Discarding partial frame 2, 104102 bytes	zm_rtp_source.cpp	345
2017-11-30 11:46:17.178617	zmc_m2		6792	WAR	Packet in sequence, gap 47	zm_rtp_source.cpp	127
2017-11-30 11:46:17.135133	zmc_m1		6790	WAR	Discarding incomplete frame 3, 0 bytes	zm_rtp_source.cpp	336
2017-11-30 11:46:17.085286	zmc_m1		6790	WAR	Discarding partial frame 3, 104102 bytes	zm_rtp_source.cpp	345
2017-11-30 11:46:17.062804	zmc_m2		6792	WAR	Discarding partial frame 1, 104102 bytes	zm_rtp_source.cpp	345
2017-11-30 11:46:17.041421	zmc_m1		6790	WAR	Packet in sequence, gap 86	zm_rtp_source.cpp	127
2017-11-30 11:46:17.038823	zmc_m2		6792	WAR	Packet in sequence, gap 53	zm_rtp_source.cpp	127
2017-11-30 11:46:17.000192	zmc_m1		6790	WAR	Discarding partial frame 2, 104102 bytes	zm_rtp_source.cpp	345
2017-11-30 11:46:16.971692	zmc_m1		6790	WAR	Packet in sequence, gap 90	zm_rtp_source.cpp	127
2017-11-30 11:46:16.953886	zmc_m2		6792	WAR	Discarding frame 0	zm_rtp_source.cpp	349
2017-11-30 11:46:16.921397	zmc_m2		6792	WAR	Sequence in probation 2, out of sequence	zm_rtp_source.cpp	112
2017-11-30 11:46:16.889444	zmc_m1		6790	WAR	Discarding partial frame 1, 104102 bytes	zm_rtp_source.cpp	345
2017-11-30 11:46:16.861932	zmc_m1		6790	WAR	Packet in sequence, gap 97	zm_rtp_source.cpp	127
2017-11-30 11:46:16.791738	zmc_m1		6790	WAR	Discarding frame 0	zm_rtp_source.cpp	349
2017-11-30 11:46:16.725879	zmc_m1		6790	WAR	Sequence in probation 2, out of sequence	zm_rtp_source.cpp	112
2017-11-30 11:46:04.427240	zmdc		2557	ERR	'zmc -m 2' exited abnormally, exit status 255	zmdc.pl	
2017-11-30 11:46:04.221390	zmdc		2557	ERR	'zmc -m 1' exited abnormally, exit status 255	zmdc.pl	
2017-11-30 11:46:04.186220	zmwatch		2641	ERR	Memory map file '/dev/shm/zm.mmap.2' does not exist. zmc might not be running.	zmwatch.pl	
2017-11-30 11:46:03.980250	zmwatch		2641	ERR	Memory map file '/dev/shm/zm.mmap.1' does not exist. zmc might not be running.	zmwatch.pl	
2017-11-30 11:45:58.249100	zmdc		2557	ERR	'zmc -m 2' exited abnormally, exit status 255	zmdc.pl	
2017-11-30 11:45:57.327974	zmc_m2		6614	ERR	Failed to pre-capture monitor 2 735333652 (1/1)	zmc.cpp	312
2017-11-30 11:45:48.314813	zmc_m2		6632	ERR	RTP timed out	zm_rtp_data.cpp	93
2017-11-30 11:45:44.069816	zmc_m1		6622	ERR	RTP timed out	zm_rtp_data.cpp	93
2017-11-30 11:41:40.591940	zmdc		2557	ERR	'zmc -m 4' exited abnormally, exit status 255	zmdc.pl	
2017-11-30 11:41:40.471222	zmc_m4		6645	FAT	No RTSP sources	zm_remote_camera_rtsp.cpp	162
2017-11-30 11:41:30.130890	zmdc		2557	ERR	'zmc -m 3' exited abnormally, exit status 255	zmdc.pl	
2017-11-30 11:41:30.032548	zmc_m3		6636	FAT	No RTSP sources	zm_remote_camera_rtsp.cpp	162
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

I should also note, I just tried to disable the other three cameras and just run one in Modect to see if it was a bandwidth issue or something, and got a segfault error that I've only seen rarely. It occurred after an RTP timeout.

Code: Select all

2017-11-30 12:05:54.128150	zmdc		2557	ERR	'zmc -m 1' exited abnormally, exit status 255	zmdc.pl	
2017-11-30 12:05:53.878850	zmwatch		2641	ERR	Memory map file '/dev/shm/zm.mmap.1' should have been 896 but was instead 0	zmwatch.pl	
2017-11-30 12:05:53.845331	web_php		625	ERR	Timed out waiting for msg /var/run/zm/zms-289569s.sock	/usr/share/zoneminder/www/includes/functions.php	2033
2017-11-30 12:05:53.271115	zms		7544	ERR	Backtrace 6: /usr/lib/zoneminder/cgi-bin/nph-zms(_start+0x29) [0x555b9d90ef69]	zm_signal.cpp	102
2017-11-30 12:05:53.248564	zms		7544	ERR	Backtrace 5: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f2a02378830]	zm_signal.cpp	102
2017-11-30 12:05:53.226000	zms		7544	ERR	Backtrace 4: /usr/lib/zoneminder/cgi-bin/nph-zms(main+0xb9e) [0x555b9d90e06e]	zm_signal.cpp	102
2017-11-30 12:05:53.203482	zms		7544	ERR	Backtrace 3: /usr/lib/zoneminder/cgi-bin/nph-zms(_ZN13MonitorStream9runStreamEv+0x48) [0x555b9d947a38]	zm_signal.cpp	102
2017-11-30 12:05:53.180876	zms		7544	ERR	Backtrace 2: /usr/lib/zoneminder/cgi-bin/nph-zms(_ZNK7Monitor6GetFPSEv+0xd) [0x555b9d93ad9d]	zm_signal.cpp	102
2017-11-30 12:05:53.156566	zms		7544	ERR	Backtrace 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f2a05c87390]	zm_signal.cpp	102
2017-11-30 12:05:53.134165	zms		7544	ERR	Backtrace 0: /usr/lib/zoneminder/cgi-bin/nph-zms(_Z14zm_die_handleriP9siginfo_tPv+0x78) [0x555b9d98dfa8]	zm_signal.cpp	102
2017-11-30 12:05:53.111415	zms		7544	ERR	Signal address is (nil), from 0x555b9d93ad9d	zm_signal.cpp	81
2017-11-30 12:05:53.090831	zms		7544	ERR	Got signal 11 (Segmentation fault), crashing	zm_signal.cpp	50
2017-11-30 12:05:53.070439	zms		7544	ERR	Got empty memory map file size 0, is the zmc process for this monitor running?	zm_monitor.cpp	533
2017-11-30 12:05:53.018681	web_js		891	ERR	getStreamCmdResponse stream error: socket_sendto( /var/run/zm/zms-289569s.sock ) failed: Connection refused - checkStreamForErrors()	?view=watch	
2017-11-30 12:05:52.869227	web_php		891	ERR	socket_sendto( /var/run/zm/zms-289569s.sock ) failed: Connection refused	/usr/share/zoneminder/www/includes/functions.php	2033
2017-11-30 12:05:52.499412	zmu		7541	ERR	Can't connect to capture daemon: 1 FrontDoor	zmu.cpp	495
2017-11-30 12:05:52.471618	zmu		7541	ERR	Got empty memory map file size 0, is the zmc process for this monitor running?	zm_monitor.cpp	533
2017-11-30 12:05:52.132052	zms		7535	ERR	Backtrace 6: /usr/lib/zoneminder/cgi-bin/nph-zms(_start+0x29) [0x5598f582bf69]	zm_signal.cpp	102
2017-11-30 12:05:52.109499	zms		7535	ERR	Backtrace 5: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc9e58d0830]	zm_signal.cpp	102
2017-11-30 12:05:52.086905	zms		7535	ERR	Backtrace 4: /usr/lib/zoneminder/cgi-bin/nph-zms(main+0xb9e) [0x5598f582b06e]	zm_signal.cpp	102
2017-11-30 12:05:52.064363	zms		7535	ERR	Backtrace 3: /usr/lib/zoneminder/cgi-bin/nph-zms(_ZN13MonitorStream9runStreamEv+0x48) [0x5598f5864a38]	zm_signal.cpp	102
2017-11-30 12:05:52.005551	zms		7535	ERR	Backtrace 2: /usr/lib/zoneminder/cgi-bin/nph-zms(_ZNK7Monitor6GetFPSEv+0xd) [0x5598f5857d9d]	zm_signal.cpp	102
2017-11-30 12:05:51.983041	zms		7535	ERR	Backtrace 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7fc9e91df390]	zm_signal.cpp	102
2017-11-30 12:05:51.960663	zms		7535	ERR	Backtrace 0: /usr/lib/zoneminder/cgi-bin/nph-zms(_Z14zm_die_handleriP9siginfo_tPv+0x78) [0x5598f58aafa8]	zm_signal.cpp	102
2017-11-30 12:05:51.937926	zms		7535	ERR	Signal address is (nil), from 0x5598f5857d9d	zm_signal.cpp	81
2017-11-30 12:05:51.915527	zms		7535	ERR	Got signal 11 (Segmentation fault), crashing	zm_signal.cpp	50
2017-11-30 12:05:51.895382	zms		7535	ERR	Got empty memory map file size 0, is the zmc process for this monitor running?	zm_monitor.cpp	533
2017-11-30 12:05:51.839829	web_js		4727	ERR	getStreamCmdResponse stream error: socket_sendto( /var/run/zm/zms-289569s.sock ) failed: Connection refused - checkStreamForErrors()	?view=watch	
2017-11-30 12:05:51.734575	web_php		4727	ERR	socket_sendto( /var/run/zm/zms-289569s.sock ) failed: Connection refused	/usr/share/zoneminder/www/includes/functions.php	2033
2017-11-30 12:05:51.002984	zms		7534	ERR	Backtrace 6: /usr/lib/zoneminder/cgi-bin/nph-zms(_start+0x29) [0x563c721f4f69]	zm_signal.cpp	102
2017-11-30 12:05:50.980427	zms		7534	ERR	Backtrace 5: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f8c02f7e830]	zm_signal.cpp	102
2017-11-30 12:05:50.957884	zms		7534	ERR	Backtrace 4: /usr/lib/zoneminder/cgi-bin/nph-zms(main+0xb9e) [0x563c721f406e]	zm_signal.cpp	102
2017-11-30 12:05:50.937853	zms		7534	ERR	Backtrace 3: /usr/lib/zoneminder/cgi-bin/nph-zms(_ZN13MonitorStream9runStreamEv+0x48) [0x563c7222da38]	zm_signal.cpp	102
2017-11-30 12:05:50.914449	zms		7534	ERR	Backtrace 2: /usr/lib/zoneminder/cgi-bin/nph-zms(_ZNK7Monitor6GetFPSEv+0xd) [0x563c72220d9d]	zm_signal.cpp	102
2017-11-30 12:05:50.891135	zms		7534	ERR	Backtrace 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f8c0688d390]	zm_signal.cpp	102
2017-11-30 12:05:50.868010	zms		7534	ERR	Backtrace 0: /usr/lib/zoneminder/cgi-bin/nph-zms(_Z14zm_die_handleriP9siginfo_tPv+0x78) [0x563c72273fa8]	zm_signal.cpp	102
2017-11-30 12:05:50.844504	zms		7534	ERR	Signal address is (nil), from 0x563c72220d9d	zm_signal.cpp	81
2017-11-30 12:05:50.821262	zms		7534	ERR	Got signal 11 (Segmentation fault), crashing	zm_signal.cpp	50
2017-11-30 12:05:50.797370	zms		7534	ERR	Got empty memory map file size 0, is the zmc process for this monitor running?	zm_monitor.cpp	533
2017-11-30 12:05:50.743450	web_js		6695	ERR	getStreamCmdResponse stream error: socket_bind( /var/run/zm/zms-289569w.sock ) failed: Address already in use - checkStreamForErrors()	?view=watch	
2017-11-30 12:05:50.605043	web_php		6695	ERR	socket_bind( /var/run/zm/zms-289569w.sock ) failed: Address already in use	/usr/share/zoneminder/www/includes/functions.php	2033
2017-11-30 12:05:48.691500	zmdc		2557	ERR	'zmc -m 1' exited abnormally, exit status 255	zmdc.pl	
2017-11-30 12:05:47.842617	zma_m1		7399	WAR	Signal: Lost	zm_monitor.cpp	1429
2017-11-30 12:05:47.837223	zmc_m1		7248	ERR	Failed to pre-capture monitor 1 -678355692 (1/1)	zmc.cpp	312
2017-11-30 12:05:44.826202	zmc_m1		7253	ERR	RTP timed out	zm_rtp_data.cpp	93
bbunge
Posts: 2934
Joined: Mon Mar 26, 2012 11:40 am
Location: Pennsylvania

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by bbunge »

Have you tried to force the cameras to use RTSP with TCP? Message about incomplete frames leads me to believe UDP is not getting all the data from the cameras to ZM. Might also lower the frame rate to 5 FPS and adjust the key frame interval to send a full frame more often.

While I am not a fan of using a VM to run Zoneminder it seems you have the power to run it several times over. For troubleshooting it might be best to pull back to your "bare metal" machine and run the database and storage locally. When you want to go back to remote storage give the systemd mount process a try (see the WIKI for a how to) as it has the ability to keep ZM from running if the remote storage fails to mount. It has been reported that a SMB mount may work more reliably than NFS.
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

Thanks for your comments. I did run RTSP with TCP with no improvement, unfortunately. I will try the frame rate adjustment for the purposes of troubleshooting. I'm not sure why the segfault of ZMS is occurring or why the frame rates would be an issue.

While I was on bare metal, it worked well with the systemd mount process (ZM waiting for the NFS mount), though it was on local database storage. I never had an issue there once I got the mount sequence down. In the virtual environment, the host takes care of the NFS mount and ZM is none the wiser - LXC bind-mounts the storage location.

I know the VM/Container method is typically not preferred nor is it officially supported, but if I can do it, it would be far better than having the cameras go down every time Comcast makes a a change that MythTV doesn't like and it takes the system down.
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

Actually, now that I think about it, I tried ffmpeg RTSP over TCP, but not native RTSP over TCP. How is that enabled? I tried all of the frontend camera options. Possibly a path switch?
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

So I'm traveling, but have been analyzing the logs with just one monitor on Modect,and noted that I get an RTP timeout from zmc_m1 almost every five minutes almost exactly at 5 minutes.

Even they are happening at xx:01:08 and xx:06:08 and they never vary more than +/- 3 seconds. Even if they don't occur, the next one happens on the same schedule.

I even adjusted the path, and it has kept the same schedule.

This tells me it's something in zmc or zmdc moreso than the camera settings. ZMDC is set to check for failures every 6000 seconds so that doesn't seem to be it. Filters are set to report every 300 seconds but there are no filter errors.
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

Further investigation:

Dropped one monitor down to 12fps. Still having the exactly-5-minutes error issue. I did try two monitors, and both started logged the RTP timeout at 5 minute intervals, but they were different minutes. Perhaps it is related to the camera itself... but not sure what it would be related to. I haven't looked at the RTSP code to see what causes that event.

In VLC, there are a number of warning messages of slow images on the RTSP feed, but 30 minutes of watching it didn't yield any major errors or any timeouts.

They are Hikvision cameras and could stand to be updated (two versions behind and would be hackable if they had internet access), but I didn't have issues in the old install. I'll try to install one once I can get a Windows VM spun up that will run the antiquated web plugin required to upgrade the software (!)
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

Still haven't been able to update the cameras, but disabled three and dropped the remaining one to 4fps just to see if that would help. Still have the exactly-5-minutes RTP timed out error issue.

Enabled debug logs and didn't notice anything special. I didn't notice the level of logging that the source code seemed to indicate that I should see, though. I did note that the source does have a hard fail if there is no source, and perhaps a retry loop could fix a one-shot issue, but I don't know enough about the code to know why it was written that way.

The only interesting thing in the logs that I noted is that it appears that the RTP address eventually removes the username and password from the IP address. I'm assuming this is normal, but maybe it isn't, and perhaps it's a timeout of the login that is causing the 5-minute syndrome.

Another consideration I had is that the LXC container is Ubuntu but the base system is Debian. I may set up a Debian container and try the install again, check the camera settings to see if there is any login timeout setting (or even try to disable it temporarily for RTSP viewing). If that fails, I may try to build from source and try to fix the timeout.
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

Another clue - I set debug down to DB4 level and am getting more information as expected. One thing that actually triggered more thought was a log entry by zmwatch, followed by a new zmc process being watched:

Code: Select all

2017-12-12 17:00:26.102790	zmwatch		6299	INF	Restarting capture daemon for Approach, shared data not valid	zmwatch.pl
2017-12-12 17:00:46.515458	zmc_m3		6414	INF	Starting Capture version 1.30.4	zmc.cpp	247
2017-12-12 17:00:47.122930	zmc_m3		6425	DB2	Starting data thread 1952721004 on port 40200	zm_rtp_data.cpp	66
2017-12-12 17:01:14.311402	zma_m3		6431	INF	In mode 3/1, warming up	zma.cpp	142
This also triggers every 5 minutes. I believe this might be what's behind the issues.

I haven't had any luck searching for what "shared data not valid" means. I don't know what might be affecting the shared data....
calmor15014
Posts: 20
Joined: Mon May 09, 2016 4:12 pm

Re: Cameras randomly dropping out - 1.30.4 unique system install

Post by calmor15014 »

Installed in a full VM to test if the solution would change. All four cameras running normally (3 @ 15fps, 1 still testing @ 4fps) with no issue at all. Same host handling network and memory management duties, same NFS-mounted events and images storage, same "remote" SQL server (on the same host as the LXC/VM). Since the SQL server is the same, all of the settings were also the same.

I gave the VM 6 cores and 12GB of RAM, which seems to be overkill. Load is 1.84 and shm is 26% (a hair lower than LXC and bare metal due to the one camera running slower).

So... it would seem that something with Proxmox, or LXC, doesn't play nicely with ZM 1.30.4 and its handling of shared memory? Is it worth bug reporting?
Locked