VMware "best practices" for 2020

A place for discussion of topics that are not specific to ZoneMinder. This could include Linux, Video4Linux, CCTV cameras or any other topic.
Post Reply
anomaly0617
Posts: 1
Joined: Wed Mar 25, 2020 3:32 pm

VMware "best practices" for 2020

Post by anomaly0617 » Wed Mar 25, 2020 4:10 pm

Hi there,

I'm not certain this is the appropriate forum to post this on, so if not, mods please feel free to move it to the appropriate forum.

For a little background, I'm a Systems Engineer with 20+ years in the IT industry. My background includes small and medium sized networks all the way up to large scale data centers with high availability and redundant power, HVAC, and internet connections. My first distro of Linux was the original Red Hat v4 in the 90's, and I've stayed current on Windows, Linux (multiple distros), AIX, and (in the day) Netware. So to summarize, I'm no spring chicken when it comes to Windows, Linux, Firewalling, Switching, Networking, Video Management Systems, H.264 RTSP streams, PBX'es and Voice over IP, etc.

What I'm looking for is a "best practices" guide from someone who is very knowledgeable on ZoneMinder's resource needs when it relates to virtualized environments. In my case, this means VMware ESXi Hypervisors, but this could also apply to any Hypervisor software such as KVM or others.

I've got a few Zoneminder physcial servers set up, and they are generally rock solid. No issues with lock ups.

But - I've got a small Zoneminder virtual server set up on an HP Pavilion ML350 G6 with 96GB of RAM. This was set up for research purposes only in my company's R&D facility. I can't really go into the details of what I'm specifically doing, but ZoneMinder is one VMS I'm working with, and there are 2-3 others I work with regularly.

At that location I started out with a dedicated Ubuntu 18.04.3 LTS virtual server with 4 vCPUs, 12GB of RAM, and a 200GB Thick Provisioned, Lazily Zeroed hard drive. Its running the distro's open-vm-tools and is fully updated. I set up 4 cameras on it using default settings. Within 6 hours, the server hard locks up, no kernel panic, just no responsiveness at all. VMware indicates that the vm-tools have stopped responding. If I hard reset the virtual server, it will run for another 4-6 hours and then hard lock up again.

Thinking this was a fluke instance, I powered this server down. Then I then took a known stable Ubuntu 18.04.4 LTS server running some of my company's analysis software that has been running on it for months. After making a snapshot backup, I bumped the virtual server resources up to 4 vCPUs, 24 GB of RAM, 600 GB Thick Provisioned, Lazily Zeroed Hard Drive space. I locked all the RAM in place so that nothing else can try to share the RAM. This means that when the VM is powered on, it reserves off 24 GB of RAM regardless of what it is actually using, and no other VM on the server can touch that RAM. After installing ZoneMinder from the distro and per the ZoneMinder website, I put one single camera on it, a Bosch MIC IP Starlight 7010.

It locked up within 2 hours.

After a few of these lockups/hard resets, I worked to optimize the virtual machine. I ran mysqltuner optimization tool and followed its recommendations. My frame analysis is down to 5 FPS, per recommendations. I'm monitoring only, not recording at this point. I am running the camera at 1920x1080 resolution, but that should be expected as this is a $10k camera and people expect to get $10k worth of performance out of it. I've either met or exceeded the recommendations for memory allocation for mysql and PHP. The system still locks up.

I've run top looking for a smoking gun, and zms is the only thing I see that could be a likely culprit. CPU and memory resources are well within tolerance - in fact I'd go so far as to say they are ridiculously under-utilized.

If I disable ZoneMinder (systemctl disable zoneminder) and reboot the server, it stays up and running for days (so far, I'm at 5+ days). I have no doubt it will run for weeks or months without any human intervention based on past performance of the VM without ZoneMinder loaded on it.

If I enable ZoneMinder, even just to start the service (systemctl start zoneminder), the system locks up within 4 hours.

So, I'm now at the point where I think I need to ask the ZoneMinder gurus and people that are running ZM in virtualized environments -- what am I doing wrong, and what should I be doing to get it right?

Thanks, in advance, for all of your advice and help.
-Anomaly0617 (Paul)

ABigHead
Posts: 1
Joined: Fri Mar 27, 2020 1:27 pm

Re: VMware "best practices" for 2020

Post by ABigHead » Fri Mar 27, 2020 1:46 pm

I hope that my reply is more than anecdotal for you, but YMMV.

I'm currently running 1.34.7 on Ubuntu 18.04 LTS on ESXi 6.7U3. It's been running on really old hardware in a VM that has 16GB of ram assigned to it, with no issues. I think and i am assuming that you're taking a snapshot, firing up the VM and running it for X amount of time, then it freezes up on its own, correct? If that is the case, it is likely caused by the snapshot. As far as I understand it snapshots are intended to be a temporary use for a VM so that you can take your snapshot, update packages, try new configurations, etc... then if you blow something up roll it back. If your changes don't break anything, you then go back into your ESXi interface and use 'delete all' to remove and compile all of your snapshots. This sounds counter-intuitive, but what deleting these will do (the following is my personal understanding:) is some form of compiling of the current state of the VM back to the original snapshot state. THIS CAN TAKE A WHILE, as it is keeping all the newest changes from your latest snapshot and merging it down to the original capture.

When I have forgotten to do what I just described above, my system would lock up after a few hours to a day of running, and I'd have to kill it from the ESXi console web interface, as the VM would become almost if not completely unresponsive. My understanding is that when you create a snapshot, it creates a 'delta disk' which is a smaller virtualized disk for all the changes your making. When you fail to delete all and recompile your snapshots at the end of your changes, this very small new virtual delta disk fills up and freezes your OS/VM.

Here is the link to VMWare docs explaining how the delete process works, read the whole thing to get a better explanation of above:
https://docs.vmware.com/en/VMware-vSphe ... 1AE25.html

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests