VMware "best practices" for 2020
Posted: Wed Mar 25, 2020 4:10 pm
Hi there,
I'm not certain this is the appropriate forum to post this on, so if not, mods please feel free to move it to the appropriate forum.
For a little background, I'm a Systems Engineer with 20+ years in the IT industry. My background includes small and medium sized networks all the way up to large scale data centers with high availability and redundant power, HVAC, and internet connections. My first distro of Linux was the original Red Hat v4 in the 90's, and I've stayed current on Windows, Linux (multiple distros), AIX, and (in the day) Netware. So to summarize, I'm no spring chicken when it comes to Windows, Linux, Firewalling, Switching, Networking, Video Management Systems, H.264 RTSP streams, PBX'es and Voice over IP, etc.
What I'm looking for is a "best practices" guide from someone who is very knowledgeable on ZoneMinder's resource needs when it relates to virtualized environments. In my case, this means VMware ESXi Hypervisors, but this could also apply to any Hypervisor software such as KVM or others.
I've got a few Zoneminder physcial servers set up, and they are generally rock solid. No issues with lock ups.
But - I've got a small Zoneminder virtual server set up on an HP Pavilion ML350 G6 with 96GB of RAM. This was set up for research purposes only in my company's R&D facility. I can't really go into the details of what I'm specifically doing, but ZoneMinder is one VMS I'm working with, and there are 2-3 others I work with regularly.
At that location I started out with a dedicated Ubuntu 18.04.3 LTS virtual server with 4 vCPUs, 12GB of RAM, and a 200GB Thick Provisioned, Lazily Zeroed hard drive. Its running the distro's open-vm-tools and is fully updated. I set up 4 cameras on it using default settings. Within 6 hours, the server hard locks up, no kernel panic, just no responsiveness at all. VMware indicates that the vm-tools have stopped responding. If I hard reset the virtual server, it will run for another 4-6 hours and then hard lock up again.
Thinking this was a fluke instance, I powered this server down. Then I then took a known stable Ubuntu 18.04.4 LTS server running some of my company's analysis software that has been running on it for months. After making a snapshot backup, I bumped the virtual server resources up to 4 vCPUs, 24 GB of RAM, 600 GB Thick Provisioned, Lazily Zeroed Hard Drive space. I locked all the RAM in place so that nothing else can try to share the RAM. This means that when the VM is powered on, it reserves off 24 GB of RAM regardless of what it is actually using, and no other VM on the server can touch that RAM. After installing ZoneMinder from the distro and per the ZoneMinder website, I put one single camera on it, a Bosch MIC IP Starlight 7010.
It locked up within 2 hours.
After a few of these lockups/hard resets, I worked to optimize the virtual machine. I ran mysqltuner optimization tool and followed its recommendations. My frame analysis is down to 5 FPS, per recommendations. I'm monitoring only, not recording at this point. I am running the camera at 1920x1080 resolution, but that should be expected as this is a $10k camera and people expect to get $10k worth of performance out of it. I've either met or exceeded the recommendations for memory allocation for mysql and PHP. The system still locks up.
I've run top looking for a smoking gun, and zms is the only thing I see that could be a likely culprit. CPU and memory resources are well within tolerance - in fact I'd go so far as to say they are ridiculously under-utilized.
If I disable ZoneMinder (systemctl disable zoneminder) and reboot the server, it stays up and running for days (so far, I'm at 5+ days). I have no doubt it will run for weeks or months without any human intervention based on past performance of the VM without ZoneMinder loaded on it.
If I enable ZoneMinder, even just to start the service (systemctl start zoneminder), the system locks up within 4 hours.
So, I'm now at the point where I think I need to ask the ZoneMinder gurus and people that are running ZM in virtualized environments -- what am I doing wrong, and what should I be doing to get it right?
Thanks, in advance, for all of your advice and help.
-Anomaly0617 (Paul)
I'm not certain this is the appropriate forum to post this on, so if not, mods please feel free to move it to the appropriate forum.
For a little background, I'm a Systems Engineer with 20+ years in the IT industry. My background includes small and medium sized networks all the way up to large scale data centers with high availability and redundant power, HVAC, and internet connections. My first distro of Linux was the original Red Hat v4 in the 90's, and I've stayed current on Windows, Linux (multiple distros), AIX, and (in the day) Netware. So to summarize, I'm no spring chicken when it comes to Windows, Linux, Firewalling, Switching, Networking, Video Management Systems, H.264 RTSP streams, PBX'es and Voice over IP, etc.
What I'm looking for is a "best practices" guide from someone who is very knowledgeable on ZoneMinder's resource needs when it relates to virtualized environments. In my case, this means VMware ESXi Hypervisors, but this could also apply to any Hypervisor software such as KVM or others.
I've got a few Zoneminder physcial servers set up, and they are generally rock solid. No issues with lock ups.
But - I've got a small Zoneminder virtual server set up on an HP Pavilion ML350 G6 with 96GB of RAM. This was set up for research purposes only in my company's R&D facility. I can't really go into the details of what I'm specifically doing, but ZoneMinder is one VMS I'm working with, and there are 2-3 others I work with regularly.
At that location I started out with a dedicated Ubuntu 18.04.3 LTS virtual server with 4 vCPUs, 12GB of RAM, and a 200GB Thick Provisioned, Lazily Zeroed hard drive. Its running the distro's open-vm-tools and is fully updated. I set up 4 cameras on it using default settings. Within 6 hours, the server hard locks up, no kernel panic, just no responsiveness at all. VMware indicates that the vm-tools have stopped responding. If I hard reset the virtual server, it will run for another 4-6 hours and then hard lock up again.
Thinking this was a fluke instance, I powered this server down. Then I then took a known stable Ubuntu 18.04.4 LTS server running some of my company's analysis software that has been running on it for months. After making a snapshot backup, I bumped the virtual server resources up to 4 vCPUs, 24 GB of RAM, 600 GB Thick Provisioned, Lazily Zeroed Hard Drive space. I locked all the RAM in place so that nothing else can try to share the RAM. This means that when the VM is powered on, it reserves off 24 GB of RAM regardless of what it is actually using, and no other VM on the server can touch that RAM. After installing ZoneMinder from the distro and per the ZoneMinder website, I put one single camera on it, a Bosch MIC IP Starlight 7010.
It locked up within 2 hours.
After a few of these lockups/hard resets, I worked to optimize the virtual machine. I ran mysqltuner optimization tool and followed its recommendations. My frame analysis is down to 5 FPS, per recommendations. I'm monitoring only, not recording at this point. I am running the camera at 1920x1080 resolution, but that should be expected as this is a $10k camera and people expect to get $10k worth of performance out of it. I've either met or exceeded the recommendations for memory allocation for mysql and PHP. The system still locks up.
I've run top looking for a smoking gun, and zms is the only thing I see that could be a likely culprit. CPU and memory resources are well within tolerance - in fact I'd go so far as to say they are ridiculously under-utilized.
If I disable ZoneMinder (systemctl disable zoneminder) and reboot the server, it stays up and running for days (so far, I'm at 5+ days). I have no doubt it will run for weeks or months without any human intervention based on past performance of the VM without ZoneMinder loaded on it.
If I enable ZoneMinder, even just to start the service (systemctl start zoneminder), the system locks up within 4 hours.
So, I'm now at the point where I think I need to ask the ZoneMinder gurus and people that are running ZM in virtualized environments -- what am I doing wrong, and what should I be doing to get it right?
Thanks, in advance, for all of your advice and help.
-Anomaly0617 (Paul)