ZM Corrupting VM+ Host

Forum for questions and support relating to the 1.30.x releases only.
Locked
StarMonkey
Posts: 7
Joined: Tue Oct 03, 2017 8:31 pm

ZM Corrupting VM+ Host

Post by StarMonkey »

HI, weird problem with ZM. I originally had this problem around 6 months ago. Couldn't figure it out so went on to other things. Now I'm wanting to get ZM up and running again and was hoping something got fixed while I was away.

I'm running ZM 1.3.0 from iconnor/zoneminder PPA on Ubuntu Server 16.04 64bit. This is running inside a VM on ESXi 6.5. The host is an HP Microserver N36L with 4GB RAM and several TB in HDD.

Installed ZM following http://zoneminder.readthedocs.io/en/lat ... untu-16-04

Install went smoothly and configuration of monitors was fine. Originally getting a lot of errors in the ZM web log around dropped frames but switching to ffmpeg fixed this.

Cameras are 2 IP cams, both at 720p. Load was around 0.4-0.7 depending on motion.

After several days of running with no problems when I try to visit the ZM web front end I get Unable to connect to ZM db.SQLSTATE[HY000] [2002] No such file or directory.

Ubuntu is showing SQL service has failed.

On rebooting the VM it basically eats itself and crashes the host. I then cannot recover the VM so can get no log outputs etc. Restoring the VM to a snapshot provides a fix until it then happens again.

This same error was happening with ZM 1.29 and with ESXi 6.0.

Any ideas? Troubleshooting steps? With no access to logs once the error happens I'm a bit stuck!

Thanks
User avatar
knight-of-ni
Posts: 2404
Joined: Thu Oct 18, 2007 1:55 pm
Location: Shiloh, IL

Re: ZM Corrupting VM+ Host

Post by knight-of-ni »

You SQL server is crashing, so check your logs under /var/log, including but not limited to /var/log/mysql, to figure out why.

This might provide a clue:

Code: Select all

systemctl -l status mysql
While the system is running normally, use mysqltuner to verify your sql config is set correctly.
Visit my blog for ZoneMinder related projects using the Raspberry Pi, Orange Pi, Odroid, and the ESP8266
All of these can be found at https://zoneminder.blogspot.com/
StarMonkey
Posts: 7
Joined: Tue Oct 03, 2017 8:31 pm

Re: ZM Corrupting VM+ Host

Post by StarMonkey »

Ok, so I restored the VM from snapshot before I had any monitors connected and left it for a couple of days. It's crashed again with no monitors running.

I'm seeing a lot of blk_update_request: I/O error, dev sda block (various block numbers) on the console as it crashes.

Output of systemctl -l status mysql is :
systemctl -l status mysql output.png
systemctl -l status mysql output.png (16.61 KiB) Viewed 7792 times
not seeing much wrong in the sql log except for the timeouts (which I guess is after the crash)
sql log.png
sql log.png (49.52 KiB) Viewed 7792 times
syslog has errors around Buffer I/O. My filesystem has gone read only so can't install ssh and grab the log files.

I'll restore from snapshot, install ssh and wait for the crash. Should be able to get the logs then.

SQL was configured to the instructions, anything I might have done wrong that's fatal to mysql?
User avatar
knight-of-ni
Posts: 2404
Joined: Thu Oct 18, 2007 1:55 pm
Location: Shiloh, IL

Re: ZM Corrupting VM+ Host

Post by knight-of-ni »

The additional data you've provided indicates the sql failure is just a symptom of the underlying problem.

You're almost there. Just Google the i/o the error:
https://www.google.com/search?client=ub ... 8&oe=utf-8

You've either got a bad hard drive, or you've got some kind of bottleneck in your virtual environment that is causing read/writes to time out.
Visit my blog for ZoneMinder related projects using the Raspberry Pi, Orange Pi, Odroid, and the ESP8266
All of these can be found at https://zoneminder.blogspot.com/
StarMonkey
Posts: 7
Joined: Tue Oct 03, 2017 8:31 pm

Re: ZM Corrupting VM+ Host

Post by StarMonkey »

Thanks for the help - SMART is reporting no problems with the physical drive.

I'm installing smartmontools to monitor.

It's a default ESXi install so there shouldn't be any bottlenecks. It's a 2TB physical disk with a 1.9TB virtual disk on it. The drive is only used for ZM and there are no other VMs running on the host at this point.

I'll see what I get from smartmontools.
StarMonkey
Posts: 7
Joined: Tue Oct 03, 2017 8:31 pm

Re: ZM Corrupting VM+ Host

Post by StarMonkey »

Ah here's something I forgot though - smartmontools won't work as it's a virtual disk!

Checking through ESX's CLI the physical disk is reporting OK, all disks in the server are reporting about the same values.

The output on first look appears to be bad (Read Error Count at 100 for example) but I'm not sure those are true values.


Parameter Value Threshold Worst
---------------------------- ----- --------- -----
Health Status OK N/A N/A
Media Wearout Indicator N/A N/A N/A
Write Error Count 76 0 76
Read Error Count 100 51 100
Power-on Hours 100 0 100
Power Cycle Count 100 0 100
Reallocated Sector Count 252 10 252
Raw Read Error Rate 100 0 100
Drive Temperature N/A N/A N/A
Driver Rated Max Temperature N/A N/A N/A
Write Sectors TOT Count N/A N/A N/A
Read Sectors TOT Count N/A N/A N/A
Initial Bad Block Count N/A N/A N/A
mikb
Posts: 600
Joined: Mon Mar 25, 2013 12:34 pm

Re: ZM Corrupting VM+ Host

Post by mikb »

It's not a count, those are normalised values.

"Read Error Count at 100" = 100% of health, i.e. good. The worst it as ever been is 100% (good). And if it falls to 51% health, *that's* a warning. Well, a fail.

However "Write error count" at 76% is more interesting. What can't be writ?
StarMonkey
Posts: 7
Joined: Tue Oct 03, 2017 8:31 pm

Re: ZM Corrupting VM+ Host

Post by StarMonkey »

Ah, ok thanks, that makes sense.

I'm unsure about the 76%, I was wondering if those were the errors from ZM attempting to write and then getting I/O errors.

I've pulled the disk and stuck it in a USB caddy and then run HDSentintel on the disk - all comes back as OK but shows the disk has limited life left.

I'm now doing a vmotion from that disk to another one that I've tested and has 100% life left just in case. I'm pretty sure last time I had this same error ZM was installed on a different drive.

The Microserver has a backplane so faulty wiring is unlikely.
mikb
Posts: 600
Joined: Mon Mar 25, 2013 12:34 pm

Re: ZM Corrupting VM+ Host

Post by mikb »

Can you clarify if the SMART table you posted is for 1) a physical hard drive (spinning rust), 2) an SSD, or 3) for a virtual/notional pretend hard-drive that is the figment of a VM's imagination and is just a big file/chain of files on a 1) or 2)?

I'm suspicious of all the "N/A's" you posted, almost like "all that isn't supported, and we're emulating the rest ..." :) E.g. no idea about temperature?

An I/O error at the SMART level on a REAL hard drive is a failure in the drive. An I/O error in the SMART of an SSD may be running out of write cycles, in fact, that might be what the counter means: Starts at 100% and descends to 0% as the SSD is written to death. Both of these are at the hardware level.

A write I/O error on a "virtual hard drive" in a VM may mean a failure to write to the file *pretending* to be the disk image. E.g. disk full, or file system corruption -- not necessarily a hardware fault
StarMonkey
Posts: 7
Joined: Tue Oct 03, 2017 8:31 pm

Re: ZM Corrupting VM+ Host

Post by StarMonkey »

These are actual spinning disks. I've run them through Hard Disk Sentinel and the disk where the VM for Zoneminder is installed on is ok. All other disks also tested as ok with one testing as running out of write cycles.

I've run a 24 hour memtest and that's come back ok.

I've removed all the disks from the server except the one with ZM on.

I've reinstalled ZM on a different physical disk.

I've reinstalled VSphere (6.5).

Same problem each time. After a period of time ZM corrupts and the VM is toast.

Running out of things to test!
StarMonkey
Posts: 7
Joined: Tue Oct 03, 2017 8:31 pm

Re: ZM Corrupting VM+ Host

Post by StarMonkey »

Right, I'm going to install direct onto hardware and avoid all the vmware bit to rule out problems there.
SteveGilvarry
Posts: 494
Joined: Sun Jun 29, 2014 1:12 pm
Location: Melbourne, AU

Re: ZM Corrupting VM+ Host

Post by SteveGilvarry »

StarMonkey wrote: Tue Oct 24, 2017 5:58 pm
Same problem each time. After a period of time ZM corrupts and the VM is toast.
Wouldn't happen to be recording and the period of time be long enough to fill your drive?

I run zm as VM in ESX 6.5 without issues, using StorageAreas to provide disk from another VM.
Production Zoneminder 1.37.x (Living dangerously)
Random Selection of Cameras (Dahua and Hikvision)
Locked