I have about twenty servers with different webapps. Every 4 hours runs rsnapshot
task and backs up all of them to a backup server.
Accidentally today I discovered that backup failed last 4 days due to input/output failure in file system. fsck
fixed the issue, however 4 backup days are lost.
Is there any way to check if backups are ok?
Right now I use munin
monitoring system, if it does matter, though it check only server health (memory, cpu, hdd, etc) without any software checks.
I can integrate a script that will check a FATAL ERROR
s entries in rsnapshot log, however I’m not sure will it be enough?
May be there is a system for bootstrapping environment from backup to check its integrity. Unfortunately I didn’t find enough information about it.
Ensure you are also monitoring your filesystem free space, monitor system logs for critical / severe messages, SMART output for your disks, network and backup services (ssh / rsync).
Regarding verifying your backups, you may want to setup your webapps environment in parallel and recover your backup periodically. Your backups are as good as your recovery.
Check more discussion of this question.