Indie Archive File Integrity And Notification
Posted on 2025-01-11
Indie Archive File Integrity And Notification
If all you're doing is backing up with no file integrity checks then you can never be sure if a file that's important to you has been corrupted, accidentally edited, or maybe even just removed. That's why file integrity checks are important to any archival system. We want to detect divergence in our file sets before we copy the corrupt file all over all of the good copies.
And, hand in hand with file integrity checks is notification. What good are the checks if you aren't notifed about discrepancies?
I'm using rsync to do the file integrity checks daily. I use the -n switch to force a dry run so the file integrity checks don't change anything. And I use the -c switch to force checksum. It is possible for a file to be corrupted while still showing the same time, date, and size, so the checksum switch is the safest way to go, when comparing two files.
I also use the -r recursive switch and the -v verbose switch. A typical file integrity check looks like this.
rsync -rnvc --delete --exclude '.Trash-1000' --exclude 'lost+found' /home/indiearchive/primaryfiles/ 192.168.2.12:/home/indiearchive/secondaryfiles > logfile.txt
(logfile.txt will be named with the date and the directory being checked and the logfiles will be stored on the boot drive by default)
The command above is comparing the primary files directory with the secondaryfiles directory which I am accessing with ssh across the LAN. I include the --delete switch so rsync will report if a file would be deleted meaning it's not in the primaryfiles directory but it is in the secondaryfiles directory.
rsync also reports when the alternative is true, listing just the filename meaning that if this wasn't a dry run, rsync would be copying that file from the primaryfiles directory over to the secondaryfiles directory.
And beyond just checking for missing files the checksum switch checks that the files are actually the same and rsync reports that too.
Running file integrity from the primaryfiles directory I can compare to three other directories, primarysnapshots, secondaryfiles, and secondarysnapshots. So if three of them are the same and one is different it definitely focuses my attention right at the problem.
File integrity is also run daily on the remote system comparing the remotefiles directory with three other directories remotesnapshots, secondaryfiles, and secondarysnapshots. You can see how having the ssh server on the secondary system makes it the hub that ties the primary and the remote system together.
The primary system pushes files from the primaryfiles directory across the LAN to the secondaryfiles directory. The remote system pulls the files across the internet from the secondaryfiles directory to the remotefiles directory.
The file integrity checks work the same way with the primary system running file integrity checks on the primary and secondary systems daily and the remote system running file integrity checks on the remote and secondary systems daily.
I create a log file from each file integrity check by capturing the rsync output. If there are no missing or divergent files the log has five lines of text. If there are more than five lines of text then a notification needs to be sent.
I chose email notifications because it's the easiest to implement and the least expensive.
After a quick search I decided to try smtp2go for my smtp service. It's really quite easy to get it working if you know how to add CNAME records to your domain provider. They have example code using curl so I was able to use their service quite quickly to send an email.
Using curl to send emails is great because your don't have to install email software on your system.
The problem was getting the logfile into the email.
I'm sure there are better ways of doing it but I couldn't figure any of them out so I resorted to something I know how to do, concatenating files.
I took my example script and saved everything up to the actual text of the body of the email to a file called smpt2go1.sh. And I saved everything after the body of the email to smtp2go2.sh
And I put the body of the email into test.log.
Then I concatenated smtp2go1.sh, test.log, and smtp2go2.sh to a file I called smtpcat.sh.
And there were only two problems.
Since I used cat to concatenate the files there were line feeds between the files, that is each file started on a new line.
Well I didn't want that because it didn't work. So I searched around a little bit and found a way to concatenate files without the extra line feed using the head command.
Which worked great, but I still had line feeds in my test.log file because there will be line feeds in the actual log files. So I replaced all the linefeeds with html
notation.
Which worked great but when I got the email it showed me all the html break syntax.
Did I say there were only two problems? There's never only two problems. But I was expecting this one.
I searched around some more to find out how to send html emails. This was an easy one. I just replaced text_body with html_body in the smtp2go1.sh file.
So I lumped all of this together into an smtpsend.sh file that looks like this.
#!/bin/bash
sed -i'' ':a; $!N; s|\n|<br>|; ta; P;D' test.log
head -c -1 -q smtp2go1.sh test.log smtp2go2.sh > smtpcat.sh
./smtpcat.sh
The sed command uses a whole bunch of gobbledegook to replace all of the linefeeds with html breaks. Don't ask me how it works but it does, in fact, work.
Then the head command concatenates the three files into smtpcat.sh. That -1 in there tells it to skip the last character which leaves out that last linefeed.
But the real trick here is that I use a bash script to write a bash script and then execute that bash script which sends the email.
Pretty cool, huh?
Now, I haven't installed this yet on the primary and the remote systems. I don't see why it wouldn't work since it's working great with my test files. I sure hope it's not as much work to get it going on The Indie Archive as it took to figure it out this far.
Puzzles are puzzling and I love working the puzzles. (In fact, I get obsessed.)
Solving the puzzles is important but having the vision, knowing what you want and why, that's the real important part. That's the big puzzle.
Home -
Table Of Contents -
Most Recent -
RSS -
Permalink -
Text