I feel better, now

Oops! The whole problem started when I wanted to install a new program on the “entertainment computer” that’s connected to the TV. The existing version of Ubuntu was no longer supported, so I started what I thought would be a simple upgrade. Oops, it deleted the entire /var directory, including the contents of the second drive mounted there. That drive contained all of our photos and music. Not a problem, I thought. It’s all backed up nightly onto a USB drive. It’s just a bit of bother to copy all of that over the USB interface.

Oops! There are no photos beyond last March. I felt sick to my stomach. That’s a lot of family memories to evaporated. Whew! The program we’ve been using to download the cameras also makes a backup copy local to the machine used to download. I’ve got all the pictures back on the photo server, and now just have to sort them into directories again.

But first, I need to fix the backup script. Why didn’t it warn me when it started to fail?

I copied a version on a different fileserver to that machine and edited it for the different source and destinations. It’s a simple shell script that mounts the USB drive, runs rsync to duplicate the data, and then unmounts the USB drive. I’d run into a problem once before where the drive wasn’t getting mounted properly, so I’d added some script to detect that and email me using msmtp if it failed. That only worked for mount errors, though, not for rsync errors.

As I hacked in detection and notification of rsync errors, I felt ill at ease. I was duplicating some of the script to send emails. The script was getting harder to read. The script was increasingly inconvenient to test. And I was nervous about introducing new errors.

I started practicing Test Driven Development a decade ago. It has become second-nature to me to test-drive the code I write. Why was this code just hacked together? Mostly because I hadn’t been thinking of it as code. It was just a script. At first, it just replicated the commands I typed on the command-line. I put those into a script to make it easy to run, and then to automate running it via the chron cron daemon. When I replicated it to another machine, I extracted the paths to variables to make it easier to change the differences. When it had failed to mount a drive, I’d added a check for that and a line to send me an email. That line surely crossed the border between “a file of commands” to “a program,” but I didn’t notice it at the time. Or, if I did notice it, I didn’t pay attention to it. After all, I don’t generally program in bash scripts and haven’t developed any rigor for doing so. It’s those “little” changes that “can’t possibly” hurt that bite you the hardest.

The script now “seems to be working,” but I’m not satisfied. Wouldn’t it be nice to get advance notice when the disk is getting close to full, rather than waiting until it is?

I now know I’m programming in sh, not just collecting a bunch of command-lines from my .history file. Adding the email notification of rsync errors was harder than I expected due to tiny errors made in an unfamiliar programming environment. In addition to being hard, it just felt “dirty.”

A little googling lead me to shunit2. After a little difficulty creating a tiny virtual disk under control of the tests, I quickly had a function that extracted the % full output using the du command and returned it as a number. Great! Now I can use that number to provide an early warning. And working test-driven is teaching me better shell scripting habits.

I’ve still got work to do, pulling the existing script into tested functions, but already I feel better. I’m on my way to a reliable and maintainable script.

13 Replies to “I feel better, now”

  1. Well, that is exactly what you get for using such an ancient, outdated operating system. You really should join the modern world.

  2. @Brad – Not a helpful statement… I’m a huge ubuntu / linux fan.

    George – This was seriously one of the most refreshing posts I’ve read in months. Yes. MONTHS.

    As a developer myself, this all makes SO much sense. Thanks for taking us through your trials. I’ve been through similar “at-home” destructions of personal data as well.

  3. As sysadmins, we live and die by our restores (no one ever cares about backups).

    cron (not chron) will send your email whenever a job writes to stdout or stderr. It send email to the owner of the account by default, but you can use the MAILTO variable to get cron to send mail elsewhere.

    MAILTO=george@example.com
    0 * * * * /usr/local/bin/backup.sh

    That will send you email using the system MTA (yay for simplicity and tools that work with each other).

    if [ $? -ne 0]; then
    echo “Backup aborted”
    exit 1
    fi

    Alternatively, you could use one of the multiple, pre-written backup systems out there.

    1. Thanks, Devdas, for catching my typo for “cron.” And I appreciate the suggestions. Getting mail every time the job writes to stdout or stderr wouldn’t be helpful unless I sent some of the output to /dev/null. I don’t want to get mail every day, just when there’s a problem that needs investigating. And I use msmtp because there’s no MTA running on that system.

      Running someone else’s backup system just moves the problem around a bit. I’d still have to configure it to do what I wanted. And I’d have to test that it behaved the way I expected after I configured it. And therein lies another example of the same problem. Do you take the same precautions when configuring an off-the-shelf program that you do when you’re programming? If not, why not?

  4. Oh yes, please catch me on Twitter as @f3ew if you want further input (since I no longer follow blog streams).

    Or ask on irc (I’m on Freenode, one or more of #lopsa, ##infra-talk, ##devops should be able to help you out).

  5. Actually, yes. There’s a bunch of tests which happen for all programs, because you never know what sort of subtle interactions will cause things to break (yay, enterprise apps).

    The getting email problem is easily fixed by only having your tools write any sort of output when they throw an error (aka Rule of Silence). Most Unix systems will have a local MTA (or equivalent providing a sendmail binary).

    rsync and mount/umount are all silent by default, so under normal circumstances, you’ll get no output.

    #!/bin/sh

    mount /backup && rsync /home/media /backup && umount /backup

    That should stay silent and only fail verbosely if something goes wrong. (/backup needs to match the USB device in /etc/fstab)

  6. Devdas, have you tested that? What about the case where /backup is already mounted?

    I think you’re doing an excellent job of illustrating the point of this blog post. You’re making a fair number of assumptions about what won’t go wrong.

    (And as I said before, this system is not running an MTA because it has no need for it.)

  7. Only the newest mounted filesystem will be visible to the OS. The backup will work correctly, depending on whether rsync works correctly.

    df may give you more interesting results, but that won’t affect rsync.

    (As for the assumptions, keep in mind that a lot of this knowledge comes from making those mistakes and learning from them. If anything goes wrong, this will write to stdout, if not, it’ll silently succeed).

  8. Devdas, you are making my point. If the filesystem is already mounted, mount will return an error.

    Even when you’ve got a lot of experience, it’s worth testing your knowledge.

  9. When I wanted to write an rsync-based backup script, I decided to make a game of it and wrote a crude sort of ‘bash_spec’ bdd tool. Was going to upload it with this comment, but there doesn’t seem to be a way. Drop me an email if you want a copy.

  10. So have you managed to vacuum up all the Yack hair yet? A most delightful post. Thanks for sharing the experience.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.