[cs615asa] HW5 notes

Sat May 16 13:23:13 EDT 2015

Hello,

I will be sending out grades for HW5 later today, but in the mean time a
few common notes on the assignment and the submissions:

It seems that a lot of you had trouble understanding the 'tar + dd'
backup method.  The objective here is _not_ to create a single archive
file and back up that file on a remote filesystem, but rather to write
the archive to raw disk.  That is, the EBS volume you create is treated
as a simple storage block device, and no filesystem is created.

If there is no filesystem, then there is also no need to create a single
"backup.tar" (or "backup.tar.gz") file to store there.  Instead, you
should write the data directly to the device.

Please review Lecture 02 and Lecture 07 for the related concepts.

Secondly, you must not create a local tar archive file: first of all,
it's unnecessary if you're writing the data to a block device instead of
storing it as a file, but more importantly, you cannot backup one
filesystem to a remote location by previously storing a copy of it on
itself.  Suppose you were to back up the entire filesystem (ie "/"), but
you store a copy of the archive on the local filesystem: you're likely
to run out of disk space locally.

So in short, you want to run (more or less) the following:

tar cf - $directory | ssh instance "dd of=/dev/whatever"

(Note: "/dev/", not "/mnt/some-filesystem".)

--

When using the rsync backup method, you do need a filesystem on the
volume.  The point of allowing this method is to let the user perform an
incremental backup, ie only back up files that have changed.  (When
using 'tar + dd', you necessarily copy _all_ data every time you run the
command; when using 'rsync', it can copy only files that have changed
since the last time you ran the command.)

Incremental backups become entirely impossible if you always create a
new filesystem on the target volume, thus overwriting any previously
existing data.  That is, if the user chooses 'rsync' as the backup
method, and the user specifies a volume, then creating a new filesystem
on the specified volume destroys the older data and leads to 'rsync'
performing a full, rather than an incremental backup.

--

Creating files while backing up a directory is another issue many of you
had.  As noted above, you run into the problem of running out of
diskspace when you try to back up a directory containing 60 GB of data
when the filesystem has a total capacity of, say, 80 GB.

But another issue is that you cannot assume that you can write to the
current working directory from which your command may be invoked.
Remember that we are writing general purpose tools, and you cannot make
such assumptions, since your tool could be invoked from anywhere on the
filesystem to back up any portion of the filesystem.  For example,
consider the following invocation:

cd /usr
ec2-backup .

Many of your submissions would fail, because they're trying to write a
file to the current working directory ("/usr" in this case).

Handling temporary files is much trickier than unexperienced systems
programmers usually think: you need to find a location on the filesystem
where you can write to: "/tmp" is a good idea, but that may have
insufficient disk space or open up the possibility of a symlink attack
(if I know ec2-backup writes to /tmp/ec2-backup.tar, then I can create a
link from /tmp/ec2-backup.tar to your ~/.ssh/authorized_keys or any
other file in your home directory and the next time you run the command
you overwrite your file), and creating safe, unguessable temporary files
isn't trivial ; writing data to the invoking user's home directory may
run into quota issues; etc. etc.

Finally, if your program doesn't succeed, you need to remember to remove
the file.  This includes cases where the user interrupts the program,
meaning you have to install an exit handler to clean up after you.

In short, when you write any unix tools, try to avoid creating temporary
files.

--

Backing up a directory may include files that you do not have read
permissions for.  Your program needs to be able to handle this situation
by backing up what it can and warning the user about the files it cannot
back up.

--

When your program terminates (either successfully, unsuccessfully, or
because it was interrupted), it needs to shut down any EC2 instances it
may have created.

--

After some back and forth on the mailing list, most of you seem to have
understood the idea of allowing the user to influence the program via
the EC2_BACKUP_ environment flags.  However, most of you are still
making assumptions about the user's environment or behaviour of the AWS
tools.

If you are processing the output of the aws(1) commands, then you need
to make sure that the output matches the format you expect.  The aws(1)
commands allow the user to specify an output format (either json or
text), and several of your tools failed to explicitly specify which
output format they expected.

The volume and instance you use need to be in the same availability
zone; some of you explicitly specified the zone for the volume, but not
for the instance.  This may have accidentally worked for you, but would
not work for another user who may have a different default availability
zone.

Remember to always be explicit.

--

Some of you submitted programs that contained syntax errors when run on
linux-lab.cs.stevens.edu.  I don't understand how you may end up doing
that, but it should be obvious that your program needs to actually run
on the platform specified in order to get a decent grade.

--

There is a lot of code duplication in your programs.  That is, you
frequently have code blocks that repeat even though they are nearly
identical.  You need to learn to modularize your code and split it into
well-defined functions.

You also need to learn to write code in a unified, consistent style when
working with others.  Indentation, use of braces, brackets, operators,
comments etc. all should be consistent.  Readability counts.

--

After a long semester of me insisting that you do all your work on
linux-lab.cs.stevens.edu, several of you are _still_ submitting files
(such as the README) that were created on a different platform.  I fail
to understand your thought process behind this.

--

The manual page provided to you is specific about the output your
program generates.  Deviations from this requirement are a bad idea.
The reason for the output (only the volume ID in question) is that one
could conceivably build other tools around ec2-backup(1).  This becomes
harder if your program generates output like:

"The Volume ID is: vol-12345"

instead of:

vol-12345

Stick to the program definition and always consider that a unix tool you
built will be used by other tools.

The manual page also provides example invocations.  It's a fairly good
idea to make sure that your tools can handle at least those use cases.

--

Overall, I hope that in this assignment you learned a bit about how
convoluted automating a common sysadmin task can become when you try to
turn something that might work for one user into a generally useful
tool.  Almost all of you would do well working on cleaning up your code,
making it more modular, reducing repetition, ensuring messages are
properly spelled, and cleaning up formatting.  Writing simple and easy
to understand tools is something you only learn with practice, so the
next time you have to put together a script for yourself, try to evolve
it into a general purpose utility.

-Jan