[cs615asa] [git commit] CS615 EBS-BACKUP; backup a directory into Elastic Block Storage (EBS) branch main updated. 4a8ed3e0ab073ee057404a0dcb6665ce2929ea52

Git Owner jschauma at stevens.edu
Wed Apr 14 18:14:13 EDT 2021


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CS615 EBS-BACKUP; backup a directory into Elastic Block Storage (EBS)".

The branch, main has been updated
       via  4a8ed3e0ab073ee057404a0dcb6665ce2929ea52 (commit)
      from  bebad79e0b831045e2ead467cb7a7bad8b208ce3 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 4a8ed3e0ab073ee057404a0dcb6665ce2929ea52
Author: camatang <camatang at stevens.edu>
Date:   Tue Apr 13 20:54:51 2021 -0400

    end-to-end progress

diff --git a/ebs-backup.txt b/ebs-backup.txt
new file mode 100644
index 0000000..4e55c9b
--- /dev/null
+++ b/ebs-backup.txt
@@ -0,0 +1,172 @@
+EBS-BACKUP(1)		NetBSD General Commands Manual		 EBS-BACKUP(1)
+
+NAME
+     ebs-backup -- backup a directory into Elastic Block Storage (EBS)
+
+SYNOPSIS
+     ebs-backup [-h] [-l filter] [-r filter] [-v volume-id] dir
+
+DESCRIPTION
+     The ebs-backup tool performs a backup of the given directory into Amazon
+     Elastic Block Storage (EBS).  This is achieved by creating a volume of
+     the appropriate size, attaching it to an EC2 instance and finally copying
+     the files from the given directory onto this volume.
+
+OPTIONS
+     ebs-backup accepts the following command-line flags:
+
+     -h		   Print a usage statement and exit.
+
+     -l filter	   Pass data through the given filter command on the local
+		   host before copying the data to the remote system.
+
+     -r filter	   Pass data through the given filter command on the remote
+		   host before writing the data to the volume.
+
+     -v volume-id  Use the given volume instead of creating a new one.
+
+DETAILS
+     ebs-backup will perform a backup of the given directory to an ESB volume.
+     The backup is done with the help of the tar(1) command on the local host,
+     writing the resulting archive directly to the block device.  That is,
+     ebs-backup does not create or use a filesystem on the volume.  Instead,
+     ebs-backup utilizes the dd(1) command to write out the data to the raw
+     volume.  In essence, ebs-backup wraps the following pipeline:
+
+	   tar cf - <dir> [| local-filter] | ssh ec2-instance "[remote-filter |] dd of=/dev/xbd2d"
+
+     Here, "/dev/xbd2d" stands for the suitable raw disk device, which may
+     differ depending on the instance type.
+
+     ebs-backup does not use any temporary files, nor creates a local copy of
+     the archive it writes to the volume.
+
+     ebs-backup can pass the archive it creates through a filter command on
+     either the local or the remote host.  This allows the user to e.g. per-
+     form encryption of the backup.
+
+     Unless the -v flag is specified, ebs-backup will create a new volume, the
+     size of which will be at least two times the size of the directory to be
+     backed up.
+
+     ebs-backup will create an instance suitable to perform the backup, attach
+     the volume in question and then back up the data from the given direc-
+     tory.  Afterwards, ebs-backup will terminate the instance it created.
+
+     ebs-backup will not create or modify any other AWS resources.
+
+VERIFICATION
+     By default, ebs-backup simply writes the data to the volume.  To verify
+     that this was successful, the user may manually perform the following
+     tasks:
+
+	   aws ec2 run-instances
+	   aws ec2 attach-volume
+	   ssh instance "dd if=/dev/xbd2d" | tar tvf -
+
+OUTPUT
+     If successful, ebs-backup will print the volume-id of the volume to which
+     it backed up the data as the only output.
+
+     Unless the EBS_BACKUP_VERBOSE environment variable is set, ebs-backup
+     will not generate any other output unless any errors are encountered.  If
+     that variable is set, it may print out some useful information about what
+     steps it is currently performing.
+
+     Any errors encountered cause a meaningful error message to be printed to
+     STDERR.
+
+ENVIRONMENT
+     ebs-backup assumes that the user has set up their environment for general
+     use with the EC2 tools and ssh(1) without any special flags on the com-
+     mand-line.	 That is, the user has a suitable section in their ~/.ssh/con-
+     fig file to ensure that running 'ssh ec2-instance.amazonaws.com' suc-
+     ceeds.
+
+     To accomplish this, the user has created an SSH key pair named 'ebs-
+     backup' and configured their SSH setup to use that key to connect to EC2
+     instances.
+
+     Therefor, ebs-backup will not set nor modify the variables AWS_CON-
+     FIG_FILE, EC2_CERT, EC2_HOME or EC2_PRIVATE_KEY.
+
+     ebs-backup allows the user to add custom flags to the commands related to
+     starting a new EC2 instance via the EBS_BACKUP_FLAGS_AWS environment
+     variable.
+
+     ebs-backup also assumes that the user has set up their ~/.ssh/config file
+     to access instances in EC2 via ssh(1) without any additional settings.
+     It does allow the user to add custom flags to the ssh(1) commands it
+     invokes via the EBS_BACKUP_FLAGS_SSH environment variable.
+
+     As noted above, the EBS_BACKUP_VERBOSE variable may cause ebs-backup to
+     generate informational output as it runs.
+
+EXIT STATUS
+     The ebs-backup will exit with a return status of 0 under normal circum-
+     stances.  If an error occurred, ebs-backup will exit with a value >0.
+
+EXAMPLES
+     The following examples illustrate common usage of this tool.
+
+     To back up the entire filesystem:
+
+	   $ ebs-backup /
+	   vol-1a2b3c4d
+
+     To create a complete backup of the current working directory using
+     defaults to the volume with the ID vol-1a2b3c4d, possibly overwriting any
+     data previously stored there:
+
+	   $ ebs-backup -v vol-1a2b3c4d .
+	   vol-1a2b3c4d
+
+     To do the same thing again, but having the program tell us what it's
+     doing, the user can set the EC2_BACKUP_VERBOSE environment variable.
+     Possible diagnostic messages generated when that variable is set are
+     shown below:
+
+	   $ EBS_BACKUP_VERBOSE=1 ebs-backup -v vol-1a2b3c4d .
+	   Verifying volume...
+	   Volume vol-1a2b3c4d is in availability zone 'us-east-1a'.
+	   Creating a suitable instance in 'us-east-1a'...
+	   Attaching volume vol-1a2b3c4d to instance i-123abcd456...
+	   Performing backup...
+	   Terminating instance i-123abcd456...
+	   Backup complete, 1.7 GB of data written to:
+	   vol-1a2b3c4dgg
+
+     Suppose a user has their ~/.ssh/config set up to use the private key
+     ~/.ec2/stevens but wishes to use the key ~/.ssh/ec2-key instead:
+
+	   $ export EBS_BACKUP_FLAGS_SSH="-i ~/.ssh/ec2-key"
+	   $ ebs-backup .
+	   vol-1a2b3c4d
+
+     To force creation of an instance type of t1.micro instead of whatever
+     defaults might apply
+
+	   $ export EBS_BACKUP_FLAGS_AWS="--instance-type t1.micro"
+	   $ ebs-backup .
+	   vol-1a2b3c4d
+
+     To locally encrypt the backup of the '/var/secrets' directory:
+
+	   $ ebs-backup -l 'gpg -e -r 9BED3DD7' /var/secrets
+	   vol-1a2b3c4d
+
+     The same as above, but perform encryption on the remote system:
+
+	   $ ebs-backup -r 'gpg -e -r 9BED3DD7' /var/secrets
+	   vol-1a2b3c4d
+
+SEE ALSO
+     aws help, cat(1), dd(1), ssh(1), tar(1)
+
+HISTORY
+     ebs-backup was originally assigned by Jan Schaumann
+     <jschauma at cs.stevens.edu> as a homework assignment for the class "Aspects
+     of System Administration" at Stevens Institute of Technology in the
+     Spring of 2011.
+
+NetBSD 8.0		       February 7, 2021			    NetBSD 8.0
\ No newline at end of file
diff --git a/src/README b/src/README
index 8b9254a..a91a05c 100644
--- a/src/README
+++ b/src/README
@@ -25,9 +25,10 @@ https://docs.python.org/3/library/venv.html
 
 **Running outside of Pycharm***
 ```
-# activating
-> source venv/bin/activate
+# activating (Linux)
+$ source env/bin/activate
+# activating (Windows)
+C:\> source venv\Scripts\activate.bat
 # deactivating
 > deactivate
 ```
-
diff --git a/src/argument_parsing.py b/src/argument_parsing.py
index 2195f7e..38ec315 100644
--- a/src/argument_parsing.py
+++ b/src/argument_parsing.py
@@ -39,6 +39,16 @@ def parse_args(args):
       type=str,
       help='Use the given volume instead of creating a new one.',
   )
+
+  # TODO: remove this when done w/ development
+  parser.add_argument(
+      '-i',
+      metavar='instance-id',
+      type=str,
+      help='FOR DEBUGGING PURPOSES: Use this existing ec2 instance rather than creating a new one. \
+           If set, will not shutdown/terminate afterwards.',
+  )
+
   parser.add_argument(
       "dir",
       help="Directory to backup",
diff --git a/src/ec2.py b/src/ec2.py
index 77d4b49..0e45e72 100644
--- a/src/ec2.py
+++ b/src/ec2.py
@@ -3,6 +3,10 @@ import os
 from botocore.exceptions import ClientError
 
 
+def parse_overrides(override_string):
+    pass
+
+
 class EC2(object):
     session = None
     ec2_client = None
@@ -15,22 +19,33 @@ class EC2(object):
         self.session = session
         self.ec2_client = self.session.client('ec2')
 
-        # TODO: need to parse 'EBS_BACKUP_FLAGS_AWS' in
-        #  order to override defaults...
-        if 'EBS_BACKUP_VERBOSE' in os.environ:
-            overrides = os.environ['EBS_BACKUP_VERBOSE']
+        if ('instance_id' in config) and config['instance_id'] is not None:
+            for i in session.resource('ec2').instances.all():
+                if i.instance_id == config['instance_id']:
+                    self.instance = i
+                    self.instance_id = i.instance_id
+            if self.instance is None:
+                raise Exception('Could not find an instance with ID: ' + config['instance_id'])
+        else:
+            # TODO: need to parse 'EBS_BACKUP_FLAGS_AWS' in
+            #  order to override defaults...
+            if os.getenv('EBS_BACKUP_FLAGS_AWS') is not None:
+                parse_overrides(os.getenv('EBS_BACKUP_FLAGS_AWS'))
 
-        instance = self.ec2_client.run_instances(
-            ImageId='ami-0018b2d98332ba7e3',
-            MinCount=1,
-            MaxCount=1,
-            InstanceType='t2.micro',
-            Placement={"AvailabilityZone": config.get('zone_id', 'us-east-1a')}
-        )
-        # TODO: make sure we are getting the correct instance
-        #  (and not an existing instance on the account)
-        self.instance = instance["Instances"][0]
-        self.instance_id = self.instance["InstanceId"]
+            instance = self.ec2_client.run_instances(
+                ImageId='ami-0018b2d98332ba7e3',
+                MinCount=1,
+                MaxCount=1,
+                InstanceType='t2.micro',
+                KeyName='ebs-backup',
+                Placement={"AvailabilityZone": config.get('zone_id', 'us-east-1a')}
+            )
+            # TODO: make sure we are getting the correct instance
+            #  (and not an existing instance on the account)
+            self.instance_id = instance["Instances"][0]["InstanceId"]
+            for i in session.resource('ec2').instances.all():
+                if i.instance_id == self.instance_id:
+                    self.instance = i
 
     def get_a_zone(self):
         try:
@@ -54,6 +69,9 @@ class EC2(object):
         return self.instance_id
 
     def get_ip(self):
+        # Seems like we need to call reload otherwise the DNS name is blank
+        # https://stackoverflow.com/a/60880894/1613023
+        self.instance.reload()
         return self.instance.public_ip_address
 
     def wait_for_instance(self):
@@ -65,19 +83,18 @@ class EC2(object):
 
     def cleanup(self):
         try:
-            res1 = self.ec2_client.stop_instances(
-                InstanceIds=[self.instance_id],
-                DryRun=False
-            )
-            res2 = self.ec2_client.filter(
-                InstanceIds=[self.instance_id],
-            ).terminate()
+            # Terminate rather than stopping then terminating
+            # otherwise non-graceful exits could result in a
+            # build-up of useless stopped instances
+
+            # res1 = self.ec2_client.stop_instances(
+            #     InstanceIds=[self.instance_id],
+            #     DryRun=False
+            # )
+            res = self.instance.terminate()
 
             if 'EBS_BACKUP_VERBOSE' in os.environ and os.environ['EBS_BACKUP_VERBOSE']:
-                print(res1)
-                print(res2)
+                print(res)
         except ClientError as e:
             if 'EBS_BACKUP_VERBOSE' in os.environ and os.environ['EBS_BACKUP_VERBOSE']:
                 print(e)
-
-
diff --git a/src/main.py b/src/main.py
index aa5a0bb..445b8d1 100644
--- a/src/main.py
+++ b/src/main.py
@@ -1,14 +1,14 @@
 import os
 import subprocess
 import sys
+import atexit
 
 import boto3
 
-from src.ec2 import EC2
-from src.volume import Volume
+from ec2 import EC2
+from volume import Volume
 
 from argument_parsing import parse_args
-from env_parsing import parse_env
 
 #### TODO's ####
 # [ ] determine whether to use subprocess or python logic for various operations
@@ -19,8 +19,8 @@ from env_parsing import parse_env
 ### Questions's ###
 # [ ] can we have both local and remote filters?
 
-
 ZONE = 'us-east-1'
+KEY_FLAGS = os.getenv('EBS_BACKUP_FLAGS_SSH') or ''
 
 # Windows alternative to ~/.aws/credentials & ~/.aws/config (for local/testing purposes)
 # achieves the same result of boto3.[resource|client]('service') via
@@ -31,61 +31,78 @@ session = boto3.Session(
     region_name=ZONE,
 )
 
-
-def upload(user, host, dir):
+def upload(user, host, _dir):
     # scp directory to be backed up to ec2 instance
     # flags = os.environ['EBS_BACKUP_VERBOSE']
-    subprocess.call('scp {flags} {dir} {user}@{host}:/tmp'.format(
-        flags='', user=user, host=host, dir=dir
+    subprocess.Popen('scp -v -o StrictHostKeyChecking=no {key} {dir}.tar {user}@{host}:/tmp/'.format(
+        user=user, host=host, dir=_dir, key=KEY_FLAGS
     ).split(' '))
 
-# TODO: ensure we aren't overwriting something
-def tar(user, host, dir):
-    subprocess.call('tar czf - {dir} > {dir}.tgz'.format(
-        user=user, host=host, dir=dir,
-    ).split(' '))
+    # ssh = SSHClient()
+    # ssh.load_system_host_keys()
+    # ssh.connect(hostname=host, username=user, key_filename=KEY_NAME)
+    # scp = SCPClient(ssh.get_transport())
+    # scp.put('{dir}.tar'.format(dir=_dir), remote_path='~')
+    # scp.close()
 
-    # # Result of this tar will get written to our the volume
-    # flags = os.environ['EBS_BACKUP_VERBOSE']
-    # if flags:
-    #     subprocess.call('ssh {flags} {user}@{host} tar czf - tmp/{dir} > {dir}.tgz'.format(
-    #         flags=flags, user=user, host=host, dir=dir,
-    #     ).split(' '))
-    # else:
-    #     subprocess.call('ssh {user}@{host} tar czf - tmp/{dir} > {dir}.tgz'.split(' '))
-
-def backup(user, host, dir):
-    subprocess.call('ssh {user}@{host} dd if=/tmp/{dir}.tgz of=/dev/sdf'.format(
-        user=user, host=host, dir=dir
+
+# ebs - backup does not use any temporary files, nor creates a
+# local copy of the archive it writes to the volume.
+def tar(dir):
+    subprocess.call('tar -cvf {dir}.tar {dir}'.format(dir=dir).split(' '))
+
+
+def backup_data(user, host, dir):
+    # TODO: make sure key exists
+    subprocess.call('ssh -v {key} {user}@{host} dd if=/tmp/{dir}.tar of=/dev/xbd1'.format(
+        user=user, host=host, dir=dir, key=KEY_FLAGS
     ).split(' '))
 
+
 def calculate_dir_size(dir):
     # TODO: make sure this is x-platform compatible
     total = sum(d.stat().st_size for d in os.scandir(dir) if d.is_file())
-    print(total)
     return total
 
+
+### Always cleanup ###
+def exit(ec2):
+    def handle():
+        ec2.cleanup()
+
+    return handle
+
+
 def backup(args):
     ec2 = None
-
     try:
         size = calculate_dir_size(args.dir)
 
-        # 1. create ec2 ...
-        ec2 = EC2(session, {
-            'zone_id': ZONE + 'a',
-        })
+        if args.i is None:
+            # 1. create ec2 ...
+            ec2 = EC2(session, {
+                'zone_id': ZONE + 'a',
+            })
+            # If the EC2 instance gets created let's be sure to cleanup
+            # regardless of manner of exit
+            atexit.register(exit(ec2))
+        else:
+            ec2 = EC2(session, {
+                'instance_id': args.i,
+            })
 
         # 2. volume [creation] attachment ...
+
         vol = Volume(session, {
             'volume_id': args.v,
-            'size': size * 2,
+            'size': size,
         })
         ec2.wait_for_instance()
+
         vol.attach_to_instance(ec2.instance_id)
 
         # 3. tar ...
-        tar('root', ec2.get_ip(), args.dir)
+        tar(args.dir)
 
         if args.l is not None:
             # 4. apply local-filter ...
@@ -99,12 +116,25 @@ def backup(args):
             pass
 
         # 7. copy to volume ...
-        backup('root', ec2.get_ip(), args.dir)
+        backup_data('root', ec2.get_ip(), args.dir)
+
+        vol.cleanup(ec2.get_id())
 
         # 8. teardown ...
         ec2.cleanup()
-    except:
-        ec2.cleanup()
+
+        # remove local tar after upload
+        subprocess.call('rm {dir}.tar')
+
+        # If successful, ebs - backup will print the volume - id of the volume
+        # to which it backed up the data as the only output.
+        vol.id()
+    except Exception as ex:
+        # print meaningful errors to STDERR
+        print(ex, file=sys.stderr)
+        if ec2 is not None:
+            ec2.cleanup()
+
 
 if __name__ == '__main__':
     args = parse_args(sys.argv[1:])
diff --git a/src/volume.py b/src/volume.py
index 694761b..546d97f 100644
--- a/src/volume.py
+++ b/src/volume.py
@@ -10,6 +10,10 @@ class Volume(object):
     ec2_volume = None
     volume_id = None
 
+    # chosen from: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
+    # note: the name of the device on the machine doesn't necessarily match the volume
+    device_name = "/dev/xbd1"  # Just for example. Should determine based on instance type
+
     def __init__(self, session, config=None):
         if config is None:
             config = {}
@@ -18,13 +22,21 @@ class Volume(object):
 
         # for new volumes
         if not ('volume_id' in config) or config['volume_id'] is None:
+
+            # volume sizes are in gigabytes
+            true_size = (config['size'] / 1000 / 1000) * 2
+            true_size = 1 if true_size < 1 else true_size
+
             res = self.ec2_client.create_volume(
                 AvailabilityZone='us-east-1a',
                 # TODO: this needs to be 2x the dir that is being backed up
-                Size=config['size'],
+                Size=true_size,
             )
             self.volume_id = res['VolumeId']
+        else:
+            self.volume_id = config['volume_id']
         # for existing volumes
+        # TODO: what if the existing volume is not large enough
         self.ec2_volume = session.resource('ec2').Volume(self.volume_id)
         if self.ec2_volume is None:
             raise Exception("The provided volume ID does not exist")
@@ -34,29 +46,22 @@ class Volume(object):
     # After volume is attached, it must be made available
     # TODO: Must find out how to determine suitable raw disk device, may differ depending on the instance type
     def attach_to_instance(self, instance_id):
-        # chosen from: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
-        device_name = "/dev/sdf"  # Just for example. Should determine based on instance type
-        try:
-            res = self.ec2_volume.attach_to_instance(
-                Device=device_name,
-                InstanceId=instance_id,
-                DryRun=True
-            )
-            # Note: potential metadata we might be interested in
-            res['ResponseMetadata']
-        except ClientError as e:
-            if 'DryRunOperation' not in str(e):
-                raise
-        try:
-            res = self.ec2_volume.attach_to_instance(
-                Device=device_name,
-                InstanceId=instance_id,
-                DryRun=False
-            )
-            # Note: potential metadata we might be interested in
-            res['ResponseMetadata']
-        except ClientError as e:
-            print(e)
+        res = self.ec2_volume.attach_to_instance(
+            Device=self.device_name,
+            InstanceId=instance_id,
+            DryRun=False
+        )
+        # Note: potential metadata we might be interested in
+        res['ResponseMetadata']
+
+    def cleanup(self, instance_id):
+        self.ec2_volume.detach_from_instance(
+            Device=self.device_name,
+            InstanceId=instance_id,
+        )
+
+    def id(self):
+        return self.volume_id
 
     def delete(self):
         # TODO: ensure `volume_id` is set

-----------------------------------------------------------------------

Summary of changes:
 ebs-backup.txt          | 172 ++++++++++++++++++++++++++++++++++++++++++++++++
 src/README              |   7 +-
 src/argument_parsing.py |  10 +++
 src/ec2.py              |  69 +++++++++++--------
 src/main.py             | 102 ++++++++++++++++++----------
 src/volume.py           |  53 ++++++++-------
 6 files changed, 324 insertions(+), 89 deletions(-)
 create mode 100644 ebs-backup.txt


hooks/post-receive
-- 
CS615 EBS-BACKUP; backup a directory into Elastic Block Storage (EBS)


More information about the cs615asa mailing list