[cs615asa] Homework #2 revisited

Jan Schaumann jschauma at stevens.edu
Wed Mar 18 21:02:10 EDT 2015


Hello,

I've finished grading homework #2, and I will send out grades later
today.  Since there were many aspects where several of you came up with
the same solution or ran into the same problem, I will try to cover the
common issues in this email rather than in individual feedback.  Please
do take the time to carefully read this email and review your homework
submission to see what you might be able to improve and to better
understand the concept of package management.

As noted in the assignment, the objective was for you to produce a
package for 'awscli'; as is usual for any assignment in this class (and,
more generally, any task related to System Administration), there are a
number of hidden requirements and other surprises lurking.

The assignment should have given you an opportunity to build an RPM
package.  Your submission was to include again the documentation as well
as the spec file to create the package.  The resulting package should be
able to be installed on a new system without issues.

--

I may have mentioned this before, but you should create your
documentation on linux-lab.cs.stevens.edu and submit it in plain text or
HTML format.  DO NOT SEND ME .docx, .PDF, .zip, or .rar FILES.  If you
do, or if your HTML is generated by Microsoft Word (or something
similar), you will lose points.

This is not just evil pedantry on my side: if you copy and paste your
.spec file into a Word document and send that to me, it may end up with
Windows end-of-line characters and as a result not work when I try to
use it.

Please do not do your work on your laptop (regardless what OS), but
instead on linux-lab.cs.stevens.edu.

--

Many of you submitted documentation with typos and errors.  For English
text, please use a spell-checker.  For commands and other content,
please double-check that what you are writing is correct.  If you
provide documentation and the commands you're providing are a mere
approximation of what users need to type, then your documentation is not
going to be very useful.  Pretend you write the docs for a general
audience, not just for me.

Similarly, make sure that the steps you provide are complete.  Don't
assume the user knows that the location of the software archive is to be
~/rpmbuild/SOURCES; your documentation should be sufficient to follow
step by step to produce the package without second guessing.

--

Many of you found the useful "How to create an RPM package"
documentation at
https://fedoraproject.org/wiki/How_to_create_an_RPM_package.  Many of
you then proceeded to copy verbatim the first section of this document.
As I've mentioned before, I'm asking you to please read and _understand_
all the steps and commands you are running.  For example, most of you
explained that one should create a new user and add it to the 'mock'
group:

usermod -a -G mock makerpm

Not a single one of you explained why that would be necessary.  None of
you made use of any resources owned by a 'mock' group.

Please do not blindly copy or follow commands without understanding why
you should run them.

--

Many of you dutifully copied the paragraph on never building packages as
the 'root' user from that URL... and then proceeded to invoke sudo(8) as
the build user to install files into the running system:

%install
sudo ./awscli-1.7.13/install -i /usr/local/aws -b /usr/bin/aws

Again, please make sure to understand the words that you're copying into
your documentation.  The point of not building a package as 'root' is to
avoid changing the system on which you are building the package (the
"build system").  Installing the files you wish to package up on the
build system using sudo(8) defeats the point of having a dedicated build
user.

The RPM package system (as well as other package managers) are generally
specifically designed to allow you to build a new package in a specified
destination directory, the so-called "build root".  By doing this, you
are ensuring that you are (a) not polluting the build system, (b) are
identifying all the required dependencies, and (c) allow the package to
generate and verify the list of files it should install (aka the
"manifest").

--

The 'awscli' software is made available under the Apache License.  Not
the GPL.  This is a rather important difference, as the software license
defines what you can and cannot do with the software in question.  It
appears that many of you assume that if software is available in code
form, that it then must necessarily be "Open Source" and that that is
equivalent to being licensed under the GNU Public License.

Please use your favorite search engine to research the meaning of the
terms "Open Source", "Apache License", and "GNU Public License".

--

The 'awscli' software is available via different methods.  Most of you
chose to retrieve it from one of the following locations:

- via curl(1) or wget(1) from
  https://s3.amazonaws.com/aws-cli/awscli-bundle.zip
- via git(1) from https://github.com/aws/aws-cli.git

This will get you the latest version of the code.  However, when
creating a software package, you want to have a specific version, so
that you can deploy the same version of the software across all your
systems and not get one version one time you build the package and
another version another time.

When using the unversioned code -- i.e., the software development
branch, or 'HEAD' -- you also run into a number of problems: your .spec
file may assume that the (unversioned) .zip file extracts into a
versioned subdirectory (awscli-1.7.13, for example), or that the files
found in the archive are of a specific version.  Trying to build your
package will then fail every time the code or distribution is changed,
which is outside of your control.  As a result, several of you had a
package that you claimed to contain version 1.7.12 or 1.7.13 of
'awscli', but when I tried to build it, it was already version 1.7.14
(with several of its bundled dependencies also having increased in
version number and the build process necessarily failing).

(Some of you didn't pay attention to the version numbers at all, and
simply called your package "awscli-1.0", which is quite simply wrong.)

A better approach was to retrieve a fixed (or 'pinned') version of the
code, for example from:

- https://github.com/aws/aws-cli/archive/1.7.13.tar.gz
- https://pypi.python.org/packages/source/a/awscli/awscli-1.7.12.tar.gz

This way, you are sure that the files in the distribution do not change
in between your creating the package and somebody else doing so.  That
is, what you package as version 1.7.13 will in fact contain version
1.7.13 of the software.

--

Several of you identified the command rpmlint(1) as being helpful.
Unfortunately all of you who did so, still had rpmlint(1) generate
warnings when run against your .spec file.  Just like with warnings
generated by a compiler or any other tool you run, you should pay
attention to them and address them.

--

Several of you noted that it's necessary to add the following line to
your ~/.rpmmacros file:

%__arch_install_post /usr/lib/rpm/check-rpaths /usr/lib/rpm/check-buildroot

None of you explained why.  What problem does this solve?  Why is this
necessary?

Again, make sure you understand (and in the case of providing
documentation, explain) all the commands you're using.

--

'awscli' requires a few things to be able to run.  The task of the
packager (i.e. you) is to explicitly describe what other packages are
needed, so that the tools (rpm(1), yum(1)), etc.) can automatically
resolve them.

With 'awscli' this is a little tricky: this software depends on some
python modules, but not all of them are available as RPMs.  However,
some of them are.  Here you need to find a way to provide those that are
not available (e.g. botocore, bcdoc), while using the existing
'Requires' mechanism to define the ones that are available (e.g.
python-docutils).

There is no perfect solution for this other than building RPMs for those
dependencies without existing RPMs.  Several of you chose to use the
'easy_install' or 'pip' commands to install the missing packages in the
'%pre' or '%post' section of the RPM.  This works in most cases, but has
the following drawbacks:

- If the system on which you're installing the RPM is not able to talk
  to the internet, the software can't be installed; this is not as
  unlikely as you may think: letting production systems randomly talk to
  the internet is not a good idea and firewall rules often prohibit
  this.

- You rely on another tool to pull in random files from the internet at
  package installation time.  If you provide a package, you have full
  control over and the ability to QA the software.

- You assume that 'easy_install' or 'pip' are installed on the system.
  This assumption does not always hold; you'd have to make the software
  packages that provide these tools a dependency.

- Your package does not provide a dependency on these files; if you
  later check for the existence of files on your host that do not belong
  to any package and identify the 'botocore' python package, for
  example, you might choose to remove it, because nothing on your system
  uses this software, as far as you can tell.  That is, you cannot
  possibly know all the software dependencies on your host -- after all,
  that is what the package manager is for.  Removing this software will
  break 'awscli'.  Similarly, you may not even be aware that the
  'botocore' software is installed.  If a vulnerability in 'botocore' is
  announced, you won't patch your systems, because you think you are not
  affected.

Some of you tried to solve the dependency- and package problem by using
this approach in the .spec file:

%install
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
sudo python get-pip.py
sudo pip install awscli 

This defeats the point of having an RPM altogether, and is not a
reasonable solution.  (Worse so if you do not explain in your
documentation how you arrived at this solution or why you think it's
reasonable.)

Some of you decided to install the software (in the buildroot) using the
'install' script provided in the awscli package:

%install
rm -rf $RPM_BUILD_ROOT
./install -i $RPM_BUILD_ROOT%{_prefix}/aws
mkdir %{buildroot}/usr/bin
ln -s ../aws/bin/aws $RPM_BUILD_ROOT/usr/bin/aws

This solves the problem of specifying dependencies, since this bundles
everything 'aws' needs under the $PREFIX/aws directory.

However, this also means that you are ending up with a whole other copy
of 'python' and all possible dependencies in this location.  That is,
your package does not take advantage of many of the things a package
manager is good at (such as identifying already installed dependencies),
and you again do not retain an accurate description of what software of
what version is present on your system: if you want to upgrade 'python',
you are not going to (be able / remember to) update this version.

Anytime your software wants to have its own directory somewhere with all
sorts of additional things installed, that hints at a packaging problem.

So how should we solve this problem?  As I said, the correct solution
would be to build RPMs for all the dependencies that do not exist in RPM
format.  I understand that you didn't have the time (or desire) to do
so, but I was looking for some thoughts from you on how to address this
alongside your solution.

--

If you finally arrive at a point where you produced an RPM from the
sources, make sure to test it on a new system (i.e. not the build
system).  This helps you verify that you have outlined all your
dependencies and are not (unknowingly) depending on software you may
have previously installed on the build system.

When installing a package, there should be no warnings.  You most
certainly must not use "--force" to install the package -- that breaks
many of the benefits of using a package manager to begin with.

Finally, after you have installed the package, make sure that the
command actually works.

--

As you see and have probably found out during the exercise, what may
have seemed like a simple task can lead to a surprising number of
unexpected problems and may be more difficult than initially expected.
Welcome to System Administration.

The next time you find yourself needing to create a package for a piece
of software, try to keep in mind that the objective is to create a
well-defined package that accurately describes the software in question
with sufficient granularity and explicit dependencies.  Getting there
isn't always easy.

-Jan


More information about the cs615asa mailing list