[cs615asa] end of the semester

Jan Schaumann jschauma at stevens.edu
Sat May 14 13:20:44 EDT 2022


All,

I've just submitted the final grades based on the CtF
(all of you who actively participated received full
credit), your meetup, presentation, and quality of
your course notes together with the homework grades
you already received and weighted as outlined on the
course website.

In the last pre-class checkpoint, I had asked if there
were any questions you still had, so let me take the
opportunity here to try to answer those questions
below.

This, then, concludes the semester.  Best of luck in
your further career, academic or otherwise!

-Jan


Remaining questions:

> If a preshared secret is leaked, what protections
> are there to prevent encrypted packets from being
> decrypted?

In general: none.  A leaked symmetric key by
definition implies a compromise of the data protected
by that key.  (The same holds for asymmetric keys,
although for bi-directional decryption capabilities
you'd need to have gained access to the private key of
both parties.)

Some protocols do attempt to _minimize_ the impact of
such a compromise, but ensuring a shared key is only
used for one communication steam and is rotated
frequently.  Within TLS, for example, there exists the
concept of so-called "forward secrecy".  Recall that
TLS uses asymmetric key cryptography to negotatiate a
symmetric key for the specific session; if instead the
same symmetric key was used for all communications,
then a passive attacker collecting the encrypted data
could, at a later point when they managed to
compromise the private key in question, decrypt all of
the previously observed encrypted data.

For this reason, TLS uses cipher suites that are based
on the Diffie-Hellman key exchange (DHE) algorithm
(including elliptic curve, or ECDHE); the Signal
messaging app (and WhatsApp, which implements that
same algorithm) uses a Double Ratches Algorithm.  With
these, each session uses a unique symmetric key,
thereby making it impossible for an attacker to
retro-actively decrypt all content they observe.

See also:
https://en.wikipedia.org/wiki/Forward_secrecy

> say we have a file system with 100 free inodes, and
> we create a virtual disk with its own file system of
> 100 inodes. Would adding files to the virtual disk
> take up inodes on the host?

No.  Inodes are filesystem specific and only exist
within that filesystem.  If you create a virtual disk,
then that disk -- effectively a file -- uses up one
inode on the filesystem that it resides on.  Creating
a new filesystem on the virtual disk will not utilize
any inodes of the parent.

So in a way, you could overcome the problem of having
run out of inodes, so long as you still have storage
space and a file that already exists: overwrite an
existing file and turn it into a virtual disk, create
a new filesystem on that virtual disk, then use a
loopback mount to mount the virtual disk in the
current hierarchy, and you now can store new files on
that virtual disk.

However, as funny as that may seem, this really is no
different from you mounting any other new filesystem,
whether it's backed by a physical disk, over the
network or anything else: you are not creating new
files on the existing filesystem (which remains out of
inodes), but you are adding a new filesystem with its
own set of inodes.

> Looking back to the boot process, why is the MBR
> written in little endian? Does this provide some
> optimization?

Like so many things, this is a historical accident:
the MBR originated within the IBM Personal Computer
XT; those CPUs were little-endian, and hence so was
the MBR in all IBM compatible PCs.

Other boot sectors on other hardware architectures
were different and may have used other endiannesses,
although UEFI currently also only supports
little-endian processors.

>  We talked about email addresses and how they can
>  basically just be anything.  Why have they not been
>  regulated to only allow certain characters/must
>  follow a certain pattern?

They do follow a certain pattern.  It just happens to
be a pattern much more complex than most people
consider. :-)  The protocol was originally specified
in RFC821 (from 1982), from a time when mail (and
internet services in general) were much more
permissive, allowing relaying and forwarding, for
example.  It also tried to retain compatibility with
other messaging- or communication systems that people
were used to (e.g., UUCP).

On top of that, email ties in with, but is not
equivalent to, local account names, which reflect the
different limitations of different systems, but at the
same time is _also_ tied to what humans think of as
"names".

Coupled with plain text as the data format, it evolved
to allow many different edge cases as time progressed.
Were one to create a new communications system, then
I'm sure the format would be much more strictly
defined, but as so often, in order to retain backwards
compatibility with older systems, SMTP has to continue
to support those weird scenarios.

The good news is that most of the time one doesn't
need to care about that, and on some systems it's
entirely reasonable to put additional restrictions in
place and simply _not_ allow everything that's
_technically_ permitted.  But as people working with
SMTP on the administrative level, it's important to
understand what is possible.

> One subject I would like more clarification on would
> be the current situation with the transition from
> IPv4 to IPv6, particularly how quickly different
> sites and services have picked up IPv6, and how its
> adoption and availability are progressing.

In a word: poorly.  As we discussed, the IPv4 space is
already exhausted, but organizations are slow to adopt
and migrate to IPv6.  This is in part because it costs
a lot of time and efforts (i.e., money) to perform
this migration, but businesses see no immediate
benefit, as they tend to remain focused on short term
gains.

On the other hand, there are many ways to continue to
operate in the current IPv4 world, by way of
reclaiming unused space, deploying NAT, market of IPv4
space, etc.  This doesn't solve the problem, but
lets you get away with not making difficult changes
for the time being.

Why are businesses and other organizations not
prioritizing this work, knowing that it is inevitable
and necessary in the long term?

Well, as an analogy, consider climate change: we all
know -- and have known for decades -- what we have to
do, but nobody is willing to drastically cut carbon
emissions, eliminate fossil fuels, change
transportation or eating habits, etc.  The experts are
loudly telling everybody that we can't go on like
this, yet nobody acts.

> What sort of functionality do services like Datadog
> provide and why are they so essential to modern
> systems administration? Also, how do systems
> administrators use these tools and how does it
> influence their workflow and day-to-day operations?

There is a lot of data that is collected in every
organization of any size.  As discussed in class, you
can collect metrics and datapoints for just about any
aspect of your system with the goal to gain visibility
into the operations, into trends and usage.

But collecting this data isn't always straight
forward, and comining different datapoints into
meaningful metrics is something that many people need
to do, so naturally there developed an industry around
this need.

Monitoring services promise to make it easy for you to
not only collect information, but to process it, slice
and dice it as you need, to present you with the
findings in an easy to consume manner, and to let you
further process the information they extract.  This
can be a tremendously helpful source of information
for you.

System Administrators interact with these services
generally in a programmatic fashion via an API.  That
is, they install an agent or data collector on the
endpoints or data aggregation systems that forwards
the data (possibly after some anonymization) to the
service, which then exposes the desired metrics in a
programmatic fashion as well as via some graphical
dashboards.

Most operating centers have a few walls of graphs and
trends derived from this information, to allow you to
quickly spot visually where outliers occur, but in the
back you also tend to then hook this monitoring system
into others: if a certain metric goes above a certain
threshold, it might for example trigger a page out to
the current oncall engineer to react; certain usage
patterns can even lead to automated scaling of
resources, with new systems spun up or shut down to
meet the need.

The fun part then is that these systems become
critical enough for you that you then need to monitor
the monitoring systems...

>  If there is no specific job description for a
>  System Administrator, then how  do companies hire
>  for them?

Hiring in the tech industry is... not well defined.
This applies equally to almost all jobs in the sector,
not just System Administration, by the way.

Companies generally hire by way of broad job
descriptions that describe the current and anticipated
duties and the desired background knowledge.  If you
browse the open positions in this field, you will find
that of course there's some overlap (understanding of
Unix, TCP/IP networking, scripting, some programming),
but that they then quickly differ in specifics.

Resume ingesting systems and initial recruiters
screening the submissions generally perform some sort
of rough keyword matching and then run through a short
phone screen to verify that candidates can answer some
basic questions relevant to the position, before then
moving the candidate on to an interview panel where
people generally probe deeper based on their own
experience in the job.

But again, this really isn't all _that_ different from
interviewing for a software engineering or a program
manager position, for example.  Only in System
Administration, there is often times a bit more
flexibility on background and more weight on previous
experiences, perhaps.

> Regarding filesystem snapshotting tools: Given that
> some of these tools work by restoring the root inode
> of a filesystem, if the actual data itself, the data
> itself would not actually be be restored properly,
> correct?

Correct, if all you did was copy the inode itself but
didn't keep track of the disk blocks, then you would
lose that data indeed.  That is, if data on the data
blocks is actively overwritten, then it will be lost
in this scenario; otherwise, the data remains
available.

But I should note that filesystem snapshotting does
not just track the inode data, but also the data
blocks -- the illustrations in the videos may have
been simplifying things too much for better visual
understanding.

You can think of snapshotting filesystems as also
tracking data blocks in the same manner.  Visualize
them as a linked list of blocks; if one of the blocks
is modified, it is first copied, the linked list
updated to point to the new copy where you make the
changes, but the original reference list of data
blocks remains available so that when you want to roll
back, you restore the original list, thereby again
pointing to the original data block.

Now if you make changes to the data blocks _outside
the filesystem_ (e.g., by writing data to the raw
disk), then of course the data there can be lost /
overwritten without the ability to restore it from a
snapshot, and you'd have to go back to the offline
backup.  You do have an offline backup, don't you?


More information about the cs615asa mailing list