[cs615asa] New York SRE Tech Talks Report

szhong2 szhong2 at stevens.edu
Mon May 1 15:42:20 EDT 2017


Hi everyone,

I attended the New York SRE Tech Talk last week and here are my summary, 
thoughts and comments of the event. I chose this tech talk because I am 
always interest in the field of SRE. I think this filed is not only 
about site reliability, the methodologies and architectures  can be used 
in many different kinds of software engineering. Plus, it’s awesome to 
have a chance to take a little tour in Facebook’s New York office.

Back to the tech talk. The first topic of the night was "Linux Network 
Switches", which was the part I was interested in the most. This topic 
was mainly about the Clos topology, which was used by Shapeways to 
manage their two data centers.

Compared to traditional "tree-like" topologies, Clos topology has its 
advantages. For example, in tree-like topologies, each physical host on 
a rack is connected to a single switch, which is likely mounted on the 
same rack. And the switch on one rack is connected to a parent switch in 
the data center or the server farm, so that different racks can 
communicate with each other. If there are more than one data centers, 
there will be a root router to route traffics across different data 
centers. So in this type of topology, there are at least three layers, 
which are root router, parent switch and hosts. If a host wants to 
exchange data with another host in a different data center, there will 
be at least three hops. But with Clos topology, things are simpler. In 
Clos topology, there are only two layers in the network, which are spine 
and leaf. Leaf means the physical hosts in the network, and spine is the 
root router. It is possible to have more than one spines. And in 
abstract, each leaf host is directly connected to all the spines. The 
spines handle all the routing across different hosts. So in Clos 
topology, only one hop is needed when a leaf host wants to talk to any 
other leaf host.

Also, the simplicity of Clos topology brings stability and scalability. 
Because there can be more than one spines and all leaf hosts are 
connected to all the spines, redundancy can be easily achieve by 
duplicating the spine layer. If one spine is down, the other spine will 
keep working and the whole network still functions normally. In 
tree-like topologies, because there are multiple layers and each layer 
has their own scope and function, the price is much higher to achieve 
redundancy.

The speaker from Shapeways also mentioned that in their Clos network, 
they used a BGP advertising mechanism to update routing information. 
When a new leaf host is up or a new spine is added, there will be a 
broadcast that tells all the hosts in the network about the new routing 
information. This reduces two ways queries when hosts want to locate 
another host.

In conclusion, the Clos topology reduces the cost of hardware layers and 
it is more efficient, scalable and stable than some traditional network 
topologies. I think the Clos topology is useful not only as a network 
topology, the architecture can also be ported to other fields. For 
example, the two layer architecture can be easily ported to a 
distributed system, in which the spine represents the master nodes and 
the leaf represents the slave nodes. And the advertising mechanism can 
be used to maintain things like the “finger table” in a distributed 
system.

The other two topics of the night were “Automating The Linux Kernel 
Validation” and “Dependency Traps and Gotchas”. The first one was about 
testing for Linux Kernel and CI/CD for Linux Kernel in Facebook. And the 
latter  was about traps, like deadlocks, in a system consisting of 
different micro services. These two topics were not as interesting as 
the first one to me, so I am not going to elaborate here in my report. I 
am sure other fellow classmates who attended the same event will have a 
more detailed description about these two topics.

That's all, thanks for reading.


Shaoliang Zhong


More information about the cs615asa mailing list