[cs615asa] Meetup Summary

bsakthiv bsakthiv at stevens.edu
Tue Apr 24 13:01:35 EDT 2018


Meetup Information:
Date: April 12, 2018
Topics: Efficient Autoscaling at HBO & Traffic Control Strategies w 
Envoy
Link: 
https://www.meetup.com/New-York-Kubernetes-Meetup/events/248205155/

I recently attended this meetup in NYC organized by New York Kubernetes 
Meetup group. My primary reason to attend this meetup as I am interested 
in learning about huge network traffic and also I can get more details 
about traffic congestion and the tools used to control that. Most of the 
attendees in these group were SysAdmin Professionals.

There were two talks by the speakers: James Polera and Mark McBride
Before getting into the talk, the Host Ariel Jatib presented about the 
products:
1. Sysdig Secure - Identify, block, and analyze unauthorized activity 
anywhere in your system. Built on deep container visibility, combined 
with Kubernetes, Docker, and Mesos integration to better defend your 
services. It is primarily used to Monitor & Resolve incidents faster.
The metrics collected is used for network forensic analysis.
2. Tigera - The first secure application connectivity solution designed 
from the ground-up for cloud-native environments, including multi-cloud 
deployments, that also connects and protects legacy applications running 
on virtual machines and bare metal hosts.

Talk1: "Efficient Autoscaling at HBO"
James Polera is a Site Reliability Engineer at HBO Digital Products 
where he works alongside his teammates across different engineering 
teams, with a focus towards making the underlying platform and 
microservice architecture performant, scalable and reliable for HBOGO 
and HBONOW.

Polera started with showing some of the K8s services and proxy about how 
Kubernetes Services are implemented at HBO. Kube-proxy is responsible 
for writing iptables rule that implements K8s services. He showed how to 
inspect that by running: iptables-save | grep Ipaddress commands.
Then proceeded with the discussion about using HTTP keep-alive in a 
micro-service architecture, focusing on how keep-alive configuration can 
have an impact on autoscaling.
Service-to-Service communication:
In this the API gateway pod need to do the following to reach an 
accounts pod:
- DNS lookup for accounts
- create a HTTP connection
- Make a request
In order to cut down on this overhead, the keepalives in the API gateway 
to accounts interaction. Instead, reuse
the existing connections for requests to accounts.
- DNS lookup for accounts
- create a HTTP connection
- Persist the connection
- Make a request
He spoke about a tool in K8s that allows us to have our deployments 
automatically resize based on the demand - the Horizontal Pod 
Autoscaler(HPA) that helps autoscaling. The long-lived keep-alive is 
preferable when there is periodic heavy load and he demonstrated this 
using real-world metrics graph.

He referred to some of the Keep-alive examples used in the client-side 
setting:
https://www.npmjs.com/package/keep-alive
https://golang.org/pkg/net/http/#DefaultTransport
https://golang.org/pkg/net/http/#DialerTransport


Talk2: "Traffic Control Strategies with Envoy"
Mark McBride is founder and CEO of Turbine Labs, makers of Houston, a 
modern traffic management plane. Prior to Turbine Labs, he ran 
server-side engineering at Nest. Before that he worked at Twitter, 
working on migrating their rails code base to JVM-based equivalents.

Mark talked about the better traffic control using Envoy tool in the 
Kubernetes over AWS. He mentioned about the general
good things Kubernetes leads to:
-Creating new services is easier
-Deploying new service version is easier
-Deploying smaller services is easier

He specified about two major goals about traffic control:
1. Resilience - Distributed systems are never up. Dealing with failures 
should be straightforward
2. Routing - Introducing a new code to the call chain is a common 
operation. It should be straightforward
Envoy provides utilities such as service discovery, load balancing, rate 
limiting, circuit breaking, stats, logging, tracing for modern 
service-oriented architecture. Envoy is an open source used by Google, 
Lyft, Apple etc.
Also, he mentioned about the tools that come with Envoy:
-Stats on listeners, clusters, protocols and more
-An admin server for direct observation and control

Mark demonstrated some examples using wrk and curl command to identify 
the latency, success rate for each thread and connections running on the 
clusters. Also, he showed the statistics for how the pending requests 
are not overflowed and queued one by one.
Usually, when there is a situation of more traffic than the system can 
handle, it would crash. But Envoy resolves that by supporting requests 
limits on a per-cluster basis. Also supports two priority groups 
allowing to save slots for important traffic.
There are few more examples he showed for the safe route retries and 
traffic shifting in multiple clusters and routes.

Example slides in the google drive:  
https://drive.google.com/open?id=1_CYLRs6McST2Tu-tKdCq6xhFn0mVmt7w

Finally, the speakers shared the email to apply for SysAdmin positions. 
I thought it might be useful for anyone interested:
Stacey.Lowenwirth at hbo.com
mark at turbinelabs.io

Overall, I learned about how the persistent connection using HTTP 
keep-alive is useful in load balancing traffic and the traffic control 
strategies with envoy in a very big service-oriented architecture. 
Almost many companies like Google, Apple, Lyft, Apple, Microsoft, 
Netflix, Verizon, Salesforce relay on Envoy for their traffic control 
and in future many companies would do so.

Relevant links:
https://sysdig.com/product/secure/
https://www.tigera.io/




More information about the cs615asa mailing list