[cs615asa] Meetup Summary
bsakthiv
bsakthiv at stevens.edu
Tue Apr 24 13:01:35 EDT 2018
Meetup Information:
Date: April 12, 2018
Topics: Efficient Autoscaling at HBO & Traffic Control Strategies w
Envoy
Link:
https://www.meetup.com/New-York-Kubernetes-Meetup/events/248205155/
I recently attended this meetup in NYC organized by New York Kubernetes
Meetup group. My primary reason to attend this meetup as I am interested
in learning about huge network traffic and also I can get more details
about traffic congestion and the tools used to control that. Most of the
attendees in these group were SysAdmin Professionals.
There were two talks by the speakers: James Polera and Mark McBride
Before getting into the talk, the Host Ariel Jatib presented about the
products:
1. Sysdig Secure - Identify, block, and analyze unauthorized activity
anywhere in your system. Built on deep container visibility, combined
with Kubernetes, Docker, and Mesos integration to better defend your
services. It is primarily used to Monitor & Resolve incidents faster.
The metrics collected is used for network forensic analysis.
2. Tigera - The first secure application connectivity solution designed
from the ground-up for cloud-native environments, including multi-cloud
deployments, that also connects and protects legacy applications running
on virtual machines and bare metal hosts.
Talk1: "Efficient Autoscaling at HBO"
James Polera is a Site Reliability Engineer at HBO Digital Products
where he works alongside his teammates across different engineering
teams, with a focus towards making the underlying platform and
microservice architecture performant, scalable and reliable for HBOGO
and HBONOW.
Polera started with showing some of the K8s services and proxy about how
Kubernetes Services are implemented at HBO. Kube-proxy is responsible
for writing iptables rule that implements K8s services. He showed how to
inspect that by running: iptables-save | grep Ipaddress commands.
Then proceeded with the discussion about using HTTP keep-alive in a
micro-service architecture, focusing on how keep-alive configuration can
have an impact on autoscaling.
Service-to-Service communication:
In this the API gateway pod need to do the following to reach an
accounts pod:
- DNS lookup for accounts
- create a HTTP connection
- Make a request
In order to cut down on this overhead, the keepalives in the API gateway
to accounts interaction. Instead, reuse
the existing connections for requests to accounts.
- DNS lookup for accounts
- create a HTTP connection
- Persist the connection
- Make a request
He spoke about a tool in K8s that allows us to have our deployments
automatically resize based on the demand - the Horizontal Pod
Autoscaler(HPA) that helps autoscaling. The long-lived keep-alive is
preferable when there is periodic heavy load and he demonstrated this
using real-world metrics graph.
He referred to some of the Keep-alive examples used in the client-side
setting:
https://www.npmjs.com/package/keep-alive
https://golang.org/pkg/net/http/#DefaultTransport
https://golang.org/pkg/net/http/#DialerTransport
Talk2: "Traffic Control Strategies with Envoy"
Mark McBride is founder and CEO of Turbine Labs, makers of Houston, a
modern traffic management plane. Prior to Turbine Labs, he ran
server-side engineering at Nest. Before that he worked at Twitter,
working on migrating their rails code base to JVM-based equivalents.
Mark talked about the better traffic control using Envoy tool in the
Kubernetes over AWS. He mentioned about the general
good things Kubernetes leads to:
-Creating new services is easier
-Deploying new service version is easier
-Deploying smaller services is easier
He specified about two major goals about traffic control:
1. Resilience - Distributed systems are never up. Dealing with failures
should be straightforward
2. Routing - Introducing a new code to the call chain is a common
operation. It should be straightforward
Envoy provides utilities such as service discovery, load balancing, rate
limiting, circuit breaking, stats, logging, tracing for modern
service-oriented architecture. Envoy is an open source used by Google,
Lyft, Apple etc.
Also, he mentioned about the tools that come with Envoy:
-Stats on listeners, clusters, protocols and more
-An admin server for direct observation and control
Mark demonstrated some examples using wrk and curl command to identify
the latency, success rate for each thread and connections running on the
clusters. Also, he showed the statistics for how the pending requests
are not overflowed and queued one by one.
Usually, when there is a situation of more traffic than the system can
handle, it would crash. But Envoy resolves that by supporting requests
limits on a per-cluster basis. Also supports two priority groups
allowing to save slots for important traffic.
There are few more examples he showed for the safe route retries and
traffic shifting in multiple clusters and routes.
Example slides in the google drive:
https://drive.google.com/open?id=1_CYLRs6McST2Tu-tKdCq6xhFn0mVmt7w
Finally, the speakers shared the email to apply for SysAdmin positions.
I thought it might be useful for anyone interested:
Stacey.Lowenwirth at hbo.com
mark at turbinelabs.io
Overall, I learned about how the persistent connection using HTTP
keep-alive is useful in load balancing traffic and the traffic control
strategies with envoy in a very big service-oriented architecture.
Almost many companies like Google, Apple, Lyft, Apple, Microsoft,
Netflix, Verizon, Salesforce relay on Envoy for their traffic control
and in future many companies would do so.
Relevant links:
https://sysdig.com/product/secure/
https://www.tigera.io/
More information about the cs615asa
mailing list