[cs615asa] Meetup Summary

Matthew Gomez mgomez1 at stevens.edu
Thu Apr 26 12:24:24 EDT 2018


Meetup Information:
Date: April 24, 2018
Topics: Big and Streaming Data on AWS
Link: https://www.meetup.com/DevOps-Efficiency-on-AWS-NYC/events/249379457/

Talk 1:
Keeping compute costs down as data pipelines grow can be complex but it
doesn't have to be.
Speaker: Yuval Dovrat, Director of Solutions Architecture, Spotinst.
https://spotinst.com/

Spotinst is a cloud based orchestration service that leverages the fact
that cloud providers sell their excess capacity at a discounted rate. This
discounted rate is anywhere from 70%-90% off of the on demand allocation
price. However, there is one big disadvantage to these spot instances; they
can be terminated at any time with very short notice, typically only a few
minutes, so they are not usually ideal for running production workloads.

Spotinst solves this problem by using a prediction algorithm to proactively
move instances in a clustered configuration that are about to be
terminated. Spotinst moves the instances by taking snapshots to insure that
no data is lost. It also attempts to minimize service interruptions by
moving public and private IP's and by ensuring that not more than one
instance in an availability zone is moving at the same time. Spotinst can
simplify the movement process for workloads running in a primary secondary
configuration by having the primary server run as a normal on demand
instance and having all of the secondary instances run as spot instances.
That way all Spotinst has to do is ensure that not more than one instance
is moved at the same time.

Spotinst also supports running your own 'serverless' functions using spot
instances as the backend. This allows Spotinst to pass along the cost
savings of using Spot instances while abstracting some of the the
complexity of managing spot instances away from the end user.

Spotinst claims that they save their customers 70-90% off of the cost of
using exclusively on demand instances.

Talk 2:
Benefits of a Serverless Data Pipeline
Speaker: Itamar Ben Hemo, CEO Rivery
https://rivery.io/

Rivery is a serverless data pipeline. Its purpose is an abstraction layer
that sits on top of various API's. Rivery attempts to solve the problem of
rapidly changing API's. It does this by making the API calls for you and
loading the data into a database that you specify. Rivery attempts to save
companies the cost of building a big data platform by performing the same
service with a simple pay per use model with the additional advantages of
being always up to date with no maintenance costs. Rivery runs on EC2 and
uses dynamo DB. Additionally, it uses ECS for scaling. It accomplishes
scaling in two ways. It achieves fine scaling by using orchestration to
increase or decrease the number of  Docker containers running on instances,
and it achieves large scaling be using orchestration to increase or
decrease the number of running instances.

Talk 3:
Comparing big data architectures on AWS
Speaker: Ori Rafael, CEO of Upsolver.
https://www.upsolver.com/

Cancelled due to technical difficulties.

In this talk, we will review a traditional data warehouse approach, a
modern big data architecture on AWS and how Upsolver ties it all together.

Takeways:

I think the most significant thing that applicable to system administration
would be Spotinst. The ability to save 70%-90% on infrastructure costs
would be a significant advantage to any company.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.stevens.edu/pipermail/cs615asa/attachments/20180426/d5b8d8c6/attachment.html>


More information about the cs615asa mailing list