EMR with EC2 spots

On-demand pricing model is very expensive. On the other side, the EC2 Spot model can be much cheaper but introduces a risk to deal with. I am going to show you how to use EC2 spots to run non-critical EMR/Spark jobs.

On-demand vs EC2 spots

It is easy to check the history of spot requests in the AWS console. The chart shows that prices did not exceed 60% of the on-demand price, and even in some AZ prices remained at 40% level.

Running EMR

To use EC2 spots only few things should be adjusted in the RunFlowJob request: TargetSpotCapacity, BidPriceAsPercentageOfOnDemandPrice and LaunchSpecifications like in the following snippet:

Outcome

Successfully configured EMR should show following hardware:

And in the EC2 services all instances are seen as:

And even the savings summary is available: