On-demand pricing model is very expensive. On the other side, the EC2 Spot model can be much cheaper but introduces a risk to deal with. I am going to show you how to use EC2 spots to run non-critical EMR/Spark jobs.
On-demand vs EC2 spots
It is easy to check the history of spot requests in the AWS console. The chart shows that prices did not exceed 60% of the on-demand price, and even in some AZ prices remained at 40% level.
To use EC2 spots only few things should be adjusted in the RunFlowJob request: TargetSpotCapacity, BidPriceAsPercentageOfOnDemandPrice and LaunchSpecifications like in the following snippet:
Successfully configured EMR should show following hardware:
And in the EC2 services all instances are seen as:
And even the savings summary is available: