I’ve been experimenting with Docker on Amazon ECS recently, and using “spot” server instances to host my Docker containers. Spot instances are described as follows by AWS:
Amazon EC2 Spot instances are spare compute capacity in the AWS cloud available to you at steep discounts compared to On-Demand prices. EC2 Spot enables you to optimize your costs on the AWS cloud and scale your application’s throughput up to 10X for the same budget. By simply selecting Spot when launching EC2 instances, you can save up-to 90% on On-Demand prices.
The only difference between On-Demand instances and Spot Instances is that Spot instances can be interrupted by EC2 with two minutes of notification when EC2 needs the capacity back. You can use EC2 Spot for various fault-tolerant and flexible applications, such as test & development environments, stateless web servers, image rendering, video transcoding, and to run analytics, machine learning and high-performance computing (HPC) workloads. EC2 Spot is tightly integrated with other AWS services including EMR, Auto Scaling, Elastic Container Service (ECS), CloudFormation, Data Pipeline and AWS Batch, providing you freedom of choice in how you launch and maintain your applications running on Spot instances.
AWS also offers Spot Fleet, which automates the management of Spot instances. You simply tell Spot Fleet how much capacity you need and Fleet does the rest.
I set up my ECS cluster to automatically create a “Spot Fleet” with a maximum cost per hour for my spot instances. This worked OK at first, when I was only using 1 spot instance at a time. When I needed more than one spot instance at a time, I started getting the following error:
Notice the status “spotInstanceCountLimitExceeded” in the screenshot. I had a spot instance active in a different region, and that prevented me from obtaining more in that region or any other.
I did some Google searches, and found that an AWS account has a very low initial limit for spot instances. In order to raise the limit, you simply need to open a support case in the AWS Support Center.
I used the category: “Service Limit Increase, EC2 Instances”.
This was the description that I provided:
Hi AWS Support,
I would like to raise the Spot Request limit on my account to 5. I currently have one Spot instance in service in the Canada (Central) region. When I attempt to create any Spot instances in the N. Virginia region, I see the following error status in the “History” tab of the “Spot Requests” page:
Please let me know if you require any further information.
In less than 24 hours, I received a reply from AWS support that my request had been approved and implemented. Unfortunately, it wasn’t really fixed. I messed around for awhile, creating and recreating ECS Cluster configurations with Spot instances selected as my EC2 container host type. It still didn’t work.
I sent a follow-up message on my still open Support Case:
The problem is still occurring for me. I have attached a screenshot to illustrate.
I’m using ECS, configured to use Spot instances. The ECS Cluster creation automatically created a Spot Fleet request for this. The whole process worked for me on the Canada (Central) region, and is still running there. I’m doing the same process on the N. Virginia region, but it fails with this error.
The failing Spot Fleet Request is: sfr-fc253f0b-8c94-41c6-9c59-962883068be7
Please let me know if I’m doing something wrong, or if this is related to some other AWS account limit that I’m not familiar with.
This time around, I got this reply back from AWS Support:
Thank you for reaching out about this.
In order to find the best solution for you, I have reached out to our Service team. They have the necessary tools to investigate this in more detail so that we can best assist you.
I will hold on to your case while they investigate, and will update you as soon as they respond to my internal ticket. Rest assured that I will insist on regular updates until we can get your issue resolved.
You are welcome to reach out at any time with further questions or concerns.
Thank you for your patience while we work to resolve your problem.
Amazon Web Services
And then a couple days later, I got this:
Thank you for your patience while we reviewed your query.
I received information back from the service team as the increase requested has now been processed correctly.
Please keep in mind that it can sometimes take up to 15 minutes for the limit to take effect and become available for use.
If you’re still having trouble, please let me know and I’ll investigate further.
Amazon Web Services
And now, back to ECS to recreate my Cluster configuration one more time and see if things are truly fixed.
The experience with AWS Support was fine for me, because I wasn’t in a huge rush to get the problem fixed. In all, it took about 4 days turnaround time.
My only feedback to Amazon would be about the visibility of the “Spot Limit” on my account. I don’t think there is anywhere in the AWS Management Console that I am able to see that value. That made the whole issue more difficult to diagnose and fix than it needed to be.