Web, API and builds are currently offline
Incident Report for Buildkite
Postmortem

We published a full account of what happened over on our blog:

https://building.buildkite.com/outage-post-mortem-for-august-23rd-82b619a3679b#.8b05xlp8b

Posted Aug 23, 2016 - 14:45 UTC

Resolved
We've overcome the AWS issues that were causing us problems spinning up new servers this morning. Everything is looking stable again and requests aren't timing out anymore. Agents are executing jobs as normal and builds are flowing through the system like before.

We've got some cleanup stuff we need to take care of on our end, and we'll be doing that throughout the day. After that we'll get started on a technical postmortem to detail exactly what happened, and we're were going to do about it.

Tim, Sam and I want to apologise for what happened, and want you to know that we're going to make sure this problem doesn't happen again.

Thanks so much for your patience and understanding this today!

- Keith
Posted Aug 23, 2016 - 01:32 UTC
Update
We’re still trying to push out proper fixes to the servers. We lost all the servers from our load balancer that was servicing buildkite.com requests - which is why the site went down. We’ve stolen servers from the Agent API to handle requests so you can get your work done.

It seems that AWS are now having issues (eughh) so we’re finding it hard to get everything stable again. We’re all online and working hard to get it all healthy again!

We’re so sorry that it was down for so long - really uncool on our behalf. we’re gonna write a technical postmortem when we’re in the clear and detail exactly what happened, and what we’re gonna do about it.

If you have any questions, feel free to email me directly keith@buildkite.com - or hop onto our Slack channel here: http://chat.buildkite.com and DM me @keithpitt
Posted Aug 23, 2016 - 01:03 UTC
Update
We're working on rolling out proper fixes to the problem that occurred earlier on today. You may see some slow requests or timeouts when using the UI - we're trying to squash these problems now. Builds and API requests should continue to work without a hitch (although some of those requests may be a little slow as well)
Posted Aug 22, 2016 - 23:20 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Aug 22, 2016 - 22:20 UTC
Investigating
We are currently investigating this issue.
Posted Aug 22, 2016 - 21:10 UTC