Resolved -
Processing of the backlog is complete.
May 15, 07:35 UTC
Monitoring -
Ingestion of Test Engine execution data from an internal queue to a data store stalled, has been resumed, and is working through the backlog. Visibility of test executions from the past hour hours will be delayed for approximately a further one hour.
This has been a recurring issue; an architectural change is coming soon to eliminate this failure mode.
May 15, 06:51 UTC
Resolved -
Additional capacity was added to our redis caches. This triggered a failover between UTC 15:10 - 15:14 and there was a spike of errors on the REST and GraphQL APIs. Customers would have seen some errors in the Buildkite UI during this period as well.
We have been monitoring the situation since then and things have returned to baseline.
May 13, 15:34 UTC
Investigating -
We've spotted that something has gone wrong. We're currently investigating the issue, and will provide an update soon.
May 13, 15:14 UTC
Resolved -
The fix was successful and the backlog has now been cleared.
May 12, 16:09 UTC
Monitoring -
We are currently experiencing delayed processing of Test Engine data. We have identified and applied a fix for the issue but are expecting to continue to experience delays while we clear the ingestion backlog
May 12, 12:59 UTC