Errors when creating Artifacts
Incident Report for Buildkite
Postmortem

Service impact

Jobs running on v3.83.0 or v3.83.1 of the Agent could not create artifacts for 25 minutes.

Incident summary

Beginning at 2024-10-17 05:02 UTC, a change was deployed to Agent API which was intended to prevent older versions of the Buildkite Agent that are incompatible with a soon-to-be-released feature from using that feature. The change contained a bug which impacted versions 3.83.0 and 3.83.1 of the agent, which passed through our CI process due to a gap in integration specs in an older part of our code base

The team responsible for the change was monitoring the deployment, noticed the errors immediately and started our incident response process. Together with our on-call engineers, they triggered an emergency rollback to a known working commit.

The emergency rollback finished at 2024-10-17 05:27 and service was restored. We then deployed a revert of the broken change to ensure we did not return to a broken state. 

Changes we’re making 

We have since shipped an improved version of the code change with additional test coverage, including deeper integration tests covering the edge case that was missed.

Appendix - Supporting materials

Example of the artifact upload failure in a job log

Posted Oct 25, 2024 - 04:32 UTC

Resolved
This incident has been resolved.
Posted Oct 17, 2024 - 05:33 UTC
Monitoring
The affected service has been reverted to a known good version. We are monitoring the impact.
Posted Oct 17, 2024 - 05:26 UTC
Identified
A recent deploy has introduced an error when creating build artifacts. We are reverting to a known good version.
Posted Oct 17, 2024 - 05:18 UTC