Jobs running on v3.83.0 or v3.83.1 of the Agent could not create artifacts for 25 minutes.
Beginning at 2024-10-17 05:02 UTC, a change was deployed to Agent API which was intended to prevent older versions of the Buildkite Agent that are incompatible with a soon-to-be-released feature from using that feature. The change contained a bug which impacted versions 3.83.0 and 3.83.1 of the agent, which passed through our CI process due to a gap in integration specs in an older part of our code base
The team responsible for the change was monitoring the deployment, noticed the errors immediately and started our incident response process. Together with our on-call engineers, they triggered an emergency rollback to a known working commit.
The emergency rollback finished at 2024-10-17 05:27 and service was restored. We then deployed a revert of the broken change to ensure we did not return to a broken state.
We have since shipped an improved version of the code change with additional test coverage, including deeper integration tests covering the edge case that was missed.
Example of the artifact upload failure in a job log