tag:www.buildkitestatus.com,2005:/historyBuildkite Status - Incident History2024-03-28T18:40:08ZBuildkitetag:www.buildkitestatus.com,2005:Incident/203818792024-03-27T23:38:02Z2024-03-27T23:38:02ZDelayed dispatch<p><small>Mar <var data-var='date'>27</var>, <var data-var='time'>23:38</var> UTC</small><br><strong>Resolved</strong> - Job dispatch has recovered and remained stable for over 2 hours.</p><p><small>Mar <var data-var='date'>27</var>, <var data-var='time'>22:02</var> UTC</small><br><strong>Monitoring</strong> - The delays have subsided and we're currently monitoring our job dispatch times.</p><p><small>Mar <var data-var='date'>27</var>, <var data-var='time'>21:04</var> UTC</small><br><strong>Investigating</strong> - We are investigating delays to job processing</p>tag:www.buildkitestatus.com,2005:Incident/202309522024-03-13T06:20:41Z2024-03-13T06:20:42ZGitHub commit statuses are failing to be dispatched<p><small>Mar <var data-var='date'>13</var>, <var data-var='time'>06:20</var> UTC</small><br><strong>Resolved</strong> - We've successfully rolled back the recently deployed change that is causing GitHub commit statuses to fail for repositories that are connected with our GitHub App. All failed statuses have entered our retry queue and will be sent out shortly.</p><p><small>Mar <var data-var='date'>13</var>, <var data-var='time'>06:04</var> UTC</small><br><strong>Monitoring</strong> - We've rolled back the recently deployed change that is causing GitHub commit statuses to fail for repositories that are connected with our GitHub App. All failed statuses have entered our retry queue and should be sent soon.</p><p><small>Mar <var data-var='date'>13</var>, <var data-var='time'>05:54</var> UTC</small><br><strong>Update</strong> - We're continuing the reversion of a recently deployed change that is causing GitHub commit statuses to fail for repositories that are connected with our GitHub App. All failed statuses have entered our retry queue and should be sent after the fix has been deployed.</p><p><small>Mar <var data-var='date'>13</var>, <var data-var='time'>05:26</var> UTC</small><br><strong>Identified</strong> - We have identified an error in a recently deployed change that is causing GitHub commit statuses fail for repositories that are connected with our GitHub App. We are reverting the change. All failed statuses have entered our retry queue and should be sent after the fix has been deployed.</p><p><small>Mar <var data-var='date'>13</var>, <var data-var='time'>05:19</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating an issue in which GitHub commit statuses are failing to be dispatched.</p>tag:www.buildkitestatus.com,2005:Incident/199699252024-02-13T00:08:06Z2024-02-13T00:08:06ZDegraded Performance<p><small>Feb <var data-var='date'>13</var>, <var data-var='time'>00:08</var> UTC</small><br><strong>Resolved</strong> - We have successfully disabled the job causing load issues. we have also identified and mitigated a seperate database performance problem that we believe contributed to this incident.</p><p><small>Feb <var data-var='date'>12</var>, <var data-var='time'>22:29</var> UTC</small><br><strong>Update</strong> - We have identified a specific background job that results in excessive database load. Currently we have paused all background jobs while we deploy a change to disable just the problematic job.</p><p><small>Feb <var data-var='date'>12</var>, <var data-var='time'>21:37</var> UTC</small><br><strong>Monitoring</strong> - We've implemented a mitigation and have seen performance improve and are continuing our investigation into the cause</p><p><small>Feb <var data-var='date'>12</var>, <var data-var='time'>21:27</var> UTC</small><br><strong>Investigating</strong> - We are experiencing poor performance across the application and are investigating.</p>tag:www.buildkitestatus.com,2005:Incident/199145922024-02-05T19:22:57Z2024-02-05T19:22:57ZDegraded Notification<p><small>Feb <var data-var='date'> 5</var>, <var data-var='time'>19:22</var> UTC</small><br><strong>Resolved</strong> - Latency for notifications has returned to normal operation.</p><p><small>Feb <var data-var='date'> 5</var>, <var data-var='time'>19:01</var> UTC</small><br><strong>Update</strong> - We're still experiencing a small increase in latency for notifications and are actively investigating the root cause.</p><p><small>Feb <var data-var='date'> 5</var>, <var data-var='time'>18:42</var> UTC</small><br><strong>Update</strong> - We're still actively investigating latency to notifications. We have noticed performance has improved slightly, but we are still investigating.</p><p><small>Feb <var data-var='date'> 5</var>, <var data-var='time'>18:21</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating a delay in service notification processing</p>tag:www.buildkitestatus.com,2005:Incident/198812922024-02-04T03:24:09Z2024-02-04T03:24:09ZDatabase maintenance<p><small>Feb <var data-var='date'> 4</var>, <var data-var='time'>03:24</var> UTC</small><br><strong>Completed</strong> - Maintenance has been completed.</p><p><small>Feb <var data-var='date'> 4</var>, <var data-var='time'>03:21</var> UTC</small><br><strong>Verifying</strong> - Maintenance has gone as expected. We're currently in the process of verifying.</p><p><small>Feb <var data-var='date'> 4</var>, <var data-var='time'>03:01</var> UTC</small><br><strong>Update</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Feb <var data-var='date'> 4</var>, <var data-var='time'>03:00</var> UTC</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Feb <var data-var='date'> 1</var>, <var data-var='time'>03:35</var> UTC</small><br><strong>Scheduled</strong> - We will perform a minor version update to one of our databases. We expect 5–10 minutes of downtime during this window, where webhook requests will not be acknowledged, and we will process existing webhooks & outbound notifications upon completion of the maintenance window.</p>tag:www.buildkitestatus.com,2005:Incident/198717312024-01-31T03:23:20Z2024-01-31T03:23:20ZIncreased latency for REST API<p><small>Jan <var data-var='date'>31</var>, <var data-var='time'>03:23</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'>31</var>, <var data-var='time'>01:52</var> UTC</small><br><strong>Monitoring</strong> - REST API latency has returned to normal levels. We continue to monitor the situation.</p><p><small>Jan <var data-var='date'>31</var>, <var data-var='time'>01:47</var> UTC</small><br><strong>Update</strong> - We have seen a drop in request latency due to our increase in resource allocation. We are allocating further resources in order to reduce latency to normal levels.</p><p><small>Jan <var data-var='date'>31</var>, <var data-var='time'>01:44</var> UTC</small><br><strong>Identified</strong> - We have identified an increased number of requests to a particular endpoint and are working to mitigate the impact of the additional load. We have increase the available resources and continue to work on further mitigations.</p><p><small>Jan <var data-var='date'>31</var>, <var data-var='time'>01:22</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating an issue of increased load on our REST API. The increase in load is causing elevated latency and request timeouts for some users.</p>tag:www.buildkitestatus.com,2005:Incident/198662192024-01-30T12:55:38Z2024-01-30T12:55:38ZIssue with delivery of email notifications<p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>12:55</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>12:52</var> UTC</small><br><strong>Update</strong> - We are no longer experiencing errors with email delivery. We continue to monitor the situation.</p><p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>12:50</var> UTC</small><br><strong>Monitoring</strong> - We continued to see a minority of failures for mail delivery so we have switched to our backup mail provider.</p><p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>12:38</var> UTC</small><br><strong>Identified</strong> - We have identified an issue connecting to our upstream mail provider. Error rates have decrease however we are still experiencing issues with mail delivery.</p><p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>12:26</var> UTC</small><br><strong>Investigating</strong> - We are experiencing errors with the delivery of email notifications and have begun investigating.</p>tag:www.buildkitestatus.com,2005:Incident/198624462024-01-30T00:46:58Z2024-01-30T05:31:29ZIncreased job dispatch latency<p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>00:46</var> UTC</small><br><strong>Resolved</strong> - We’ve applied the known mitigations and have seen an improvement in database query plan performance. We continue to work on a long term fix.</p><p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>00:30</var> UTC</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>00:18</var> UTC</small><br><strong>Identified</strong> - We're investigating an issue which is causing increased job dispatch latency.<br />We suspect this is a re-occurrence of a previous incident, which is caused by poor Postgres query performance due to an inefficient query plan.<br />We're applying mitigations now to attempt to improve system performance</p>tag:www.buildkitestatus.com,2005:Incident/198396012024-01-26T19:53:39Z2024-01-31T04:02:22ZElevated Latency<p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>19:53</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>19:43</var> UTC</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>19:27</var> UTC</small><br><strong>Identified</strong> - We identified elevated load on couple of endpoints and actively working on mitigation</p><p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>18:30</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate the latency issue with Agent and REST APIs</p><p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>18:01</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate the latency issue with agent and REST APIs</p><p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>17:44</var> UTC</small><br><strong>Investigating</strong> - We are investigating increased latency in our agent API</p>tag:www.buildkitestatus.com,2005:Incident/197545532024-01-18T19:12:25Z2024-01-18T19:12:27ZElevated Latency<p><small>Jan <var data-var='date'>18</var>, <var data-var='time'>19:12</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'>18</var>, <var data-var='time'>17:59</var> UTC</small><br><strong>Update</strong> - We have investigated some recent reports of increased latency, however the issue has subsided. We are continuing to monitor the situation.</p><p><small>Jan <var data-var='date'>18</var>, <var data-var='time'>17:24</var> UTC</small><br><strong>Monitoring</strong> - Latency and Agent API have improved, we are continuing to monitor the situation and identify the root cause.</p><p><small>Jan <var data-var='date'>18</var>, <var data-var='time'>17:10</var> UTC</small><br><strong>Update</strong> - We are actively investigating this issue</p><p><small>Jan <var data-var='date'>18</var>, <var data-var='time'>16:53</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of increased latency</p>tag:www.buildkitestatus.com,2005:Incident/195051602023-12-22T10:11:45Z2023-12-22T10:11:46ZWebhooks not being processed<p><small>Dec <var data-var='date'>22</var>, <var data-var='time'>10:11</var> UTC</small><br><strong>Resolved</strong> - Webhooks are now being processed again as normal since 9:20 UTC.<br /><br />Webhooks sent to buildkite between 8:49 UTC & 9:20 UTC will not have been processed.</p><p><small>Dec <var data-var='date'>22</var>, <var data-var='time'>09:34</var> UTC</small><br><strong>Update</strong> - We are continuing to monitor for any further issues.</p><p><small>Dec <var data-var='date'>22</var>, <var data-var='time'>09:34</var> UTC</small><br><strong>Monitoring</strong> - Webhooks are now being processed. <br /><br />Between 8:49 UTC & 9:20 UTC, webhooks sent to buildkite will not have been processed.<br /><br />We are continuing to investigate the cause.</p><p><small>Dec <var data-var='date'>22</var>, <var data-var='time'>09:21</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating web hooks not being processed.</p>tag:www.buildkitestatus.com,2005:Incident/192370762023-11-28T21:42:37Z2023-11-28T21:42:37ZElevated Latency<p><small>Nov <var data-var='date'>28</var>, <var data-var='time'>21:42</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Nov <var data-var='date'>28</var>, <var data-var='time'>21:22</var> UTC</small><br><strong>Monitoring</strong> - We had a period of elevated load on one database that ended at 20:22 UTC, resulting in elevated latency and errors for some customers. We're continuing to investigate the cause, but latency and error rates have stabilized.</p><p><small>Nov <var data-var='date'>28</var>, <var data-var='time'>20:47</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate this issue.</p><p><small>Nov <var data-var='date'>28</var>, <var data-var='time'>20:44</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of increased latency and timeouts.</p>tag:www.buildkitestatus.com,2005:Incident/192357662023-11-28T18:11:29Z2023-11-28T18:11:29ZWebhooks not being received<p><small>Nov <var data-var='date'>28</var>, <var data-var='time'>18:11</var> UTC</small><br><strong>Resolved</strong> - GitHub is reporting webhook status as fully operational. We are seeing webhooks received back to normal operation.</p><p><small>Nov <var data-var='date'>28</var>, <var data-var='time'>17:57</var> UTC</small><br><strong>Update</strong> - GitHub has reported that webhooks not received that fell between 16:23 and 17:12 (UTC) have been resolved and they are working to re-process events for the affected time period. We will continue to monitor the situation, but the issue has been resolved.</p><p><small>Nov <var data-var='date'>28</var>, <var data-var='time'>17:38</var> UTC</small><br><strong>Monitoring</strong> - We are aware of an issue with GitHub webhooks and are currently monitoring the situation. GitHub has posted about degraded performance with webhooks, which you can see here: https://www.githubstatus.com/incidents/dsb3kn1zfdl7</p><p><small>Nov <var data-var='date'>28</var>, <var data-var='time'>17:24</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of webhooks not being received from Github.</p>tag:www.buildkitestatus.com,2005:Incident/191896372023-11-22T00:01:01Z2023-11-22T05:30:33ZAgent API degraded<p><small>Nov <var data-var='date'>22</var>, <var data-var='time'>00:01</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>23:11</var> UTC</small><br><strong>Monitoring</strong> - We've identified and resolved the cause of the high levels of agent API activity. Agent activity looks normal again, and we're currently monitoring.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>22:41</var> UTC</small><br><strong>Identified</strong> - We are seeing a reoccurrence of the same unusually high level of agent API activity, and are looking to mitigate.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>21:20</var> UTC</small><br><strong>Monitoring</strong> - We’ve identified and resolved the cause of the high database load. Agent activity looks normal again, and we’re currently monitoring.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>21:04</var> UTC</small><br><strong>Update</strong> - We're identifying the high database impact and are working towards relieving the load.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>20:43</var> UTC</small><br><strong>Update</strong> - We are seeing high database load, which we are working to bring back to normal operation.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>20:25</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate this issue.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>20:20</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate this issue.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>20:00</var> UTC</small><br><strong>Update</strong> - Performance is degraded across the system. We’ve identified some unusually high usage as a likely cause, we’re continuing to investigate.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>19:54</var> UTC</small><br><strong>Update</strong> - Performance is degraded across the system, we've identified some unusually high usage. We are continuing to investigate.</p><p><small>Nov <var data-var='date'>21</var>, <var data-var='time'>19:24</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p>tag:www.buildkitestatus.com,2005:Incident/190296842023-11-06T01:02:05Z2023-11-06T01:02:06ZDelay in job dispatching<p><small>Nov <var data-var='date'> 6</var>, <var data-var='time'>01:02</var> UTC</small><br><strong>Resolved</strong> - After monitoring our exception tracking and key metrics dashboards we've determined that we're all clear.</p><p><small>Nov <var data-var='date'> 6</var>, <var data-var='time'>00:25</var> UTC</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Nov <var data-var='date'> 6</var>, <var data-var='time'>00:25</var> UTC</small><br><strong>Investigating</strong> - We've identified an error that was causing delays in dispatching new jobs and sending notifications. We've reverted the change and are monitoring.</p>tag:www.buildkitestatus.com,2005:Incident/189487472023-10-27T20:28:36Z2023-10-27T20:28:36ZElevated error rate on Agent API<p><small>Oct <var data-var='date'>27</var>, <var data-var='time'>20:28</var> UTC</small><br><strong>Resolved</strong> - We experienced a spike in errors on the Agent API due to higher than normal load on our Redis cluster which caused some agents to fail to connect. Load has returned to normal and we are no longer experiencing errors.</p>tag:www.buildkitestatus.com,2005:Incident/188775532023-10-21T12:17:47Z2023-10-21T12:17:48ZElevated error rates on Test Analytics collector uploads<p><small>Oct <var data-var='date'>21</var>, <var data-var='time'>12:17</var> UTC</small><br><strong>Resolved</strong> - The issue is now resolved.</p><p><small>Oct <var data-var='date'>21</var>, <var data-var='time'>12:04</var> UTC</small><br><strong>Monitoring</strong> - We have addressed the issue, and are now monitoring.</p><p><small>Oct <var data-var='date'>21</var>, <var data-var='time'>11:39</var> UTC</small><br><strong>Investigating</strong> - We're observing elevated error rates on Test Analytics collector uploads. We are currently investigating the issue.</p>tag:www.buildkitestatus.com,2005:Incident/188447842023-10-18T21:30:51Z2023-10-18T23:34:13ZDelayed outbound job event notifications<p><small>Oct <var data-var='date'>18</var>, <var data-var='time'>21:30</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Oct <var data-var='date'>18</var>, <var data-var='time'>21:26</var> UTC</small><br><strong>Monitoring</strong> - We have worked through the delayed job notifications. We will continue monitoring to ensure they are all processed successfully.</p><p><small>Oct <var data-var='date'>18</var>, <var data-var='time'>20:59</var> UTC</small><br><strong>Identified</strong> - We are continuing to work through the backlog of notifications.</p><p><small>Oct <var data-var='date'>18</var>, <var data-var='time'>20:29</var> UTC</small><br><strong>Investigating</strong> - We've identified a backlog of delayed job event notifications and are working through that backlog at the moment. All other notification events (including new job events) are unaffected.</p>tag:www.buildkitestatus.com,2005:Incident/188435552023-10-18T18:57:39Z2023-10-18T18:57:40ZDelayed outbound notifications<p><small>Oct <var data-var='date'>18</var>, <var data-var='time'>18:57</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Oct <var data-var='date'>18</var>, <var data-var='time'>18:51</var> UTC</small><br><strong>Monitoring</strong> - We have increased capacity and new notifications are being processed normally. Notifications that occurred during the incident are expected to be delivered but with a small delay.</p><p><small>Oct <var data-var='date'>18</var>, <var data-var='time'>18:34</var> UTC</small><br><strong>Identified</strong> - We are experiencing high load which is in excess of our rate internal rate limits. We are increasing capacity to accomodate the increased load. We will provide an update within 15 minutes.</p><p><small>Oct <var data-var='date'>18</var>, <var data-var='time'>18:25</var> UTC</small><br><strong>Investigating</strong> - We're seen an increase in notification delivery latency of over 20 minutes. We are working to identify a cause. We will provide an update in 15 minutes.</p>tag:www.buildkitestatus.com,2005:Incident/188216992023-10-17T03:56:55Z2023-10-17T04:08:02ZDelay in notification dispatch<p><small>Oct <var data-var='date'>17</var>, <var data-var='time'>03:56</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Oct <var data-var='date'>17</var>, <var data-var='time'>03:55</var> UTC</small><br><strong>Update</strong> - New notifications are expected to be delivered normally. Any notifications dispatched during the outage are expected to be delivered but may be delayed.</p><p><small>Oct <var data-var='date'>17</var>, <var data-var='time'>03:49</var> UTC</small><br><strong>Monitoring</strong> - The error has been fixed and notifications are now being processed. We will continue monitoring.</p><p><small>Oct <var data-var='date'>17</var>, <var data-var='time'>03:40</var> UTC</small><br><strong>Identified</strong> - We have identified an error which is causing notification dispatch to be delayed. This includes notifications such as GitHub Commit Status updates, Slack Notifications, Webhooks and email. We are working to resolve the bug that is causing the error. We will provide a status update in 10 minutes.</p>tag:www.buildkitestatus.com,2005:Incident/187772752023-10-13T12:18:26Z2023-10-18T03:48:41ZAgents not processing jobs<p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>12:18</var> UTC</small><br><strong>Resolved</strong> - Jobs are being dispatched to agents for all customers and the system is performing normally. We will continue to monitor closely, and will share details once we've completed a detailed investigation</p><p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>11:56</var> UTC</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>11:39</var> UTC</small><br><strong>Identified</strong> - We have identified the cause of the issue and are beginning to see jobs being processed.</p><p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>11:02</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate this issue.</p><p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>10:31</var> UTC</small><br><strong>Update</strong> - We have received reports that agents are not picking up any work and are continuing to investigate this issue</p><p><small>Oct <var data-var='date'>13</var>, <var data-var='time'>10:05</var> UTC</small><br><strong>Investigating</strong> - We have received reports of elevated wait times for job dispatch to agents and are investigating the issue.</p>tag:www.buildkitestatus.com,2005:Incident/185787632023-09-22T22:45:23Z2023-09-29T06:13:51ZElevated Latency and Error Rates<p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>22:45</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>21:59</var> UTC</small><br><strong>Update</strong> - We have seen Agent API latency return to expected levels, and are continuing to monitor the situation closely.</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>21:30</var> UTC</small><br><strong>Monitoring</strong> - We have seen Agent API latency return to expected levels, and are continuing to monitor the situation closely.</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>21:00</var> UTC</small><br><strong>Update</strong> - Our mitigations have reduced Agent API latency to normal levels, we continue to actively monitor the situation.</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>20:33</var> UTC</small><br><strong>Update</strong> - We are currently attempting to mitigate the increased system load by reducing concurrent HTTP processing capacity. We are continuing to investigate this issue.</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>20:00</var> UTC</small><br><strong>Update</strong> - We continue investigating this issue and continue to apply mitigations to shed database load</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>19:22</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate this issue, we are still experiencing elevated latency</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>18:41</var> UTC</small><br><strong>Update</strong> - We are still seeing elevated response time and we are taking actions to shed database load. We continue to investigate the root cause.</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>18:05</var> UTC</small><br><strong>Update</strong> - We are still observing a high load on DB, causing performance issues with API. We are investigating to identify the root cause.</p><p><small>Sep <var data-var='date'>22</var>, <var data-var='time'>17:35</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating a high-latency issue.</p>tag:www.buildkitestatus.com,2005:Incident/185395322023-09-18T22:33:25Z2023-09-22T06:20:28ZDegraded performance on Scheduled Builds<p><small>Sep <var data-var='date'>18</var>, <var data-var='time'>22:33</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Sep <var data-var='date'>18</var>, <var data-var='time'>22:12</var> UTC</small><br><strong>Monitoring</strong> - Scheduled builds are functional again. We are currently monitoring the service.</p><p><small>Sep <var data-var='date'>18</var>, <var data-var='time'>21:56</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating an issue with scheduled builds not being triggered.</p>tag:www.buildkitestatus.com,2005:Incident/181771762023-08-17T23:51:15Z2023-08-17T23:51:15ZPeriodic latency spikes in artifact upload<p><small>Aug <var data-var='date'>17</var>, <var data-var='time'>23:51</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Aug <var data-var='date'>17</var>, <var data-var='time'>23:01</var> UTC</small><br><strong>Monitoring</strong> - Our cloud provider put a workaround in place, and we are currently monitoring its results. Latency levels on artifact upload are back to normal levels.</p><p><small>Aug <var data-var='date'>17</var>, <var data-var='time'>22:16</var> UTC</small><br><strong>Identified</strong> - We continued experiencing spikes in latency on artifact uploads, and we've escalated the issue with our cloud provider and working with them on a resolution. Retrying jobs with artifact failures will likely succeed. We'll provide an update in an hour.</p><p><small>Aug <var data-var='date'>17</var>, <var data-var='time'>20:16</var> UTC</small><br><strong>Investigating</strong> - Our database that handles artifact metadata is experiencing spikes in latency approximately every 18 minutes due to a hardware issue, causing some artifact uploads to time out. We are working with our cloud provider to resolve this. Failures with artifacts will likely succeed if retried.</p>tag:www.buildkitestatus.com,2005:Incident/181653682023-08-16T23:55:45Z2023-08-22T05:07:09ZDegraded performance due to latency<p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>23:55</var> UTC</small><br><strong>Resolved</strong> - Latency has reduced to normal levels and system performance has been restored</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>23:23</var> UTC</small><br><strong>Monitoring</strong> - Latency has reduced to normal levels. We will continue to deploy improvements to reduce database contention throughout the day.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>22:50</var> UTC</small><br><strong>Update</strong> - Latency has reduced to normal levels. We will continue to deploy improvements to reduce database contention throughout the day.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>22:12</var> UTC</small><br><strong>Update</strong> - Latency has improved but is still elevated. We are deploying another change to reduce database lock contention.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>22:00</var> UTC</small><br><strong>Update</strong> - We have blocked some problematic patterns of bot requests manually and will shortly block them automatically. We are working on more comprehensive fixes at the same time. We will provide another status update in 30 minutes.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>21:28</var> UTC</small><br><strong>Update</strong> - We have blocked some problematic patterns of bot requests manually and will shortly block them automatically. We are working on more comprehensive fixes at the same time.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>20:54</var> UTC</small><br><strong>Update</strong> - We continue to apply mitigations to shed database load. We will provide another status update in 30 minutes.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>20:22</var> UTC</small><br><strong>Update</strong> - We continue to apply mitigations to shed database load. We will provide another status update in 30 minutes.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>19:52</var> UTC</small><br><strong>Update</strong> - We continue to apply mitigations to shed database load. We will provide another status update in 30 minutes.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>19:31</var> UTC</small><br><strong>Update</strong> - Latency has reduced somewhat but the system has not yet recovered. We continue to apply mitigations to shed database load.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>19:26</var> UTC</small><br><strong>Update</strong> - Latency on the Agent API has further degraded. We continue to apply mitigations to shed database load.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>19:23</var> UTC</small><br><strong>Update</strong> - We continue to apply mitigations to shed database load. We will provide another status update in 30 minutes.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>18:54</var> UTC</small><br><strong>Update</strong> - We are applying mitigation to shed load on the DB. We will be providing status update every 30 minutes on the progress</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>18:46</var> UTC</small><br><strong>Update</strong> - We are observing high load on DB causing performance issues with API and Web. We are taking actions to shed database load. We continue to investigate the root cause.</p><p><small>Aug <var data-var='date'>16</var>, <var data-var='time'>18:31</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p>