We are writing to share an update about a temporary interruption that affected our services earlier this morning.
Our Network Operations Centre (NOC) first spotted an increase in error rates at 06:24 UTC.
On investigation, we found the issue was caused by an unexpected storage bottleneck within our infrastructure. One of our storage drives (an OSD) had not cleared out old data as it usually would, which led to it reaching its capacity limit.
When this occurred, the drive automatically switched itself to a ‘read-only’ safety mode. While this impacted service availability, it is also a built-in feature designed specifically to protect the integrity of the data.
We have now resolved the immediate issue and services are running normally. Our engineering team is currently investigating why the drive’s automated cleanup process did not trigger as expected, and we are looking closely at the system logic to ensure this remains an isolated incident.
We sincerely apologise for any frustration or inconvenience this may have caused you today.
Thank you for your patience while we get to the bottom of this.