Starting around 12:35pm MST on January 27, 2021, some customers started experiencing occasional errors and increased latency while using CometChat. Around 12:45pm MST there was a rapid increase in errors and CometChat wasn’t usable for most customers with apps hosted in our US region.
Around 12:45pm MST, we began the process of migrating our customers to a separate database cluster. From there, some customers started seeing improvements. By 1:35pm MST, our migration was complete and all customers were able to use CometChat again.
A root cause analysis revealed that, our backup policies coincided with an infrastructure issue that occurred at our cloud vendor's end. As a result, our I/O operations were suspended for an extended period of time. Internal monitoring tools at our cloud vendor's end were able to observe this behavior which eventually caused the underlying hosts to be replaced. While this operation was being performed, it caused a backlog of transactions which ultimately lead to the outage.
Our current priority is working alongside our cloud vendor and putting safeguards in place to prevent similar problems from happening again. We're truly sorry for the disruption.