Resolved
After a long period of monitoring, the system remains stable, and as such we are resolving the incident
Monitoring
We are continuing to investigate degraded service and increased latency impacting product analytics. We are monitoring for further spikes and working to improve stability.
Monitoring
We're continuing to monitor the occasional spikes in query timeouts and errors. Recovery from the spikes has been happening more quickly.
Investigating
We're still seeing occasional spikes in query timeouts and errors, but recovering much more quickly. We're continuing to monitor and investigate.
Resolved
We've resolved the query timeouts/failures incident. Cluster stability has improved and recent configuration changes to table profiles have been applied. Queries and the app are operating normally, with no backlog observed in distribution queues after the updates.
Monitoring
We've seen some smaller spikes in failed queries, so we're still monitoring this closely for a while, just in case, before we call this one resolved.
Monitoring
Load has been looking much better, but we are still monitoring for a while before we mark this one as resolved. Thanks much for your patience.
Monitoring
We are still monitoring the cluster to ensure stability. The failure rate for insight queries has dropped, and customer-facing query errors have decreased, but we are continuing to watch for further issues.
Monitoring
We have resumed event ingestion and are monitoring load on the cluster
Identified
We fixed the root cause and are monitoring the load on our ClickHouse cluster.
Identified
We have identified the issue and we are working on the fix
Investigating
Unfortunately, the issue has reoccured and we have begun a second investigation.
-
We’ve identified an issue affecting analytics queries (timeouts and/or failures). Dashboards, insights, and other query-driven views may be slow or unavailable.
Ingestion is also impacted, so data delays are expected when issuing queries.
Resolved
The load spike has been resolved and systems are operating normally.
Investigating
We’ve identified an issue affecting analytics queries (timeouts and/or failures). Dashboards, insights, and other query-driven views may be slow or unavailable.
Ingestion is also impacted, so data delays are expected when issuing queries.