We have successfully completed the backfill of the available logs backup data (all logs since 16th February 21:00) and the logs service is operating normally at this point
The logs database and infrastructure is completely isolated from the rest of PostHog's data. Events, replays, and all other PostHog data is in a separate, more mature, cluster with robust backups.
We are working urgently on improving our backup and disaster recovery for the logs database to prevent anything like this happening again in the future.
A post-mortem will be posted soon with more details
Monitoring
We have successfully completed the backfill of the available logs backup data (all logs since 16th February 21:00) and the logs service is operating normally at this point
The logs database and infrastructure is completely isolated from the rest of PostHog's data. Events, replays, and all other PostHog data is in a separate, more mature, cluster with robust backups.
We are working urgently on improving our backup and disaster recovery for the logs database to prevent anything like this happening again in the future.
A post-mortem will be posted soon with more details
Identified
Regretfully, we have confirmed data loss for all customer logs for the new Logs product in the US cloud region up until 16th February 21:00 UTC (roughly 3 days ago). No analytics events, replays, or other PostHog data is effected.
We have removed the corrupted database files and restarted the system and recent logs are able to be queried as normal
We are working on backfilling the available backup data (which only includes the last 3 days of logs) and expect it to be restored within the next 24 hours
We will follow up with a post-mortem soon
Identified
We are investigating some potential data loss
It seems a potential bug in our database has triggered some deletes which may not be recoverable. We are investigating possible recovery scenarios
Identified
We've identified an issue with logs ingestion in the US region - log events may be up to 30 minutes delayed
We believe we have resolved the root cause and are seeing the lag coming down, we expect it to be caught up soon