It appears that everything is back to normal. Simple quick and dirty post-mortem:
1. Jacob was getting the renewal emails and he was off this week so didn't see the final emails. We're moving the admin accounts to a shared alias. 2. We need to create better alerts for when throughput drops. We weren't aware of the issue until users contacted us. We should have caught it immediately, but it took us 40 minutes to resolve. 3. We've moved the certificate renewal to a DNS record based approach, which will be much more reliable than one based on emailing the CEO while they are on vacation.
Posted Oct 23, 2020 - 13:09 UTC
The dashboard and API have been restored. We're going through everything to make sure we didn't miss something.
We're also going to do a thorough post-mortem to understand how this happened and ensure it doesn't happen again.
Sorry everyone. We'll update with the scope of the outage but it was on the order of minutes.
Posted Oct 23, 2020 - 12:48 UTC
We've renewed the certificate for the API and are now waiting for it to propagate. We're working on the dashboard now.
Posted Oct 23, 2020 - 12:46 UTC
Our SSL certificate has expired. We're actively investigating, all hands on deck right now trying to fix it.
Posted Oct 23, 2020 - 12:00 UTC
This incident affected: API Uptime and Dashboard (Overall).