Incident Lifecycle
Incidents progress through these statuses:| Status | Description |
|---|---|
| Investigating | You’re aware of an issue and looking into it |
| Identified | You’ve found the root cause |
| Monitoring | A fix has been applied and you’re monitoring |
| Resolved | The issue is fully resolved |
Creating an Incident
- Dashboard
- CLI
- API
- Go to Dashboard > Incidents
- Click Create Incident
- Fill in the title, severity, and affected services
- Add an initial update message
- Click Create
Severity Levels
Choose the appropriate severity based on impact:| Severity | Use When |
|---|---|
| Minor | Small number of users affected, workarounds available |
| Major | Significant impact, core functionality degraded |
| Critical | Complete outage, all users affected |
Posting Updates
Keep users informed by posting updates as the situation evolves:Best Practices
Be transparent
Be transparent
Users appreciate honesty. If you don’t know the cause yet, say so.
Update frequently
Update frequently
Even if there’s no new information, post updates every 30-60 minutes during active incidents.
Include ETAs carefully
Include ETAs carefully
Only provide time estimates if you’re confident. “We expect to have more information within the hour” is better than a specific time you might miss.
Post a retrospective
Post a retrospective
After major incidents, consider adding a final update explaining what happened and what you’re doing to prevent it.