Ready for more DevOpsDays?
Scribe: Jake Vanderdray
- Take the approach that systems are complex and that you can’t blame an individual, so that you can honestly find out what happened.
- “Blame is the expression of discomfort” - Is blame the right word? Maybe concentrate on just no repercussions.
- Maybe not focus on the individual and what happened, but instead how can we have a system that doesn’t allow that failure.
- If the problem stemmed from over-work, how do we fix the over-work problem.
- What type of mechanisms can be put in place to prevent mistakes from being possible?
- If you’re not doing post-mortems start with a formatted post-mortem.
- Ask "What?" happened followed by "How?” instead of “Why?”
- John Allspaw - Great post(s) on this
- We need to use this to accelerate learning
- Re-brand as a “Learning Reviews”?
- Can be a tool for restoring work/life balance
- Difference between being accountable and being responsible
- accountable - being able to explain what happened - this is the goal
- responsible - taking blame
- The focus should be on looking on forward rather than backward. How do we keep this from happening again rather than just why did it happen.
- What to do when management steps in during a problem making it worse and during a postmortem changing the direction toward blame.
- Maybe help measure the cost of blaming a person instead of making process changes to prevent recurrence.
- Point out that places like Etsy are successful because they use these methods to learn from mistakes.
- Try not to fight stupid.
- Field Guide to Human Error
- The Human Side of Post Mortems - http://www.oreilly.com/webops-perf/free/the-human-side-of-postmortems.csp
- VictorOps Blog 1 : https://victorops.com/blog/blameless-post-mortems-essential/
- VictorOps Blog 2 : https://victorops.com/blog/post-mortem-reporting/