Skip to content
All posts

Critical Crisis Moments: Making Bad Situations Better

Introduction

I have heard that sometimes the purpose of your life is simply to serve as a warning to others.  Companies large and small, at some point in their lives will have a "Critical Crisis Moment."  It is a situation that you have either prepared for or not.  These moments are often either the cause of, or are caused by, significantly negative customer facing events.  When a critical crisis moment happens to your company, the eventual outcome will depend on several factors.  Those factors include your teams training and competency, the quality of the processes in place coupled with your team's ability to follow them, and lastly their individual ability to rise up and meet the moment. There are many large and otherwise successful companies whose critical crisis moments came and found them wanting (like JetBlue found out on Valentine's Day in 2007).  When that time comes for your organization, how will your team react?  What will the headlines say about your company?  

Sometimes Things Just go Terribly WrongImage of two network operations team members in crisis mode.

Large scale system outages like the global outage caused by CrowdStrike last month, are critical crisis moments that are largely preventable.  When you look at the common causes for the typical outage, several key factors come to the fore: people failing to follow the process, power outages taking out data centers, and networking issues.  According to the Uptime Institute’s 2022 Outage Analysis, a significant proportion of outages are linked to human error, often resulting from team members failing to follow established processes and procedures. This factor accounts for many incidents, where inadequate or ignored policies lead to major disruptions.  

Power-related problems, especially failures in uninterruptible power supply (UPS) systems, are another critical cause. These issues constitute a significant portion of major outages, underscoring the need for robust power management strategies in data centers.

Networking issues, exacerbated by the complexities of modern, distributed architectures, are also a leading cause of IT service downtime. The increasing reliance on hybrid and cloud technologies adds to the complexity, raising the frequency of outages related to network failures.

You can check out their report here.  (Uptime Institute).

Recruit and Retain Talented People and Train them Effectively

For the issues that you can control, training your employees and constantly refreshing them on following policies and procedures will pay significant dividends.  Do you remember the case of Dr. David Dao being forcibly removed from a United Airlines flight?  The 69-year-old was seated in his paid-for and assigned seat on a flight from Chicago to Louisville that United Airlines had overbooked.  They asked him to deplane so they could give his seat to someone else.  He declined.  In the end ... he woke up in the hospital.  This situation highlighted the major consequences of inadequate crisis management and poor employee training. The incident caused significant brand damage to United Airlines.  There clearly were some poor policy decisions made in this situation, and it sent a paying passenger to the hospital. United settled with Dr. Dao out of court for an undisclosed amount. Hindsight is always 20/20 but just imagine how this would have played out had United focused on improving their policies and investing in customer service training.

At the end of the day, it is always all about people.  Equipment failures, software-defects, and network issues are all outputs of organizations made of people.  If we want great results, we need to hire great people and treat them fairly and with authenticity.  Additionally, we need to train them well with structured on-boarding and on-going training programs to expand their knowledge and keep them sharp on topics relevant for their current or future role.

Standardize Key Policies and Ensure Compliance

There have been many major policy failures in recent history.  Some of these have sadly resulted in the loss of life - like the response to Hurricane Katrina in 2005 that is often cited as a failure in crisis management, largely due to the absence of adequate standardized policies and preparations.  Another life impacting example is the gas leak at Union Carbide India Limited in Bhopal, which resulted in thousands of deaths, exacerbated by inadequate safety protocols and emergency response measures.

Hospitals and airlines are industries that have significantly improved their safety records and incident response results through the disciplined use of policies, procedures, and the use of checklists to ensure that the process has been followed.  More companies across the tech industry should adopt this philosophy to get similar results.

In a previous organization where I worked, we had a poor track record of releasing software that negatively impacted customers' environments.  When we did root cause analysis, we found that most of the time it was something that we did wrong and could have easily been prevented through careful review and planning prior to release.  We then implemented a release readiness review to ensure that the release checklist was completed and that all of our policies were followed.  This not only significantly reduced customer facing incidents, but it also increased the team's willingness to point out places where the standardized process could be hardened or streamlined.  The team became emotionally engaged and intellectually invested in the outcome.  This is a good thing.

Foster an Engaged and Proactive Team

We have all heard about the benefits of having an engaged workforce.  From decreased turnover, to increased customer satisfaction, the research shows that engaged employees have a positive impact on the organization and the value it can deliver to the marketplace.  When it comes to dealing with crises, having an engaged workforce can mean the difference between the crisis being a minor blip, or a global tech catastrophe.  During a critical situation, engaged workers stay well informed about developments and are more likely to follow instructions diligently and contribute to the decision-making process.  Engaged workers also tend to maintain morale and motivation to see the problem resolved.  They are willing to be more flexible and creative as they work to drive the resolution of problems presented to them.

Imagine being an airline gate agent with an entire flight of disgruntled passengers waiting in your line when your computer system goes down and you have to start using pen and paper.  An engaged team member will work to diffuse the situation to the best of their ability and the passengers will feel it and appreciate it.  When JetBlue stranded over 100,000 passengers for days at JFK airport in 2007, the company's dedicated response, driven by engaged employees who went above and beyond in their customer service roles, helped mitigate the situation. 

It takes consistent and authentic leadership to build a culture where team members are motivated to exceed expectations.  Leaders have to be "out front" exhibiting invested and proactive leadership in order to build an engaged and proactive team.  They must "walk the walk," not "talk the talk."

Some Final Thoughts

So, what is the bottom line?  Simply put: hire the right people, train them as an empowered team, give them a well thought out and maintained set of standard processes, and trust and encourage them to be their best selves.  It is simple to write, but in practice, very difficult to do.  Organizations large and small need to be prepared to handle the inevitable critical situations that may occur from time to time, as a matter of course.  Every industry has its unique challenges and potential crises.  It is up to organizational leadership to prepare the team to work together effectively to mitigate the situation, and restore normal operations.  

Is your organization ready for its critical crisis moment?  If your systems go down or operations go offline, what business continuity plan is your team prepared to execute?  General Patton said that "He who sweats more in training bleeds less in battle."  Get your team ready now lest your critical crisis moment finds you unprepared.