While many airlines were impacted by the winter storms that swept the nation during Christmas, most of them were back to running at full capacity a day after the storm passed.  But even a week after the storms passed, Southwest Airlines was still cancelling more than 50% of its daily flights.  Beginning on December 22nd, over 15,000 of its flights were canceled, with daily cancellations of 2,300 flights or more [1].  During the Christmas break, this meltdown left an estimated several hundred thousand travelers stuck for a number of days.  How is it that other airlines soon resumed their usual schedules as the weather got better yet one airline experienced such a devastating outage?

While many factors coalesce to create a problem as severe as the one Southwest faced, one of the main reasons cited for the problem is technical debt.  Often discussed in reference to software development, technical debt is the implied cost of additional rework brought on by selecting a quick path now rather than a better strategy that would require more time.  Analogous with monetary debt, if technical debt is not repaid, it can accumulate "interest", making it harder to implement changes that would avoid future problems [2].  

For Southwest Airlines, the technical debt was an antiquated scheduling system to coordinate pilots and crews with flights.  While most people would assume that in today’s age of technology development, an airline would have an app where pilots and flight could check to get their assignments or update their availability due to flight cancellation, sickness or other events, Southwest’s app is prone to failure and easily overwhelmed.  As described by Lyn Montgomery, the president of Southwest’s flight attendants’ union, when hiccups or weather events happen, the employees have to go through a burdensome, arduous process to get things sorted, because Southwest hadn’t sufficiently modernized its crew-scheduling systems [3].  For example, if crew members from Buffalo don’t arrive in Baltimore because their flight was canceled, the employees have had to manually call in to let the company know where they are and get hotels arranged for them.  While this is annoying if one flight is cancelled, it is catastrophic when 2,300 flights are cancelled.  This led to personnel being on hold for 8 hours or more waiting to discuss their schedules, by which time they may have “timed out” and no longer been available to fly per FAA regulations.  

Of course, the issue of technical debt is not limited to airlines.  After all, in almost every field there are examples where the easy, faster solution was taken instead of the better, more technical approach.  This is often the case where older manual systems are still in place long after their intended lifetime rather than implementing a better automated solution.  And while the manual system is effective – just like with the Southwest scheduling example, a manual system will get overwhelmed as the system grows.  The question then become not if the manual system will fail, only when.  At some point the technical debt has to be paid.  

In the multi-tenant data center (MTDC) space, a clear example of a manual system that is still in place is performing cross connects.  One example of the technical debt incurred by using manual rather than automated cross connect is the lack of inventory accuracy.  This may not sound bad if the company is still billing a client for an inactive cross connect.  But if a reconfiguration is performed in error due to incorrect records, it can take hours to find and repair the errors by manually tracing connections.  Another problem caused by technical debt is the challenge of adding new features on top of the manual system.   For MTDCs, the manual cross connect process prevents the operator from offering remote testing services where the client could test their circuit before reporting an issue.  Another service that could be implemented with an automated system is a physical fiber bandwidth-on-demand service, where a client could schedule their bandwidth demand for short periods of time, for example to distribute a large file and for periodic back-ups.  

Of course there is a solution today to replace the manual cross connect process.  Telescent has developed a robotic cross connect system to automate fiber management.   The Telescent system is purely fiber-based, offering low loss and a latching design that matches manual patch panel performance.  Test equipment such as power meters and OTDRs can be included with the Telescent system to allow monitoring of any fiber either through automated scheduling or on demand.  The new RobustTM configuration of the Telescent system simplifies implementation of the Telescent robotic system with easy scaling as more automation is required.  The Telescent system also meets the reliability requirements for data centers and has passed NEBS Level 3 certification as well as multiple customer trials and has over 1 billion port hours in operation. 

While Southwest's recent storms may have been a "black swan" incident that made their scheduling system's technological debt clear, shouldn't you avoid the reputational damage of an outage on the scale of Southwest by applying automation today?  Contact Telescent now to discuss how to bring fiber automation to your data center.  

[1]  Opinion | The Shameful Open Secret Behind Southwest’s Failure - The New York Times (nytimes.com)

[2]  Technical debt - Wikipedia

[3]  What’s the problem with Southwest Airlines scheduling system? (dallasnews.com)