Lyndon Bird FBCI |
My
first reaction to hearing that the Royal Bank of Scotland had serious IT problems
with a software upgrade was one of disbelief. Surely not I told my colleagues,
there has to be more to it than this, perhaps a cyber-attack or perhaps
internal sabotage. When no rumours of such dramatic events emerged and the
banking group kept extending their expected recovery period I had to
reluctantly admit I was wrong. As I write, the banking group are still not
totally operational at their Ulster Bank subsidiary after 4 weeks of
disruption, and even customers at the bank’s main brands (RBS and NatWest)
experienced delays of a week or more.
If we put this in context it is increasingly bizarre. Most banks define their “Recovery Time Objectives” in hours or even minutes for particular services. An interruption of 4 weeks on accessing client accounts (i.e. basic banking functionality) would have been unthinkable.
For
those non UK residents who are not too familiar with this particular financial
institution, it is sufficient to say that it was it was bailed out by the UK
Government in 2009 and is now 82% owned by the UK taxpayer. It has continued
with problems since then, not just the destruction of its share price and
failure to contain its losses but also reputational and image danger. There was
the arbitrary of removal of the previous Chairman’s knighthood by the Queen, a
damaging long running dispute about whether the current CEO should be allowed
to take his bonus. It has also faced much government and opposition criticism
of its failure to lend sufficient money to small businesses despite being entirely
protected itself by public money.
Against this backdrop, the last thing needed was an operational failure. It appears that the IT problem started on a Tuesday evening when a routine update of a software component failed and prevented access to customer accounts. It took until Friday to understand the problem fully, no transactions had been handled and a backlog of over 100 million transactions remained to be processed. RBS certainly had commercial problems but like all major banks it has expensive, sophisticated, low risk and highly protected computer systems.
Speculation on reasons for the failure was, of course,
wide-spread and highly imaginative. Some postulated that it was caused by the
outsourcing of computer operations to India. The argument that RBS was running
its computer operations in a risky manner just to save money was a popular
line, with no evidence presented. As it continued there seemed a new
explanation gaining credence – it was not really about any specific failure, it
was about the complexity of the technical infrastructure that had grown up over
the past decade. The view was that no-one could possibly understand the full
potential consequences of a single change on the overall infrastructure. RBS
had been unlucky but it could (and would) happen to others on a regularly
increasingly basis. Leading ICT consultants called for a fundamentally
different approach to the way large organizations manage the performance of the
IT systems, recognizing that everyone now relies on such services in their day
to day lives.
For those who were less than convinced by this argument we were then hit by a different situation, but one in which again the experts again blamed complexity of technical integration. The mobile operator O2 (itself owned by the struggling Spanish telecom company Telefonica) was out of action for a minimum of 17 hours for many customers, and was then only restored to the downgrade 2G services whilst work continued on recovery of the 3G network. The network, who uses the slogan "we're better connected", had not issued a timetable for full recovery two days after the incident. Mobile operators set their acceptable downtimes in minutes, not days so again the impact on profits and reputation will be massive.
From
these two examples, probably O2 has the most to lose. RBS already has a poor reputation,
and is protected against loss by the UK taxpayer. It is also more difficult to
change banks than to change mobile operators so RBS are unlikely to lose
customers in large numbers. Other banking scandals have already overtaken RBS
in the news agenda. O2 has no such protection and only limited brand loyalty.
Just when the Business Continuity community felt that ITDR was now a routine business process and our attention should be turned to helping deal with business related threats (such as the risk of a Eurozone breakup or political upheaval in the Middle East), the oldest BCM issue of all comes back to bite us. Technology recovery is back on our radar and with increasing cyber threats emerging; it is likely to remain so.
I wrote a similar article for my companies blog - http://blog.onyx.net - focusing on the complexities of systems we take for granted and the need for testing. The current ongoing electricity problems in India may also be symptomatic of not having a full understanding of the complexity of large interconnected systems.
ReplyDelete