Could the embarrassment caused to Fifa and South Africa when the ticketing system for the 2010 World Cup crashed not once, but twice, have been avoided?
‚Probably,‚ says David Rogers, CEO at Qualica Technologies, an international provider of optimised network and application services and solutions.
‚Effective stress testing, as well as throughput and latency testing and even user experience testing prior to the system going live can help to identify potential bottleneck or problem areas across an entire system – the network, the applications and the server infrastructure.
‚In addition, cloud computing can often assist by quickly offering up huge additional processing power which can be thrown at an overload problem in an emergency,‚ he adds.
Rogers points out that the Fifa World Cup ticketing fiasco was not the first time key IT infrastructure had fallen over as a result of capacity issues.
In South Africa, ticket sales for the 2003 Cricket World Cup were also held up when the system was overwhelmed by demand on the first day of bookings at the stadiums. Another IT system bottleneck resulted in the wrong contestant being crowned the winner of the SA Idols 2008 competition.
Internationally, the first and possibly most famous crash occurred in 1999 when a Webcast of the Victoria’s Secret fashion show in New York City fell over, leaving millions of viewers frustrated.
In 2008, talk show queen Oprah Winfrey was forced to apologise to the millions of fans whose efforts to log into her self-empowerment webcast overwhelmed Internet servers.
‚Capacity-related performance issues on websites are more common than many people realise but only a few make headlines. For example, most local airlines experience some sort of load problem on their websites when running special offers,‚ Rogers adds.
One reason for overload problems is often the fact that website infrastructure – the hardware as well as the architecture of the application – is not built to handle multiple sessions or exceptionally high volumes of connections.
There could also be a problem with the capacity or bandwidth of the pipe connecting to the server from where the application is being run.
However, Rogers continues, it would be too simplistic to suggest that capacity problems arise because of a miscalculation about the traffic that will be generated during a specific event.
Usually, the problem stems from a key ‚weak’ link in the chain that has not been identified or understood. An example, it could stem from an international site that is ‚allowed’ to download advertising on high-load local sites. A problem arises on the local site, not because the website was not properly built to cater for high load, but because there was a lack of understanding of the implications of international bandwidth on local website performance.
‚Unfortunately, there is no single formula or simple answer to deal with the problem of system capacity. Investing in additional capacity in case of a surge of demand could be an expensive waste.
‚However, if one is serious about preventing loss of business or reputation as a result of a system crash, the sensible alternative would be to invest in effective, targeted testing that would help to identify potential bottlenecks or problem areas in the system before it is subjected to high activity loads,‚ Rogers concludes.