Wednesday
Feb112009

NLA eClips Service Incident - Report

Problem:

 

Some core content for publication date 11th Feb 2009 was missing from the eClips database, and from eClips feeds, by the NLA KPI deadlines. Certain pages were missing from three core titles:

Daily Mirror

The Daily Telegraph

The Guardian

 

 

Cause:

 

A core, automated NLA service malfunctioned on Tuesday 10th February 2009 at 19:24. NLA engineers were alerted to the malfunction at 19:52 and had the service running normally by 20:16. Unfortunately, the nature of the malfunction resulted in the corruption of pages delivered by publishers, to the NLA, within the aforementioned period.

 

When the corruption became apparent, publishers were engaged for retransmission of the affected pages. The normal NLA escalation process was not followed when recovering pages from The Daily Telegraph, which resulted in these pages being made available much later than expected. All the missing pages have now been re-delivered to the NLA and are being processed.

 

Solution:

 

To mitigate the risk and effects of a similar event occurring in future, the automated monitoring strategy for the affected service will be modified to alert NLA engineers of impending failure, rather than upon failure. The publisher escalation process will be reviewed to reduce the possibility of deviation from process under similar circumstances. Finally, the architecture of the affected service will be enhanced to make it more robust.

 

Monday
Feb022009

NLA London Office

Due to the severe weather and the impact on public transport in Central London, NLA staff will be working from home today.  Clients who require assistance should email clientservices@nla.co.uk

Thank you.

Tuesday
Nov042008

NLA eClips Service Incident - Report

Problem:

 

Loading of The Daily Mail into the NLA database failed on Tuesday 4th November 2008. This meant that the distribution of feeds to NLA clients did not contain 1st edition Daily Mail content by the target time of 01:00. Loading and distribution of other titles were unaffected.

 

Cause:

 

A momentary connectivity failure between the server running the loading module and the storage device to which loading takes place, caused a single thread of the loading module to loop erroneously.

 

Solution:

 

NLA engineers restarted the loading module which resulted in the loading and distribution of all the 1st edition Daily Mail content by 01:15.

 

Analysis of the loading module's source code has identified areas where modifications can be made to prevent a similar incident in future. These modifications will be scheduled soon.

Saturday
Oct182008

NLA eClips Service Incident - Report

 

Problem:

 

Certain eClips customers had intermittent access to NLA web and FTP services from 7:35am to 9:00am and from 9:19am to 9:33am on Saturday 18th October 2008.

 

Cause:

 

The owners of NLA's London hosting facility were carrying out the first phase of a planned, annual, power-down exercise on Saturday 18th October. This involved disabling one of the two power feeds which supply the NLA infrastructure. The NLA's infrastructure can usually tolerate removal of one power feed as it has a dual-fed, clustered architecture. In this instance, the automated failover of one clustered network component did not complete successfully.

 

Solution:

 

The failover process for the affected network component required manual intervention by engineers, who ensured that it completed successfully. The engineers also made some configuration changes to the cluster which should reduce the risk of a similar event occurring in future.

Monday
Oct062008

NLA eClips Service Incident Report

NLA eClips Service Incident - Report

Problem:

At approximately 9:30 this morning an incident occurred which impacted eClips service delivery. The incident was resolved at 10:00. During this incident, clients ability to view eClips content was impacted as the service was intermittently unavailable.

Cause:

The root cause of the incident is still under investigation by NLA engineers, however indications show that when attempting to serve a higher than normal number of requests, the eClips database license checking process became less responsive and is being investigated as a potential area requiring optimization.

Solution:

NLA engineers are now reviewing the eClips core code related to this aspect of the service with the aim of discovering the root cause and optimizing it to prevent reoccurrence.

The NLA engineering team is also preparing to deploy a new database architecture which will be more resilient and scalable. This should also have the benefit of preventing such an incident from occurring.

NLA Service Operations Management