Computer screens

How to maximise system uptime

Mark Hall - 09 October 2019

Customers expect online systems to be available 24 hours a day, 7 days a week and any downtime is increasingly seen as a failure of the organisation to invest in the necessary equipment and skilled personnel required to achieve this. Even scheduled maintenance windows are becoming increasingly difficult to arrange, with customers expecting systems to continue to run on alternative platforms when maintenance is required.

At OWA we aim to provide our customers with 100% uptime for the systems we run and support on their behalf, so we would like to provide some insight into how we go about achieving this.

1. Choose a reliable home to host your systems

Organisations now have many choices about where to host their systems and they can even choose to place their applications in the Cloud, without any knowledge of where they are located physically. Data protection legislation, such as GDPR, now requires organisations to ensure the security of personal identifiable data which they store on behalf of their customers. It is therefore important to understand physically where the servers which host your data are located, even if you do decide to place everything in the Cloud.

At OWA we use two completely independent UK based Data centres to host our physical equipment. This allows us to replicate our critical applications and those of our clients across the Data centres, therefore reducing the recovery time needed in the event of a catastrophic failure.

2. Invest in good quality equipment and manage the life cycle

If you require your systems to be up and running 24 hours a day, 7 days a week, without needing too much emergency attention, then you have to invest in good quality equipment which is designed to run reliably. Where possible single points of failure should be avoided and critical applications should be replicated.

No matter how well maintained, equipment does not last forever and therefore it is important to manage the lifecycle. At OWA we have just completed the replacement of all of our production servers and this is something we do every 3 years. We strongly believe that by replacing equipment before it becomes unreliable helps us avoid unscheduled downtime and ultimately reduces the related costs involved in recovering from such events.

We are of course mindful of the environmental impact of replacing equipment which is still operational. As part of our equipment lifecycle programme, production servers are re-purposed and used for up to 3 further years as development, staging and backup equipment.

3. Invest in good quality virtualisation and management applications

Having reliable equipment is only part of what is needed to provide 100% uptime. Good quality and reliable virtualisation software is so important to ensure that hosted applications continue to run and can be managed in a way that minimises downtime.

At OWA we use VMware to provide our virtualised environment. VMware is a very mature virtualisation platform which allows us to effectively monitor and manage the performance of our environments across both of our Data centres.

VMware vCenter includes the necessary tools to migrate virtual hosts between physical hosts and storage, therefore allowing critical applications to remain operational even when they are being migrated.

vSphere Replication allows us to create replicated copies of critical servers across different hosts and Data centres which reduces the time needed to recover from an issue if required.

4. Monitor and check system performance

At OWA our own team directly manage the systems to ensure that any problems are identified before they result in unscheduled downtime. To achieve this we have procedures and checks which we carry out on a daily, weekly and monthly basis to ensure that any potential issues are picked up at an early stage.

In addition to our manual checks we also run automated monitoring of the servers and hosted applications to ensure that they are online and available. In the event that an issue is identified our team are alerted and respond appropriately, 24 hours a day, 7 days a week.

5. Apply patches regularly

To ensure that systems continue to run reliably and securely it has never been more important than to regularly apply patches to applications and underlying operating systems. At OWA we run a monthly patching cycle and also closely monitor for zero day vulnerabilities. Zero day vulnerabilities sometimes require us to put in place additional mitigation measures prior to a full patch being released.

6. Prepare in advance to recover

No matter how reliable your infrastructure and how well you manage your systems, there is always a risk that something will fail and you will need a robust recovery plan in place. At OWA we operate from two independent UK based Data centres and replicate our key applications between the two sites. We replicate all of the servers we host across virtualised platforms so we always have a copy which is on standby. We also take tape backups of all of our key data on a weekly basis and store those securely off site and rotate the copies every 6 weeks. Tape backups are used as a last resort, however they do have the benefit of being offline and less prone to Ransomware attacks.

OWA are able to support, host and further develop applications which clients have had created elsewhere, so please do get in touch if you would like to benefit from our knowledge and experience.