HIGH AVAILABILITY IT PLATFORM

VOLKSWAGEN GROUP

GOALS AND OBJECTIVES

  • BUSINESS OBJECTIVE

    To ensure uninterrupted operations, prevent any downtime due to emergency, and eliminate associated financial losses.

  • IT OBJECTIVE

    To build a high availability IT infrastructure which ensures fault tolerance of all services, including production services.
    To develop standard solution for the protection of critical systems and document the required SLAs.
    To ensure the continuous monitoring of the availability of all IT systems components.

SOLUTION

  • A distributed virtualized systems complex based on two data centers
  • The solution includes: virtual farms, database clusters, storage networks and backup systems from various vendors (EMC, IBM and VMware)
  • A system for analytical monitoring
Solve a similar problem

IMPLEMENTATION

The new high availability IT platform is based on two data centers and is a distributed virtualized computing system that operates in active-active mode during normal functioning. Multilevel protection of IT services (cluster architecture for storage systems and database servers) has been worked out in detail, in order to ensure the platform’s smooth operation.

All data is mirrored between sites, and virtual machines can be quickly moved from one location to another. The largest sites (several terabytes) are replicated to additional storage, which significantly reduces the service recovery time in the event of an emergency, given that there is no need to restore from a backup.

Service systems are separated by a firewall for security reasons. Backup – with individually configured frequency depending upon how critical a particular service is – is carried out for each of the systems.

A Disaster Recovery Plan, with a detailed description of methods and steps for eliminating failures in the event of force majeure situations (determining the required team of specialists, their tasks, etc.), has been developed and documented.

Continuous saving of changes to disk arrays protects databases from logical errors and makes it possible to completely recreate the system in a way which is identical to a specific point in time before an accident. An analytical monitoring system (monitoring systems for storage systems, virtual machines and network infrastructure) tracks IT infrastructure operation in real time.

Read more

PROJECT RESULTS

The new IT infrastructure provides the required fault-tolerance and operates almost around the clock. Analysis of system availability and performance, equipment health, operability of systems software and the DBMS, and sufficiency of resources is carried out in continuous monitoring mode. For clarity, all the necessary indicators are brought together on unified dashboards, providing a complete, real-time picture of the IT environment.

The detailed systems operation and restoration instructions (with predictable SLA parameters provided for each system) make it possible to optimize the work and interaction of specialists and ensure efficient control mechanisms.

In total, 16 typical emergencies were tested (failure of virtual infrastructure at one of the sites, complete or partial database destruction, loss of SAN configuration, etc.), as were the recovery actions for each situation.

Jet Infosystem specialists have managed to achieve an almost unique recovery time for Volkswagon’s primary production IT services - in the case of sporadic failures, recovery time does not exceed 40 minutes. Potential data loss in the event of damage has also been minimized (the Return Point Objective is almost zero).

The technologies applied and the platform service standards developed as part of the project allow for extension to meet the needs of any production line, even if very large-scale.
  • 21 hours per day

    Time during which the production conveyor is running

  • 40 minutes or less

    Recovery of operation of critical IT services (RTO) in the event of an accident

  • 16 typical emergency situations

    Tested during trial systems runs

  • ⇒ 0

    The volume of data loss (RPO) in an event of a disaster tends to zero

CUSTOMER REVIEW

The only possible time for migrating Volkswagon’s production systems to the new infrastructure was the three-week factory vacation. When we began work, there were just a couple months left for deployment and testing of the new computing systems complex. While this was only about half the usual timeframe, our experience and best practices noted during earlier projects meant that would were able to design and implement infrastructure components in an almost simultaneous fashion. As a result, we were able to comply with the extremely tight timeframe.

Alexey Kulpin

Manager for work with Corporate Clients, Jet Infosystems

DO YOU HAVE A SIMILAR PROBLEM?

I have read and agree to the privacy policy

By continuing to use this site, you consent to the processing of your personal data using the Internet services Yandex Metrika and Google Analytics. The procedure for processing your personal data, as well as the implemented requirements for their protection, are set out in the PDN Processing Policy. If you do not agree with the processing of your personal data, you can disable the storage of cookies in your browser settings.

Read more