Business continuity on cloud enterprise data centers

Report 62 Downloads 234 Views
US008805989B2

(12) United States Patent

(10) Patent N0.: (45) Date of Patent:

Hemachandran et al.

(54) (75)

BUSINESS CONTINUITY ON CLOUD ENTERPRISE DATA CENTERS

Inventors: Satish K. Hemachandran, Newnan, GA

(US); Christopher T. Sears, Avondale Estates, GA (US)

(*)

Notice:

8,037,187 B2 *

10/2011

Dawson et a1. ............. .. 709/226

11/2011

Hadar et a1.

U.S.C. 154(b) by 0 days.

(21) App1.No.: 13/531,744 (22)

Filed:

Jun. 25, 2012

(65)

1/2012 Alibakhsh et a1.

8,209,415 B2 *

6/2012

8,291,036 B2 * 8,606,938 B1 *

10/2012 12/2013 12/2013

US 2013/0346573 A1

(51) (52)

.. 709/224

714/13 .. 709/224

Poluri et a1. .. .. 709/217 Chong et a1. .. .. 709/228 Clarke ........................ .. 718/104

5/2006 Mendonca et al. 10/2009

2009/0276771 A1

11/2009 Nickolov et a1.

AntosZ et a1. ............... .. 717/104

2010/0100879 A1

4/2010 Katiyar

2010/0131324 A1* 2010/0251329 A1*

5/2010 9/2010

Ferris .............................. .. 705/8 Wei ................................. .. 726/1

2011/0022642 A1

1/2011 deMilo et a1.

2011/0145413 A1*

6/2011

Dawson et a1. ............. .. 709/226

2011/0191296 A1 2011/0258481 A1

8/2011 Wallet a1. 10/2011 Kern

2011/0289119 A1*

11/2011

2012/0047107 A1 *

2/2012

Doddavula et a1.

.. 707/620

2012/0110186 A1*

5/2012

Kapur et a1. .... ..

.. 709/226

2013/0132768 A1*

Hu et a1. ..................... .. 707/803

5/2012 Ferris et a1. 8/2012 Jog et a1. 11/2012 Ferris 5/2013

.. 709/226 .. 709/223 .. 709/217

Kulkarni .................... .. 714/6.22

OTHER PUBLICATIONS

Int. Cl. G06F 15/173 US. Cl. USPC

(58)

Dec. 26, 2013

Wei ................. ..

2009/0249284 A1*

2012/0137002 A1* 2012/0215901 A1* 2012/0303740 A1*

Prior Publication Data

.... ..

8,103,906 B1*

8,607,242 B2 *

Subject to any disclaimer, the term of this patent is extended or adjusted under 35

Aug. 12, 2014

8,069,242 B2 *

2006/0098790 A1

(73) Assignee: Sungard Availability Services, LP, Wayne, PA (U S)

US 8,805,989 B2

(2006.01)

“HP Cloud Service Automation,” Hewlett-Packard Development Company, LP; 4 pages, created Apr. 2011.

........................................................ ..

709/223

Field of Classi?cation Search None

* cited by examiner

See application ?le for complete search history.

Primary Examiner * Ninos Donabed

(74) Attorney, Agent, or Firm * Cesari and McKenna, LLP

(56)

References Cited

(57)

U.S. PATENT DOCUMENTS 7,349,961 B2

3/2008 Yamamoto

7,363,382 Bl *

4/2008

7,609,619 B2 7,992,031 B2

ABSTRACT

Business continuity services in a data processing environ ment where a service provider offers virtual data center ser vices to numerous customers.

Bakke et al. ................ .. 709/230

10/2009 Naseh et al. 8/2011 Chavda et a1.

19 Claims, 16 Drawing Sheets

\

f

K

SITE-1 CLOUD MANAGEMENT

F/W POLlCY1 FIREWALL1

FIW POLICY2 F/W PO

I FNV POLICY 3 | ____________ ,7,

LICY1

F/W POLICYZ

FM POLICY n

F/W POLICY 3

--------- -->

L04 F/W POLICY n

/ 1110

FIREWALLZ LIB POLICY 1

L/B POLICYZ

777777777777 _>_

L/B POLICY n

I

CLOUD MANAGEMENT

LIB POLICY 1

U5 POLICY 2

____________ n,

LIB Poucv n

SOFWVARE

LOAD BALANCER1

\ LOAD BALANCERZ svc POLICY 1

svc POUCYZ

svc POLICY 3

(12.9. BACKUP)

(eg. OSPATCHING)

(9.9 MONITORING)

____________ 7+

svc poucv n

CUSTOMER 1 CONFlG/POLICY

VD“ SERV‘CES EEEE 2

DATA

CLOUD

CONFlGURATION APP 1

APP 2

APP 1

APP n

0s 1

0s 2

0s1

0s n

GUEST VM 1

GUEST VM 2

GUEST VM1

GUEST VM n

1PADDREss1

IPADDRESS 2

|PADDREss1

IP ADDRESS n

NETWORK1VLAN 1

z I

i

DATABASE

NETWORK: VLAN 2

\VIRTUAL DATA cENTER 1 1041-1 CUSTOMER 1 w

\ CLOUD SITE 1 102-1

J

J—J

US. Patent

Aug. 12, 2014

US 8,805,989 B2

Sheet 11 0f 16

6:2onwa

A.-6.@we:6:.mé52 3026:02% ‘,.N-222:8256

mm :>5:822>c58N552,;6:8

W32__8cmN625n::2“0:;2 N“2F52>2 .

w 1g5%226/

w_$E:22%)\5lqe 3.22 5\6J8

US. Patent

Aug. 12, 2014

Sheet 13 0f 16

US 8,805,989 B2

EH Q ~\._

@2s$_"8:3oE0.>S2mE“e?;

25$w98um2s5%gw W". LIm$UES409E\52MHEIU%DJ.=mm58m6>E5%53§2m.:3

:25$:Nw2:8

=@3~>52:DE538 2E9532s8;z>oEn_9S:5E

.GEm?

US. Patent

EG @5&3as?a$_g:38o,s6mb;

3256>m28e

Aug. 12, 2014

Sheet 14 0f 16

US 8,805,989 B2

US. Patent

Aug. 12, 2014

Sheet 16 0f 16

US 8,805,989 B2

US 8,805,989 B2 1

2

BUSINESS CONTINUITY ON CLOUD ENTERPRISE DATA CENTERS

(VDCs); and (b) permits selective enablement of a business continuity service for failing over selected elements of the production cloud to the continuity production cloud on a

BACKGROUND

per-customer, per-VDC, or per-VM basis. In speci?c implementations, additional features may

The users of data processing equipment increasingly ?nd

include:

the cloud-based infrastructure-as-a-service, or IaaS, model to

virtual data processors, ?rewalls, load balancers, and vir

be a ?exible, easy, and affordable way to access the IT infra

tual local area networks as elements of the VDCs;

structure they need. By moving servers and applications into

a replication service, provides data replication between the ?rst and second locations;

logical units referred to as Virtual Data Centers (VDCs), that can be easily deployed with an IaaS provider, these customers are free to build out equipment that exactly ?ts their require ments at the outset, while having the option to adjust with changing future needs on a “pay as you go” basis. VDCs, like

a network interface, provides secure communication

between the production and continuity clouds, such that the ?rst customer is prevented from accessing production or con

tinuity clouds provided for other customers; and

other cloud-based services, bring this promise of scalability

if included, the replication service operating independently

to allow expanding servers and applications as business needs

grow, without having to spend for unneeded hardware resources in advance. Additional bene?ts provided by profes sional level cloud service providers include access to equip ment with superior performance, security, disaster recovery,

of the production cloud and the continuity cloud. The cloud management service can further enable the ?rst

customer to specify Service Level Agreement (SLA) infor 20

multiple virtualization technologies provide further abstrac

25

tion layers within VDCs that makes them attractive. Server

network addresses are re-assigned; ?rewall rules are updated; virtual private networks are created; 30

tion side by side on the same physical machine. A virtual

load-balancing options are con?gured; virtual local area networks are created;

standby network interfaces are activated;

machine is a software representation of a physical machine, specifying its own set of virtual hardware resources such as

processors, memory, storage, network interfaces, and so forth upon which an operating system and applications are run.

The cloud management service can also further enable the ?rst customer to specify which one of several possible data processing platforms at several locations are to provide the target production cloud for the ?rst user. Optionally, in the event of a disaster;

virtualization decouples physical hardware from the operat ing system and other information technology and resources. Server virtualization allows multiple virtual machines with different operating systems and applications to run in isola

mation including one or more of cost, Recovery Point Objec

tive (RPO) and Recovery Time Objective (RTO).

and easy access to information technology consulting ser vices. Beyond simply moving hardware resources to a remote location accessible in the cloud via a network connection,

35

a recover plan is executed for each continuity enabled VDC to bring online VMs as speci?ed by the user in an order of recovery; the recovered VM’s are rebalanced.

SUMMARY

Furthermore, in an event of a test, it is possible that: virtual machine disks are cloned; ?rewall rules are updated;

Increasingly, cloud service providers are offering addi tional value-added services to IaaS customers as a way of 40

virtual private networks are created;

retaining existing customers and attracting new ones. Ser

load-balancing options are con?gured;

vices being offered to customers include, for example, busi

virtual local area networks are created;

ness continuity services. These services are optional but sub scribing to them may be bene?cial to the use and operation of each individual VDC.

standby network interfaces are activated; 45

Subscribing to a business continuity service helps protects virtual machines operating in the customer’s VDC from inter ruptions in the availability of the service providers’ infra

BRIEF DESCRIPTION OF THE DRAWINGS

structure.

With business continuity services enabled, the service pro

a recover plan is executed for each continuity enabled VDC to bring online VMs as speci?ed by the user in an order of recovery; and DNS updates are initiated for the recovered VM’s.

50

The foregoing will be apparent from the following more

vider can now respond to a disaster at the primary site, such as

a network outage or power failure, by transitioning customer systems to run out of a secondary site, thereby minimiZing the

particular description of example embodiments of the inven

disruption to application availability. This transition, known

like reference characters refer to the same parts throughout

tion, as illustrated in the accompanying drawings in which

as a “fail over”, can be done on a per-customer, per-VDC, or 55 the different views. The drawings are not necessarily to scale,

emphasis instead being placed upon illustrating embodi

per-VM basis. By doing so, business continuity services are implemented in a more orderly fashion from the perspective of the service provider and the cloud customer. In one embodiment, a data processing system is therefore provided for hosting virtual machines in a cloud computing

ments of the present invention. FIG. 1 is a high level diagram of a service provider who

offers enterprise cloud services with optional business conti 60

environment. A primary production cloud site, operated from

detail. FIG. 3 is a data structure maintained by the service pro vider to represent information concerning which VDCs have

a ?rst location, provides a set of virtual machines to a set of

customers. A second production site operates at a second location. The second location also operates as a continuity production cloud for the set of customers. A cloud manage

nuity to a number of customers. FIG. 2 illustrates a Virtual Data Center (VDC) in more

ment service both (a) maintains con?guration of the set of

associated business continuity services enabled. FIG. 4 illustrates replication services implemented

virtual machines as one or more Virtual Data Centers

between various sites.

65

US 8,805,989 B2 3

4

FIG. 5 shows a result of operating replication services is to store VDC replicas at failover sites. FIG. 6 shows the state immediately after an outage at cloud

An example cloud site 102 is responsible for hosting infra structure equipment that provides cloud services to many different customers. In the case of cloud site 102-1 there are n

site one.

customers 104-1-1 through 104-1-n. Cloud site 102-2 is ser

FIG. 7 illustrates the state after the backup VDC images are

vicing m customers 104-2-1, . . . , 104-2-m, and cloud site 102-3 hosts p customers 104-3-1, . . . , 104-3-p. It should be

promoted to production mode. FIG. 8 illustrates an initial state one a cloud site is brought back on line.

understood that is often overlap in the customers here such

FIG. 9 is an intermediate state after the cloud site is brought online but where some of the VDCs are still serviced from the

multiple sites 102-1, 102-2, and/or 102-3.

backup site.

(VDC) 110. An example VDC 110 may include many differ

that a given customer 104 can request cloud services from

One type of cloud service provided is a Virtual Data Center ent types of virtual data processing resources such as virtual

FIG. 10 is the state after all VDCs are again active at the

original site.

?rewalls, virtual load balancers, virtual local area networks, virtual data processing machines, virtual memory, virtual

FIG. 11 shows the cloud management database in more detail. FIG. 12 illustrates detail of how replication occurs between

disk storage, and software resources such as operating sys tems and applications. It should also be understood that although an example customer one 104-1-1 shown in FIG. 1

two sites.

appears to have speci?ed exactly four (4) VDCs (110-1-1-1,

FIG. 13 is an example user interface for specifying busi

ness continuity options.

110-1-1-2, . . . , 110-1-1-n) in reality any given customer

FIG. 14 is a user interface for con?guring a virtual data

20 104-1-1, . . . , 104-3-p may have more or less than the four

center in the enterprise cloud. FIG. 15 illustrates a sequence of steps performed in the

VDCs than are illustrated in FIG. 1. The VDCs 110-1 served from site one 102-1 for customer one 104-1 serve as a production cloud for speci?c customers

event of a disaster.

FIG. 16 illustrates a sequence of steps performed at time of

104-1-1, 104-1-2, . . . , 104-1-n. Likewise, the VDCs 110-1 25

test.

served from site two 102-2 for other customers 104-2-1, 104-2-2, . . . , 104-2-m serve as a production cloud for those

DETAILED DESCRIPTION

other customers 104-2-1, 104-2-2, . . . , 104-2-m.

The VDCs 110 include virtual computing resources that

FIG. 1 is a high level diagram of a typical cloud based

information technology (IT) environment 100 in which

30

are physically implemented at each particular service pro vider site 102 but are remotely accessed by the respective

improved business continuity procedures and apparatus

customers 104 over network connection(s). The service pro

described herein may be used. It should be understood that this is but one example cloud environment and many others are possible.

vider thus operates a number of physical machines at the

Of particular interest here is that users can request and

35

con?gure business continuity services for enterprise cloud(s)

area networks, and other data processing machines as needed

on a per-VDC or per-VM basis. The business continuity ser vice allows for site-to-site recovery across multiple data cen

ters that can be placed at geographically diverse sites. By selecting this business continuity service, the customer can be assured that in the event of a failure of the physical infrastruc ture at given site , his enterprise cloud(s)4on aVDC by VDC basisiwill be brought back online at another site according to a service level agreement (SLA). For example, as part of

enabling the business continuity service for certain VDCs, the customer may specify a Recovery Time Objective (RTO) and

various provider sites 102-1, 102-2, 102-3 including network ing equipment such as switches, routers, and other types internetworking equipment such as physical ?rewalls, and multiple physical data processors, storage servers, storage

40

to provide the functions required by the VDCs 110. The details of con?guration and operation of this physical data processing equipment are hidden from the customers 104; this data processing model sometimes referred to as Infra structure as a Service (IaaS).

An administrative user typically associated with each ser vice customer 104 does however have access to a cloud man 45 agement function 120 at one or more sites 102. The cloud

management interface allows administrative users to interact

Recovery Point Objective (RPO).

with and con?gure the elements of their VDCs available to

The business continuity service is made available to cus tomers on a per VDC basis. Thus, after the customer speci?es

them from the cloud site 102 as well as additional services. Cloud management components at least some of which are

con?guration of his VDC (including any virtual machines,

50

are applied. The service provider is then entirely responsible for con?guring the details of replicating the VDC, managing that data that speci?es the replication, isolating that detail from the customer, and bring the VDC back on line at the time of a disaster. Examples of conditions under which a disaster might be declared could include a network outage, power outage, or complete site failure. More particularly now, the cloud environment 100 illus

located each cloud site 102 may also be provided from a

central location (not shown in FIG. 1). For example, the

?rewalls, load balancers, etc.) he can then treat his entire VDC con?guration as a single entity to which continuity services

55

service provider may allow each customer to use the cloud management interface 120 to specify policies or other ser vices on a per customer, perVDC orper virtual machine basis. An example of a custom service policy might be a backup

60

policy that schedules backups of all virtual machines (VMs) at a given time each day for example at midnight Paci?c Standard Time (PST) each day. As will be understood from the description below, the business continuity service offered by the service provider in

trated in FIG. 1 is operated by a cloud service provider. The environment 100 includes equipment located at several dif

the environment 100 allows each customer to specify optional services to be provided on a per VDC 110 basis. One of the

ferent physical locations or sites 102. For example, a ?rst

services of interest is a business continuity service that

cloud site 102-1 may be located in Philadelphia, Pennsylva

enables a selected VDC to be brought back on line at an

nia, USA, a second cloud site 102-2 may be located in Lon don, England, UK and a third cloud site 102-3 may be located

in Pune, Maharashtra, India.

65

alternate site 102-2, 102-3 in the event that a selected cloud site 102-1 fails, goes off-line, or otherwise becomes unavail able.

US 8,805,989 B2 5

6

A typical VDC is shown in more detail in FIG. 2, and includes a number of virtual machines 201-1, 201-2,

be replicated at cloud site two 102-2. Similarly, another cus tomer n 104-n of site one 102-1 will have his VDC 4 repli cated at site two 102-2. Also apparent in FIG. 5 is that site one 102-1 has a customer two 104-1-2 that has requested business continuity services for his VDC 2, but that site three 102-3 be used for this. So the replication service 400-1-3 causes an image of his VDC 2 to be created at site three 102-3 as VDC

201-3, . . . , 201-n. An exampleVM 201 has associated with it

a network address such as an Internet Protocol (IP), an oper

ating system 203, and one or more applications 204. TheVMs 201 may be further interconnected into one or more Virtual

Local Area Networks (VLANs) 210-1, 210-2. Although FIG. 2 illustrates a single operating system 203 and single application 204 for each VM 201 it should be

image 110-3-1-2. These replicated VDCs (110-2-1-1-1, 110-2-1-1-4 and

understood that multiple operating systems 203 and multiple

110-2-1-n-4) will exist as images (e.g., as replicas or dormant copies) and will not yet be in an active production mode; this

applications 204 may be implemented in each VM 201. The example VDC 110 also may have one or more virtual

fact is indicated by the use of dashed lines in FIG. 5. As with the prior ?gures, the VDC shown with solid lines are used to indicate that those VDCs are in an active production mode.

?rewalls 212, virtual load balancers 221 and other services 230. Virtual ?rewalls 211-1 and 211-2 may each have a number

It is therefore the case that while site one 102-1 serves as a

of associated policies 212-1-1, . . . 212-2-m.

production cloud for customer one, that customer one also has

Likewise, the virtual load balancers 220-1 and 220-2 also

access to one or more other sites, such as site two 102-2.

have associated policies 221-1-1 through 221-2-m. The services 230 associated with each VDC 110 are selec

tively chosen by the customer and speci?ed via cloud man agement 120. The service provider may choose to charge additional fees for activating these optional services. For example a given VDC 110 may have a backup policy 230-1,

20

event of a failure at site one. These other sites also serve as

primary production clouds for other customers at the same time.

and operating system patching policy 230-2, and monitoring policies 230-3. Of interest herein the customer can specify a

These other sites serve as a business continuity cloud for customer one from which selected VDCs will be served in the

As a further option speci?ed to cloud management 120-1, 25

customers can specify at which site their respective business

business continuity (BC) policy 230-4 on a per-VDC basis. FIG. 3 is a high-level conceptual diagram of an example

continuity elements are located; this option can be speci?ed

cloud site one 102-1 and how the customers 104-1-1, 104-1-2,

user speci?es the con?guration of his corresponding business continuity services for each VDC. Also at this cloud management con?guration screen (to be

on the same user interface screen when the administrative

104-1-3, and 104-1-n it is responsible for have speci?ed busi ness continuity services for each of their respective VDCs. In

30

this example, customer one 104-1-1 of cloud site one 102-1

shown in detail below), a customer can specify further aspects

has speci?ed that business continuity services should be enabled forhis VDC 1 (110-1-1-1) andhisVDC 4 (110-1-1-4)

of the business continuity service such as Recovery Time

Objective (RTO) and Recovery Point Objective (RPO),

but not for his other VDC 2, VDC 3, and VDC n.

Similarly, customer two 104-1-2 has speci?ed that busi ness continuity services should be enabled for his VDC 2 (110-1-2-2) but not for any ofhis otherVDC 1, VDC 3, VDC

35

how quickly they much be brought into production mode in

4, . . . , orVDCm.

Information concerning which VDCs have business conti nuity services enabled is maintained in the cloud manage

the event of a disaster. 40 At a time illustrated in FIG. 6, site one 102-1 is now

experiencing an outage 601 of some type that makes it

ment information 120-1 associated with each site 102 as will be described in more detail below.

unavailable to customer one 104-1 and other customers.

What is important to recognize here is that each customer 102 speci?es, on a per VDC basis, and not on a lower level (such as a per-VM basis) or on a high level (such as a per

according to an available Agreement (SLA) entered into between the customer and the service provider. These SLA parameters will dictate how often the replication services 400 store and update the images of the various VDCs as well as

When cloud site one 102-1 goes down, these associated cus tomers may be noti?ed of the outage in a manner that has been 45

pre-arranged such as by e-mail, mobile text message, phone

customer basis), the enablement of business continuity ser

call etc.

Vices.

Initially, as indicated in FIG. 6, there is no change in the operational state of the other sites 102-2, 102-3. This may be

FIG. 4 is a diagram similar to that of FIG. 1 but illustrating

that to provide the requested business continuity services there are replication services put in place between the various cloud sites 102. For example, a ?rst replication service 400 1-2 operates to replicate information between site one 102 1-2 and site two 102-2, a second replication service 400-2-3 replicates data between site two 102-2 and site three 102-3, and a third replication service 400-1-3 replicates data between site one 102-1 and site three 102-3. These replication

services can be implemented using any convenient replica tion technology, but operate independently of the customers 104 and other operations of the sites 102. FIG. 5 shows the outcome of implementing these replica

because the other sites are operating normally, or some time is 50

VDCs for site one customer one 104-1-1 and site one cus

tomer n 104-1-n are brought online at cloud site two 102-2. Also at this time the VDCs for site one customer two are 55

to customer one 104-1 at cloud site one 102-1 will eventually

brought on online at cloud site three 102-3. With these VDCs

brought back in production mode, the customers 104-1, 104-2 are again sent a notice, this time that their VDCs (e.g., VDCs 110-2-1-1-1, 110-2-1-1-4, . . . , 110-2-n-1-4, 110-3-1-2-2) are

brought back online at the respective alternate sites 102-2, 60

tion services. As mentioned above, customer one 104-1 has

requested that his VDC 1 and VDC 4 have business continuity services enabled; likewise customer two 104-2 has requested that only his VDC 4 be subjected to the business continuity service. As a result, due to the operation of replication ser vices 400-1-2 and 400-1-3, the VDC 1 and VDC 4 belonging

permitted for the outage to resolve itself at the primary site. However, eventually a point is reached in FIG. 7 where the

65

102-3. In a next state, as shown in FIG. 8, a point might be reached where site one 102-1 again comes back online. At this point, site one 102-1 is not yet hosting any production VDCs as it does not yet have access to the information needed to bring them back online, and therefore customer one 104-1 and customer two 104-2 continue to have theirVDCs hosted from

the alternate locations 102-2, 102-3.