US008805989B2
(12) United States Patent
(10) Patent N0.: (45) Date of Patent:
Hemachandran et al.
(54) (75)
BUSINESS CONTINUITY ON CLOUD ENTERPRISE DATA CENTERS
Inventors: Satish K. Hemachandran, Newnan, GA
(US); Christopher T. Sears, Avondale Estates, GA (US)
(*)
Notice:
8,037,187 B2 *
10/2011
Dawson et a1. ............. .. 709/226
11/2011
Hadar et a1.
U.S.C. 154(b) by 0 days.
(21) App1.No.: 13/531,744 (22)
Filed:
Jun. 25, 2012
(65)
1/2012 Alibakhsh et a1.
8,209,415 B2 *
6/2012
8,291,036 B2 * 8,606,938 B1 *
10/2012 12/2013 12/2013
US 2013/0346573 A1
(51) (52)
.. 709/224
714/13 .. 709/224
Poluri et a1. .. .. 709/217 Chong et a1. .. .. 709/228 Clarke ........................ .. 718/104
5/2006 Mendonca et al. 10/2009
2009/0276771 A1
11/2009 Nickolov et a1.
AntosZ et a1. ............... .. 717/104
2010/0100879 A1
4/2010 Katiyar
2010/0131324 A1* 2010/0251329 A1*
5/2010 9/2010
Ferris .............................. .. 705/8 Wei ................................. .. 726/1
2011/0022642 A1
1/2011 deMilo et a1.
2011/0145413 A1*
6/2011
Dawson et a1. ............. .. 709/226
2011/0191296 A1 2011/0258481 A1
8/2011 Wallet a1. 10/2011 Kern
2011/0289119 A1*
11/2011
2012/0047107 A1 *
2/2012
Doddavula et a1.
.. 707/620
2012/0110186 A1*
5/2012
Kapur et a1. .... ..
.. 709/226
2013/0132768 A1*
Hu et a1. ..................... .. 707/803
5/2012 Ferris et a1. 8/2012 Jog et a1. 11/2012 Ferris 5/2013
.. 709/226 .. 709/223 .. 709/217
Kulkarni .................... .. 714/6.22
OTHER PUBLICATIONS
Int. Cl. G06F 15/173 US. Cl. USPC
(58)
Dec. 26, 2013
Wei ................. ..
2009/0249284 A1*
2012/0137002 A1* 2012/0215901 A1* 2012/0303740 A1*
Prior Publication Data
.... ..
8,103,906 B1*
8,607,242 B2 *
Subject to any disclaimer, the term of this patent is extended or adjusted under 35
Aug. 12, 2014
8,069,242 B2 *
2006/0098790 A1
(73) Assignee: Sungard Availability Services, LP, Wayne, PA (U S)
US 8,805,989 B2
(2006.01)
“HP Cloud Service Automation,” Hewlett-Packard Development Company, LP; 4 pages, created Apr. 2011.
........................................................ ..
709/223
Field of Classi?cation Search None
* cited by examiner
See application ?le for complete search history.
Primary Examiner * Ninos Donabed
(74) Attorney, Agent, or Firm * Cesari and McKenna, LLP
(56)
References Cited
(57)
U.S. PATENT DOCUMENTS 7,349,961 B2
3/2008 Yamamoto
7,363,382 Bl *
4/2008
7,609,619 B2 7,992,031 B2
ABSTRACT
Business continuity services in a data processing environ ment where a service provider offers virtual data center ser vices to numerous customers.
Bakke et al. ................ .. 709/230
10/2009 Naseh et al. 8/2011 Chavda et a1.
19 Claims, 16 Drawing Sheets
\
f
K
SITE-1 CLOUD MANAGEMENT
F/W POLlCY1 FIREWALL1
FIW POLICY2 F/W PO
I FNV POLICY 3 | ____________ ,7,
LICY1
F/W POLICYZ
FM POLICY n
F/W POLICY 3
--------- -->
L04 F/W POLICY n
/ 1110
FIREWALLZ LIB POLICY 1
L/B POLICYZ
777777777777 _>_
L/B POLICY n
I
CLOUD MANAGEMENT
LIB POLICY 1
U5 POLICY 2
____________ n,
LIB Poucv n
SOFWVARE
LOAD BALANCER1
\ LOAD BALANCERZ svc POLICY 1
svc POUCYZ
svc POLICY 3
(12.9. BACKUP)
(eg. OSPATCHING)
(9.9 MONITORING)
____________ 7+
svc poucv n
CUSTOMER 1 CONFlG/POLICY
VD“ SERV‘CES EEEE 2
DATA
CLOUD
CONFlGURATION APP 1
APP 2
APP 1
APP n
0s 1
0s 2
0s1
0s n
GUEST VM 1
GUEST VM 2
GUEST VM1
GUEST VM n
1PADDREss1
IPADDRESS 2
|PADDREss1
IP ADDRESS n
NETWORK1VLAN 1
z I
i
DATABASE
NETWORK: VLAN 2
\VIRTUAL DATA cENTER 1 1041-1 CUSTOMER 1 w
\ CLOUD SITE 1 102-1
J
J—J
US. Patent
Aug. 12, 2014
US 8,805,989 B2
Sheet 11 0f 16
6:2onwa
A.-6.@we:6:.mé52 3026:02% ‘,.N-222:8256
mm :>5:822>c58N552,;6:8
W32__8cmN625n::2“0:;2 N“2F52>2 .
w 1g5%226/
w_$E:22%)\5lqe 3.22 5\6J8
US. Patent
Aug. 12, 2014
Sheet 13 0f 16
US 8,805,989 B2
EH Q ~\._
@2s$_"8:3oE0.>S2mE“e?;
25$w98um2s5%gw W". LIm$UES409E\52MHEIU%DJ.=mm58m6>E5%53§2m.:3
:25$:Nw2:8
=@3~>52:DE538 2E9532s8;z>oEn_9S:5E
.GEm?
US. Patent
EG @5&3as?a$_g:38o,s6mb;
3256>m28e
Aug. 12, 2014
Sheet 14 0f 16
US 8,805,989 B2
US. Patent
Aug. 12, 2014
Sheet 16 0f 16
US 8,805,989 B2
US 8,805,989 B2 1
2
BUSINESS CONTINUITY ON CLOUD ENTERPRISE DATA CENTERS
(VDCs); and (b) permits selective enablement of a business continuity service for failing over selected elements of the production cloud to the continuity production cloud on a
BACKGROUND
per-customer, per-VDC, or per-VM basis. In speci?c implementations, additional features may
The users of data processing equipment increasingly ?nd
include:
the cloud-based infrastructure-as-a-service, or IaaS, model to
virtual data processors, ?rewalls, load balancers, and vir
be a ?exible, easy, and affordable way to access the IT infra
tual local area networks as elements of the VDCs;
structure they need. By moving servers and applications into
a replication service, provides data replication between the ?rst and second locations;
logical units referred to as Virtual Data Centers (VDCs), that can be easily deployed with an IaaS provider, these customers are free to build out equipment that exactly ?ts their require ments at the outset, while having the option to adjust with changing future needs on a “pay as you go” basis. VDCs, like
a network interface, provides secure communication
between the production and continuity clouds, such that the ?rst customer is prevented from accessing production or con
tinuity clouds provided for other customers; and
other cloud-based services, bring this promise of scalability
if included, the replication service operating independently
to allow expanding servers and applications as business needs
grow, without having to spend for unneeded hardware resources in advance. Additional bene?ts provided by profes sional level cloud service providers include access to equip ment with superior performance, security, disaster recovery,
of the production cloud and the continuity cloud. The cloud management service can further enable the ?rst
customer to specify Service Level Agreement (SLA) infor 20
multiple virtualization technologies provide further abstrac
25
tion layers within VDCs that makes them attractive. Server
network addresses are re-assigned; ?rewall rules are updated; virtual private networks are created; 30
tion side by side on the same physical machine. A virtual
load-balancing options are con?gured; virtual local area networks are created;
standby network interfaces are activated;
machine is a software representation of a physical machine, specifying its own set of virtual hardware resources such as
processors, memory, storage, network interfaces, and so forth upon which an operating system and applications are run.
The cloud management service can also further enable the ?rst customer to specify which one of several possible data processing platforms at several locations are to provide the target production cloud for the ?rst user. Optionally, in the event of a disaster;
virtualization decouples physical hardware from the operat ing system and other information technology and resources. Server virtualization allows multiple virtual machines with different operating systems and applications to run in isola
mation including one or more of cost, Recovery Point Objec
tive (RPO) and Recovery Time Objective (RTO).
and easy access to information technology consulting ser vices. Beyond simply moving hardware resources to a remote location accessible in the cloud via a network connection,
35
a recover plan is executed for each continuity enabled VDC to bring online VMs as speci?ed by the user in an order of recovery; the recovered VM’s are rebalanced.
SUMMARY
Furthermore, in an event of a test, it is possible that: virtual machine disks are cloned; ?rewall rules are updated;
Increasingly, cloud service providers are offering addi tional value-added services to IaaS customers as a way of 40
virtual private networks are created;
retaining existing customers and attracting new ones. Ser
load-balancing options are con?gured;
vices being offered to customers include, for example, busi
virtual local area networks are created;
ness continuity services. These services are optional but sub scribing to them may be bene?cial to the use and operation of each individual VDC.
standby network interfaces are activated; 45
Subscribing to a business continuity service helps protects virtual machines operating in the customer’s VDC from inter ruptions in the availability of the service providers’ infra
BRIEF DESCRIPTION OF THE DRAWINGS
structure.
With business continuity services enabled, the service pro
a recover plan is executed for each continuity enabled VDC to bring online VMs as speci?ed by the user in an order of recovery; and DNS updates are initiated for the recovered VM’s.
50
The foregoing will be apparent from the following more
vider can now respond to a disaster at the primary site, such as
a network outage or power failure, by transitioning customer systems to run out of a secondary site, thereby minimiZing the
particular description of example embodiments of the inven
disruption to application availability. This transition, known
like reference characters refer to the same parts throughout
tion, as illustrated in the accompanying drawings in which
as a “fail over”, can be done on a per-customer, per-VDC, or 55 the different views. The drawings are not necessarily to scale,
emphasis instead being placed upon illustrating embodi
per-VM basis. By doing so, business continuity services are implemented in a more orderly fashion from the perspective of the service provider and the cloud customer. In one embodiment, a data processing system is therefore provided for hosting virtual machines in a cloud computing
ments of the present invention. FIG. 1 is a high level diagram of a service provider who
offers enterprise cloud services with optional business conti 60
environment. A primary production cloud site, operated from
detail. FIG. 3 is a data structure maintained by the service pro vider to represent information concerning which VDCs have
a ?rst location, provides a set of virtual machines to a set of
customers. A second production site operates at a second location. The second location also operates as a continuity production cloud for the set of customers. A cloud manage
nuity to a number of customers. FIG. 2 illustrates a Virtual Data Center (VDC) in more
ment service both (a) maintains con?guration of the set of
associated business continuity services enabled. FIG. 4 illustrates replication services implemented
virtual machines as one or more Virtual Data Centers
between various sites.
65
US 8,805,989 B2 3
4
FIG. 5 shows a result of operating replication services is to store VDC replicas at failover sites. FIG. 6 shows the state immediately after an outage at cloud
An example cloud site 102 is responsible for hosting infra structure equipment that provides cloud services to many different customers. In the case of cloud site 102-1 there are n
site one.
customers 104-1-1 through 104-1-n. Cloud site 102-2 is ser
FIG. 7 illustrates the state after the backup VDC images are
vicing m customers 104-2-1, . . . , 104-2-m, and cloud site 102-3 hosts p customers 104-3-1, . . . , 104-3-p. It should be
promoted to production mode. FIG. 8 illustrates an initial state one a cloud site is brought back on line.
understood that is often overlap in the customers here such
FIG. 9 is an intermediate state after the cloud site is brought online but where some of the VDCs are still serviced from the
multiple sites 102-1, 102-2, and/or 102-3.
backup site.
(VDC) 110. An example VDC 110 may include many differ
that a given customer 104 can request cloud services from
One type of cloud service provided is a Virtual Data Center ent types of virtual data processing resources such as virtual
FIG. 10 is the state after all VDCs are again active at the
original site.
?rewalls, virtual load balancers, virtual local area networks, virtual data processing machines, virtual memory, virtual
FIG. 11 shows the cloud management database in more detail. FIG. 12 illustrates detail of how replication occurs between
disk storage, and software resources such as operating sys tems and applications. It should also be understood that although an example customer one 104-1-1 shown in FIG. 1
two sites.
appears to have speci?ed exactly four (4) VDCs (110-1-1-1,
FIG. 13 is an example user interface for specifying busi
ness continuity options.
110-1-1-2, . . . , 110-1-1-n) in reality any given customer
FIG. 14 is a user interface for con?guring a virtual data
20 104-1-1, . . . , 104-3-p may have more or less than the four
center in the enterprise cloud. FIG. 15 illustrates a sequence of steps performed in the
VDCs than are illustrated in FIG. 1. The VDCs 110-1 served from site one 102-1 for customer one 104-1 serve as a production cloud for speci?c customers
event of a disaster.
FIG. 16 illustrates a sequence of steps performed at time of
104-1-1, 104-1-2, . . . , 104-1-n. Likewise, the VDCs 110-1 25
test.
served from site two 102-2 for other customers 104-2-1, 104-2-2, . . . , 104-2-m serve as a production cloud for those
DETAILED DESCRIPTION
other customers 104-2-1, 104-2-2, . . . , 104-2-m.
The VDCs 110 include virtual computing resources that
FIG. 1 is a high level diagram of a typical cloud based
information technology (IT) environment 100 in which
30
are physically implemented at each particular service pro vider site 102 but are remotely accessed by the respective
improved business continuity procedures and apparatus
customers 104 over network connection(s). The service pro
described herein may be used. It should be understood that this is but one example cloud environment and many others are possible.
vider thus operates a number of physical machines at the
Of particular interest here is that users can request and
35
con?gure business continuity services for enterprise cloud(s)
area networks, and other data processing machines as needed
on a per-VDC or per-VM basis. The business continuity ser vice allows for site-to-site recovery across multiple data cen
ters that can be placed at geographically diverse sites. By selecting this business continuity service, the customer can be assured that in the event of a failure of the physical infrastruc ture at given site , his enterprise cloud(s)4on aVDC by VDC basisiwill be brought back online at another site according to a service level agreement (SLA). For example, as part of
enabling the business continuity service for certain VDCs, the customer may specify a Recovery Time Objective (RTO) and
various provider sites 102-1, 102-2, 102-3 including network ing equipment such as switches, routers, and other types internetworking equipment such as physical ?rewalls, and multiple physical data processors, storage servers, storage
40
to provide the functions required by the VDCs 110. The details of con?guration and operation of this physical data processing equipment are hidden from the customers 104; this data processing model sometimes referred to as Infra structure as a Service (IaaS).
An administrative user typically associated with each ser vice customer 104 does however have access to a cloud man 45 agement function 120 at one or more sites 102. The cloud
management interface allows administrative users to interact
Recovery Point Objective (RPO).
with and con?gure the elements of their VDCs available to
The business continuity service is made available to cus tomers on a per VDC basis. Thus, after the customer speci?es
them from the cloud site 102 as well as additional services. Cloud management components at least some of which are
con?guration of his VDC (including any virtual machines,
50
are applied. The service provider is then entirely responsible for con?guring the details of replicating the VDC, managing that data that speci?es the replication, isolating that detail from the customer, and bring the VDC back on line at the time of a disaster. Examples of conditions under which a disaster might be declared could include a network outage, power outage, or complete site failure. More particularly now, the cloud environment 100 illus
located each cloud site 102 may also be provided from a
central location (not shown in FIG. 1). For example, the
?rewalls, load balancers, etc.) he can then treat his entire VDC con?guration as a single entity to which continuity services
55
service provider may allow each customer to use the cloud management interface 120 to specify policies or other ser vices on a per customer, perVDC orper virtual machine basis. An example of a custom service policy might be a backup
60
policy that schedules backups of all virtual machines (VMs) at a given time each day for example at midnight Paci?c Standard Time (PST) each day. As will be understood from the description below, the business continuity service offered by the service provider in
trated in FIG. 1 is operated by a cloud service provider. The environment 100 includes equipment located at several dif
the environment 100 allows each customer to specify optional services to be provided on a per VDC 110 basis. One of the
ferent physical locations or sites 102. For example, a ?rst
services of interest is a business continuity service that
cloud site 102-1 may be located in Philadelphia, Pennsylva
enables a selected VDC to be brought back on line at an
nia, USA, a second cloud site 102-2 may be located in Lon don, England, UK and a third cloud site 102-3 may be located
in Pune, Maharashtra, India.
65
alternate site 102-2, 102-3 in the event that a selected cloud site 102-1 fails, goes off-line, or otherwise becomes unavail able.
US 8,805,989 B2 5
6
A typical VDC is shown in more detail in FIG. 2, and includes a number of virtual machines 201-1, 201-2,
be replicated at cloud site two 102-2. Similarly, another cus tomer n 104-n of site one 102-1 will have his VDC 4 repli cated at site two 102-2. Also apparent in FIG. 5 is that site one 102-1 has a customer two 104-1-2 that has requested business continuity services for his VDC 2, but that site three 102-3 be used for this. So the replication service 400-1-3 causes an image of his VDC 2 to be created at site three 102-3 as VDC
201-3, . . . , 201-n. An exampleVM 201 has associated with it
a network address such as an Internet Protocol (IP), an oper
ating system 203, and one or more applications 204. TheVMs 201 may be further interconnected into one or more Virtual
Local Area Networks (VLANs) 210-1, 210-2. Although FIG. 2 illustrates a single operating system 203 and single application 204 for each VM 201 it should be
image 110-3-1-2. These replicated VDCs (110-2-1-1-1, 110-2-1-1-4 and
understood that multiple operating systems 203 and multiple
110-2-1-n-4) will exist as images (e.g., as replicas or dormant copies) and will not yet be in an active production mode; this
applications 204 may be implemented in each VM 201. The example VDC 110 also may have one or more virtual
fact is indicated by the use of dashed lines in FIG. 5. As with the prior ?gures, the VDC shown with solid lines are used to indicate that those VDCs are in an active production mode.
?rewalls 212, virtual load balancers 221 and other services 230. Virtual ?rewalls 211-1 and 211-2 may each have a number
It is therefore the case that while site one 102-1 serves as a
of associated policies 212-1-1, . . . 212-2-m.
production cloud for customer one, that customer one also has
Likewise, the virtual load balancers 220-1 and 220-2 also
access to one or more other sites, such as site two 102-2.
have associated policies 221-1-1 through 221-2-m. The services 230 associated with each VDC 110 are selec
tively chosen by the customer and speci?ed via cloud man agement 120. The service provider may choose to charge additional fees for activating these optional services. For example a given VDC 110 may have a backup policy 230-1,
20
event of a failure at site one. These other sites also serve as
primary production clouds for other customers at the same time.
and operating system patching policy 230-2, and monitoring policies 230-3. Of interest herein the customer can specify a
These other sites serve as a business continuity cloud for customer one from which selected VDCs will be served in the
As a further option speci?ed to cloud management 120-1, 25
customers can specify at which site their respective business
business continuity (BC) policy 230-4 on a per-VDC basis. FIG. 3 is a high-level conceptual diagram of an example
continuity elements are located; this option can be speci?ed
cloud site one 102-1 and how the customers 104-1-1, 104-1-2,
user speci?es the con?guration of his corresponding business continuity services for each VDC. Also at this cloud management con?guration screen (to be
on the same user interface screen when the administrative
104-1-3, and 104-1-n it is responsible for have speci?ed busi ness continuity services for each of their respective VDCs. In
30
this example, customer one 104-1-1 of cloud site one 102-1
shown in detail below), a customer can specify further aspects
has speci?ed that business continuity services should be enabled forhis VDC 1 (110-1-1-1) andhisVDC 4 (110-1-1-4)
of the business continuity service such as Recovery Time
Objective (RTO) and Recovery Point Objective (RPO),
but not for his other VDC 2, VDC 3, and VDC n.
Similarly, customer two 104-1-2 has speci?ed that busi ness continuity services should be enabled for his VDC 2 (110-1-2-2) but not for any ofhis otherVDC 1, VDC 3, VDC
35
how quickly they much be brought into production mode in
4, . . . , orVDCm.
Information concerning which VDCs have business conti nuity services enabled is maintained in the cloud manage
the event of a disaster. 40 At a time illustrated in FIG. 6, site one 102-1 is now
experiencing an outage 601 of some type that makes it
ment information 120-1 associated with each site 102 as will be described in more detail below.
unavailable to customer one 104-1 and other customers.
What is important to recognize here is that each customer 102 speci?es, on a per VDC basis, and not on a lower level (such as a per-VM basis) or on a high level (such as a per
according to an available Agreement (SLA) entered into between the customer and the service provider. These SLA parameters will dictate how often the replication services 400 store and update the images of the various VDCs as well as
When cloud site one 102-1 goes down, these associated cus tomers may be noti?ed of the outage in a manner that has been 45
pre-arranged such as by e-mail, mobile text message, phone
customer basis), the enablement of business continuity ser
call etc.
Vices.
Initially, as indicated in FIG. 6, there is no change in the operational state of the other sites 102-2, 102-3. This may be
FIG. 4 is a diagram similar to that of FIG. 1 but illustrating
that to provide the requested business continuity services there are replication services put in place between the various cloud sites 102. For example, a ?rst replication service 400 1-2 operates to replicate information between site one 102 1-2 and site two 102-2, a second replication service 400-2-3 replicates data between site two 102-2 and site three 102-3, and a third replication service 400-1-3 replicates data between site one 102-1 and site three 102-3. These replication
services can be implemented using any convenient replica tion technology, but operate independently of the customers 104 and other operations of the sites 102. FIG. 5 shows the outcome of implementing these replica
because the other sites are operating normally, or some time is 50
VDCs for site one customer one 104-1-1 and site one cus
tomer n 104-1-n are brought online at cloud site two 102-2. Also at this time the VDCs for site one customer two are 55
to customer one 104-1 at cloud site one 102-1 will eventually
brought on online at cloud site three 102-3. With these VDCs
brought back in production mode, the customers 104-1, 104-2 are again sent a notice, this time that their VDCs (e.g., VDCs 110-2-1-1-1, 110-2-1-1-4, . . . , 110-2-n-1-4, 110-3-1-2-2) are
brought back online at the respective alternate sites 102-2, 60
tion services. As mentioned above, customer one 104-1 has
requested that his VDC 1 and VDC 4 have business continuity services enabled; likewise customer two 104-2 has requested that only his VDC 4 be subjected to the business continuity service. As a result, due to the operation of replication ser vices 400-1-2 and 400-1-3, the VDC 1 and VDC 4 belonging
permitted for the outage to resolve itself at the primary site. However, eventually a point is reached in FIG. 7 where the
65
102-3. In a next state, as shown in FIG. 8, a point might be reached where site one 102-1 again comes back online. At this point, site one 102-1 is not yet hosting any production VDCs as it does not yet have access to the information needed to bring them back online, and therefore customer one 104-1 and customer two 104-2 continue to have theirVDCs hosted from
the alternate locations 102-2, 102-3.