US 20100333116A1
(19) United States (12) Patent Application Publication (10) Pub. No.: US 2010/0333116 A1 (43) Pub. Date:
Prahlad et al. (54)
CLOUD GATEWAY SYSTEM FOR
Publication Classi?cation
MANAGING DATA STORAGE TO CLOUD STORAGE SITES
(76) Inventors:
(51)
Anand Prahlad, Bangalore (IN); Marcus S. Muller, Tinton Falls, NJ
(US); Rajiv Kottomtharayil,
(52)
Marlboro, NJ (US); Srinivas
(57)
Kavuri, Miyapur (IN); Parag Gokhale, Ocean, NJ (US); Manoj Vij ayan, Marlboro, NJ (US) PERKINS COIE LLP PATENT-SEA PO. BOX 1247
(22) Filed:
Mar. 31, 2010
us. c1. ........................ .. 719/328; 709/216; 713/153
ABSTRACT
Systems and methods are disclosed for performing data stor
age operations, including content-indexing, containeriZed
ing HTTP and FTP. Methods are disclosed for content index ing data stored Within a cloud environment to facilitate later
searching, including collaborative searching. Methods are also disclosed for performing containeriZed deduplication to reduce the strain on a system namespace, effectuate cost
savings, etc. Methods are disclosed for identifying suitable
Related US. Application Data
(60)
(2006.01) (2006.01) (2006.01)
and/or packet loss, using various network protocols, includ
SEATTLE, WA 98111-1247 (US)
12/751,953
Int. Cl. G06F 9/44 G06F 15/167 H04L 29/06
deduplication, andpolicy-driven storage, Within a cloud envi ronment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud envi ronment that requires data transfer over Wide area networks, such as the Internet, Which may have appreciable latency
Correspondence Address:
(21) App1.No.:
Dec. 30, 2010
storage locations, including suitable cloud storage sites, for
Provisional application No. 61/299,313, ?led on Jan.
data ?les subject to a storage policy. Further, systems and
28, 2010, provisional application No. 61/221,993,
methods for providing a cloud gateWay and a scalable data object store Within a cloud environment are disclosed, along
?led on Jun. 30, 2009, provisional application No. 61/223,695, ?led on Jul. 7, 2009.
With other features.
130 115A
Client 195 Data
agent 130
165
Secondary storage computer
‘
,
http/https/ftp protocols
Cloud storage site A
1 15B
device Client 195
Data
I
agent
.
Cloud storage site B 165
2
Secondary
'
storage computer
130
device
115N
Client 1 Data
agent
Cloud storage site N
Patent Application Publication
“@59206
Dec. 30, 2010 Sheet 1 0f 33
m:
U@39206
wn on? mm?
E26
Ema Ewm
on?
Ema “com
E26
om; mm?
E96
Ema Em m
Patent Application Publication
245
US 2010/0333116 A1
Dec. 30, 2010 Sheet 2 0f 33
105
150
storage manager
I
235
233 l
: network
mgmt l
| agent 211 l 220 mgmt. : jobs
agent : 225 : interface r
'
1?,0
Index I agent
Chem
‘.
270
255
m t ea base
network Client
data
agent
agent
agent
l
I
L “ _ ' ' ' _ _ - ' - “ ,
130 client
195
195
255
270
data
network client
meta
agent
agent
base
A
260
260
_l?“i/s£"e£e_____ secondary storage
__¢_______
261
-
165
-
secondary storage computing dev'ce
235
E
205
content indexing component
Network agent 299
V
deViCe
247
165
235
Network agent
'"dex
module
205
content lndexmg component
38 light
deduplication
*
secondary storage computing
299
tie-duplication module
240
240
Media file system agent 236 Cloud storage
Media ?le system agent 236 Cloud storage
submodule
submodule
297
-
A
297
Deduplication
database
,
115
"
115
Storage
Storage
Device
Device
(e.g., cloud
(e.g., cloud
storage site)
storage site)
FIG. 2
.
Deduphcatlon
database
Patent Application Publication
Dec. 30, 2010 Sheet 3 0f 33
340
Receive a ?le system request to write data to a target cloud
storage site
i
350
Add data associated with
received file system request to buffer
Buffer full?
Convert file system requests to vendor-specific API calls
ii
380
Transmit buffer using vendor specific API calls
Transmission successful?
FIG. 3A
US 2010/0333116 A1
Patent Application Publication
Dec. 30, 2010 Sheet 4 0f 33
US 2010/0333116 A1
300
c
> 310
Receive copy of an original data set from a file system 320 Index data 330
Deduplicate data and store deduplicated data on cloud
storage
( Return ) FIG. 3B
Patent Application Publication
Dec. 30, 2010 Sheet 5 0f 33
US 2010/0333116 A1
400
130
297
Client 1
Deduplication Database
Deduplication Module 410
tion Client 2
299 420
generation 425
430
Identi?er
Criteria
comparison
evaluation
130
1 15
.
Storage
Chent n
device
FIG. 4
Patent Application Publication
Dec. 30, 2010 Sheet 6 0f 33
US 2010/0333116 A1
502 chunk folder
504 ——>
metadata file
506 --——>
N file
508 ———>
S file
500
FIG. 5A 502
chunk folder 1 504 ———>
metadata file 1
506 —————>
N file 1
508 __—__>
8 file 1
510
U chunk folder 2 504
5151
—————-+
metadata file 2
506 —--—>
FIG. 5B
N file 2
Patent Application Publication
522
Dec. 30, 2010 Sheet 7 0f 33
524
522
524
Stream
Stream
Stream
Stream
Header 1
Data 1
Header 2
Data 2
520 ;
O
542
542
C1
5 544
522
. _ .
524
Stream
Stream
Header 11
Data n
FIG. 5 C
542
C0
US 2010/0333116 A1
C2
10 544
542
C3
15 544
542
- ~ -
C”
65 544
FIG. 5D
544
Patent Application Publication
Dec. 30, 2010 Sheet 8 0f 33
US 2010/0333116 A1
600
(
Prune
)
v
605
Receive selection of an archive ?le to prune v
610
Perform lookup of archive file
615 Does archive file have references out? 620 Delete the references out
archive files reference
by references out have other references in? 630
Prune archive files referenced by references out
635 Does archive file have references in?
640
Delete references in \
v
650
Prune archive file 645
Add reference to archive file to deleted archive file table
FIG. 6
655
Add deleted time stamp to archive file table
Patent Application Publication
Dec. 30, 2010 Sheet 11 0f 33
US 2010/0333116 A1
802
804
_>
Chunk_001 Metadata ?le ——>
806
Non-SI data
Metadata index ?le —>
808
Index to metadata file Container file 001
‘—>
B1
B2
B3
810
- - ~
Bn
Container file 002 -->
B1
B2
B3
811
' ~ '
Bn
+ Container index file 001_B1 0
I
001__B2 1
. _ .
812
002_B1 1
0O2_Bn O
805
_>
Chunk_002 Metadata file 807 Non-Si
——>
data
.
.
Link
Non-SI
Link
data
Metadata index file —>
——>
809
Index to metadata file
B1
B2
Container file 001
813
B3
Bn
B4
B5
---
Container index file
814
0011_B1 001o_B2 ._. 0011_Bn FIG. 8
Patent Application Publication
Dec. 30, 2010 Sheet 12 0f 33
US 2010/0333116 A1
900
905 Receive selection of a job
to be pruned v
932 entries in container 907
Determine archive file, volume folders, and chunk folders
corresponding to job i
index file corresponding 0 the container equa
to zero?
910
Delete metadata ?les and metadata index ?les in chunk
933
Delete container file
folders A
915
V
Access container file in chunk
folders
More container files in chunk
folders? 920 For the block in the container file, is its reference count
in primary table equal Free up space in container files?
Set corresponding entry in container index file equal to zero
W
Free up space in container files
More blocks in
V
l ‘
container file?
Return
FIG. 9
i
Patent Application Publication
Dec. 30, 2010 Sheet 13 0f 33
US 2010/0333116 A1
C Index content > 1010 Select copy of data set
1020
Identify content 1030
Update content index
C Return D FIG. 10
Patent Application Publication
Dec. 30, 2010 Sheet 14 0f 33
US 2010/0333116 A1
0:
-ENE
Patent Application Publication
1200
Dec. 30, 2010 Sheet 15 0f 33
1
US 2010/0333116 A1
Restore v
1 1205
Receive selection of a file to restore v
1210
Determine archive file ID and offset v
1215
Access secondary storage \
1220
Open chunk folder v
1225
Parse metadata file v
1230
Determine location of file from metadata v
1235
Open file v
1240
Restore ?le V
1
Return
FIG. 12
1
Patent Application Publication
Dec. 30, 2010 Sheet 16 0f 33
US 2010/0333116 A1
1300
1310
1320
1330
Archive File ID
File ID
Offset
AF1
F1 F2 F3
OF1 OFZ OF3
FN
OFn
1350
1370
1380
Archive File ID
Media Chunk
Start
C, J, Cycle, AF
M1, C1 M, C2 M2, C3
AF1, OF1, Size AF1, OF2, Size AF1, OF3, Size
FIG. 13B
1390
Patent Application Publication
Dec. 30, 2010 Sheet 17 0f 33
(
Search Index
) 1410
Receive Search Request 1420
Search Content Index 1425
Generate Search Results 1430
Get Next Search Result
Archived?
Retrieve Archived Content
More Results?
1460
Provide Search Results
(
Return
FIG. 14
1
US 2010/0333116 A1
Patent Application Publication
(m:
“@59206
Dec. 30, 2010 Sheet 18 0f 33
m:
@m32L0Bw