(19) United States

Report 0 Downloads 77 Views
US 20100333116A1

(19) United States (12) Patent Application Publication (10) Pub. No.: US 2010/0333116 A1 (43) Pub. Date:

Prahlad et al. (54)

CLOUD GATEWAY SYSTEM FOR

Publication Classi?cation

MANAGING DATA STORAGE TO CLOUD STORAGE SITES

(76) Inventors:

(51)

Anand Prahlad, Bangalore (IN); Marcus S. Muller, Tinton Falls, NJ

(US); Rajiv Kottomtharayil,

(52)

Marlboro, NJ (US); Srinivas

(57)

Kavuri, Miyapur (IN); Parag Gokhale, Ocean, NJ (US); Manoj Vij ayan, Marlboro, NJ (US) PERKINS COIE LLP PATENT-SEA PO. BOX 1247

(22) Filed:

Mar. 31, 2010

us. c1. ........................ .. 719/328; 709/216; 713/153

ABSTRACT

Systems and methods are disclosed for performing data stor

age operations, including content-indexing, containeriZed

ing HTTP and FTP. Methods are disclosed for content index ing data stored Within a cloud environment to facilitate later

searching, including collaborative searching. Methods are also disclosed for performing containeriZed deduplication to reduce the strain on a system namespace, effectuate cost

savings, etc. Methods are disclosed for identifying suitable

Related US. Application Data

(60)

(2006.01) (2006.01) (2006.01)

and/or packet loss, using various network protocols, includ

SEATTLE, WA 98111-1247 (US)

12/751,953

Int. Cl. G06F 9/44 G06F 15/167 H04L 29/06

deduplication, andpolicy-driven storage, Within a cloud envi ronment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud envi ronment that requires data transfer over Wide area networks, such as the Internet, Which may have appreciable latency

Correspondence Address:

(21) App1.No.:

Dec. 30, 2010

storage locations, including suitable cloud storage sites, for

Provisional application No. 61/299,313, ?led on Jan.

data ?les subject to a storage policy. Further, systems and

28, 2010, provisional application No. 61/221,993,

methods for providing a cloud gateWay and a scalable data object store Within a cloud environment are disclosed, along

?led on Jun. 30, 2009, provisional application No. 61/223,695, ?led on Jul. 7, 2009.

With other features.

130 115A

Client 195 Data

agent 130

165

Secondary storage computer



,

http/https/ftp protocols

Cloud storage site A

1 15B

device Client 195

Data

I

agent

.

Cloud storage site B 165

2

Secondary

'

storage computer

130

device

115N

Client 1 Data

agent

Cloud storage site N

Patent Application Publication

“@59206

Dec. 30, 2010 Sheet 1 0f 33

m:

U@39206

wn on? mm?

E26

Ema Ewm

on?

Ema “com

E26

om; mm?

E96

Ema Em m

Patent Application Publication

245

US 2010/0333116 A1

Dec. 30, 2010 Sheet 2 0f 33

105

150

storage manager

I

235

233 l

: network

mgmt l

| agent 211 l 220 mgmt. : jobs

agent : 225 : interface r

'

1?,0

Index I agent

Chem

‘.

270

255

m t ea base

network Client

data

agent

agent

agent

l

I

L “ _ ' ' ' _ _ - ' - “ ,

130 client

195

195

255

270

data

network client

meta

agent

agent

base

A

260

260

_l?“i/s£"e£e_____ secondary storage

__¢_______

261

-

165

-

secondary storage computing dev'ce

235

E

205

content indexing component

Network agent 299

V

deViCe

247

165

235

Network agent

'"dex

module

205

content lndexmg component

38 light

deduplication

*

secondary storage computing

299

tie-duplication module

240

240

Media file system agent 236 Cloud storage

Media ?le system agent 236 Cloud storage

submodule

submodule

297

-

A

297

Deduplication

database

,

115

"

115

Storage

Storage

Device

Device

(e.g., cloud

(e.g., cloud

storage site)

storage site)

FIG. 2

.

Deduphcatlon

database

Patent Application Publication

Dec. 30, 2010 Sheet 3 0f 33

340

Receive a ?le system request to write data to a target cloud

storage site

i

350

Add data associated with

received file system request to buffer

Buffer full?

Convert file system requests to vendor-specific API calls

ii

380

Transmit buffer using vendor specific API calls

Transmission successful?

FIG. 3A

US 2010/0333116 A1

Patent Application Publication

Dec. 30, 2010 Sheet 4 0f 33

US 2010/0333116 A1

300

c

> 310

Receive copy of an original data set from a file system 320 Index data 330

Deduplicate data and store deduplicated data on cloud

storage

( Return ) FIG. 3B

Patent Application Publication

Dec. 30, 2010 Sheet 5 0f 33

US 2010/0333116 A1

400

130

297

Client 1

Deduplication Database

Deduplication Module 410

tion Client 2

299 420

generation 425

430

Identi?er

Criteria

comparison

evaluation

130

1 15

.

Storage

Chent n

device

FIG. 4

Patent Application Publication

Dec. 30, 2010 Sheet 6 0f 33

US 2010/0333116 A1

502 chunk folder

504 ——>

metadata file

506 --——>

N file

508 ———>

S file

500

FIG. 5A 502

chunk folder 1 504 ———>

metadata file 1

506 —————>

N file 1

508 __—__>

8 file 1

510

U chunk folder 2 504

5151

—————-+

metadata file 2

506 —--—>

FIG. 5B

N file 2

Patent Application Publication

522

Dec. 30, 2010 Sheet 7 0f 33

524

522

524

Stream

Stream

Stream

Stream

Header 1

Data 1

Header 2

Data 2

520 ;

O

542

542

C1

5 544

522

. _ .

524

Stream

Stream

Header 11

Data n

FIG. 5 C

542

C0

US 2010/0333116 A1

C2

10 544

542

C3

15 544

542

- ~ -

C”

65 544

FIG. 5D

544

Patent Application Publication

Dec. 30, 2010 Sheet 8 0f 33

US 2010/0333116 A1

600

(

Prune

)

v

605

Receive selection of an archive ?le to prune v

610

Perform lookup of archive file

615 Does archive file have references out? 620 Delete the references out

archive files reference

by references out have other references in? 630

Prune archive files referenced by references out

635 Does archive file have references in?

640

Delete references in \

v

650

Prune archive file 645

Add reference to archive file to deleted archive file table

FIG. 6

655

Add deleted time stamp to archive file table

Patent Application Publication

Dec. 30, 2010 Sheet 11 0f 33

US 2010/0333116 A1

802

804

_>

Chunk_001 Metadata ?le ——>

806

Non-SI data

Metadata index ?le —>

808

Index to metadata file Container file 001

‘—>

B1

B2

B3

810

- - ~

Bn

Container file 002 -->

B1

B2

B3

811

' ~ '

Bn

+ Container index file 001_B1 0

I

001__B2 1

. _ .

812

002_B1 1

0O2_Bn O

805

_>

Chunk_002 Metadata file 807 Non-Si

——>

data

.

.

Link

Non-SI

Link

data

Metadata index file —>

——>

809

Index to metadata file

B1

B2

Container file 001

813

B3

Bn

B4

B5

---

Container index file

814

0011_B1 001o_B2 ._. 0011_Bn FIG. 8

Patent Application Publication

Dec. 30, 2010 Sheet 12 0f 33

US 2010/0333116 A1

900

905 Receive selection of a job

to be pruned v

932 entries in container 907

Determine archive file, volume folders, and chunk folders

corresponding to job i

index file corresponding 0 the container equa

to zero?

910

Delete metadata ?les and metadata index ?les in chunk

933

Delete container file

folders A

915

V

Access container file in chunk

folders

More container files in chunk

folders? 920 For the block in the container file, is its reference count

in primary table equal Free up space in container files?

Set corresponding entry in container index file equal to zero

W

Free up space in container files

More blocks in

V

l ‘

container file?

Return

FIG. 9

i

Patent Application Publication

Dec. 30, 2010 Sheet 13 0f 33

US 2010/0333116 A1

C Index content > 1010 Select copy of data set

1020

Identify content 1030

Update content index

C Return D FIG. 10

Patent Application Publication

Dec. 30, 2010 Sheet 14 0f 33

US 2010/0333116 A1

0:

-ENE

Patent Application Publication

1200

Dec. 30, 2010 Sheet 15 0f 33

1

US 2010/0333116 A1

Restore v

1 1205

Receive selection of a file to restore v

1210

Determine archive file ID and offset v

1215

Access secondary storage \

1220

Open chunk folder v

1225

Parse metadata file v

1230

Determine location of file from metadata v

1235

Open file v

1240

Restore ?le V

1

Return

FIG. 12

1

Patent Application Publication

Dec. 30, 2010 Sheet 16 0f 33

US 2010/0333116 A1

1300

1310

1320

1330

Archive File ID

File ID

Offset

AF1

F1 F2 F3

OF1 OFZ OF3

FN

OFn

1350

1370

1380

Archive File ID

Media Chunk

Start

C, J, Cycle, AF

M1, C1 M, C2 M2, C3

AF1, OF1, Size AF1, OF2, Size AF1, OF3, Size

FIG. 13B

1390

Patent Application Publication

Dec. 30, 2010 Sheet 17 0f 33

(

Search Index

) 1410

Receive Search Request 1420

Search Content Index 1425

Generate Search Results 1430

Get Next Search Result

Archived?

Retrieve Archived Content

More Results?

1460

Provide Search Results

(

Return

FIG. 14

1

US 2010/0333116 A1

Patent Application Publication

(m:

“@59206

Dec. 30, 2010 Sheet 18 0f 33

m:

@m32L0Bw

Recommend Documents