Data mining

Report 1 Downloads 186 Views
Interpretation of Data from Permanent Downhole Gauges, Using Data Mining Approaches Yang Liu and Roland N. Horne

ConocoPhillips, Stanford University

1 March 26, 2014

SUPRI-D

Table of Contents • • • • •

Introduction Kernelized Data Mining Application Performance Analysis Summary and Future Work

March 26, 2014

SUPRI-D

2

Table of Contents • • • • •

Introduction Kernelized Data Mining Application Performance Analysis Summary and Future Work

March 26, 2014

SUPRI-D

3

Permanent Downhole Gauges • Long time (up to 10 years), high frequency (1 measurement/10s) • Huge volume (>1 GB/year) • Noisy (< 1% noise)

M. Konopczynski and C. McKay, 2009

March 26, 2014

SUPRI-D

4

Pressure and Flow Rate Pressure (psi)

9000 8900 8800 8700 8600 260

280

300

320

340

360

380

340

360

380

Time (hours) Flow Rate (STB/d)

4

2.5

x 10

2 1.5 1 0.5 0 260

280

300

320

Time (hours) A typical pressure and flow rate data measured from permanent downhole gauge (PDG)

• Measurements were obtained and stored, but not used as much as they could be. • Hidden in the PDG data, there should be some useful information that would help to better describe the reservoir. March 26, 2014

SUPRI-D

5

Problems • The data are very noisy.

9000

Pressure (psi)

Pressure (psi)

• Commonly only buildup (zero flow rate) data are utilized. 8900 8800 8700 8600 260

280

300

320

340

360

380

8704 8703 8702 8701 8700 319.4 319.6 319.8

2 1

280

300

320

340

360

380

Flow Rate (STB/d)

Flow Rate (STB/d)

x 10

0 260

320.2 320.4 320.6 320.8

321

321.2 321.4

321

321.2 321.4

Time (hours)

Time (hours) 4

3

320

4

1.945

x 10

1.94 1.935 1.93 319.4 319.6 319.8

Time (hours)

320

320.2 320.4 320.6 320.8

Time (hours)

• Flow rate information are not • A predefined physical model is incorporated in the interpretation. required in advance. March 26, 2014

SUPRI-D

6

Not All “Noise” is Noise 9362

9470

pressure 9361.5

9460

9361

9450

9360.5

9440

flow

B

A

9360

9430

9359.5

9420 p q

9359

9410

A – flow event 9358.5

B – noise event

7 9400

Research Target (1) Data

Cleaned Data 5000

Pressure (psi)

Pressure (psi)

5000

4500

4000 0

20

40

60

80

100

120

140

160

180

200

4800 4600 4400 4200 0

20

40

60

Flow Rate (STB/d)

Flow Rate (STB/d)

60 40 20 0 20

40

60

80

100

120

100

120

140

160

180

200

140

160

180

200

Time (hours)

Time (hours)

0

80

140

160

180

200

Time (hours)

60 40 20 0 0

20

40

60

80

100

120

Time (hours)

• How to utilize the whole noisy data set to make interpretation? March 26, 2014

SUPRI-D

8

Research Target (2) Reservoir Model

9000

Pressure (psi)

Pressure (psi)

Data 8900 8800 8700 8600 260

280

300

320

340

360

380

10000

9000

8000

7000 0

20

Flow Rate (STB/d)

Flow Rate (STB/d)

x 10 2 1.5 1 0.5

280

300

320

340

60

80

100

120

100

120

Time (hours)

Time (hours) 4

0 260

40

360

380

Time (hours)

4

x 10 8 7.5 7 6.5 6 0

20

40

60

80

Time (hours)

• Reservoir model should be revealed without knowing it in advance. March 26, 2014

SUPRI-D

9

Data Mining Overview • Spam email classification • Hand written zip code identification • Google Translate • Data mining is the process of extracting patterns (reservoir models) from data (PDG data). March 26, 2014

SUPRI-D

10

Work Flow Pressure (psi)

𝑝(𝑡)

Flow Rate (STB/D)

71

𝑞(𝑡)

70.5

𝑝 = 𝑓(𝑞, 𝑡)

70 69.5 69

0

5

10

15

20

8700 8600 260

280

300

320

340

360

380

340

360

380

4

3

x 10

2 1 0 260

280

300

320

Time (hours)

Predict

Pressure prediction 3600

3500

3400

Pressure (psi)

Feed

8800

Flow Rate (STB/d)

𝑞(𝑡)

Training process

New flow rate history

8900

Time (hours)

PDG Data

Understanding of the reservoir (in data mining algorithm)

9000

𝑝 (𝑡)

3300

3200

3100

25

Time (hours)

3000

2900

0

5

10

15

20

25

Time(hours)

March 26, 2014

SUPRI-D

11

Table of Contents • • • • •

Introduction Kernelized Data Mining Application Performance Analysis Summary and Future Work

March 26, 2014

SUPRI-D

12

Key Elements of Data Mining Cost Function Model as pattern structure Underlying functional form sought from data.

𝑦 pred = h𝛉 𝐱 = 𝛉T ϕ 𝐱

Judging the quality of field model to data.

𝐽 𝛉 =

1 2

𝑚

h𝛉 𝐱

𝑖

− 𝑦 (𝑖)

2

𝑖=1

Optimization Method Minimizing cost function on training data set.

Conjugate Gradient Optimization

March 26, 2014

SUPRI-D

13

Linearity in Feature Space y  a0  a1z  a2z 2  a3z 3

y  θT x

 x 

1  x    z  Only capture the linearity • •

y  θ x  T

1    z   x    2  z  z 3    Capture the 3rd order nonlinearity

Performance: The more nonlinearity you want to capture, the more complex ϕ(𝐱) will be. How to run data mining in feature space without explicitly writing out ϕ(𝐱) ?

March 26, 2014

SUPRI-D

14

Kernel Function • • • •

Use kernel function to split the calculation and the learning process. Defined by inner product: K 𝐱, 𝐳 = ϕ 𝐱 T ϕ(𝐳). Calculate kernel function using 𝐱 and 𝐳 only. Calculating in low dimensional space (the space of 𝐱), while learning in high dimensional space (the space of ϕ(𝐱)).

Prove: 𝐓

𝟐

𝟐

𝐊 𝐱, 𝐳 = 𝐱 𝒛 = 𝒙𝟏 𝒛𝟏 + 𝒙𝟐 𝒛𝟐 + 𝒙𝟑 𝒛𝟑 = 𝒙𝟐𝟏 𝒛𝟐𝟏 + 𝒙𝟐𝟐 𝒛𝟐𝟐 + 𝒙𝟐𝟑 𝒛𝟐𝟑 + 𝟐𝒙𝟏 𝒛𝟏 𝒙𝟐 𝒛𝟐 + 𝟐𝒙𝟏 𝒛𝟏 𝒙𝟑 𝒛𝟑 + 𝟐𝒙𝟐 𝒛𝟐 𝒙𝟑 𝒛𝟑 = 𝒙𝟏 𝒙𝟏 , 𝒙𝟏 𝒙𝟐 , 𝒙𝟏 𝒙𝟑 , 𝒙𝟐 𝒙𝟏 , 𝒙𝟐 𝒙𝟐 , 𝒙𝟐 𝒙𝟑 , 𝒙𝟑 𝒙𝟏 , 𝒙𝟑 𝒙𝟐 , 𝒙𝟑 𝒙𝟑 𝒛𝟏 𝒛𝟏 , 𝒛𝟏 𝒛𝟐 , 𝒛𝟏 𝒛𝟑 , 𝒛𝟐 𝒛𝟏 , 𝒛𝟐 𝒛𝟐 , 𝒛𝟐 𝒛 𝟑 , 𝒛𝟑 𝒛𝟏 , 𝒛𝟑 𝒛𝟐 , 𝒛𝟑 𝒛𝟑 𝑻 Considering 𝐊 𝐱, 𝐳 = 𝛟 𝐱 𝐓 𝛟(𝐳), we have 𝛟(𝐱) = 𝒙𝟏 𝒙𝟏 , 𝒙𝟏 𝒙𝟐 , 𝒙𝟏 𝒙𝟑 , 𝒙𝟐 𝒙𝟏 , 𝒙𝟐 𝒙𝟐 , 𝒙𝟐 𝒙𝟑 , 𝒙𝟑 𝒙𝟏 , 𝒙𝟑 𝒙𝟐 , 𝒙𝟑 𝒙𝟑 𝑻 , and 𝛟 𝒛 = 𝒛𝟏 𝒛𝟏 , 𝒛𝟏 𝒛𝟐 , 𝒛𝟏 𝒛𝟑 , 𝒛𝟐 𝒛𝟏 , 𝒛𝟐 𝒛𝟐 , 𝒛𝟐 𝒛𝟑 , 𝒛𝟑 𝒛𝟏 , 𝒛𝟑 𝒛𝟐 , 𝒛𝟑 𝒛𝟑 𝑻 .

March 26, 2014

SUPRI-D

K 𝐱, 𝐳 =

2 T 𝐱 𝐳

𝐱 = 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 𝐓 , 𝐳 = 𝒛𝟏 , 𝒛𝟐 , 𝒛𝟑

𝜙 𝐱 =

𝐓

𝒙𝟏 𝒙𝟏 𝒙𝟏 𝒙𝟐 𝒙𝟏 𝒙𝟑 𝒙𝟐 𝒙𝟏 𝒙𝟐 𝒙𝟐 𝒙𝟐 𝒙 𝟑 𝒙𝟑 𝒙𝟏 𝒙𝟑 𝒙𝟐 𝒙𝟑 𝒙𝟑

We are making the calculation in 3dimensional space (the space of vector x), while running the learning algorithm in the 9dimensional space (the space of ϕ(𝐱)).

15

Convolution Kernel • Created in artificial linguistic study • Words comparison: “move” vs. “remove” – 𝑢𝑖 ∈ {𝑝𝑎𝑟𝑡𝑠 𝑜𝑓 "move"}: m, o, v, e, mo, ov, ve, mov, ove, move – 𝑣𝑗 ∈ {𝑝𝑎𝑟𝑡𝑠 𝑜𝑓 "remove"}: r, e, m, o, v, re, em, mo, ov, ve, rem, emo, mov, ove, remo, emov, move, remov, emove, remove – Compare parts using a given kernel: k(𝑢𝑖 , 𝑣𝑗 ) – Sum all kernels of parts to form a new kernel:

• K "move", "remove" =

10 𝑖=1

20 𝑗=1 𝑘(𝑢𝑖 , 𝑣𝑗 )

• PDG data: – Decompose the pressure transient into a series of pressure responses to the previous flow rate change events. – Treat all previous points as breakpoints so that no breakpoint detection is needed. – The superposition was then reflected as the summation of simple kernels evaluated on all parts (hence superposition over kernelization). March 26, 2014

SUPRI-D

16

Convolution Kernel (Method D)







j

i

K x (i ) , x ( j )   k x (ki ) , x l( j )



k 1 l 1



(i ) k

k x ,x Flow Rate (STB/d)

80

  x  x (i ) T k

( j) l

t 2i 

60 40

( j) l

𝒌𝒕𝒉 flow rate change

i 

q1

i 

q2

x (ki )

x i 

20 0

10

i  t 1 30 20

40

50

60

70

 qki      qki  log t ki     i  i    qk t k   i  i    qk t k 

Time(hours) March 26, 2014

SUPRI-D

Time elapsed from 𝒌𝒕𝒉 flow rate change

17

Training and Prediction • Training: 𝐊𝛃 = 𝐲, where 𝐊 𝑖,𝑗 = K 𝐱 𝑖 , 𝐱

𝑗

, and 𝛃 = 𝛽1 , 𝛽2 , … , 𝛽𝑚

T

• Prediction: 𝑦 pred =

𝑚 𝑖 𝑖=1 𝛽𝑖 K(𝐱

, 𝐱 pred )

• Essentially, we are using K 𝐱 𝑖 ,∙ as the basis to describe the PDG data space. • Conjugate gradient method to solve 𝐊𝛃 = 𝐲

March 26, 2014

SUPRI-D

18

Table of Contents • • • • •

Introduction Kernelized Data Mining Application Performance Analysis Summary and Future Work

March 26, 2014

SUPRI-D

19

Simple Synthetic Case: Training Data  Pressure (psi)

0

True Data Noisy Data

-500

-1000 0

20

40

60

80

100

120

140

160

180

200

Flow Rate (STB/d)

Time (hours) 80 60 40

True Data

20

Noisy Data

0 0

20

40

60

80

100

120

140

160

180

200

Time (hours)

Wellbore effect, skin, radial flow, and constant pressure boundary are all present. March 26, 2014

SUPRI-D

20

Simple Synthetic Case: Test Data  Pressure (psi)

0

Constant pressure boundary with wellbore effect and skin factor

-500

-1000 0

20

40

60

80

100

120

140

160

180

200

140

160

180

200

Flow Rate (STB/d)

Time (hours) 80 60 40 20 0 0

20

40

60

80

100

120

Time (hours) 0

 Pressure (psi)

 Pressure (psi)

-600

-700

-800

-900 0

10

20

30

40

50

60

70

80

-200 -400 -600 -800 0

20

40

60

80 75 70 65 60 0

10

March 26, 2014

20

30

40

50

100

120

140

160

180

200

140

160

180

200

Time (hours) Flow Rate (STB/d)

Flow Rate (STB/d)

Time (hours)

80

60

70

80

Time (hours)

SUPRI-D

60 40 20 0 0

20

40

60

80

100

120

Time (hours)

21

Simple Synthetic Case: Test Results 0

True Data

-100

Method D

Constant pressure boundary with wellbore effect and skin factor

 Pressure (psi)

-200

-300

-400

-500

-600

-700

-800

-900 0

20

40

60

80

100

120

140

160

180

200

Time (hours) 3

0

10

True Data -100

Method D  Pressure (psi)

 Pressure (psi)

-200

2

10

True Data True Data (Derivative) Method D 1

-300

-400

-500

-600

-700

Method D (Derivative)

10 0 10

1

10

-800

2

10

0

Time (hours)

March 26, 2014

20

40

60

80

100

120

140

160

180

200

Time (hours)

SUPRI-D

22

Complicated Synthetic Case

Wellbore effect, skin, radial flow, and constant pressure boundary are all present. True Data

-200

-400

Method D -600 -300

-800

 Pressure (psi)

 Pressure (psi)

-200

-1000 0

20

40

60

80

100

120

140

Time (hours)

160

180

200

True Data

Flow Rate (STB/d)

Noisy Data 80 60

-400

-500

-600

-700

40 20

-800

0

20

40

60

80

100

120

140

160

180

0

200

Time (hours)

3

20

40

60

80

100

120

140

160

180

200

Time (hours)

0

10

True Data -100

Method D  Pressure (psi)

 Pressure (psi)

-200

2

10

-300

-400

-500

-600

True Data True Data (Derivative) Method D 1

-700

Method D (Derivative)

10 0 10

March 26, 2014

-800 1

10

Time (hours)

0

2

10

SUPRI-D

20

40

60

80

100

120

Time (hours)

140

160

180

200

23

Semi-Real Case

Wellbore effect, skin, radial flow, and constant pressure boundary are all present. 0

True Data

True Data -500

Noisy Data

Method D

-200

-1000

 Pressure (psi)

 Pressure (psi)

0

-1500 0

20

40

60

80

100

120

140

160

180

200

Flow Rate (STB/d)

Time (hours) 80 60

-400

-600

-800

40

True Data

20

-1000

Noisy Data

0 0

20

40

60

80

100

120

140

160

180

200

-1200 0

Time (hours)

20

40

60

80

100

120

140

160

180

200

Time (hours)

3

10

0

True Data -100

Method D  Pressure (psi)

 Pressure (psi)

-200

2

10

True Data True Data (Derivative) Method D 1

March 26, 2014

-400

-500

-600

-700

Method D (Derivative)

10 0 10

-300

1

10

Time (hours)

2

10

-800 0

SUPRI-D

20

40

60

80

100

120

Time (hours)

140

160

180

200

24

Real Case (I) Real Data

0 -100

Real Data

-100 -200 -300 -400 332

334

336

338

340

342

344

346

348

350

352

Time (days) 4

x 10

Flow Rate (STB/d)

Method D

-150

 Pressure (psi)

 Pressure (psi)

-50

2 1.5 1

-200

-250

-300

-350

0.5

Real Data

0 332

334

336

338

340

342

344

346

348

350

-400 332

352

334

336

338

Time (days)

340

342

344

346

348

350

352

Time (days)

2

0

10

Method D

-1

-3

1

 Pressure (psi)

 Pressure (psi)

-2

10

0

10

-4 -5 -6 -7 -8

Method D 10

-9

Method D (Derivative)

-1 0

10

1

10

2

10

-10

3

10

0

Time (days)

March 26, 2014

20

40

60

80

100

120

140

160

180

200

Time (days)

SUPRI-D

25

Real Case (II) 0

-200 -100

-400 -600

 Pressure (psi)

 Pressure (psi)

100

0

Real Data

-800 500

550

600

650

700

750

800

Time (days) 4

Flow Rate (STB/d)

x 10 2

Real Data

1.5

-200

-300

-400

-500

-600

Real Data

1 -700

0.5

Method D

0 500

550

600

650

700

750

-800 500

800

520

540

Time (days)

560

580

600

620

640

660

680

Time (days)

2

100

10

0

-100

1

 Pressure (psi)

 Pressure (psi)

10

0

10

-1

10

-200

-300

-400

-500

-600

Real Data

Method D

Method D

Method D (Derivative)

-2

10

-700

0

10

1

10

2

10

-800 500

3

10

Time (days)

March 26, 2014

550

600

650

700

750

800

Time (days)

SUPRI-D

26

Table of Contents • • • • •

Introduction Kernelized Data Mining Application Performance Analysis Summary and Future Work

March 26, 2014

SUPRI-D

27

Performance Analysis • • • • • •

Outliers Aberrant segments Incomplete production history Unknown initial pressure Sampling frequency Evolution of learning

March 26, 2014

SUPRI-D

28

Performance Analysis • • • • • •

Outliers Aberrant segments Incomplete production history Unknown initial pressure Sampling frequency Evolution of learning

March 26, 2014

SUPRI-D

29

Outliers

6% of pressure and 3% of flow rate training data are outliers; 3% artificial normal noise everywhere. 0

True Data

0

True Data

-100

Noisy Data

Method D

-500

-200

-1000

 Pressure (psi)

 Pressure (psi)

500

-1500 0

20

40

60

80

100

120

140

160

180

200

Flow Rate (STB/d)

Time (hours) 80 60 40

-300

-400

-500

-600

-700

True Data

20 0

Noisy Data

-20 0

20

40

60

-800

80

100

120

140

160

180

200

-900 0

Time (hours)

20

40

60

80

100

120

140

160

180

200

Time (hours)

3

10

0

True Data -100

Method D  Pressure (psi)

 Pressure (psi)

-200

2

10

True Data True Data (Derivative)

-300

-400

-500

-600

Method D -700

Method D (Derivative) 1

10 0 10

March 26, 2014

1

10

Time (hours)

2

10

-800 0

SUPRI-D

20

40

60

80

100

120

Time (hours)

140

160

180

200

30

Aberrant Segments (I)

8% of pressure training data lay in an aberrant segment; 3% artificial normal noise everywhere. 0

True Data

Method D

-500

-200

 aberrant segment -1000 0

20

40

60

80

100

120

140

160

180

200

Flow Rate (STB/d)

Time (hours) 80 60 40

Noisy Data

0 0

20

40

60

80

-300

-400

-500

-600

-700

True Data

20

True Data

-100

Noisy Data

 Pressure (psi)

 Pressure (psi)

0

-800

100

120

140

160

180

200

-900 0

Time (hours)

20

40

60

80

100

120

140

160

180

200

Time (hours)

3

10

0

True Data -100

Method D  Pressure (psi)

 Pressure (psi)

-200

2

10

True Data True Data (Derivative)

-300

-400

-500

-600

Method D -700

Method D (Derivative) 1

10 0 10

March 26, 2014

1

10

Time (hours)

2

10

-800 0

SUPRI-D

20

40

60

80

100

120

Time (hours)

140

160

180

200

31

Aberrant Segments (II) 0

-200

-100

True Data

-400

-200

Method D

 aberrant segment

-600

-300

 Pressure (psi)

 Pressure (psi)

8% of pressure training data lay in an aberrant segment; 3% artificial normal noise everywhere. 0

-800 0

20

40

60

80

100

120

140

Flow Rate (STB/d)

Time (hours)

160

180

200

True Data Noisy Data

60 40

-400 -500 -600 -700 -800

20 0

-900

0

20

40

60

80

100

120

140

160

180

200

-1000 0

Time (hours)

20

40

60

80

100

120

140

160

180

200

Time (hours)

3

10

0

True Data -100

Method D  Pressure (psi)

 Pressure (psi)

-200

2

10

True Data True Data (Derivative)

-300

-400

-500

-600

Method D -700

Method D (Derivative) 1

10 0 10

March 26, 2014

1

10

Time (hours)

2

10

-800 0

SUPRI-D

20

40

60

80

100

120

Time (hours)

140

160

180

200

32

Aberrant Segments (III)

8% of pressure training data lay in an aberrant segment (removed); 3% artificial normal noise everywhere. 0

 Pressure (psi)

0

Method D

-400

-200

 Pressure (psi)

 data removed

-600 -800 0

20

40

60

80

100

120

140

Time (hours) Flow Rate (STB/d)

True Data

-100

-200

160

180

200

True Data Noisy Data

60 40

-300

-400

-500

-600

-700

20 -800

0 0

20

40

60

80

100

120

140

160

180

200

-900 0

Time (hours)

20

40

60

80

100

120

140

160

180

200

Time (hours)

3

10

0

True Data -100

Method D  Pressure (psi)

 Pressure (psi)

-200

2

10

True Data True Data (Derivative)

-300

-400

-500

-600

Method D -700

Method D (Derivative) 1

10 0 10

March 26, 2014

1

10

Time (hours)

2

10

-800 0

SUPRI-D

20

40

60

80

100

120

Time (hours)

140

160

180

200

33

Flow Rate (STB/d) Pressure (psi)

Incomplete Production History 500 0 -500

True Data Noisy Data

-1000 -1500

March 26, 2014

0

50

100

150

200

250

300

250

300

Time (hours) 100

True Data Noisy Data 50

0 0

50

100

150

200

Time (hours)

SUPRI-D

34

Effective Rate Correction 3

-400

10

True Data Method D

Pressure (psi)

Without effective rate correction

Pressure (psi)

-500

-600

-700

-800

-900

2

10

True Data True Data (Derivative) Method D Method D (Derivative)

-1000 1

-1100 100

120

140

160

180

200

220

240

260

280

300

10 0 10

Time (hours)

-500

Pressure (psi)

𝑡 (1) 𝑞1 = 𝑞eff

3

-600

-700

-800

-900

2

10

True Data True Data (Derivative) Method D Method D (Derivative)

-1000

-1100 100

1

120

140

160

180

200

220

240

260

280

Time (hours)

March 26, 2014

10

True Data Method D

-400

Pressure (psi)

𝑞eff =

𝑄 (1)

2

10

Time (hours)

-300

With effective rate correction

1

10

SUPRI-D

300

10 0 10

1

2

10

10

Time (hours)

35

Table of Contents • • • • • • •

Introduction Kernelized Data Mining Application Performance Analysis Scalability Simple Kernel Methods Summary and Future Work

March 26, 2014

SUPRI-D

36

Summary •











The nonparametric data mining algorithms do not require any physical model or mathematical assumption ahead of time. As long as the algorithm puts all the possible features in the input vector, the data mining methods will find a suitable functional form in the high-dimensional space and thereby discover the most appropriate reservoir model in the process. The data mining approaches cointerpret the pressure and flow rate data simultaneously by utilizing both the pressure and the flow rate in the training process. This provides a way to make use of flow rate measurements that can now be recorded with some modern PDG tools. The data mining methods do not require constant flow rate, and utilize the whole set of variable flow rate PDG data. The procedures also work well in the absence of any shut-in periods, which are generally required for present analysis techniques. The data mining methods tolerate noise in the data set naturally. No denoising procedure is required in advance, and in fact the procedure provides a robust way of removing noise without removing reservoir response signal. The data mining method enables the flexibility of extracting reservoir model using other measured data from PDG, e.g. temperature. Among Methods A-D, Method D is preferable due to its good accuracy, stability, and no requirement in the knowledge of breakpoints in advance.

March 26, 2014

SUPRI-D

37

Future Work • • • • •

Unsynchronized data Other data source, e.g. temperature Multiphase flow Multiple wells Parallel computation

March 26, 2014

SUPRI-D

38

Questions

SPE-147298 (2011)

SPE-166440 (2013) SPE-165346 (2013) Thank you March 26, 2014

SUPRI-D

39