Bitcoin UTXO Lifespan Prediction - CS 229

Report 3 Downloads 131 Views
Bitcoin UTXO Lifespan Prediction Robert Konrad

Stephen Pinto

[email protected]

[email protected]

Beginning Transaction



Histogram of Collected UTXO Lifespan Data

7000

Bob’s Unspent Transaction Output (UTXO)

Alice pays Bob 1 BTC

Only Bob has the key to spend this.

SVM with radial basis kernel

600

6000

500

5000

400

300

Ending Transaction

5

4000

200 4.5 4

Can we predict how long Bob will wait before spending his UTXO? Bob spends

100 3.5

3000

3

0 0

100

200

300

400

500

600

700

800

Hours

2.5 2

2000

his UTXO If so we could identify possible fraud, predict trading volume, predict price volatility, model individual spending habits, etc.

1.5 1 0.5

1000

0 0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Hours

0 0

0.5

1

1.5

2

2.5

3

2 ×10 4

3.5

Hours

4 ×104

90% of collected lifespans are less than 315 hours but the distribution tails out to 4.5 years.

Collecting All Bitcoin transaction history is publicly available and services like and provide an interface to access data and statistics. There are hundreds of millions of spent TXOs ready to serve as training and testing data. A script queries ’s API at a polite rate to steadily collect info from the effectively infinite spigot.

The Features: • • • • • • • •

Classifying

Labelling

2

0.009

1.8

0.008

1.6

0.007

1.4

×10 -3

1.2

×10

-3

Three ℓ1 clusters (less than three months, more than 1.5 years, and in between) fitted to either an Exponential, Laplace, or Gaussian distribution (whichever is most likely.)

1

0.8 0.006

1.2

0.005

1

0.004

0.8

0.003

0.6

0.002

0.4

0.001

0.2

0

0

0.6

0.4

0.2

0

Weekday of beginning transaction Unix time of beginning transaction Number of inputs to beginning transaction Number of outputs from beginning transaction Value of TXO Transaction volume on creation date BTC to USD exchange rate on creation date 2 ne order polynomial fit parameters of previous week’s BTC to USD exchange rate

Features

Maximum Likelihood Distribution for Each Cluster 0.01

1000

2000

0 0

5000

10000

0

2

Validating

4 ×10 4

Accuracy

Weekday of creation + Unix time of creation + Number of inputs to Begin TXN + Number of outputs from Begin TXN + Value of TXO in BTC + TXN volume on day of creation + Polynomial of BTC/USD rate + BTC/USD rate on day of creation

Numerical CDF of Fitted Distribution

training error testing error

0.14

The overall distribution defines ten equally 0 probable ranges to serve as the labels for SVM classification.

5000

Hours

10000

15000

Equal Probability Subdomains on CDF

70.64% 92.45% 92.64% 92.66% 92.66% 93.62% 93.62% 93.59%

Error vs training set size

0.16

Error Rate

Motivating

0.12 0.1 0.08 0.06 0.04 0.02

0

2000

4000

6000

8000

10000

12000

m (training set size) 0

50

100

150

200

250

300

Hours

Collect Data

Curate Features

k Means ℓa" Cluster

Fit Indiv. Distributions

Define n Equal Prob. Spaces

Optimize γ and C

Ready to run on new data!

14000