Bitcoin UTXO Lifespan Prediction Robert Konrad
Stephen Pinto
[email protected] [email protected] Beginning Transaction
…
Histogram of Collected UTXO Lifespan Data
7000
Bob’s Unspent Transaction Output (UTXO)
Alice pays Bob 1 BTC
Only Bob has the key to spend this.
SVM with radial basis kernel
600
6000
500
5000
400
300
Ending Transaction
5
4000
200 4.5 4
Can we predict how long Bob will wait before spending his UTXO? Bob spends
100 3.5
3000
3
0 0
100
200
300
400
500
600
700
800
Hours
2.5 2
2000
his UTXO If so we could identify possible fraud, predict trading volume, predict price volatility, model individual spending habits, etc.
1.5 1 0.5
1000
0 0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Hours
0 0
0.5
1
1.5
2
2.5
3
2 ×10 4
3.5
Hours
4 ×104
90% of collected lifespans are less than 315 hours but the distribution tails out to 4.5 years.
Collecting All Bitcoin transaction history is publicly available and services like and provide an interface to access data and statistics. There are hundreds of millions of spent TXOs ready to serve as training and testing data. A script queries ’s API at a polite rate to steadily collect info from the effectively infinite spigot.
The Features: • • • • • • • •
Classifying
Labelling
2
0.009
1.8
0.008
1.6
0.007
1.4
×10 -3
1.2
×10
-3
Three ℓ1 clusters (less than three months, more than 1.5 years, and in between) fitted to either an Exponential, Laplace, or Gaussian distribution (whichever is most likely.)
1
0.8 0.006
1.2
0.005
1
0.004
0.8
0.003
0.6
0.002
0.4
0.001
0.2
0
0
0.6
0.4
0.2
0
Weekday of beginning transaction Unix time of beginning transaction Number of inputs to beginning transaction Number of outputs from beginning transaction Value of TXO Transaction volume on creation date BTC to USD exchange rate on creation date 2 ne order polynomial fit parameters of previous week’s BTC to USD exchange rate
Features
Maximum Likelihood Distribution for Each Cluster 0.01
1000
2000
0 0
5000
10000
0
2
Validating
4 ×10 4
Accuracy
Weekday of creation + Unix time of creation + Number of inputs to Begin TXN + Number of outputs from Begin TXN + Value of TXO in BTC + TXN volume on day of creation + Polynomial of BTC/USD rate + BTC/USD rate on day of creation
Numerical CDF of Fitted Distribution
training error testing error
0.14
The overall distribution defines ten equally 0 probable ranges to serve as the labels for SVM classification.
5000
Hours
10000
15000
Equal Probability Subdomains on CDF
70.64% 92.45% 92.64% 92.66% 92.66% 93.62% 93.62% 93.59%
Error vs training set size
0.16
Error Rate
Motivating
0.12 0.1 0.08 0.06 0.04 0.02
0
2000
4000
6000
8000
10000
12000
m (training set size) 0
50
100
150
200
250
300
Hours
Collect Data
Curate Features
k Means ℓa" Cluster
Fit Indiv. Distributions
Define n Equal Prob. Spaces
Optimize γ and C
Ready to run on new data!
14000