Ryuichi YAMAMOTO, Shinji SAKO, Tadashi KITAMURA (Nagoya Institute of Technology, JAPAN)
1. Motivation
2. The Idea
How to improve the robustness against polyphonic signals?
■
Audio-to-score alignment has been well studied but still has room for improvement on the robustness of its ON-LINE algorithm ■ Most popular dynamic programming approaches such as DTW and HMMs rely on their OFF-LINE optimization algorithms Some approximations must be required in ON-LINE settings
Delayed decision improves the reliability of estimating past score position compared with an instant decision
Label
Label
…
Future anticipation using tempo will be useful
How to find both score position and tempo?
We propose a delayed decision and anticipation framework that jointly optimizes score position and tempo based on Segmental Conditional Random Fields and Linear Dynamical System
Problem:
ON-LINE approximations sometimes result in worse than OFF-LINE
4. SCRFs with Tempo Model
Conditional Random Fields (CRFs) ■
Label
Label
Next
■
Next
■ ■
based on SCRFs
based on LDS
6. Experiments
Tempo dynamics
We test with various delay-time 0 s, 0.5 s, 1.0 s, 1.5 s The result with delay-time 0 s is a baseline Database MAPS RWC
Classic: Piano 60 recordings (about 4-hours) Jazz: Multiple instuments include percussions 50 recordings (about 3.5-hours)
Delayed decision Viterbi algorithm to find the reliable past score position
time
time
Tempo induction from the estimated path (Kalman Filter)
Future anticipation using the pastdecided score position and tempo
7. Results and Discussion EFFECT OF DELAY-TIME (CLASSIC)
90 80 70 60 50 40 30
+1.1 91.7
+11.4 80.3 78.8 +31.9 -5.6 46.9
0.0 s
0.5 s
-0.3
92.8 73.2
-7.4
1.0 s
92.5 65.8
1.5 s
DELAY-TIME
δ=100 ms
δ=300 ms
Proposed (OFF-LINE) : 96.4 (δ=300 ms)
ONSET RECOGNITION RATE
Tempo model
time
ONSET RECOGNITION RATE
Duration control
EFFECT OF DELAY-TIME (JAZZ)
90 80 70 60 50 40 30
+1.6 78.7
+19.2
59.5
60.2
+30.0
-0.7
80.3 54.7
-5.5
30.2 0.0 s
0.5 s
■
47.8
■
δ=300 ms
The results show improvement on both classical and jazz database ■ The results with a delay time get close to OFF-LINE results ■ A large delay time (over 1.0 s) cause the results to worsen in the small tolerance 1
Score position and tempo are jointly optimized minimizing the anticipation error Delayed decision can find reliable score position Reliable future position is anticipated using adaptively estimated tempo Our algorithm is intermediate between ON-LINE and OFF-LINE
Robust ON-LINE algorithm for score alignment based on a delayed decision and anticipation framework The combined model of SCRFs and LDS provides an unified framework to find both score position and tempo Our framework with a large delay time get better results in general
1.5 s
Proposed (OFF-LINE) : 87.0 (δ=300 ms)
■
■
79.6
DELAY-TIME
δ=100 ms
Segmental CRFs (SCRFs)
8. Conclusion
-6.9
1.0 s
Duration distribution
Advantages
Chord transition model Audio observation:
…
Segment-level Markov chain (NOT frame-level) Flexible feature design than classical Hidden Markov Models Chroma features, onset features, duration features, etc.
Delay-time
Chord
…
Features ■
5. Proposed Algorithm
time
time
Semi Markov
How to find a reliable current score position?
E.g. Instant Decision t=t+1
3. Segmental CRFs
Label
Vancouver Convention & Exhibition Centre May 26 - 31, 2013, Vancouver, Canada
ROBUST ON-LINE ALGORITHM FOR REAL-TIME AUDIO-TO-SCORE ALIGNMENT BASED ON A DELAYED DECISION AND ANTICIPATION FRAMEWORK
9. Future Direction ■ ■ ■
Dynamic optimization of delay-time Learning the model from real-data 1 Application to automatic accompaniment
“Ryry: Real-time Score Following and Automatic Accompaniment,” Demonstration movies are found on YouTube