Features - Semantic Scholar

Report 3 Downloads 119 Views
Ryuichi YAMAMOTO, Shinji SAKO, Tadashi KITAMURA (Nagoya Institute of Technology, JAPAN)

1. Motivation

2. The Idea

How to improve the robustness against polyphonic signals?



Audio-to-score alignment has been well studied but still has room for improvement on the robustness of its ON-LINE algorithm ■ Most popular dynamic programming approaches such as DTW and HMMs rely on their OFF-LINE optimization algorithms Some approximations must be required in ON-LINE settings

Delayed decision improves the reliability of estimating past score position compared with an instant decision

Label

Label



Future anticipation using tempo will be useful

How to find both score position and tempo?

We propose a delayed decision and anticipation framework that jointly optimizes score position and tempo based on Segmental Conditional Random Fields and Linear Dynamical System

Problem:

ON-LINE approximations sometimes result in worse than OFF-LINE

4. SCRFs with Tempo Model

Conditional Random Fields (CRFs) ■

Label

Label

Next



Next

■ ■

based on SCRFs

based on LDS

6. Experiments

Tempo dynamics

We test with various delay-time 0 s, 0.5 s, 1.0 s, 1.5 s The result with delay-time 0 s is a baseline Database MAPS RWC

Classic: Piano 60 recordings (about 4-hours) Jazz: Multiple instuments include percussions 50 recordings (about 3.5-hours)

Settings Evaluation measure Onset detection tolerance δ

Onset recognition rate

Hop-size

10 ms

Model parameters

Tuned by a grid search

100 ms, 300 ms

Delayed decision Viterbi algorithm to find the reliable past score position

time

time

Tempo induction from the estimated path (Kalman Filter)

Future anticipation using the pastdecided score position and tempo

7. Results and Discussion EFFECT OF DELAY-TIME (CLASSIC)

90 80 70 60 50 40 30

+1.1 91.7

+11.4 80.3 78.8 +31.9 -5.6 46.9

0.0 s

0.5 s

-0.3

92.8 73.2

-7.4

1.0 s

92.5 65.8

1.5 s

DELAY-TIME

δ=100 ms

δ=300 ms

Proposed (OFF-LINE) : 96.4 (δ=300 ms)

ONSET RECOGNITION RATE

Tempo model

time

ONSET RECOGNITION RATE

Duration control

EFFECT OF DELAY-TIME (JAZZ)

90 80 70 60 50 40 30

+1.6 78.7

+19.2

59.5

60.2

+30.0

-0.7

80.3 54.7

-5.5

30.2 0.0 s

0.5 s



47.8



δ=300 ms

The results show improvement on both classical and jazz database ■ The results with a delay time get close to OFF-LINE results ■ A large delay time (over 1.0 s) cause the results to worsen in the small tolerance 1

Score position and tempo are jointly optimized minimizing the anticipation error Delayed decision can find reliable score position Reliable future position is anticipated using adaptively estimated tempo Our algorithm is intermediate between ON-LINE and OFF-LINE

Robust ON-LINE algorithm for score alignment based on a delayed decision and anticipation framework The combined model of SCRFs and LDS provides an unified framework to find both score position and tempo Our framework with a large delay time get better results in general

1.5 s

Proposed (OFF-LINE) : 87.0 (δ=300 ms)





79.6

DELAY-TIME

δ=100 ms

Segmental CRFs (SCRFs)

8. Conclusion

-6.9

1.0 s

Duration distribution

Advantages

Chord transition model Audio observation:



Segment-level Markov chain (NOT frame-level) Flexible feature design than classical Hidden Markov Models Chroma features, onset features, duration features, etc.

Delay-time

Chord



Features ■

5. Proposed Algorithm

time

time

Semi Markov

How to find a reliable current score position?

E.g. Instant Decision t=t+1

3. Segmental CRFs

Label

Vancouver Convention & Exhibition Centre May 26 - 31, 2013, Vancouver, Canada

ROBUST ON-LINE ALGORITHM FOR REAL-TIME AUDIO-TO-SCORE ALIGNMENT BASED ON A DELAYED DECISION AND ANTICIPATION FRAMEWORK

9. Future Direction ■ ■ ■

Dynamic optimization of delay-time Learning the model from real-data 1 Application to automatic accompaniment

“Ryry: Real-time Score Following and Automatic Accompaniment,” Demonstration movies are found on YouTube