Section A
Dynamic Time Warping is a robust algorithm for determining similarity between two timeseries sequences and can be applied to a large range of problems in the fields of shape recognition, handwriting recognition, and speech recognition. It allows for more flexibility than the Euclidean distance metric, and is used on data that has a large amount of variance. See the figure below for a visual depiction of how these two algorithms compare two similar timeseries by comparing individual data points. Note that the Dynamic Time Warping algorithm matches data points that we would consider to be most suitable for comparison, while the Euclidean matching algorithm is a bit more naive.
One modern application of this algorithm is the classification of animal speech, in particular the classification of birdsongs. In this problem you’ll work through the details of such a classification system.
A young bird learning birdsongs can be compared to a toddler learning their first language. Initially the bird experiments with vocal abilities without organization, but through observation of parents and other members of the local community the bird slowly refines its song until it can reproduce the tune nearperfectly. Due to the reliance on learning from other local birds, different communities of birds of the same species can have slightly differing birdsongs. These locality and age factors can make it difficult to accurately recognize a bird’s species from it’s birdsong, but one successful method of solving this problem is by applying Dynamic Time Warping.
To construct a classification system, researchers first collect a large amount of data concerning birdsongs across many species, ages, and localities and construct a representative database. For this simplified problem, our database is comprised of the two species below: a yellow wagtail and a skylark. The waveforms show the typical form of the birdsong, while the timeseries data represents the measurements that are actually in our database.
When a new birdsong from an unknown bird is recorded, the Dynamic Time Warping algorithm is applied to compare the new reading against birdsongs in the database, and the most similar bird is used as the classification label.
How do we actually perform the algorithm? Given two time series sequences A and B, a matrix of size (A x B) is initialized, and each element γ( i,j ) is computed using the following formula:
Fig 3: Application on example sequences A and B
Each element γ( i,j ) represents the total DTW distance if elements , are matched together. The most important value in this matrix is at the very top right, the value that indicates the total distance when all elements of both sequences have been matched; the total distance between A and B. To find which data points have been matched together, also known as the warping path or global alignment, you can work the algorithm backwards. Starting at the top right element, follow the neighbor that you used to calculate that element until you reach the bottom left. Given a tie in neighboring values, move diagonally, and continue to move diagonally until the tie is resolved. Every pair (i,j) in this path is the optimal pairing used to find the total distance.
Fig 4: Example of Warping Path
A1. Recorded birdsong is given to you with a frequency timeseries sequence of:
XAxis Point 
YAxis Points 
8 

8 

6 

7 

5 

3 

0 

9 

2 
XAxis Point 
YAxis Points 
8 

8 

6 

7 

5 

3 

0 

9 

2 
This naive implementation of DTW has a time complexity of ( · ) and a space complexity of ( · ), where N and M are the lengths of the two time sequences. This is one of the most cited reasons to not use DTW. Another common reason is that the algorithm may be too flexible. For example, it may be realistic to match one datapoint from A to three data points from B, but it would almost certainly be inaccurate to match one datapoint from A to half the data points from B. Take a look at the example paths below, where the green path is acceptable and the red path is not very useful. SakoeChiba bands are a prevalent solution to both these issues. In this section, you are going to compute the distance between two time series using this technique.
Fig 5: Comparison of DTW Paths
The path directly from the bottom left to top right of the above matrix represents a strict pairing where each point from A can be paired with only one point from B. This is inflexible, and a naive DTW allows us to disregard this constraint. However, SakoeChiba bands impose the constraint that any element on the optimal path cannot be too far from this strict pairing. Below are examples of SakoeChiba bands of degree N, where we will only evaluate the matrix elements a distance N away from the strict pairing line. Examine the examples below, where the gray boxes are disregarded in DTW calculations:
Fig 6: Visualization of SakoeChiba bands
B.1 Using the same timeseries data used in the previous problems, [8,8,6,7,5,3,0,9,2]:
XAxis Point 
YAxis Points 
8 

8 

6 

7 

5 

3 

0 

9 

2 
XAxis Point 
YAxis Points 
8 

8 

6 

7 

5 

3 

0 

9 

2 