We now have the full video of our piece "Dangum" that was presented at Listening Machines 2008 held at the Eye Drum Art Galleria in Downtown Atlanta :
http://vimeo.com/1727884
Here is a detailed description of the system: As mentioned in one of my previous posts, the system can improvise with a human mridangam player (Mridangam is the main percussion instrument in the South Indian Classical Music. Here is the link to the wikipedia entry).
The system has four components to it :
- The Onset detector
- Stroke Classifier
- Improviser Module
- Synthesizer Module
1. Onset Detector :
The onset detector module opens up the microphone input, gets what the human mridangam player plays and does an onset detection on it. After trying a number of different onset detection algorithms, we finally settled for the "spectral difference" method as described in this paper by Bello, JP et al. We construct a detection function with the spectral difference and do a peak picking on that. Once onsets are detected, a few number of samples following the onset is extracted and passed (Well, here "a few number" of samples kept changing with different settings and had to be constantly changed) to the Stroke Recognizer module.
2. Stroke Classifier:
The Stroke Classifier module gets the samples and classifies the kind of stroke. We used a SVM (Support Vector Machine) Classifier to do the task. The classifier was trained to identify 10 different strokes of the mridangam using 13 mfcc (mel frequency cepstral coefficients) feature vector. There are different strokes like Nam, Dhin, Chappu, Tha, Thi, Thom that can be played in the mridangam and this module classifies the stroke samples it got from the onset detector into one of these strokes.
We had earlier built a database of about 10,000 different strokes of different mridangam players (both professional artists and Atlanta mridangists). We used a part of this database to train the classifier. Once the stroke is identified, a sequence of stroke and the onset time is made and is sent to the Improviser module. The sequence of stroke and onset time resembles like this :
(Time in secs) 0.5 1.2 1.9 2.6
(Stroke) Nam Dhim Dhim Nam
Before getting into the Improviser module, it makes sense to discuss the database we built (if not for anything else, at least for the reason that I spent one full semester gathering data for this :) )
Database Description :
About 13 different recordings – 6 from professional mridangam players like
Palghat Mani Iyer,Trichy Sankaran , Guruvayur Dorai and Umayalpuram Sivraman were
used. Another 7 used were recording of me and Santosh Chandru, another mridangist at
GeorgiaTech.
All the recordings were normalized and had some external noise removed and
exported as .wav files. SonicVisualiser with aubio onset plugin was then used to find a
beat detection on all these .wav files and an annotated .txt file was got from that.
Audacity was used to import the .wav files and their corresponding annotated .txt files
and to do the manual annotation. In all, about 9200 strokes were labeled in either of the
13 following labels:
1 : nam
2 : dhin
3 : thi
4 : ta
5 : cha
6 : namm (nam with thom)
7 : dhim (dhin with thom)
8 : tha
9 : thom
10 : gum
11 : thlam
12 : namG (nam with gum)
13 : dhimG (dhin with gum)
Here is the confusion matrix generated using a SVM Classifier on the entire database
a b c d e f g h i j k l m <-- classified as 976 59 204 9 7 34 22 9 7 0 0 0 0 | a = 1 nam
54 898 223 0 11 1 83 7 1 0 0 0 0 | b = 2 dhin
179 194 1735 48 29 23 56 64 48 0 0 0 0 | c = 3 thi
40 33 283 340 4 5 11 24 7 0 0 0 0 | d = 4 ta
81 137 86 2 214 1 22 15 0 0 0 0 0 | e = 5 cha
26 13 99 0 3 174 10 19 36 0 0 0 0 | f = 6 namm
60 92 145 1 26 4 363 51 37 0 0 0 0 | g = 7 dhim
36 35 270 15 4 16 19 469 83 0 0 0 0 | h = 8 tha
14 11 102 2 4 3 26 93 408 0 0 0 0 | i = 9 thom
7 1 12 2 0 1 17 14 15 0 0 0 0 | j = 10 gum
3 1 5 0 1 0 3 0 4 0 0 0 0 | k = 11 namG
3 0 1 0 0 0 7 0 4 0 0 0 0 | l = 12 dhimG
6 0 2 0 0 0 20 0 1 0 0 0 0 | m = 13 thlam
3. Improviser Module
Once this receives the sequence of strokes and their onset times, this improviser module does a few basic "improvisation" to the stroke. Improvisations include adding strokes, doubling the speed, skipping strokes, slowing the speed. The improvisation the module choses at a particular instant is stochastic in nature and this gives an element of surprise to the human playing along.
4. Synthesizer Module
This is a PD(Pure Data) patch that waits for the "improvised stroke" to be sent to it. These improvised stroke sequence is sent from the "Improviser Module" as a OSC(Open Sound Control) command. Once it receives this command, it plays the sequence using its own electronic sound.
The entire call and response of the system follows a text file that is fed in as input to the system prior to the performance. This text file acts as a score file for the human and the system.
Monday, September 15, 2008
MTG becomes GTCMT
With new people, new funding and new collaborators, our group, the "Music Technology Group" has been changed to "Georgia Tech Center for Music Technology". I was fairly involved with the name selection process in the sense that me and my fellow lab mate Andrew Beck were involved with counting the votes that decided the center's name :) Some close competitor names were "Georgia Tech Music Labs", "Georgia Tech Music Research Center".
Meanwhile, each of the individual professors has a group for themselves now. Dr.Parag Chordia has named his group as "Music Intelligence Group". Should wait and see what names Dr.Gil Weinberg and Dr.Jason Freeman come up with.
Meanwhile, each of the individual professors has a group for themselves now. Dr.Parag Chordia has named his group as "Music Intelligence Group". Should wait and see what names Dr.Gil Weinberg and Dr.Jason Freeman come up with.
Subscribe to:
Posts (Atom)