Monday, September 15, 2008

Dangum

We now have the full video of our piece "Dangum" that was presented at Listening Machines 2008 held at the Eye Drum Art Galleria in Downtown Atlanta :

http://vimeo.com/1727884

Here is a detailed description of the system: As mentioned in one of my previous posts, the system can improvise with a human mridangam player (Mridangam is the main percussion instrument in the South Indian Classical Music. Here is the link to the wikipedia entry).

The system has four components to it :
- The Onset detector
- Stroke Classifier
- Improviser Module
- Synthesizer Module

1. Onset Detector :
The onset detector module opens up the microphone input, gets what the human mridangam player plays and does an onset detection on it. After trying a number of different onset detection algorithms, we finally settled for the "spectral difference" method as described in this paper by Bello, JP et al. We construct a detection function with the spectral difference and do a peak picking on that. Once onsets are detected, a few number of samples following the onset is extracted and passed (Well, here "a few number" of samples kept changing with different settings and had to be constantly changed) to the Stroke Recognizer module.

2. Stroke Classifier:
The Stroke Classifier module gets the samples and classifies the kind of stroke. We used a SVM (Support Vector Machine) Classifier to do the task. The classifier was trained to identify 10 different strokes of the mridangam using 13 mfcc (mel frequency cepstral coefficients) feature vector. There are different strokes like Nam, Dhin, Chappu, Tha, Thi, Thom that can be played in the mridangam and this module classifies the stroke samples it got from the onset detector into one of these strokes.
We had earlier built a database of about 10,000 different strokes of different mridangam players (both professional artists and Atlanta mridangists). We used a part of this database to train the classifier. Once the stroke is identified, a sequence of stroke and the onset time is made and is sent to the Improviser module. The sequence of stroke and onset time resembles like this :
(Time in secs) 0.5 1.2 1.9 2.6
(Stroke) Nam Dhim Dhim Nam

Before getting into the Improviser module, it makes sense to discuss the database we built (if not for anything else, at least for the reason that I spent one full semester gathering data for this :) )

Database Description :
About 13 different recordings – 6 from professional mridangam players like
Palghat Mani Iyer,Trichy Sankaran , Guruvayur Dorai and Umayalpuram Sivraman were
used. Another 7 used were recording of me and Santosh Chandru, another mridangist at
GeorgiaTech.
All the recordings were normalized and had some external noise removed and
exported as .wav files. SonicVisualiser with aubio onset plugin was then used to find a
beat detection on all these .wav files and an annotated .txt file was got from that.
Audacity was used to import the .wav files and their corresponding annotated .txt files
and to do the manual annotation. In all, about 9200 strokes were labeled in either of the
13 following labels:
1 : nam
2 : dhin
3 : thi
4 : ta
5 : cha
6 : namm (nam with thom)
7 : dhim (dhin with thom)
8 : tha
9 : thom
10 : gum
11 : thlam
12 : namG (nam with gum)
13 : dhimG (dhin with gum)

Here is the confusion matrix generated using a SVM Classifier on the entire database

a b c d e f g h i j k l m <-- classified as 976 59 204 9 7 34 22 9 7 0 0 0 0 | a = 1 nam
54 898 223 0 11 1 83 7 1 0 0 0 0 | b = 2 dhin
179 194 1735 48 29 23 56 64 48 0 0 0 0 | c = 3 thi
40 33 283 340 4 5 11 24 7 0 0 0 0 | d = 4 ta
81 137 86 2 214 1 22 15 0 0 0 0 0 | e = 5 cha
26 13 99 0 3 174 10 19 36 0 0 0 0 | f = 6 namm
60 92 145 1 26 4 363 51 37 0 0 0 0 | g = 7 dhim
36 35 270 15 4 16 19 469 83 0 0 0 0 | h = 8 tha
14 11 102 2 4 3 26 93 408 0 0 0 0 | i = 9 thom
7 1 12 2 0 1 17 14 15 0 0 0 0 | j = 10 gum
3 1 5 0 1 0 3 0 4 0 0 0 0 | k = 11 namG
3 0 1 0 0 0 7 0 4 0 0 0 0 | l = 12 dhimG
6 0 2 0 0 0 20 0 1 0 0 0 0 | m = 13 thlam


3. Improviser Module
Once this receives the sequence of strokes and their onset times, this improviser module does a few basic "improvisation" to the stroke. Improvisations include adding strokes, doubling the speed, skipping strokes, slowing the speed. The improvisation the module choses at a particular instant is stochastic in nature and this gives an element of surprise to the human playing along.

4. Synthesizer Module
This is a PD(Pure Data) patch that waits for the "improvised stroke" to be sent to it. These improvised stroke sequence is sent from the "Improviser Module" as a OSC(Open Sound Control) command. Once it receives this command, it plays the sequence using its own electronic sound.

The entire call and response of the system follows a text file that is fed in as input to the system prior to the performance. This text file acts as a score file for the human and the system.

MTG becomes GTCMT

With new people, new funding and new collaborators, our group, the "Music Technology Group" has been changed to "Georgia Tech Center for Music Technology". I was fairly involved with the name selection process in the sense that me and my fellow lab mate Andrew Beck were involved with counting the votes that decided the center's name :) Some close competitor names were "Georgia Tech Music Labs", "Georgia Tech Music Research Center".

Meanwhile, each of the individual professors has a group for themselves now. Dr.Parag Chordia has named his group as "Music Intelligence Group". Should wait and see what names Dr.Gil Weinberg and Dr.Jason Freeman come up with.

Saturday, May 3, 2008

Listening Machines 2008

Listening Machines, the annual concert and exhibition organized by the Music Technology and Digital Media programs at Georgia Tech was held on April 24th, 2008 at the Eyedrum gallery located in Downtown Atlanta. The event showcases music and art projects that explore the creative space of human-machine interaction.

Alex Rae
and I presented our piece "Dangum". It is a piece involving human and the computer - an attempt at the machine musicianship as I spoke in my earlier blogs. Watch the video coverage of the concert at the youtube link below (an excerpt of the Dangum piece starts at 1:20) :

Listening Machines 2008


More about the piece in future posts !

Thursday, April 3, 2008

Carnatic Raag Classification

Part of my research project for the last semester was building a Carnatic raag database. Once a substantial raag database was ready, we ran a number of experiments to check how well PCDs (pitch-class distribution) and PCDDs (pitch-class dyad distribution) perform the task of classifying the raags. It turned out that PCDs and PCDDs are very effective for this task. We achieved a 92.5% classification accuracy on 30 target raags, using a Bayesian Classifier. This shows that though raags in Carnatic music are different from raags in North Indian Classical music - in melodic, presentation and ornamentation, PCDs and PCDDs are still effective ( Using PCDs and PCDDs for North Indian raag classification is described in Dr.Parag Chordia's ISMIR '07 paper)

The Database :
The table below shows the various raags and their corresponding scale degree:


The following list describes the database audio. Each entry is a separate audio file with its tonic frequency and artist information included in the file name. The list also mentions the type of each of the audio file and its duration.

(Artists Legends: AKC-AKC Natarajan; KVN-Palghat KV Narayanaswamy; DKP-DK Pattammal; DKJ-DK Jayaraman; Nedunuri-Nedunuri Krishnamurthy; SSI-Semmangudi Srinivasa Iyer; TNK-Prof TN Krishnan; MS-MS Subbulakshmi; GNB-GN Balasubramanian; MMI-Madurai Mani Iyer; Sanjay-Sanjay Subrahmanyan;TNS-Madurai TN Seshagopalan; NS-Neyveli Santhanagopalan; Ramani-Flute Ramani;Kadri-Kadri Gopalnath; Ravikiran-Chitraveena Ravikiran; Hari - Shenkottai Hari )

1-Karaharapriya-AKC-307.wav 6:38 Clarinet
1-Karaharapriya-KVN-254.wav 8:01 Male Vocal
2-Kalyani-DKP-308.wav 6:37 Female Vocal
2-Kalyani-Nedunuri-262.wav 7:04 Male Vocal
3-Todi-SSI-260.wav 6:22 Male Vocal
3-Todi-TNK-260.wav 4:15 Violin
4-Sankarabharnam-MS-379.wav 10:39 Female Vocal
4-Sankarabharnam-SSI-265.wav 18:55 Male Vocal
5-Shanmugapriya-NS-274.wav 14:35 Male Vocal
5-Shanmugapriya-Nedunuri-263.wav 1:58 Male Vocal
6-Nattakurinji-GNB-251.wav 10:43 Male Vocal
6-Nattakurinji-Sanjay-293.wav 11:55 Male Vocal
6-Nattakurinji-Sanjay-violin-293.wav 4:30 Violin
7-Kambhoji-Lalgudi-260.wav 7:50 Violin
7-Kambhoji-Nedunuri-260.wav 17:41 Male Vocal
8-Mayamalavagowla-NS-273.wav 0:42 Male Vocal
9-Keeravani-MMI-266.wav 13:24 Male Vocal
10-SimhendraMadhyamam-270.wav 10:31 Male Vocal
11-Khamas-GNB-254.wav 8:27 Male Vocal
11-Khamas-Hari-263.wav 14:28 Male Vocal
12-Hamsadwani-Nedunuri-263.wav 1:35 Male Vocal
13-Mohanam-DKJ-283.wav 15:49 Male Vocal
13-Mohanam-DKP-297.wav 6:23 Female Vocal
14-Bilahari-HydBros-279.wav 1:35 Male Vocal
14-Bilahari-Violin-279.wav 3:37 Violin
15-Nalinakanthi-Violin-283.wav 1:24 Violin
16-Sahana-TNS-254.wav 11:25 Male Vocal
17-Bhairavi-Ramani-323.wav 18:59 Bamboo Flute
17-Bhairavi-SSI-262.wav 6:35 Male Vocal
18-Sriranjani-Kadri-243.wav 3:40 Saxophone
19-AnandhaBhairavi-SSI-262.wav 5:33 Male Vocal
20-Atana-NS-272.wav 1:10 Male Vocal
21-Dwijavanti-SSI-266.wav 1:43 Male Vocal
22-Dhanyasi-KVN-267.wav 10:15 Male Vocal
23-Hindolam-Flute-305.wav 5:50 Bamboo Flute
24-Varali-Vocal-267.wav 4:30 Male Vocal
25-Reethigowlai-Ravikiran-1-263.wav 1:08 Chitraveena
25-Reethigowlai-Ravikiran.wav 1:30 Chitraveena
26-Abheri-NS-273.wav 1:40 Male Vocal
27-Madhyamavathi-Ramani-325.wav 5:55 Bamboo Flute
28-Kaanada-Flute-195.wav 1:15 Bamboo Flute
29-Purvikalyani-SSI-254.wav 10:11 Male Vocal
30-Pantuvarali-Sanjay-290.wav 6:19 Male Vocal


Pitch Tracking :
YIN algorithm was used to compute the Pitch Class Distribution (PCDs) of the audio files. Since the tonic frequency varies between recordings, all the audio files' tonic was manually annotated. The figures below show the discriminative power of a single scale degree. This is boxplot of scale degree D across all target ragas :



Here is the boxplot of the scale degree Eb :



Wednesday, February 13, 2008

Why machine musician ?

I'll first try giving my thoughts on this question before getting into explaining my work (This in fact is one question which many of my friends keep asking me everyday). This is my thought :

Over the years at high school, at my undergrad univ, at work and here at Georgia Tech, I have met so many people who have at some point of time learned 2-3 years of some kind of formal music - but never continued with that for different reasons. Taking music from these 2-3 years learning level to a concert performance level is definitely not easy. I feel that taking one's music to the next level needs collaboration - this is the time one needs to actually "jam" and practice with friends or fellow musicians. Not everyone gets the right people to do this. And here comes the need for a "machine musician" - A software application with which you can jam together, practice and produce good music. This application can listen to you, respond to you and even correct the mistakes in your music !

Friday, February 8, 2008

Machine Musician

My research with Dr.Parag Chordia is about using MIR(Music Information Retrieval) techniques to train the computer to listen and emulate a human musician. The research started with trying to make the computer listen and comprehend the different strokes(about 13 of them) of the mridangam, the primary South Indian Drum. It was a semester long effort and in the subsequent posts, I'll post the results and the explanation of that.

I am currently working on a live performance piece out of this to be presented at the Listening Machines concert here at Georgia Tech. It would be a 7-8 minutes piece in the traditional "tani avarthanam" or "jugal bandhi" style.