哈尔滨理工大学学士学位论文 % hemean=swmean(he,10,5); % plot(t,hemean,'r');
% 画出谱平坦测度曲线 axes(handles.axes4); plot(t,msf);
axis([0 2210 -20 0]); % hold on;
% msfmean=swmean(msf,10,5); % plot(t,msfmean,'r'); axis tight
% 画出功率谱密度曲线 axes(handles.axes5); plot(t,psd); axis tight; % hold on;
% psdmean=swmean(psd,10,5); % plot(t,psdmean,'r');
function pushbutton6_Callback(hObject, eventdata, handles) global WAVALL FS player flag; if flag==1
resume(player); else
player=audioplayer(WAVALL,FS); play(player); flag=0; end
function pushbutton7_Callback(hObject, eventdata, handles) global player; stop(player);
% --- Executes on button press in pushbutton8.
function pushbutton8_Callback(hObject, eventdata, handles)
- 31 -
哈尔滨理工大学学士学位论文 global player flag; pause(player); flag=1;
- 32 -
哈尔滨理工大学学士学位论文 附录B
A MARKOV-CHAIN MONTE-CARLO APPROACH TO
MUSICAL AUDIO SEGMENTATION
Christophe Rhodes, Michael Casey ,Goldsmiths College, University of London
Queen Mary, University of London ABSTRACT
This paper describes a method for automatically segmenting and labelling sections in recordings of musical audio. Weincorporate the user’s expectations for segment duration asan explicit prior probability distribution in a Bayesian framework, and demonstrate experimentally that this method canproduce accurate labelled segmentations for popular music.
This paper describes a method for incorporating our prior expectations about the size of musical structures or segments into a system for producing labelled
segmentations
of
musical
audio.
Automatically-generated
segmentations have application in audio ?ngerprinting,content-based retrieval systems, summary generation and user interface provision for navigation in audio editors.Previous studies in segmentation have used various spectral features such as timbre or chroma to generate time series of feature vectors. We do not address the choice of audio features here, but instead examine the typical subsequent use of the series of vectors.Several studies [2, 5, 6, 7, 8] compute pairwise similar-ity matrices between feature vectors with some distance mea-sure for individual frames, then apply a ?lter of some form.Others
- 33 -
哈尔滨理工大学学士学位论文 (e.g. [4]) perform k-means clustering between frames,and then post-process this clustering by using an HMM withfewer states, or generate labels by using an HMM directly onthe feature vectors and then average over a window .These ?ltering or post-processing stages are introduced toreduce noise in the classi?cation, and to reintroduce the no-tion of temporal closeness which was lost in the clustering;ideally, the classi?cation would be informed of our expecta-tions and so not produce noise in the ?rst place, and would have temporal coherence built in.In our previous work [9] we have avoided the need fora post-processing stage by performing clustering with large (of the order of 3s) analysis windows, but this is an equally ad hoc way to address the expected scale of segment size, Abstract: An online model adaptation technique for music segmentation was proposed. A confidence measure derived from the recognition likelihoods was adopted for selecting the credible data. The selected data was then used for model adaptation. Compared to the pre-trained models, the adapted ones characterize the acoustic properties of the processing signals more accurately. It implies that higher segmentation accuracy can be achieved. A smoothing processing was applied to further reduce the short segment fluctuation errors from the recognition output. Experimental results show that the significant performance improvement due to the proposed algorithms.and does not address the issue of temporal closeness beyond This research was supported by EPSRC grant GR/S84750/01 (Hierar-chical Segmentation and Semantic Markup of Musical Signals).
Overview of our segmentation method.the coherence of an individual feature frame. We thereforeintroduce a prior probability distribution on the sizes of segments, and adjust the classi?cation algorithms to incorporatethis prior; this
- 34 -
哈尔滨理工大学学士学位论文 probability distribution explicitly encodes ourassumptions regarding
thesegmentation.This paper continues with a description of our method for generating labelled segmentations of audio in section 2; we present some empirical results in section 3, and draw conclusions in section 4.
The processing chain begins with a uniformly sampled monophonic audio signal (from a single channel, as far as that ispossible) and breaks it into a sequence of short overlappingfragments; our sample data was 16-bit mono at a 11.025kHzsample rate, and we used a window size of 400ms with ahop size of 200ms over these audio samples to generate aconstant-Q power spectrum with 112 th-octave resolution. Thispower-spectrum is log-normalized and 20 principal components are extracted, which along with the envelope magnitude form 21-dimensional feature vectors,
Our segmentation algorithm models a segment as a sequenceof samples of HMM state histograms drawn from a class-speci?c probability distributions, with the boundaries of thesegment being where the probability distribution changes; we can perform segmentation given a ?at priorprobability by performing deterministic annealing on
an Expectation-Minimization
optimization over frame label assignments and the class probability distributions.However, except for the case of overly large analysis win-dows (of the order of 3 seconds in [9]), this method with auniform prior p(c) fails to ?nd long segments correspondingto sections such as the verse or the chorus of songs; instead thesegmentations generated correspond to changes in low-levelaudio features themselves.In order to encode our interest in higher-level segmentsinto the segmentation algorithm, we incorporate an explicitnon-?at prior probability distribution p(c) on segmentationsinto the segmentation procedure. central
- 35 -