Zespół Przetwarzania Sygnałów [DSP AGH] - Example of a scientific...
Transcript of Zespół Przetwarzania Sygnałów [DSP AGH] - Example of a scientific...
Data
CORPORA – Polish speech database (Grocholewski, 1997) 114 short sentences each spoken by 45 males, females and children 5130 utterances totally 163 274 cases of Polish phonemes Some part was hand segmented into phonemes The rest were segmented automatically
Statistics of phoneme durations in Polish were computed and presented. The concept of applying them for sentence modelling was considered and evaluated. Around 37% of sentence ends can be detected by analysis of phoneme length with only around 2.5% rate of false detections because phonemes __in the sentence ends tend to be longer then other ones. This observation can be used in speech recognition for automatic insertation of dots and sentence modelling.
Results
Conclusions
Length of Phonemes in a Context of Their Positions in Polish Sentences
Magdalena Igras, Bartosz Ziółko, Mariusz Ziółko AGH University of Science and Technology
Kraków, Poland
{migras,bziolko,ziolko}@agh.edu.pl
www.dsp.agh.edu.pl
The research was funded by the National Science Centre allocated on the basis of a decision DEC-2011/03/D/ST6/00914
Polish phonemes durations statistics Preboundary lenghtening
The statistics were collected from MLF (Master Label Files) For each of 37 Polish phonemes classes, mean and standard __duration values of duration were computed Phonemes duration were normalized:
Mean value of duration in ith phoneme class
Duration of nth phoneme from ith phoneme class
im
inin
d
dd
,
,,
"*/jc1m1001.lab" 7900000 8550000 a
0 100000 sil 8600000 9600000 sz
150000 850000 l 9650000 10300000 o
900000 1500000 u 10350000 10900000 w
1550000 2200000 b 10950000 11650000 y
2250000 2950000 i 11700000 12750000 p
3000000 3950000 si 12800000 13350000 l
4000000 5350000 cz 13400000 15450000 a
5400000 6400000 a 15500000 17850000 s
6450000 7150000 r 17900000 17950000 sil
7200000 7850000 d .
An example MLF file
/a/ /r/
0 50 100 150 200 2500
50
100
150
200
250
300
350
400
0 50 100 150 200 250 3000
100
200
300
400
500
600
700
2
2)
2
ln(
22
1)(
x
ex
xf
Histograms of durations [ms] of example phoneme classes
Average duration of Polish phonemes
CORPORA/SAMPA – notation standard of the phonemes No - number of appearance in database μ –average duration [ms] (in brackets divided by sum of all average durations) dev - standard deviations - ratio between deviation and average
Example relative phoneme durations within a sentence (blue squares) with a dynamic average of the ratio (red line with standard deviations as column markers)
For each position i in each sentence we computed the mean value (μ) and standard deviation (σ) of the ratios dynamically, as the mean of ratios from position i = 1 to the current i - th position. Then we verifed, for which sentences the current value of μ, μ + σ, μ + 2 σ and μ + 3 σ was exceeded by latest one, two, three or four phonemes in the sentence. We checked also for which sentence the thresholds were crossed before the end of the sentence.
An experiment to evaluate possibility of applying the preboundary lengthening measure for punctuation detection
No
of
occ
ure
nce
s
Distribution of duration ratios of all phonemes
Distribution of duration ratios of the last phonemes in the sentences
Mean:1.54
88.5%
Statistical approach to the phenomenon of preboundary lenghtening Lognormal distribution of phonemes durations
0 50 100 150 200 250 300 3500
50
100
150
200
The linguistic knowledge and statistic parameters are important part of speech technology applications. Phoneme durations could be used effectively in speech modelling for both, text-to-speech and speech-to-text systems. Parameter of phoneme length is also often used in prosodic models for systems of automatic detection of punctuation and segmentation into sentences or topics. It was observed that speakers usually tend to slow down their speech toward the ends of utterance units. The phenomenon was descirbed for other languages, e.g. English, Hungarian, Czech, Japanese.
Rat
ios
of
ph
on
emes
du
rati
on
s
Mean durations and their standard deviations of phonemes from the database
/a/
/r/
/sz/
The weighted average of phoneme duration
was 104 ms.
The average durations vary from 74 ms to
181 ms.