Zespół Przetwarzania Sygnałów [DSP AGH] - Example of a scientific...

1
Data CORPORA Polish speech database (Grocholewski, 1997) 114 short sentences each spoken by 45 males, females and children 5130 utterances totally 163 274 cases of Polish phonemes Some part was hand segmented into phonemes The rest were segmented automatically Statistics of phoneme durations in Polish were computed and presented. The concept of applying them for sentence modelling was considered and evaluated. Around 37% of sentence ends can be detected by analysis of phoneme length with only around 2.5% rate of false detections because phonemes __in the sentence ends tend to be longer then other ones. This observation can be used in speech recognition for automatic insertation of dots and sentence modelling. Results Conclusions Length of Phonemes in a Context of Their Positions in Polish Sentences Magdalena Igras, Bartosz Ziółko, Mariusz Ziółko AGH University of Science and Technology Krak ów, Poland {migras,bziolko,ziolko}@agh.edu.pl www.dsp.agh.edu.pl The research was funded by the National Science Centre allocated on the basis of a decision DEC-2011/03/D/ST6/00914 Polish phonemes durations statistics Preboundary lenghtening The statistics were collected from MLF (Master Label Files) For each of 37 Polish phonemes classes, mean and standard __duration values of duration were computed Phonemes duration were normalized: Mean value of duration in i th phoneme class Duration of n th phoneme from i th phoneme class i m i n i n d d d , , , "*/jc1m1001.lab" 7900000 8550000 a 0 100000 sil 8600000 9600000 sz 150000 850000 l 9650000 10300000 o 900000 1500000 u 10350000 10900000 w 1550000 2200000 b 10950000 11650000 y 2250000 2950000 i 11700000 12750000 p 3000000 3950000 si 12800000 13350000 l 4000000 5350000 cz 13400000 15450000 a 5400000 6400000 a 15500000 17850000 s 6450000 7150000 r 17900000 17950000 sil 7200000 7850000 d . An example MLF file 0 50 100 150 200 250 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 0 100 200 300 400 500 600 700 2 2 ) 2 ln ( 2 2 1 ) ( x e x x f Histograms of durations [ms] of example phoneme classes Average duration of Polish phonemes CORPORA/SAMPA notation standard of the phonemes No - number of appearance in database μ average duration [ms] (in brackets divided by sum of all average durations) dev - standard deviations - ratio between deviation and average Example relative phoneme durations within a sentence (blue squares) with a dynamic average of the ratio (red line with standard deviations as column markers) For each position i in each sentence we computed the mean value (μ) and standard deviation (σ) of the ratios dynamically, as the mean of ratios from position i = 1 to the current i - th position. Then we verifed, for which sentences the current value of μ, μ + σ, μ + 2 σ and μ + 3 σ was exceeded by latest one, two, three or four phonemes in the sentence. We checked also for which sentence the thresholds were crossed before the end of the sentence. An experiment to evaluate possibility of applying the preboundary lengthening measure for punctuation detection No of occurences Distribution of duration ratios of all phonemes Distribution of duration ratios of the last phonemes in the sentences Mean:1.54 88.5% Statistical approach to the phenomenon of preboundary lenghtening Lognormal distribution of phonemes durations 0 50 100 150 200 250 300 350 0 50 100 150 200 The linguistic knowledge and statistic parameters are important part of speech technology applications. Phoneme durations could be used effectively in speech modelling for both, text-to-speech and speech-to-text systems. Parameter of phoneme length is also often used in prosodic models for systems of automatic detection of punctuation and segmentation into sentences or topics. It was observed that speakers usually tend to slow down their speech toward the ends of utterance units. The phenomenon was descirbed for other languages, e.g. English, Hungarian, Czech, Japanese. Ratios of phonemes durations Mean durations and their standard deviations of phonemes from the database /a/ /r/ /sz/ The weighted average of phoneme duration was 104 ms. The average durations vary from 74 ms to 181 ms.

Transcript of Zespół Przetwarzania Sygnałów [DSP AGH] - Example of a scientific...

Page 1: Zespół Przetwarzania Sygnałów [DSP AGH] - Example of a scientific posterdo_druku_sigmap_poster.pdf · 2013-08-03 · Data CORPORA – Polish speech database (Grocholewski, 1997)

Data

CORPORA – Polish speech database (Grocholewski, 1997) 114 short sentences each spoken by 45 males, females and children 5130 utterances totally 163 274 cases of Polish phonemes Some part was hand segmented into phonemes The rest were segmented automatically

Statistics of phoneme durations in Polish were computed and presented. The concept of applying them for sentence modelling was considered and evaluated. Around 37% of sentence ends can be detected by analysis of phoneme length with only around 2.5% rate of false detections because phonemes __in the sentence ends tend to be longer then other ones. This observation can be used in speech recognition for automatic insertation of dots and sentence modelling.

Results

Conclusions

Length of Phonemes in a Context of Their Positions in Polish Sentences

Magdalena Igras, Bartosz Ziółko, Mariusz Ziółko AGH University of Science and Technology

Kraków, Poland

{migras,bziolko,ziolko}@agh.edu.pl

www.dsp.agh.edu.pl

The research was funded by the National Science Centre allocated on the basis of a decision DEC-2011/03/D/ST6/00914

Polish phonemes durations statistics Preboundary lenghtening

The statistics were collected from MLF (Master Label Files) For each of 37 Polish phonemes classes, mean and standard __duration values of duration were computed Phonemes duration were normalized:

Mean value of duration in ith phoneme class

Duration of nth phoneme from ith phoneme class

im

inin

d

dd

,

,,

"*/jc1m1001.lab" 7900000 8550000 a

0 100000 sil 8600000 9600000 sz

150000 850000 l 9650000 10300000 o

900000 1500000 u 10350000 10900000 w

1550000 2200000 b 10950000 11650000 y

2250000 2950000 i 11700000 12750000 p

3000000 3950000 si 12800000 13350000 l

4000000 5350000 cz 13400000 15450000 a

5400000 6400000 a 15500000 17850000 s

6450000 7150000 r 17900000 17950000 sil

7200000 7850000 d .

An example MLF file

/a/ /r/

0 50 100 150 200 2500

50

100

150

200

250

300

350

400

0 50 100 150 200 250 3000

100

200

300

400

500

600

700

2

2)

2

ln(

22

1)(

x

ex

xf

Histograms of durations [ms] of example phoneme classes

Average duration of Polish phonemes

CORPORA/SAMPA – notation standard of the phonemes No - number of appearance in database μ –average duration [ms] (in brackets divided by sum of all average durations) dev - standard deviations - ratio between deviation and average

Example relative phoneme durations within a sentence (blue squares) with a dynamic average of the ratio (red line with standard deviations as column markers)

For each position i in each sentence we computed the mean value (μ) and standard deviation (σ) of the ratios dynamically, as the mean of ratios from position i = 1 to the current i - th position. Then we verifed, for which sentences the current value of μ, μ + σ, μ + 2 σ and μ + 3 σ was exceeded by latest one, two, three or four phonemes in the sentence. We checked also for which sentence the thresholds were crossed before the end of the sentence.

An experiment to evaluate possibility of applying the preboundary lengthening measure for punctuation detection

No

of

occ

ure

nce

s

Distribution of duration ratios of all phonemes

Distribution of duration ratios of the last phonemes in the sentences

Mean:1.54

88.5%

Statistical approach to the phenomenon of preboundary lenghtening Lognormal distribution of phonemes durations

0 50 100 150 200 250 300 3500

50

100

150

200

The linguistic knowledge and statistic parameters are important part of speech technology applications. Phoneme durations could be used effectively in speech modelling for both, text-to-speech and speech-to-text systems. Parameter of phoneme length is also often used in prosodic models for systems of automatic detection of punctuation and segmentation into sentences or topics. It was observed that speakers usually tend to slow down their speech toward the ends of utterance units. The phenomenon was descirbed for other languages, e.g. English, Hungarian, Czech, Japanese.

Rat

ios

of

ph

on

emes

du

rati

on

s

Mean durations and their standard deviations of phonemes from the database

/a/

/r/

/sz/

The weighted average of phoneme duration

was 104 ms.

The average durations vary from 74 ms to

181 ms.