MPEG-1 / MPEG-2 BC Audio - tu-ilmenau.de · Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de...

Post on 16-Oct-2018

217 views 0 download

Transcript of MPEG-1 / MPEG-2 BC Audio - tu-ilmenau.de · Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de...

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 1

MPEG-1 / MPEG-2 BC Audio

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 2

The Basic Paradigm of T/F Domain Audio Coding

Digital Audio Input

Filter Bank

Psychoacoustic Model

Bitstream Formatting

Signal to Mask Ratio

Quantized Samples

Encoded BitstreamBit or Noise

Allocation

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 3

MPEG Audio Standardization Philosophy (1)• Definition of a complete transmission chain

consists of specification of– Encoding algorithm– Bitstream format– Decoding algorithm

• ITU-T standardizes all three parts ⇒ Encoder output predictable

• MPEG standardizes only bitstream format anddecoder, not the encoder (“informative part”)

ITU-T Approach

MPEG Approach

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 4

MPEG Audio Standardization Philosophy (2)

• Motivation: open for further improvements, room for specific corporate know-how

• But: No sound quality guaranteed !

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 5

MPEG-1/2 Audio

• MPEG-1 Audio– Audio coding 32 - 48 kHz, mono/stereo– Layer 1, 2, 3– Layer-3 (aka .mp3) optimized for lower bit-

rates– Copy protection via SCMS included

• MPEG-2 Audio– Low sampling frequencies audio

add 16 - 24 kHz to Layer 1, 2, 3– Multichannel audio, BC

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 6

MPEG-1 Development History

IRT (MASCAM)

Philips

CCETT

Thomson Brandt (MSC)

AT&T (PXFM)

Fraunhofer-IIS UNI-Erl.(OCF)

MUSICAM

ASPEC

LAYER 1

LAYER 2

LAYER 3

Psychoacousticmodel 2 (optional)

simplified version

CNET

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 7

MPEG-1 Audio• Developed Dec. 88 to Nov. 92• Coding of mono and stereo signals• Bitrates from 32 kbit/s to 448 kbit/s• Three "Layers":

– Layer 1: lowest complexity– Layer 2: increased complexity and quality– Layer 3: highest complexity and quality at

low bit-rates• Target bitrates 384 kbits/s, 256 kbit/s,

< 192 kbit/s for Layers 1, 2, 3 respectively• Sampling frequencies supported:

– 48 kHz, 44.1 kHz, 32 kHz

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 8

The main building blocks• Perceptual model

– using psychoacoustics, mostly proprietary

• Filter bank– subdividing the input signal into spectral

components– more lines more coding gain– longer impulse response pre-echo artifacts

• Quantization and coding– this is the step introducing quantization noise– spectral shape of quantization noise

determines the audibility

– can be designed to leave encoding methods optional

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 9

MPEG Audio - Short Description of the Layers (1)

• Layer I– Frame length: 384 samples (8 ms@ 48 kHz)– Frequency resolution: 32 subbands– Quantization: Block-companding (12 samples),

amplitude of subband samples indicated by “scalefactors” (SCF); 2 dB resolution

• Layer II– Frame length: 1152 samples (24 ms@ 48 kHz)– Frequency resolution: 32 subbands– Quantization: Block-companding (12 samples)– Use of Scalefactor select information

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 10

MPEG Audio - Short Description of the Layers (2)

• Layer III– Standard frame length: 1152 samples (24 ms @

48 kHz)– Frequency resolution: 576/192 subbands– Quantization: non-uniform with Huffman coding– Use of Scalefactor Select Information

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 11

MPEG - Layer-1, -2 and -3 Compression: Header

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 12

MPEG-1 Layer 1

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 13

Block Diagram MPEG-1 Layer 1

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 14

MPEG Audio - Layer-1

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 15

MPEG Audio - Layer-1 (2)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 16

MPEG Audio Layer-1 Signal-to-Noise Ratios

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 17

MPEG-1 Audio Layer 1 Bitstream Syntax

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 18

MPEG-1 Layer 2

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 19

MPEG Audio - Layer-2 (1)

• Processing of 1152 sample long frames• 32 Sub-bands used; grouping in 3 granules of

12 subband samples• Layer II additionally offers coding of bit

allocation, scalefactors and samples• The number of necessary scalefactors can vary• Use of different windows• Theoretically Minimum of the Coder-/Decoder-

Delays around 35 ms

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 20

MPEG-1 Audio Layer 2 Bitstream Syntax

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 21

MPEG-1 Audio Layer 2 Bitstream Syntax (2)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 22

MPEG-1 Layer 3

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 23

MPEG Audio – Layer 3• MP3 is the nickname of the ISO/IEC-Standard

for audio compression described by ISO/IEC IS 11172-3 (MPEG-1 Layer-3) and 13818-3 (MPEG-2 Layer-3)

• The benefits of MP3-formats is that it is a headerless file format, which means that it is not necessary to have the header to play the music

• Allows MP3 streaming• Theoretic minimum delay of the Coder/Decoder

is around 59 ms

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 24

Block diagram Layer-332 MDCT‘s

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 25

MPEG Layer-3• Same basic configuration as Layer-2:

– Frame length 24 ms at 48 kHz– Polyphase filter bank

• Specifics of Layer-3:– Hybrid filter bank (32*18 = 576 subbands or

32*6=192 subbands)– nonuniform quantization (implicit noise

shaping) with a power law ( ^.75)

– Huffman coding– Analysis-by-Synthesis structure– Bitreservoir (short time buffer)– Support for variable bitrate (mandatory)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 26

Psychoacoustic mode

Block Size

Hybrid Filter bank

MPEG-1 Layer 3

• often used: “Psychoacoustic model 2”

• 1 Frame = 1152 samples = 36 samples per Polyphase filter bank subband

• Additional Modified Discrete Cosine Transform (MDCT), and aliasing reduction stage

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 27

Hybrid Filter Bank & Aliasing (1)

e.g. 32kHz e.g. ≈32Hz32 bands

6-18 bands

32*18=576 bands

• Filter bank is critically sampled• Problem of aliasing in the analysis filter bank

Aliasing in the subbands

32 bands6 or 18

pre-echo avoidance

better compression

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 28

Hybrid Filter Bank & Aliasing (2)

Nyquist frequencies/channel boundaries

ky

signal mirrored inside

3dB

aliasing cannot be neglected

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 29

Hybrid Filter Bank & Aliasing (3)

• Mirroring of the original signals greater than the nyquist frequency

– occurs on downsampling

Nyquist frequency pf

mirrored original

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 30

Problem of Aliasing in a Cascaded Filter Bank (1)• Result of cascading:

– Far off frequencies are aliased into the SB of the 2nd filter bank

Nyquist frequency

attenuation ofthe filter

subband of 1st FB

subbands of 2nd FB

signal aliased signal

first stagefilter bank

Observe:Aliasing spreadsover severalsubbands ofsecond stage!Not the case for only singlestage FB (practically)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 31

Problem of Aliasing in a Cascaded Filter Bank (2)• Frequency response contains peaks from the

aliasing:

• Signal in many subbands• Must be coded in many subband

– Worse coding efficiency

Reduced by alias reduction

passband

aliasing aliasing

not just onepassband, but several, onlyslightly attenuated

peaks continue

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 32

32bands

6/18bands

6/18bands

--

n

Aliasing Reduction Structure (MP3)similar approach as synthesis FBuses to reduce/cancel aliasing

(-1)n (1,-1,1,-1,…),high / low frequenciesare mirrored everysecond subband

suitable factors (fixed),obtained from frequency response of filters, for equal level of aliasing

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 33

Problem of Aliasing in a Cascaded Filter Bank (3)

• Solution:– Less downsampling in first stage (non-critical

sampling)– better filter– Aliasing reduction with subtraction from

neighboring bands (e.g.MP3)– no cascaded filter bank (e.g. AAC)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 34

Huffman Coding Details• Vector coding

– One codeword for 2 or 4 subbands

• Escape sequence– Large words are coded as sum of

ESC-pointer + difference

• Adaptive table selection (signal table number)– Table selection according to maximum– Table selection according to local statistics

• Adaptive sectioning (4 sections)– Each section defines Huffman table

• Coding efficiency approaches theoretical limits !

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 35

Layer-3 Iteration Loops

Rate/DistortionControl Process

ScaleFactors

Quantizer

NoiselessCoding

Iteration Loops

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 36

Possibilities to control the Quantization

• Two basic ways– Bit Allocation– Noise Allocation

• Detailed possibilities:– Bit allocation between fixed „Worst-Case“ and

„Maximum-SNR“ situations– Bit allocation is calculated from the Threshold

Estimation– Direct calculation of the allowed noise (Noise

Allocation)– Simplified „Noise Allocation“

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 37

Layer-3 : Outer Loop• Distortion Loop (control of the distortion)

• Saves the unquantized spectral values

• Compares the reconstructed values with the original

• Builds the actual distortion in the frequency domain

• Scaling by frequency groups with the amount of distortion

• Convergence of the iteration is not guaranteed

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 38

Layer-3 : Inner Loop• Rate Loop (Data rate controller)

• Entropy coding: Data rate depends on actual data set

• Buffer Control: Controls the necessary bits

• Convergence through iterations: is always possible

• Beginning level: Calculated from SFM (Spectral Flatness Measure)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 39

MPEG Audio - Layer-3: Bitstream• Organization of the bit streams• Fixed length of bytes: 17 at mono, 32 at

stereo, independent of the bitrate

• Constant Section– Header (ISO Standard, like with Layer-1 and -2)– Additional information for a frame (e.g. Pointer to the

variable section)– Additional information for each granule (e.g. Number

of the Huffman-Code table)• Variable Section

– Scalefactors– Huffman-coded frequency lines– Additional Data

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 40

MPEG Audio - Layer-3: Bitstream (2)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 41

MPEG Audio - Layer-3: Bitstream (3)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 42

MPEG Audio - Layer-3: Bitstream (4)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 43

MPEG-1 Audio Decoder

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 44

MPEG Audio – General Decoder Structure

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 45

MPEG - Audio Decoder Process (1) Layer-1 and -2 Decoder flow chart

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 46

MPEG - Audio Decoder Process (2) Layer-3 Decoder flow chart

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 47

MPEG - Audio Decoder Process (3) Synthesis of the subband filter

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 48

MPEG - Audio Decoder Process (4) Layer-3 Decoder Diagramm

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 49

Stereo Coding in MPEG-1/-2

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 50

MPEG Audio - Layer-1 and Layer-2

• For better stereo performance, Intensity Stereo (IS) can be applied

• With IS-Coding, a mono signal is transmitted in higher frequency bands and the decoder places it close to the original stereo position

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 51

MPEG Audio - Layer-3 (3)

• "Joint Stereo"-Coding to additionally improve compression: Mid/Side coding (M/S) and Intensity Stereo (IS)

• M/S coding: two channels are coded where left is actually the sum of the original left and right and the right channel is the difference

• Either both channels are separately coded ("stereo"- mode) or "Intensity Stereo"-Coding is used

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 52

MPEG-2 Audio

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 53

MPEG-2 Audio LSR extensions:• Run coder at half the sampling rate• Only minor changes required (e.g. tables)⇒ Increased frequency resolution⇒Less bitrate for side information⇒Better coding efficiency⇒Only one half of the workload required

(SW implementations!)• But: Transparent coding not possible due

due restriction in bandwidth (max. 12 kHz)• All current MP3 decoders support this mode

Technical Background

Implications

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 54

• Supports 3/2+1 multi-channel sound (ITU-R): Left, right, center, left surround, right surround

• Low frequency enhancement channel (LFE)

• “Downward compatibility”: Also 3/1, 3/0, 2/2, 2/1, 1/0 supported

• Multi-Lingual capabilityUp to 7 channels e.g. for different languages,commentary channel, “clean dialog” etc.

Sound Formats

L C R

LS RS

MPEG-2 MC Features

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 55

MPEG - 2 Audio MultichannelStructure of the ISO 13818- 3 Layer II multichannel extension, backwards compatible with ISO 11172- 3 Layer II

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 56

Annex: Abbreviations and Companies

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 57

Abbreviations and Companies (1)• AAC: Advanced Audio Coding• ASPEC: Adaptive Spectral Perceptual

Entropy Coding

• AT&T:American Telephone and Telegraph Company

• CCETT: Centre Commun d’Etudes de Télédiffusion et Télécommunication

• CNET: Research and Development Center of France Télécom

• FhG-IIS: Fraunhofer Gesellschaft/Institut fürIntegrierte Schaltungen (Erlangen)

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 58

Abbreviations and Companies (2)• IRT: Institut für Rundfunktechnik GmbH,

München, Research and Development Instituteof ARD, ZDF, DLR, ORF and SRG

• ITU-R: International Telecommunication Union – Radio Communication Sector

• MASCAM: Masking-pattern Adapted SubbandCoding and Multiplexing AT&T:AmericanTelephone and Telegraph Company

• MUSICAM: Masking-pattern Universal Subband Integrated Coding and Multiplexing

Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de Page 59

Abbreviations and Companies (3)

• NTT: Nippon Telegraph and Telephone Corp./Human Interface Laboratories

• Thomson: Thomson, Telefunken, Saba, RCA, GE, ProScan

• TwinVQ: Transform-domain Weighted Interleave Vector Quantization