Jump to content

A technical discussion on the mp3 format (not the illegal download debate)


Recommended Posts

  • Members

So can anyone tell me how I can explain this to a bunch of Sound Design students of mine in layman's terms?

 

To start with my proposed lecture I'll tell them: we commonly use PCM WAV as our recording format. Hence we can explain how a digital waveform is formed. So a CD has 44.1Khz Sampling rate, and a bit resolution of 16-bit. So that means the digital signal is actually composed of samples that are discrete. And in effect you have 44,100 samples in one second... now as far as bit-depth is concerned, that means a digital sample can be represented in 2^16 different positions (or volume levels), from -32768 to 0 to 32768.

 

I did lecture on amplitude, wavelength, and all those other physics stuff before.

 

But now, they want me to explain the mp3 format.

 

I am no computer programmer, but the way I understand it, mp3s have two basic parameters: bitrate and sampling frequency. Why does mp3 sound very close to CD quality despite its small size? I understand that the idea of mp3 is getting a collection of all digital samples, and picking up the samples that represent the loudest frequency among the co-existing frequencies in a particular time? I checked out www.howstuffworks.com but I still find it inadequate. I too still have a problem comprehending.

 

 

mp3-waves.gif

 

Can anyone help?

 

Link to comment
Share on other sites

  • Members

I'm not sure exactly how it works, but my understanding is similar to the pic you posted. Think of it kinda as a frequency gate. If there is a bunch of activity (volume) around 1K, the highs and lows are attenuated. Part of the loss is hidden by the masking of the louder frequecy response of the mids, but as you compress more and more you will begin to hear bits drop as more of the masked frequencies are attenuated.

Link to comment
Share on other sites

  • Members

From articles I've read in the past, my understanding is that the mp3 encoding performs 2 main operations:

1-It will compress data

2-It will remove data that seems to be less useful

 

Operation 1: Compress Data

--------------------------

To compress data is the easiest operation to do. It can be compared to zip a file so it gets smaller. There is no loss of data in this process.

 

Here is a naive way to look at compression. Let's say you have a text file you want to compress. In the text file, there is 20 times the word "recording". The encoder will replace "recording" by the character % and write a note in the encoded file that % means "recording". Since the word is repeated often, the size of the file is gonna be reduced. It is much more complex than that in reality, but that analogy helps to understand.

 

And since audio wave files have specific and common particularities, there are some optimal ways to compress their data. But that alone is not enough to get a 12:1 ratio on a wave file.

 

Operation 2: Remove useless data

--------------------------------

In audio data, there is a lot of stuff going on, and in a lot of different frequencies. But due to human hearing limitations, all of it cannot be actually heard, although all this information is kept in the wave file. Unlike the compression process previously discussed, there is some lost of audio data in this operation. From here, it will get a little more complex.

 

The mp3 encoder splits the audio track in a couple of frequency bands, often 32 bands. At each small interval of time in the song, it will check the loudness of these individual bands and also their relative relevance compared to the other ones. If there is something loud happening in some frequency bands, the encoder is gonna keep more bits of data for these. The other bands are gonna be allowed a lot less bits, since their information is less relevant.

 

So here is another cheap analogy. If you encode a wave recording of walksteps during a thunderstorm, the walksteps will sound less good when lightning strikes with loud noise since the footsteps noise is a lot less relevant sound at that period. But the footsteps are gonna sound just fine when there is no lightning strike.

 

This is complete vulgarisation, but I thought this could help you or anybody else.

 

CrazyEdo

Link to comment
Share on other sites

  • Members

 

Originally posted by wbcsound

I'm not sure exactly how it works, but my understanding is similar to the pic you posted. Think of it kinda as a frequency gate. If there is a bunch of activity (volume) around 1K, the highs and lows are attenuated. Part of the loss is hidden by the masking of the louder frequecy response of the mids, but as you compress more and more you will begin to hear bits drop as more of the masked frequencies are attenuated.

 

 

OK, so where does the difference in bitrates come in and why do higher bitrates sound more CD-like than the lower bitrates?

Link to comment
Share on other sites

  • Members

 

Originally posted by skunky_funk



OK, so where does the difference in bitrates come in and why do higher bitrates sound more CD-like than the lower bitrates?

 

 

 

sample frequency(44.1) * bit depth (16) * 2 (stereo)=Bits per second(1,411,200 for CD quality)

 

Reducing bitrate means more data compression at the sample frequency or the bit depth.

 

For an in depth definition, check out this site

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...