Jump to content

A cool article on the importance of methodology when conducting double blind listening tests...


Recommended Posts

  • Members

a tv channel does often food tests, cheap vs. mid price vs. expensive and compare them to make finally a statement if paying more is really worth it.

 

one time the compared fish sticks.

they did quality comparison on the incredients, a chemical lab analysis and other scientific stuff.

finally they had a family five do a blind eating test which they liked most.

 

than the surprising result came out, while all scientific tests showed the most expensive fish sticks to be the best, the test eater family liked the cheapest ones best.

 

was it really a surprise? no, the test family always bought and eaten the cheap brand tested, so they where used to it, and for them thats what was supposed how fish sticks should taste.

they never experienced the "better" quality, they only know it does not taste like what they used to, so they judged them worse.

 

such blind testing is very very subjective to the people testing, no matter if it is food taste or listening, therefore it is almost impossible to get it right

Link to comment
Share on other sites

 

such blind testing is very very subjective to the people testing, no matter if it is food taste or listening, therefore it is almost impossible to get it right

 

 

I think I see your point, but double-blind listening tests are not always really a matter of preferences and "getting it right" so much as they are about the ability to differentiate between the items under consideration and pick one from the other consistently. Can you hear a difference whenever you hear it? If so, you should be able to identify it, whether you "like" the sound or not, just as someone without color blindness can differentiate red from blue, regardless of which color they prefer.

 

If the test isn't set up properly, it's very easy to introduce opportunities for biases to creep into the experiment. It's also very easy to set up such an experiment with other problems that can skew the results one way or the other, but audio tastes don't always matter in such tests, although experience and knowing how to listen (and what to listen for) certainly do. I may like Example A better than Example B, but that's not really the question. It's whether or not I can consistently identify Example A and B when they're presented to me, even if I have no way of knowing which is which other than by the sound.

 

I used to love to take the "Pepsi challenge" whenever I'd go to the county fair. Even though they'd try to get you to pick Pepsi over Coke as the one you preferred, I knew very well how both soft drinks tasted, and would consistently identify them. It really doesn't matter which one I prefer - I can tell which is which by taste alone.

 

For the record, I prefer Coke. YMMV.

Link to comment
Share on other sites

  • Members

 

 

I used to love to take the "Pepsi challenge" whenever I'd go to the county fair. Even though they'd try to get you to pick Pepsi over Coke as the one you preferred, I knew very well how both soft drinks tasted, and would consistently identify them. It really doesn't matter which one I prefer - I can tell which is which by taste alone.

 

For the record, I prefer Coke. YMMV.

 

The Pepsi challenge was a flawed test from the beginning and it led to one of the biggest business blunders in American history when the Coca-Cola company decided to change their 100 year old formula. And we all know what happened next.

 

The thing they were not taking into consideration is that when presented with two blind taste test samples people tend to choose the sweeter one over the less sweet one. Pepsi tastes sweeter than Coke yet Coke has always outsold Pepsi.

 

I think that people have the same reaction when comparing compressed audio over non-compressed audio. When you slap a compressor or limiter over the mix buss it can sound bigger and louder. If you are not experienced in audio production then many people will choose the loud version over the non-compressed version. But that bigger and louder sound can become fatiguing over time.

 

I use the New Coke fiasco whenever I try to explain the loudness wars to people. I tell them to take the compressed version and a non-compressed version and listen over time. Then after you've lived with them for a while - make your decision. A lot of good music has been ruined because people unfamiliar with how humans perceive sound samples will make the wrong choice in the spur of the moment.

 

For the record, I actually liked New Coke. :lol:

Link to comment
Share on other sites

  • Members

 

If the test isn't set up properly, it's very easy to introduce opportunities for biases to creep into the experiment. It's also very easy to set up such an experiment with other problems that can skew the results one way or the other, but audio tastes don't always matter in such tests, although experience and knowing how to listen (and what to listen for) certainly do.

For the record, I prefer Coke. YMMV.

 

i also get your point, but isn't having experience and/or knowledge on something to be good at a test also a bias?

 

on one hand you have scientific facts which can be measured and proofen

on the other you have feeling and impressions which are dependent on experience and knowledge and which are not absolute and hard to compare

 

 

Link to comment
Share on other sites

  • Members

A/B comparisons in audio are a tough one because people can hear sounds on many different planes.

 

Audio engineers and musicians in general "usually" have a multifaceted ability to focus their hearing selectively. An example is a guitarist develops the ability to hear his own instruments and block all others out. This ability takes a good deal of ear training and many of the things heard can be shadowed by other sounds yet the mind creates the dotted lines in back of those sounds.

 

Engineers may be musicians and have developed this same kind of selective focus but its not a requirement, in fact I believe it can be a restrictive skill preventing them from hearing the mix as a whole vs hearing them as a bunch of individual parts.

 

I do believe the better engineers do have an added skill above normal listeners and many musicians. Because an engineers spends so much time manipulating sounds with various effects, he often develops the ability to what I call, hear what's between the notes. This can be distortion, dissonant harmonics, noise, reflections, pitch variations, digital artifacts etc. Many of these things are what prevent the individual parts from being as transparent within a mix as they could or should be.

 

A normal listener may indeed notice a difference between a mix that is loaded with these negatives over one that doesn't, but they may not know why one sounds better. Others may not notice any difference at all. It may be their mind so clouded with thoughts, hearing clearly isn't possible or they may simply block it like they would hearing music on a low fi playback system.

 

Musicians can usually hear more of a difference here because they become highly familiar with their playback systems and develop a better ear for hearing small details. They may know what impacts the instruments tonal qualities, but again, the ability to decipher the exact causes of good or poor quality may not be there basically from a lack of experience dealing with the causes of those issues.

 

This all comes back to the ability to hear differences in an A/B comparison. An engineer is likely to have better monitors which bring out details. But even if these details are small enough to be easily overlooked, it still comes down to this major fact.

 

The causes of bad audio is rarely caused by a singular issue. Most of the times its a collection of many smaller issues which add up to be a bigger problem. The inverse is true for good audio. If you have a collection of small improvements it can equal one big improvement. We may not even hear the improvement very well, but it may be that improvement can make a difference further down stream when heavier processing down stream is involved. Up sampling a wave file before applying audio effects is an example. If you up sample the bits are trimmed off in smoother quantities and you have less noticeable artifacts then you would using a lower sample rate.

 

Again, this is why A/B comparisons are one of the tougher things to quantify. When it comes to digital, you aren't dealing with analog waves. Its all about crunching numbers. If you do hear a difference then obviously there's a good deal of improvement, but not being able to hear the smaller details does "not" immediately disqualify its ability to provide benefits. It may be subtle enough to be part of a collective.

 

Of course it can be snake oil too. You'd want to have some way of determining is the improvement is real. If you can see the difference on an audio analyzer for example and have a way to verify the improvement is actually real, then it can be useful even if its too small to be audibly apparent to the ears.

Link to comment
Share on other sites

I have rarely been so horrified at the state of humanity as when I discovered there is a subset of the audiophile community that insists that blind testing makes a comparison of audio components invalid.

 

Is there really? :philpalm:

 

Again, if the tests aren't properly set up and conducted there are definitely opportunities for errors and problems (as the article in the OP points out), but to dismiss such tests out of hand - even when properly conducted - seems to be pretty closed-minded.

 

What is their "logic" or reasoning that they use in support of that position?

Link to comment
Share on other sites

  • Members
What is their "logic" or reasoning that they use in support of that position?

 

It was pretty Alice in Wonderland. Something to the effect that if you don't know what to listen for, you could miss it. Basically, the reason why you need a double blind test was the reason they were opposed to it. I think it really boiled down to the fact that their sacred cows (probably magic cables or something) were proved inaudible, and they *knew* they made a huge difference in *their* experience, therefore the test must be wrong.

Link to comment
Share on other sites

  • Members

"We know our ears/brains have very short term memory. The same can be said for our eyes. They tend to fill in some blanks for us. The only way an ABX test would truly work, is with a switch box and short passages of a song. Flip the switch back and forth after maybe a 15-20 sec sample. Then our brain and ears don't have time to forget. Listening to a whole song and taking time to switch inputs or cables is just too long to be meaningful and not fair to our ears/brains.

Think of HDTV's. All the decent ones look good by themselves until you put them side by side. Then the differences are very apparent. Black levels, color accuracy, noise, ect. The eyes/brain don't get a chance to forget or fill in some blanks when side by side. You can't do that with an ABX audsio test, so ABX in itself is flawed to some degree.

 

Common sense people...."

http://www.head-fi.org/t/486598/testing-audiophile-claims-and-myths/60#post_6640111

http://www.head-fi.org/t/486598/testing-audiophile-claims-and-myths/60

 

Link to comment
Share on other sites

  • Members

Same forum had to ban discussion of DBT from their cable forum.

 

"Due to the flame wars that erupt as a result, this, and the other forums (other than the Sound Science forum), are DBT and ABX-free zones and posts about either will be moved or deleted. See Jude's original post on the matter."

 

http://www.audioasylum.com/audio/dbt.html

 

"Why are DBT discussions not allowed?

Quite simply, the reason is that these topics rarely spark a productive exchange. While a vast majority of Asylum inmates are firmly in the middle ground, the topics of DBT and ABX tend to force polarization and quickly degrade into death spiraling flame wars.

Is DBT bad?

Some think so. Others do not. From a strictly scientific viewpoint, DBT has proved to be the only method that is generally accepted for determining the audibility of small differences between audio equipment and cables. It does work.

A problem exists when these tests are either done poorly or when specific results are extrapolated erroneously. DBT is also not necessary for determining personal preferences.

What's the difference between DBT and ABX?

DBT is a scientific methodology. Some believe that ABX is closer to a religion. ABX is actually a sub-set of DBT. Years ago, a group of audiophiles developed a box that was dubbed the ABX Double Blind Comparator. The purpose of the box is to allow fast switching between two things to be compared, A versus B.

Fast instantaneous switching has many advantages. One's audio memory is short. Research has shown that one is more accurate in detecting small differences when the time between test stimulus is reduced.

The controversy then becomes the question of what effect the ABX box may have on the audio signal. Is the box totally transparent? What effect does adding additional cabling have on the results? Proponents would argue that these issues are irrelevant and that the advantages of fast switching out weigh any possible problems that are non-existent anyway. Opponents simply roll their eyes and respond with, 'Get a clue'. Discussions get nastier quickly.

Isn't this being unscientific?

Again, some may think so. In reality, there are many methods for determining preferences and accuracy of audio components. Measurement data is far more accurate than one's ears. DBT is simply one procedure. DBT does a great job in removing bias from comparisons. However, DBT does not imply that differences do not exist, only that these differences in this test, are below the levels of general audibility.

Many people feel that the true character of individual components is only realized after long term listening and living with the component in question. These people would argue that it takes time to fully appreciate or understand certain subtle differences that exist in various audio components."

Link to comment
Share on other sites

I think the heart of the preceding post is this:

 

"From a strictly scientific viewpoint, DBT has proved to be the only method that is generally accepted for determining the audibility of small differences between audio equipment and cables. It does work.

A problem exists when these tests are either done poorly or when specific results are extrapolated erroneously. DBT is also not necessary for determining personal preferences."

 

The two statements I highlighted are key - yes, double blind tests are effective - but they have to be set up and conducted properly. Just because someone says they've conducted a DBT doesn't mean I'd automatically assume the results are accurate - I would want to know what the setup and methodology were at the very minimum. As your posts point out, if there's too much delay between comparisons, or any of a number of other potential issues with the test, it can skew the results and invalidate the accuracy of the testing.

Link to comment
Share on other sites

  • Members

There's that, which is fair irrespective of what subject the test is performed upon. However, there is also this kind of thing:

 

"Many people feel that the true character of individual components is only realized after long term listening and living with the component in question. These people would argue that it takes time to fully appreciate or understand certain subtle differences that exist in various audio components."

 

and

 

"Many people feel that the true character of individual components is only realized after long term listening and living with the component in question. These people would argue that it takes time to fully appreciate or understand certain subtle differences that exist in various audio components."

 

These statements appear (to me) to fall more into the area of "your test produces results I don't like, and therefore I reject it." Moreso considering how often the topic is something that measurably can not have an audible effect, like a network cable or pure unobtainium power cords or magic control knobs.

Link to comment
Share on other sites

  • Members

Pretty good article, but Meyer and Moran again... arrrgh! Their study has been debunked 7 ways to Sunday. I hope one day the audiophile community will get over that study once and for all. It does serve as a good example of how NOT to do a blind test.

Link to comment
Share on other sites

There's that, which is fair irrespective of what subject the test is performed upon. However, there is also this kind of thing:

 

"Many people feel that the true character of individual components is only realized after long term listening and living with the component in question. These people would argue that it takes time to fully appreciate or understand certain subtle differences that exist in various audio components."

 

and

 

"Many people feel that the true character of individual components is only realized after long term listening and living with the component in question. These people would argue that it takes time to fully appreciate or understand certain subtle differences that exist in various audio components."

 

 

Fair enough, but after they've been given sufficient acclimation time, and have owned and "lived with" the component in question for long enough to supposedly "know what to listen for", shouldn't they be able to identify it when they do hear it in a blind listening test if it really does make an audible difference? It's not as if we're asking someone who has never used them to identify the sound of their magic cable trestles and specially lacquered wood knobs or anything like that.

Link to comment
Share on other sites

  • Members

I know for a fact my ears change on a daily basis. I have days where my ears along with a clear mind between those ears can perceive very small details. Other days, My ear sensitivity may be lacking, possibly from listening to loud music, or my focus may be clouded because my mind is distracted.

 

I can see where both a short term and long term evaluation are needed to make testing valid.

 

You also have excellent audio test tools which can detect differences well below most peoples hearing thresholds. Manufacturers use them all the time. They easily eliminate all the human factors including the ears or personal preferences in one sample being better quality.

 

They cant convince you or your ears if something is preferable however. Everyones ear lobes, ear canals, etc are different. The best you can do is build something that has a broad enough range of quality that appeals to the greatest variety of ears.

 

The biggest problem again is defining "what is better" If you consider most peoples hearing begins before birth and is tuned to the mothers voice, then becomes educated by the sound heard after birth and growing up, the sounds of voices that make people happy or calm, or excited may be the ones which are most likely considered to be the best.

 

No two people have the same aural experiences so personal preferences will be different when you get down to small differences in an A/B comparison. Audio tools can tell you if one frequency response is more detailed, has a flatter response, or has lower distortion, but there are no tools to tell you if one sound is better. Well maybe there is. They do have biofeedback which can tell you if the mind likes what it hears but there's so many other influences involved, that testing is only good while its occurring and is unlikely to produce similar results on a repeated basis.

 

So again, I say A/B tests can produce different results because the mind and ears change on a regular basis so using both short and long term comparisons would be needed to come up with any worthwhile results. The mind is just to deceptive and can imagine things that may not exist which can bias any quick switch unless the differences are blatantly apparent. When the difference become smaller it will take testing under many different conditions and repeated comparisons before some individuals may decide they like one over the other.

 

A quick test may reveal a difference between the two, but it may take a much more then that to determine which is more suitable for that individual.

Link to comment
Share on other sites

  • Members

 

... but to dismiss such tests out of hand - even when properly conducted - seems to be pretty closed-minded.

 

 

not dismissing them at all, but someone should be taking the results always with a grain of salt.

as you pointed out later, always question the test setup, look for transparency where possible, is the test setup completely reproducible aso.

 

as IT professional working with big systems, i'm a lot confronted with performance problems and question number one is, are they reproducible and if so how to measure them to get reliable results.

 

being closed-minded at this might give you punctual great performance results, while the "endusers" might be still very unhappy :)

 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...