How we test sound for reviews - a GadgetGuy Guide

A reader has asked about our use of sound signature in reviews for headphone, TV, soundbar, speakers, smartphones and other sound devices. The answer is that a sound signature is a consistent way to describe a devices ability to reproduce sound. By comparison, a reviewer’s subjective comments from listening to their favourite soundtracks, although important, can vary wildly.

Why sound signature? Audio quality depends on so many things – the bit rate, sample rate, file format, and speaker construction. It also depends on the ability of the encoder to get the important bits right and the reviewer’s ears!

Signature 101 and why we use it to rate sound devices

Frequency response – the human hearing range

Humans generally hear frequencies from 20Hz (bass) to 20kHz (treble). As you get older, the top end usually progressively falls off – it is not uncommon for the elderly to only be able to hear to as low as 3kHz. That is why hearing aids generally don’t boost volume, but boost lost frequency ranges.

But sound is a combination of hearing (tones and harmonics within the eardrum) and feeling (subjective impressions that depend on musical associations, your mood at the moment, and a bunch of oh-so-human indefinables). Oh, and throw in spatial variables like sound stage/separation (left/right, up/down, forward/behind), echo (the reflection off walls), speed of sound and timbre and you wonder how we can hear at all via two small ear canals.

Well, we have the world’s most powerful non-AI computer to post-process, fill in the gaps and make the most of whatever we hear.

That frequency response covers:

Deep Bass: 16/20-40Hz – which you can often feel more in your body that you hear in your ears
Midbass: 40-100Hz – if this is intact, you will be getting just about all the musically important bass
Upper Bass: 100 to 200Hz – most small sound devices, like portable Bluetooth speakers, start here
Mid: 200-4kHz– this is where the action is, it covers the human voice and is the area where our ears are most sensitive … even as we age.
Upper Treble: 4-10kHz – this defines the character of the sound. Its absence makes the sound dull.
Dog whistle – top octave: 10-20kHz – you can’t generally hear this, but you know if it is missing. Its presence improves the sense of direction in the sound and provides a feeling of “air”, a reality as though the music were really there, rather than merely reproduced.

Six sound signatures describe the natural state of the sound device.

Of course, you can have a combination of two or more, and many have equaliser and sound profile apps that can change the signature entirely often resulting in ‘frankensound’.

Balanced: (bass boosted, mid recessed, treble boosted) also called V-shaped and the default on many devices – despised by audiophiles
Bass: (bass boosted, mid/treble recessed) – for bass music but can sound boomy or muddy compared to warm and sweet
Warm and Sweet (bass/mid boosted, treble recessed) – the nirvana for most music and movies
Mid: (bass recessed, mid boosted, treble recessed) – for clear voice
Bright Vocal (bass recessed, mid/treble boosted) – thought to be for vocal and string instruments, but actually makes them harsh
Analytical: (bass/mid recessed; treble boosted) – crisp but can be overly harsh and not pleasant for most music

We like to add a seventh – flat or neutral that neither adds nor subtracts from the native music, but this is rare – if it exists at all! Where possible, we test with the Equaliser (EQ) set to flat.

Flat means just that – and we seldom find it

So, if we say something is warm and sweet, you can count on it for good music.

Most sound devices are naturally mid-centric and use some form of psycho-acoustic trickery via ‘tuning’ the DAC (Digital Analogue Converter) or an EQ to boost specific frequency by several dB. This can add a little bass, mid or treble, but it is not the speaker’s native signature. We often refer to this as ‘synthetic sound’ – it is not bad, but it is not entirely natural. There is an excellent article here that delves into sound signature nuances although it uses slightly different terms.

Speaker types

Speakers range from small 4-6mm earphone transducers to monster cones. But we have yet to find a single speaker that can do it all – reproduce the full range of frequency response. Physics precludes this in loudspeakers, although some headphones and earphones can come closer to this ideal. Some headphones can get great bass because they have an excellent over-the-ear seal – less air to push. And this is because you need a larger speaker to push volumes of air for bass and a smaller, ‘shriller’ speaker for treble. So, in a decent soundbar or hi-fi system, you will have separate speakers (and amps) for bass (sub-woofer), mid, and treble (tweeters).

Then there’s the separate issue of surround sound. Typically

1.0 – mono speaker and some use passive radiators to increase bass or reflectors to increase 180-360° sound
2.0 – stereo speakers (L/R)
2.1 – L/R and subwoofer
3.1 – L/R, Centre (usually tuned for clear voice frequencies) and subwoofer
5.1 – Front L/R, Centre, front L/R (upwards-firing and tuned for spatial effects instead of reproducing front sound) and a sub-woofer
5.1.2 – As above plus L/R rear speakers (Minimum for Dolby Atmos)
7.1.4 as per 5.1.2 but with up-firing (or ceiling) rear speakers (tuned for spatial effects instead of reproducing rear sound)

The above cover the usual channels that a sound device can down-mix to. For example, if a 2.0 TV or soundbar claims Dolby Atmos compatibility, it means that it takes the 5/7.1.2/4 native signal and downmixes an approximation to the physical number of amps and speakers. Conversely, DTS:X up mixes (emulates) 1.0 or higher to the speakers capacity.

Most sound sources are better than most human hearing anyway

Definitions: (The higher, the better for all)

Bit rate is finished file size in kilobits per second and relates to audio quality.
Sample rate in kHz – the number of times per second the sound is sampled. 44.1kHz covers 5Hz -22.05kHz.
Bit depth – 8-bit (256 levels or 48dB), 16-bit (65,536 levels or 96dB), 24-bit (16,777,216 levels or 144dB), 32-bit or more is simply the granularity of data stored and relates to the dynamic range (dB). We perceive those lower bit depths as more noise. It is usually a simply hiss, but at low signal, levels can produce nasty effects. Those are usually dealt with by adding some dither noise.

Codec rates

MP3 uses a bit rate from 8-320kbps (typically 128kbps or ‘radio quality’) at a sample rate from 8-44.1kHz (generally 22kHz). It allows you to compress large music files to smaller sizes which are lossy (a smaller % sample of the original sound). For example, a typical MP3 produces a file size of 128 kilobits every second (they’re actually larger, thanks to metadata including album covers.)

AAC has a variable bit rate of 8-256kbps (typically 230kbps per channel) and a sample rate of 8-96kHz (almost always 44.1kHz). Widely used by Apple and can be less lossy. But it is only Bluetooth codec that makes uses of psychoacoustic modelling* to transmit data, so it’s a very processing-heavy codec.

CD sound requires 16-bit/44.1kHz (44,100 samples a second) sampling and 1,411kbps data rate. Some audiophiles comment that CD sampling only covers 80% of the original sound information.

DVD and Blu-ray audio are typically 24-bit/96 or 192kHz and cover almost all the original sound information. By comparison, telephone quality is from 200Hz-3.2khz and uses 8 or 12-bit and a 64-96kbps data rate so you can see voice frequency range is quite limited. Depending on your content type, the sound quality and frequency response vary. All tests should be at CD quality to be fair to the device. You can read more and listen to different bit rate clips here.

*Psychoacoustic modelling determines which sound won’t be heard. For example, some sounds within a few milliseconds of louder sounds – even if they come first – won’t be heard. Models are used to determine those; then the encoder abandons them. Conceptually it’s the same for all lossy compression systems: MP3, AAC, WAV and so on. Just some do it better than others.

Then there is the device interface type

Most devices have 3.5mm (cable) audio, RCA, optical Toslink, HDMI, USB, Thunderbolt 3, Bluetooth or Wi-Fi interfaces. 3.5mm (or analogue) audio inputs may make their way in pure analogue format to the speakers, but there’s no guarantee of that. Many devices simply convert analogue to digital for processing, before converting back to analogue for output.

In theory, pure analogue should be the best test of the speaker’s capability, but in practice, digital-to-analogue and analogue-to-digital conversion are so good that it’s largely indistinguishable from pure analogue, if competently performed.

A DAC takes the smooth analogue signal, samples it and plays back what it thinks the sound is. The higher the sample rate the more accurate the conversion.

But most music is digital and needs a chip to convert from digital to analogue (called a DAC or digital audio converter). These can vary enormously in quality and low and high filter capabilities.

Let’s remember that MP3 is 128-320kbps, and CD quality is 1,411kbps (1.411Mbps) (Source).

The difference is that MP3 – and other lossy compression systems – toss out the content that the psychoacoustic models judge to be inaudible. The higher the bitrate for a given codec, typically the better quality.

Bluetooth Codecs (typical or maximum rate with 44.1kHz quality):

Standard Bluetooth codec (Sub-band coding or SBC) is 127 (mono) to 328kbps (stereo)
aptX (mono/stereo) is 128/256/352kbps, aptX LL (low latency) is 352Kbps, aptX HD is 192/384/529Kbps and aptX Adaptive is 276-420kbps
Advanced Audio Coding (AAC) is 8-576kbps (stereo) but typically 256/320kbps over BT – good on iPhone (Apple AAC) but not so good on Android that uses the Fraunhofer AAC codec
LDAC is variable from 303/606/990kbps

Codecs also suffer latency (lag):

SBC: 150-250 ms (typically 175ms)
aptX: 130-180 ms (typically 166ms and aptX LL tries to keep this under 50ms)
AAC: 190-240 ms
LDAC: 160-210 ms
Wired <5-7ms

Interface speeds:

Bluetooth 1/2/3/4/5 is 1/25/25/50Mbps, but the codec will slow it down
USB 1/2 is 12/480Mbps, and 3/4 is 5/40Gbps
Thunderbolt 3 is 20/40Gbps
Optical is 3.072Mbps
HDMI and DisplayPort are 36.863Mbps
Ethernet up to 1Gbps
Wi-Fi ranges from 50Mbps to as high as AX 11Gbps

The bottom line is that you need to test as a cabled device (if possible) to get the native sound signature and then preferably use a high-res BT codec like LDAC and USB 3.0 to get the range of sound signatures.

Now back to where we started – sound signatures

Sorry for the long tome but merely listening to a favourite music track does not cut it – it is plain wrong for all the reasons above. We use a tone generator to see the original signature of the speaker. We measure this via a frequency response meter. There are tone generators for Window, Android, macOS and iOS. Here is a YouTube tester:

BUT, in deference to all reviewers who may have better ears than I, here are the tracks I always use

The Blue Brothers Peter Gunn Theme

Those magnificent trumpets over a deep, bass, backbeat – Just the facts ma’am! I could listen to Blues Brother Jazz all day long.

Next track is the Beach Boys Fun, Fun, Fun

It is a vocal track with electric guitars and synthesiser behind it.

Finally, Manhattan Transfer Twilight Zone

It mixes voice and heavy bass as well as using the complete directional sound stage. Here is a 432kHz version (almost high res).

GadgetGuy’s take

A sound signature is the best guide to what speakers sound like. If you describe food, you could say salty, sweet, bitter, sour, soft, hard, mushy, like chicken etc. Avoid any reviews that simply talk about their favourite tracks. I hope you can see that its all a science! Sound is a very important category at GadgetGuy so we hope this helps you understand how you can rely on our reviews.