Raspberry Pi Audio: DSP Filter thoughts

I'm reading lots about DSP-implemented filters...

Initial Post

This thread has a lot of useful thoughts from people who appear to be pretty clued-up.

Linear phase - all frequencies are delayed equally, so all bands are essentially in phase across all the outputs of a N-way crossover. However, I think (TBC) that the phase may shift across frequencies, it's just linear rather than non-linear... Also [1] says

"Linearity in phase response requires symmetry in impulse response."

which is of course the gotcha that brings on the pre-ringing - see below.

Minimum phase - frequencies are differentially delayed through a crossover, that's effectively how it works.

The choice of linear-phase (implemented using FIR approach) or minimum-phase (generally implemented using IIR approach) is potentially critical, although since I haven't heard the two compared in otherwise equal settings, I don't know how important it is to me.

The Rational Audiophile (RA) swears by linear-phase, FIR-based, time-aligned approach. Logically, this appears to make sense - there's no way that time alignment can be ignored for a start. However, strong FIR-based filters have a significant amount of pre-ringing, which comes from the mathematics and construction of the filter. Pre-ringing (essentially, some amount of "pre-echo"-like noise preceding an impulse) is possible because FIR filters essentially "look into the future" to perform their work i.e. if the current sample is X, then they will lookahead into the input from X for N samples where N is the "number of taps". Thus the future can affect the present - in the real world, the signal is delayed by (at least) N samples e.g. if sample rate is 44KHz, then a 44 tap filter (in practice, they are usually much larger) will cause a delay of 1msec (at least - actual processing takes real time too). So - the RA says he has a delay of 500msec, which implies FIR filters of maybe 22k taps, which is a Big Number. What happens to the pre-ringing? Is it audible?

There is a paper on this referred to by the forum posts above [1]. This claims

"The results of the listening experiment were analysed with auditory correlates of group delay distortion (phase errors) and smoothed third-octave spectrum (magnitude error). These correlates explain the results of the listening tests to some extent, but with high-order linear-phase FIR crossover filters, correlation seemed not to always exist. Thus auditory analysis that was based on the function of hearing was used for analysis. It seemed to show qualitatively the reasons for perceived phase errors.

It was discovered that high-order, linear-phase FIR crossover filters offer apparently ”ideal” properties in magnitude and phase reproduction for crossover filters, but they cause clearly audible degradations as ”ringing” in the audio samples, when the flight-time difference between low- and highpass outputs is not zero. The crossover frequency between low- and highpass bands being 3 kHz, it was noticed on the grounds of the listening experiments that filter orders above 600 produce audible errors with linear-phase FIR crossover filters. "

Hmm. Of course, I now need to read the paper, because I'm not sure what is meant by "flight-time difference", which I'm sure is explained within. "Filter orders above 600" refers to the number of taps - 22K is a HUGE number compared with that! The minimum number of taps for a specified db/octave roll-off is highly dependent on the frequencies involved too, so one could vary the taps for different bands, and then apply specific sample-count delays to those bands using smaller tap sizes.

Reading the Paper

The paper is a bit early postgrad, and written by someone whose first language is not English, although that's only an observation not a criticism, on the basis that I'm not sure if some of the ambiguities are intellectual or linguistic!

There's a great summary of the properties required of a crossover:

To conclude all these demands for crossover filters and their transfer functions, we end up with the following goals, as Linkwitz [6], and Lipshitz and Vanderkooy [7] did in their articles:
1. Flatness in the magnitude response. That is, the output signals from woofer and tweeter sum up to unity on the main listening axis; there are no dips or peaks at any frequency.
2. Adequately steep cutoff rates of the low- and highpass filters. This is to ensure that the drivers operate on their optimal range, and to minimize the interference between the drivers.
3. Phase difference is zero between the woofer output and the tweeter output at the crossover frequency. This prevents tilting in the loudspeaker’s radiation pattern.
4. Ideal polar response of the loudspeaker by having the same phase difference between outputs at all frequencies. That is, the reproduction of the loudspeaker is symmetrical as a function of angle and it requires the same group delay from low- and highpass filters.

Other points:

A considerable phenomenon with steep cutoff rates in the FIR case is the ringing of their off-axis response - how does this work? Why would there be a change in the physical speaker behaviour because of a steep cutoff?? The screen shot shows the writer's simulation of

3.21 An on-axis impulse response of two drivers summed - lovely!
3.22 Impulse responses of two drivers summed with 0.2msec delay between them i.e. off-axis. Hmm.

The temporal resolution of ear was discovered to be approximately 2.5 ms for signals having identical energy spectra and differenting only by phase. This experiment was done by Patterson and Green in 1970

OK... which is 6.6 inches, which is a heck of a lot. Do we REALLY hear time-alignment, unless the difference is egregious?

Schouten has defined the recognition of an ”auditory object” to depend on the following factors [48]:

1. Sound periodicity in the range from 20 to 20000 Hz vs. noise-like, irregular sound
2. Waveform is constant vs. waveform fluctuates as a function of time; fluctuations are similar/dissimilar
3. Other aspect of sound, like spetrum or periodicity is changing as a function of time
4. What are the preceeding and the following sounds like?

From the audio engineer’s point of view, group delay distortions have been used as a measure of phase errors for a long time. The most comprehensive paper on the group delay matters in the audio field is probably the paper by Lipshitz, Pocock and Vanderkooy [14], supported by [15, 16, 17, 18, 19, 20, 21, 22]. Hoshino and Takegahara [49] have defined the permissible values for group delay at high frequencies to be roughly 2 ms from 10 kHz, though this has more importance in general audio processing than in crossover design, because usually the crossover frequencies are way below 10 kHz. A recent study defines the audible value of group delay to be 1.6 ms [22]. Møller et al. describe the group delay errors as ”ringing” or ”pitchiness” and importantly state that ”the ringing is detected in the individual ear and not as part of binaural processing”
Localization means judging the direction and distance of a sound event... There are differences in signals arriving at the ears, and the concepts of inter-aural time difference (ITD) and inter-aural level difference (ILD) between the ears are of the greatest importance.
"The delay between low- and highpass outputs was implemented to simulate the elevation of the listening angle, because most of the problems exist, when the vertical angle is changed. With quite a normal separation of drivers of a two-way loudspeaker (0.25 m), the delay was limited to 0-0.5 ms to simulate far-field elevation angles between 0 and 45 degrees"

Hmm - assuming 0.25m, 3m to listener, right angle triangle, path length is ~10mm different, or 0.3msec. Let's read on...
"The woofer was 78cm, and the tweeter was 92 cm above the floor" i.e. 0.14m different!

"The large number of different scenarios forced to cut out inaudible (to the author at least) samples to make the test reasonable in size and duration"

Aha - so samples pre-filtered by author to simplify - assumes author can hear all differences others cannot, not v.v.

"Qualitative inspections from the subjects suggested that the ringing of FIR crossovers was highly critical to the listening place. Slight changes in subject’s head position could make the phenomenon either audible or inaudible. There were also differences between test subjects. Remarkable is that the errors were clearly audible with a real signal and a real loudspeaker in a listening room. This suggests carefulness in designing and using digital crossover filters, especially linear phase FIR crossovers with higher orders."

He's using delays of 0.03msec i.e. <= 1mm difference in woofer/tweeter location, on a 2000tap FIR on the simulated test
So it's not surmising that small head movements can make a big difference in the loudspeaker case

Given what this paper says, you'd have to be deficient if you weren't to consider some of the points, at least to the extent of seeing what makes a difference for you.

Raspberry Pi Audio

Monday, 19 September 2016

DSP Filter thoughts

Initial Post

Reading the Paper

No comments:

Post a Comment