Raspberry Pi Audio: September 2016

Thursday, 22 September 2016

Controlling sound propagation with DSP - some simulations

Background

One of the coolest ideas the Kii Audio speakers have, that I'm very impressed by, is the directional control of sound distribution, achieved by means of phase control of signals via the DSP. Basically

For frequencies up to 250Hz, they achieve a cardioid distribution, minimising rear and side reflections of bass frequencies
For mids/highs, they also achieve an effective point source, focused on the midrange unit, which a) ensures a more coherent wavefront for the listener, improving transient response b) again reduces reflections that otherwise interfere with the original transmission

All via DSP magic... How do they do that? I want to know... And I happen to have a couple of mid/bass units knocking around from the unfinished van speakers project, maybe I could do something with them... But what? I need to think about how this could work first.

So I started to try to draw some interference patterns at a specific frequency, using coloured pencils and a compass. Well, it was probably therapeutic (I"d had a pint or two with Mick for lunch!), but didn't really get me anywhere. The only way to explore this in any way usefully is obviously to do it on a computer, where parameters can be easily changed, and graphs/pictures saved etc. At this point, lacking funds to purchase MatLab, gnuplot is your friend! [I've since looked up MatLab, and you can get a student basic version for £29+VAT, or the whole schemozzle for £55+VAT - not bad really!].

Basics

OK, so gnuplot has a magnificent CLI :-), just the ticket! I stole a demo function for sinc(x,y) in 3D, and chopped out bits I didn't want, experimented with ranges, the surface iso line density and so on. Here's some of the results...

Basic wave propagation from single source

2 sources, separated by 1 wavelength

2 sources, 2 wavelengths

2 sources, 3 wavelengths

This isn't really moving me on much, but it was pretty interesting and a good start for late in the evening after the lunchtime pint had worn off! What about a 3rd source, located behind the other two?

3 sources, 2 level 1 wavelength apart, 1 to rear 2 wavelengths from axis

3 sources, 2 level 2 wavelengths apart, 1 to rear 4 wavelengths from axis

All very pretty, but not really achieving much. I think I need to introduce some considerably shorter differences, and some adjustment of relative phase.

The gnuplot code that does this is variations on the following - this produces the very first image:

# set terminal png transparent nocrop enhanced size 450,320 font "arial,8"

# set output 'surface1.16.png'

set dummy u, v

set view 70, 20, 1, 1

set samples 51, 51

set isosamples 101,101

set style data lines

set ztics -1.00000,0.25,1.00000 norangelimit

set xlabel "X axis"

set xlabel offset character -3, -2, 0 font "" textcolor lt -1 norotate

set xrange [ -1.00000 : 1.00000 ] noreverse nowriteback

set ylabel "Y axis"

set ylabel offset character 3, -2, 0 font "" textcolor lt -1 rotate by -270

set yrange [ -1.00000 : 1.00000 ] noreverse nowriteback

set zlabel "Z axis"

set zlabel offset character -5, 0, 0 font "" textcolor lt -1 norotate

set zrange [ -5.00000 : 5.00000 ] noreverse nowriteback

sinc(u,v) = sin(sqrt(u**2+v**2))

GPFUN_sinc = "sinc(u,v) = sin(sqrt(u**2+v**2))"

x = 0.0

## Last datafile plotted: "$grid"

w = 2 * pi

splot [-10*w:10*w] [-10*w:10*w] sinc(u,v)

Of course, some of this is not relevant, since I cribbed it from a demo, but I can sort that later if required. The function name is left as "sinc" since it isn't important.

Now I'm trying to decide what fractional wavelength to use, and what the effects of adding sine waves is anyway - here's a plot of sin(x), sin(x-w/4) and their sum, mapped onto sqrt(2)*sin(x-w/8) - exactly the same signal, but bigger and phase shifted by the difference between the originals! Clearly the physical separation is important - can I see this?

sin(x), sin(x-w/4), their sum and sqrt(2)*sin(x-w/8)

Breakthrough - at least in how to think about this! Rather than be concerned about the specific frequency, and the actual physical separation of the bass units, I have merely to concern myself with the required relative distance between woofers to achieve a cardioid response. This is because the DSP can vary the phase/effective distance between woofers by frequency!! Cool... I wonder if Linkwitz has anything about this...

Monday, 19 September 2016

DSP Filter thoughts

I'm reading lots about DSP-implemented filters...

Initial Post

This thread has a lot of useful thoughts from people who appear to be pretty clued-up.

Linear phase - all frequencies are delayed equally, so all bands are essentially in phase across all the outputs of a N-way crossover. However, I think (TBC) that the phase may shift across frequencies, it's just linear rather than non-linear... Also [1] says

"Linearity in phase response requires symmetry in impulse response."

which is of course the gotcha that brings on the pre-ringing - see below.

Minimum phase - frequencies are differentially delayed through a crossover, that's effectively how it works.

The choice of linear-phase (implemented using FIR approach) or minimum-phase (generally implemented using IIR approach) is potentially critical, although since I haven't heard the two compared in otherwise equal settings, I don't know how important it is to me.

The Rational Audiophile (RA) swears by linear-phase, FIR-based, time-aligned approach. Logically, this appears to make sense - there's no way that time alignment can be ignored for a start. However, strong FIR-based filters have a significant amount of pre-ringing, which comes from the mathematics and construction of the filter. Pre-ringing (essentially, some amount of "pre-echo"-like noise preceding an impulse) is possible because FIR filters essentially "look into the future" to perform their work i.e. if the current sample is X, then they will lookahead into the input from X for N samples where N is the "number of taps". Thus the future can affect the present - in the real world, the signal is delayed by (at least) N samples e.g. if sample rate is 44KHz, then a 44 tap filter (in practice, they are usually much larger) will cause a delay of 1msec (at least - actual processing takes real time too). So - the RA says he has a delay of 500msec, which implies FIR filters of maybe 22k taps, which is a Big Number. What happens to the pre-ringing? Is it audible?

There is a paper on this referred to by the forum posts above [1]. This claims

"The results of the listening experiment were analysed with auditory correlates of group delay distortion (phase errors) and smoothed third-octave spectrum (magnitude error). These correlates explain the results of the listening tests to some extent, but with high-order linear-phase FIR crossover filters, correlation seemed not to always exist. Thus auditory analysis that was based on the function of hearing was used for analysis. It seemed to show qualitatively the reasons for perceived phase errors.

It was discovered that high-order, linear-phase FIR crossover filters offer apparently ”ideal” properties in magnitude and phase reproduction for crossover filters, but they cause clearly audible degradations as ”ringing” in the audio samples, when the flight-time difference between low- and highpass outputs is not zero. The crossover frequency between low- and highpass bands being 3 kHz, it was noticed on the grounds of the listening experiments that filter orders above 600 produce audible errors with linear-phase FIR crossover filters. "

Hmm. Of course, I now need to read the paper, because I'm not sure what is meant by "flight-time difference", which I'm sure is explained within. "Filter orders above 600" refers to the number of taps - 22K is a HUGE number compared with that! The minimum number of taps for a specified db/octave roll-off is highly dependent on the frequencies involved too, so one could vary the taps for different bands, and then apply specific sample-count delays to those bands using smaller tap sizes.

Reading the Paper

The paper is a bit early postgrad, and written by someone whose first language is not English, although that's only an observation not a criticism, on the basis that I'm not sure if some of the ambiguities are intellectual or linguistic!

There's a great summary of the properties required of a crossover:

To conclude all these demands for crossover filters and their transfer functions, we end up with the following goals, as Linkwitz [6], and Lipshitz and Vanderkooy [7] did in their articles:
1. Flatness in the magnitude response. That is, the output signals from woofer and tweeter sum up to unity on the main listening axis; there are no dips or peaks at any frequency.
2. Adequately steep cutoff rates of the low- and highpass filters. This is to ensure that the drivers operate on their optimal range, and to minimize the interference between the drivers.
3. Phase difference is zero between the woofer output and the tweeter output at the crossover frequency. This prevents tilting in the loudspeaker’s radiation pattern.
4. Ideal polar response of the loudspeaker by having the same phase difference between outputs at all frequencies. That is, the reproduction of the loudspeaker is symmetrical as a function of angle and it requires the same group delay from low- and highpass filters.

Other points:

A considerable phenomenon with steep cutoff rates in the FIR case is the ringing of their off-axis response - how does this work? Why would there be a change in the physical speaker behaviour because of a steep cutoff?? The screen shot shows the writer's simulation of

3.21 An on-axis impulse response of two drivers summed - lovely!
3.22 Impulse responses of two drivers summed with 0.2msec delay between them i.e. off-axis. Hmm.

The temporal resolution of ear was discovered to be approximately 2.5 ms for signals having identical energy spectra and differenting only by phase. This experiment was done by Patterson and Green in 1970

OK... which is 6.6 inches, which is a heck of a lot. Do we REALLY hear time-alignment, unless the difference is egregious?

Schouten has defined the recognition of an ”auditory object” to depend on the following factors [48]:

1. Sound periodicity in the range from 20 to 20000 Hz vs. noise-like, irregular sound
2. Waveform is constant vs. waveform fluctuates as a function of time; fluctuations are similar/dissimilar
3. Other aspect of sound, like spetrum or periodicity is changing as a function of time
4. What are the preceeding and the following sounds like?

From the audio engineer’s point of view, group delay distortions have been used as a measure of phase errors for a long time. The most comprehensive paper on the group delay matters in the audio field is probably the paper by Lipshitz, Pocock and Vanderkooy [14], supported by [15, 16, 17, 18, 19, 20, 21, 22]. Hoshino and Takegahara [49] have defined the permissible values for group delay at high frequencies to be roughly 2 ms from 10 kHz, though this has more importance in general audio processing than in crossover design, because usually the crossover frequencies are way below 10 kHz. A recent study defines the audible value of group delay to be 1.6 ms [22]. Møller et al. describe the group delay errors as ”ringing” or ”pitchiness” and importantly state that ”the ringing is detected in the individual ear and not as part of binaural processing”
Localization means judging the direction and distance of a sound event... There are differences in signals arriving at the ears, and the concepts of inter-aural time difference (ITD) and inter-aural level difference (ILD) between the ears are of the greatest importance.
"The delay between low- and highpass outputs was implemented to simulate the elevation of the listening angle, because most of the problems exist, when the vertical angle is changed. With quite a normal separation of drivers of a two-way loudspeaker (0.25 m), the delay was limited to 0-0.5 ms to simulate far-field elevation angles between 0 and 45 degrees"

Hmm - assuming 0.25m, 3m to listener, right angle triangle, path length is ~10mm different, or 0.3msec. Let's read on...
"The woofer was 78cm, and the tweeter was 92 cm above the floor" i.e. 0.14m different!

"The large number of different scenarios forced to cut out inaudible (to the author at least) samples to make the test reasonable in size and duration"

Aha - so samples pre-filtered by author to simplify - assumes author can hear all differences others cannot, not v.v.

"Qualitative inspections from the subjects suggested that the ringing of FIR crossovers was highly critical to the listening place. Slight changes in subject’s head position could make the phenomenon either audible or inaudible. There were also differences between test subjects. Remarkable is that the errors were clearly audible with a real signal and a real loudspeaker in a listening room. This suggests carefulness in designing and using digital crossover filters, especially linear phase FIR crossovers with higher orders."

He's using delays of 0.03msec i.e. <= 1mm difference in woofer/tweeter location, on a 2000tap FIR on the simulated test
So it's not surmising that small head movements can make a big difference in the loudspeaker case

Given what this paper says, you'd have to be deficient if you weren't to consider some of the points, at least to the extent of seeing what makes a difference for you.

Thursday, 15 September 2016

Commercial options

Whilst researching this digital audio business, I've bumped into a number of interesting commercial options in the active speaker area - thanks largely to therationalaudiophile.wordpress.com and comments therein.

Kii Audio THREE

Kii Audio provide the Kii THREE, which is quite a tour de force in the active speaker arena. Kii was formed by Bruno Putzeys (MolaMola, Hypex, Grimm - a real BSD in the amp and digital processing sphere), Bart van der Laam (DSP expert) and a couple of other guys for product management and production control.

Very small for such a premium product (16"x16"x8" or so), each box contains 6 (yes, that's 6!) 250W Class D amps, 4 woofers (two at the back, one each side), a mid and a tweeter (both in the front). The DSP is the heart of things, providing crossovers, equalisation and a "phased array" mechanism for the woofers that tailors the bass off-axis output to suit the speaker's location or your taste. There's also a "motional feedback" mechanism on the woofers to correct distortion. Pretty hairy stuff, and obviously only possible in the digital domain.

I went to hear the Kii THREEs (not sure how important the Capitalisation is but all their literature uses them ;-) ) at Purite Audio, Keith Cooper's home-run audio dealership. Excellent guy, yes a dealer, been in the business a relatively long time, but at least pretty open and friendly in person. A house full of interesting audio! We listened to the Kiis for maybe 4 hours (!), sitting on a comfy couch, my selection of music (Janet Baker, Mahler, Joe Jackson's Body and Soul (a revelation!), and others). I was massively impressed by the capability of the speakers - see comments below.

Good things

Wonderful natural non-boxy sound
Great imaging
Very natural unforced bass
Ran ridiculously loud with no strain or effort (had to turn down after we realised we couldn't hear each other!)
Real people in real acoustic spaces for those recordings that attempt to portray that

Drawbacks

£8000+! But then, you wouldn't need any preamps or amps...
Only accept AES/EBU (professional standard digital wire protocol) at the moment, so require something to convert e.g. Weiss INT202 or similar, which Keith reckoned he'd throw in (not insignificant, maybe £800 retail!). There is a Grimm-alike desktop console for other input protocols coming apparently, but no idea when.
Possibly small hot spot, at least in the room and conditions I auditioned - Keith didn't get the same impression sat on the other end of the couch - no different from Quads then!
Is that all?! Where's my credit card...

So, a revelatory experience. Fortunately I controlled myself, and ran away before ordering on the spot. However, these are definitely on the list. I could sell the Chords, the Quads (for £200 probably!) and presumably the Naim CDS2 which I won't eventually need. Cheap!

Code Acoustics System-1

CONTROL-1 DSP/Amp Box

SYSTEM-1 Speakers

Descriptions available here, these look a bit more home-made than the Kiis, and there's no provenance for the designer/builder. Basically, a separate 6-channel DSP/Amp box, with "normal" boxes and speakers, I assume, since there's nothing on the website to indicate otherwise. About £5k, with a money back guarantee, not unexpected from someone trying to launch a business and product with no provenance. The Control-1 has a bigger range of inputs, USB, AES, SPDIF, TosLink and 6? XLR balanced analogue inputs. Not sure what those last are for, I guess all will become clear shortly.

I haven't heard these yet, but I'm expecting to arrange a home demo with the manufacturer/designer/sales person Ceri shortly. Ooh!! My initial reaction is "how different are these to something I could build myself'? And then again, how necessary is the Kii bass directionality control? And all of these are more expensive than the Linkwitz LX-Mini with his subwoofers added, which I also haven't heard, which do in-room behaviour control with the physical characteristics of the chassis speakers and enclosures.

Other Choices

There are lots more active options around, most of which I haven't really explored, for various reasons

Grimm LS-1 - also available from Purite, but EXPENSIVE!!
Manger - also more expensive, and not sure they are better than Kiis, for example, although not listened to them
Genelecs - more pro and probably expensive
Divialet Phantom - Not seen any of these around
etc...!