Experimental G-Format Decodes

By Richard Elen, updated May 2006

A paper, "Getting Ambisonics Around", detailing the process used to render various source material including these examples is available here (PDF, 953K).

New: download an extract of the Nimbus recording of Elgar's Cello Concerto in DTS or DTS-CD format.

Introduction

For the past several months, Peter Carbines and I have been developing a workflow for planar decoding of UHJ and B-Format files "Nimbus style" and then encoding them to DTS-CD format, this being a convenient, inexpensive and widely-playable distribution format despite the disadvantage of employing lossy compression.

We started off with archive 2-channel UHJ sources, but we have now extended the work to include B-format content, and as a result I thought I might share some of the test results we've come up with so far and ask for comments, especially from the people who created the original source material.

The material is from AmbisonicBootlegs and Angelo Farina's "Public B-Format" sources - and many thanks to Etienne and Angelo for making these available - plus one file sourced from 2-channel UHJ from our own archives. I am assuming that there are no copyright issues in presenting these experiments based on the above sources. If there are, I will delete the file(s) as soon as I am informed of the problem. Theoretically the KPM piece is protected by traditional copyright, but I am assuming that as long as this is used for private study and research and not made available commercially, there will not be a problem.

The files are DTS-encoded 'pseudo-wav' files, and to listen to them the simplest thing to do is to use the free, cross-platform VLC Media Player. With version 0.8.5 at least, you can open a DTS-WAV or raw ".dts" file directly and play it, and it will use the loudspeaker configuration you have defined in your system. Alternatively, burn one or more of the files on to a CD and play that back via any CD or DVD player with a digital output into a DTS-equipped home theatre surround processor.

There are also a couple of other ways of going about it.

Hypercube Software offers a free tool, "wav2dts.exe", for converting DTS-WAV files (like those available here, produced by DTS-CD encoders like SurCode Pro DTS) into regular DTS streams (".dts" files) that can be played with a DTS-enabled computer-based media player such as PowerDVD, or even by Windows Media Player if a proper DTS-decoder filter has been installed. Note that the Hypercube site is a bit idiosyncratic and you might have to wait awhile before you can start downloading the utility. The usage is simple: from the command line, type wav2dts filename.wav and hit return, and the tool will generate a file called filename.dts which will be a little smaller than the original stream and is the raw DTS data.
Another possibility, for owners of Nero CD-burning software (this may also be true of other equivalent apps), is to burn a .NRG image file of an audio CD employing Nero's Image Recorder, and mount it with Nero Imagedrive. Both PowerDVD and WinDVD will play this 'virtual CD' without problems, and a 'real' CD burner is not required.

Goals

The purpose of these experiments has been to develop and evaluate a process for generating widely-distributable, low cost planar surround recordings rendered from Ambisonic (2-channel UHJ and planar B-Format) source material - without the traditional challenge of requiring the listener to have access to an Ambisonic decoder - by delivering Ambisonically-decoded speaker feeds (commonly referred to as "G-Format") using a commonly-available surround distribution format.

Our primary original involvement with Ambisonics was to develop technology for creating Ambisonic mixes from multitrack sources, and as a result we were particularly concerned that our method would work well for such material, requiring as it does a generally more precise level of localisation than may be expected of a natural recording of an acoustic event. Indeed, the work described here is part of a larger project to establish accessible and effective methods of producing and distributing Ambisonic recordings, particularly of mainstream, widely popular material, which by definition would be mixed from conventional multitrack sources.

Due to its accessibility and affordability, the DTS CD was chosen as a distribution medium. Although it is lossy compared to, for example, MLP/DVD-A technology, DTS CD is a very popular medium and is playable by virtually anyone with a 5.1 system. In addition, the DTS encoding process has previously been shown to carry Ambisonic localisation information effectively. It has also been used experimentally for this purpose before, though not to our knowledge with a file-based "off-line" workflow.

It is appreciated that presenting material in this way is subject to a number of compromises, notably:

Degradation of audio quality and possibly localisation accuracy as a result of employing lossy audio compression
Degradation of localisation accuracy as a result of lack of congruence between the listener's speaker array and the decode target array
Needless to say, this process does not support height information, so where present it has regrettably had to be discarded.

However, in our view the results are at least worth a listen.

Workflow Outline

Most of the source material was originally 16-bit 44.1 kHz sampling. In these cases the workflow is fairly straightforward. In the case of the B-Format material, the source was individual multichannel WAV files, including those with an ".amb" extension conforming to the WAVE-EX Ambisonic file format definition, giving the following workflow:

source -> 'disentangle' to mono WAV files with channelx -> four-square B-decode -> DTS CD encode

In these cases, the signal is treated as 16-bit throughout.

In the case of John Leonard's Stanbrook Abbey recording, which is 24-bit at 48kHz sampling, there are a few differences:

source -> 'disentangle' to mono WAV files with channelx -> r8brain SRC to 44.1/24 -> four-square B-decode with 24-bit output -> DTS CD encode

Note that while a CD is of course 44.1/16, the DTS encoder will encode up to 24 bits of information in the source file. As a result, for this track the output word length of the B-format decode was set to 24-bit. For the other files the decode output word length was unspecified, and in this case the decoder outputs the same word length as it is given, so the decoded files are 16-bit.

Unfortunately, the r8brain sample rate converter generates an error when presented with a ".amb" file, so the SRC operation had to be carried out on the individual mono files after disentangling. In fact it could be said that solely within the context of these experiments, the ".amb" format is a distinct disadvantage, adding one additional stage and complicating a second. On the other hand, there are numerous opportunities for error at every stage in this process, notably transposition of source files at every step following the untangling process, because the apps involved (or those that follow them) cannot handle multichannel files. Hopefully accidental transpositions have not occurred in these examples.

The UHJ example has a similar, but much more streamined workflow:

analogue tape -> 2-ch digital, 44.1kHz/<=24-bit -> UHJ-Decode -> DTS CD encode

Decoder parameters

The Meridian Audio command-line decoder application used in these experiments is capable of rendering 2-channel UHJ and planar B-Format WAV files to four speaker feeds corresponding to the corners of a rectangle (LF, RF, LB, RB). Following experiments by Nimbus, we followed their lead and decoded to a square array: the default aspect ratio.

It has been suggested that in fact the majority of home listeners with surround replay capability do not have an ITU-style 5.1 array, with the rear speaker angle significantly wider than the front, and that instead they tend towards a rectangular array, although the listening position may generally be too far back (Farina et al). Our own limited investigations strongly support this view, and this does not come as a surprise: Ambisonics was originally designed with practicality of installation in an average living room as a significant requirement, and the original basic rectangular array reflects this. As a result we feel confident that decoding to a rectangular array will successfully minimise errors of congruence between target decode and listener speaker positions.

However, we cannot tell whether listeners will have a rectangular array with an aspect ratio greater than 1 (longer, front-to-back, than wide) or less than 1. The square target array is thus a useful compromise and has been shown in tests by Nimbus Records to produce the most broadly applicable result.

In fact it should be noted that in much of this work we are following in the footsteps of Nimbus Records, though they have somewhat different goals and employ DVD-Audio as the release format, and we are extremely grateful for their support and in particular that of Caractacus Downes, along with that of Rhonda Wilson and others at Meridian Audio.

The decoder does not have provision for a centre (front) channel, and in our view, as Ambisonics already offers excellent front stage imaging, the only down-side of omitting a centre channel is that listeners might expect to hear a signal from it. This concern can be alleviated by noting it in the documentation accompanying a recording (eg sleeve notes).

In addition, we left the "row" setting at default: this is a parameter for selecting forward dominance, based on the idea of defining a row of seats in a concert hall. We used Row A. Some historical Ambisonic recordings made with a Soundfield mic or equivalent array have been criticised for being too ambient when replayed in stereo (undecoded UHJ). To alleviate this, it may be that some recordists have taken to placing the mic closer to the musicians than might normally be the case. If the row value is increased, any feeling of being too close to the musicians when listening to the G-Format may be alleviated - we have yet to do work on this area.

Test material

The material is as follows:
Widor - Kyrie (from Messe pour 2 Choers et 2 Orgues) - Paul Hodges
Duruflé - Ubi Caritas (from Quatres Motets, Op10) - Paul Hodges
Langlais - Sanctus (from Messe Solonelle) - Paul Hodges
Stanbrook Abbey Choir - John Leonard
Midas Studios Choir
Midas Studios String Quartet
Spanish Flea - Henry Walmsley
Tijuana Taxi - Henry Walmsley
"Organ BG"
Tony Hymas - Saturn - special mix

Elgar Cello Concerto extract (Nimbus) - This item has its own page with production and background information

The Walmsley pieces are synthesised and as a result have very precise localisation. 'Spanish Flea' for example includes an element that pans slowly around the room and is a good test that everything is behaving itself.

The 'organ bg' item has a good sense of a large space and includes a number of extraneous sounds which localise very nicely. I also like the Midas items as everything is very up-front and locations are easily identified.

John Leonard's Stanbrook Abbey recording has some of the best localisation I've found on an SFM-style recording, and Paul Hodges' Exeter College recordings are very impressive: 'Sanctus' and 'Kyrie' in particular make my hair stand on end!

For comparison, we have included an additional file sourced from 2-channel UHJ. This is a special mix of the Tony Hymas composition 'Saturn' (KPM Music) which was mixed from multitrack in early 1981 for the demonstration of the original Boots Ambisonic Microsystem at the Cunard Hi-Fi Show in London that year. Intended to open the demo, it has some deliberately excessive effects, starting in mono and ending up in rather exaggerated surround. It is one of the first-ever Ambisonically-mixed recordings and was made with the prototypes of the Audio & Design Ambisonic Mastering System: while it doesn't have the subtlety of mixes done with the production units (which had additional features), it at least has some antique value.

Obviously, apart from the latter, we were not present at the recordings, and as a result we would be most grateful for any observations, especially from those who made the recordings we've used - and thanks again to all those who made them available.

The files are mostly 15-30MB in size. As they are DTS-encoded and behave like noise, ZIPping them does not save a great deal of space.

What exactly is G-Format?

These examples essentially represent "G-Format" renderings as discussed elsewhere on this site. Strictly speaking, the term "G-Format" is nominally applied to 5.1 speaker-feed decodes of B-Format source material. However, the process for rendering 2-channel UHJ content is virtually identical to that used for B-Format, and is essentially a special case: 3-channel UHJ recordings, if any exist, would offer exactly the same amount of data as a B-Format recording, the difference between 2- and 3-channel UHJ being a single channel of information. As a result, it seems a little unnecessary to think up a different, special label just for this purpose, and we are tending to describe renderings of 2-channel UHJ to speaker feeds as "2G-Format", generalising the meaning of "G-Format" in the process.

Thus we propose that the term "G-Format" be expanded to describe generally a set of speaker feeds rendered from an Ambisonic recording. If one wishes to be more specific, the source could be specified as B (for a B-Format source: "BG-Format") or 2 (for a 2-channel UHJ source: "2G-Format"). One could extend this further and also specify the base destination array to some extent: "2G4" would represent a 2-channel UHJ source rendered for a 4-speaker array as used here, for example.