
Exploring Spatial Audio for Immersive Experiences
Spatial audio, in simple words, is imitation of how we hear sound in our everyday environment. We are able to— point out the direction of sound source, tell how far it is, predict the direction of movement etc. Spatial audio when combined with immersive experiences such as V.R. games, movies etc. can provide a more authentic imitation of the real world. Apart from realism, it can also be used for expressive tones.
Modern sound reproduction techniques have allowed us to be able to record spatial audio, store it and reproduce it whenever we wish to. There are several hi-fi techniques such as binaural recording and reproduction systems that allow this.
In this project, I tried to understand the concept of spatial audio and its production. During the course, I did numerous experiments that helped me understand the implementation aspect of spatial audio when using a digital platform. Then I compiled my learnings to furnish a final deliverable- A small V.R. music video that utilizes spatial audio.
Exploration Outcome
The above video shows the final outcome of this exploration journey. A Music video I composed, that utilises Spatial Audio and VR.
​
Music composition, audio processing, video direction: Priyank Sagar
Vocals: Samanda Pyngrope
Cast: Anjali Singh, Priyank Sagar
Camera, video editing: Amal Dev Chandran
Aim
This project aimed to study various technical aspects of sound and audio production and find its application in spatial audio. This will help find methods to make an immersive experience more engaging, expressive and realistic.
These technical aspects include:
-
Audio signal and terminologies: Bit depth sample rate etc.
-
Electronic sounds generation and music production: DAWs, Virtual instruments etc.
-
Synthesis: Synthesizers, VSTs, Audio Units etc.
-
Studio: construction, acoustics
-
Hardware: Preamp, Audio interface, ASIO PCI, DAC, monitors, mic etc.
-
Recording, Sampling, Mixing, Mastering
-
Binaural recording
-
Mapping sounds in an immersive environment.
Project deliverables were:
-
Producing audio for a 360 film.
-
Experiments depicting different methods of producing spatial audio and its documentation.
Basics of Sound and Audio Engineering
I started by learning about the fundamentals of sound and audio engineering with help of existing literature
Getting Used to New Workflow
Following that, I used online resources to learn about the tools that were necessary for this exploration.
Throughout this process, I was faced with several questions regarding the engineering part of sound. Questions like: what are the different ways in FB360 (a spatialisation tool by Facebook (now Meta)) in which we can effectively distinguish the sources in vertical plane? Or How do the sound sources imitate distance using FB360 plugin? etc.
The next step was to conduct experiments to clarify these questions.
Tests and Experiments
I started experimenting to try my ideas and understand the concept better. I compiled these tests and published them online, so they can make use of my learnings.
Following are the experiments I conducted.
​Test 1
-
Front, Back, Left, Right: 4 Channel, 4 Meters
In this test, there were samples of sounds (speech, drums, piano, exhaust fan) that were placed in four planes (front, back, left and right).
This test was meant to get a tentative idea of how easily one can point out the direction of the sound source and also the resolution of the transition. As YouTube was going to be the publishing platform, I had to find out about its first-order ambisonics capabilities.
​
First-order ambisonics is a spatial audio format that has four channels, which correspond to the four mics that are placed equally apart in the recording equipment. The result is- these four mics cover all six planes of a cubical environment: front, top, back, bottom, left and right.
Typically, such recording setups are unable to mimic how the shape of an ear reflects sound, which results in difficulty while differentiating between the vertical plane (front/back, top/bottom). This led me to the next test.
​
Insights
Because of this test, I was able to judge the spatialising resolution of YouTube’s first-order ambisonics, i.e. how good the mapping and head tracking in first-order ambisonics on YouTube is for the horizontal plane. It also helped me figure out the basics of FB360 and things I need to keep in mind during the process.
​​Test 2
-
Front, Top, Back: Source Distinguishing Test, 4 Metres
In this test, I experimented with different settings of Facebook’s FB360 plugin to see which setting helps us distinguish between sound sources in the vertical plane. No additional effect was applied. Mix volume is the same across all three settings. The image on the right shows the parameters I experimented with.
Insights
This test was helpful in trying out ways, in which FB360 can be best used for distinguishing sound sources present in vertical plane.
The methods I used are not entirely sufficient to mimic how sounds reflects and gets transmitted due to our ears. There are several binaural mics that have tried to mimic this. Although, since the shape of ear is unique to an individual, it is hard to mimic exactly a model of ear for a group. Hence, the research on this factor, especially in front, back, top, bottom planes still continues.

​​Test 3
-
Distance
In this test, I wanted to find out how different do sources sound when FB360 is used to induce distance v/s when mere volume control is used to show distance.
Insights
There is indeed a difference between tonality of sound when using FB360 v/s volume control to show distance. The algorithm of FB360 changes parameters more than mere attenuating the sound.
​​Test 4
-
Motion
Using automation in reaper to generate a feeling of motion and tracking.
In this test, I played with reaper’s automation settings for FB360 plugin to mimic sound sources in motion. A single sample track was used for this.
Insights
This test helped me understand the limitations of FB360 plugin when it comes to tracking motion of a sound source. One can’t rely on Fb360’s automatic tracking feature alone when it comes to small objects. Using automation and manually changing position is a better choice for such scenarios.
​​Test 5
-
Music Introduced
In this test, I compile a set of 4 electronically generated beats with help of synthesizers and distribute them in 4 different channels. After that, I spatialised them and added motion and distance that changes over time using automation.
The VST’s that I used for generating sounds were: Arturia analogue lab one and VCV rack.
​
Insights
This test helped understand the role of psychoacoustics. There was a disconnect between visuals and sound and only after this test was I able to understand how music, in a spatialised environment, hugely depends on visuals. Without proper semantics, a spatialised sound simply becomes a noise and makes no sense. There needs to be an understanding of what sound resembles and what impact will it have on listener's perception with respect to the visuals that are being played on the screen.
Recording Studio Visit
Visiting Auralmayhem, Mumbai (India)
I got an opportunity to visit founder of Auralmayhem studios, Andheri- Mr. Nitin Gera. Meeting with Mr. Gera was very helpful. I realized how much there is still to learn and how vast this new media is. We talked about psychoacoustics and how important it is to first observe and understand how humans perceive different kinds of sounds, how overdoing spatialisation can be an overkill at times, how different frequencies of sound are related to directionality (1 Hz frequency is direction-less and the higher the frequency the more directional it becomes), point sources and sources spread over a space etc. I learnt a lot of details that went missing in my studies. One first needs to observe sounds in real world, after that understand basics of sound engineering and after that come to spatial audio as spatial audio is one step above these basics.
​
Application
Storytelling and Spatial Audio
Conducting the above tests helped me clarify several confusions, identify software limitations and ways in which I can work around them. It also sensitized me towards ways in which one should and should not apply spatial audio in an environment. Following is what I think, with my current understanding of the subject, spatial audio can be used in immersive media and have an effect on storytelling:
-
It helps us imitate our day-to-day surroundings, thus, making the virtual environment more realistic.
-
We can play with this imitation and alter it in order to give an atypical feel to the virtual environment.
-
Spatial audio is being used in numerous other fields other than VR due to its guiding capabilities in space through sound.
-
The same two visuals utilizing different emotion of music will result in a different story altogether.
-
Using electronically generated sounds and spatialising them in a VR environment gives us lots of scope for experimentation and expression.
-
One needs to understand psychoacoustics first in order to make intended communication in a spatial environment.
-
Playing with the semantics of the sounds used gives us lots of scope for new ideas in storytelling.
-
Acoustic instruments are a better choice over synthesised sounds if one is trying to achieve a realistic environment. There is a quality about acoustic instruments that makes it a familiar part of a natural environment when compared to synthesised sounds, which might sound bizarre/ alien/ new.
360 Music Video
Final Deliverable
For the final output, I decided to make a music video which will have spatial audio as its focus. Keeping the audio in focus was one of the reasons why I decided on a music video rather than a film. Apart from this, music allows us to play with sound, understand its semantics with respect to visuals and keeps sound in focus, which is the main goal of this project.
Music
Recording Music Track
The song had two instruments and one vocal part as the constituent. Along with that, it had samples of wind and rain that would act as ambient noise. Following are the tools used for recording.
Tools used for recording, mixing and mastering
![]() Drums were recorded using a VST called UJAM PHAT | ![]() For strings I used the electric guitar that I had made (Refer projects section to know more) | ![]() The mix was monitored using a pair of studio monitors |
---|---|---|
![]() All tracks were processed via an audio interface | ![]() A multiband EQ was added in order to shape the frequency response | ![]() Desired amount of saturation and distortion was added using FabFilter Saturn |
![]() Vocals tracks were passed through a De-Esser to lower ‘Ss’ and ‘Ps’ in signal because they usually result in ‘puff’ [like sound] if left unaddressed. | ![]() And to add reverb FabFilter Pro - R was used |
Following this, the video shoot took place. I needed to shoot 5 shots and stitch them together later which posed several challenges, like, the camera position, lighting, artefacts position etc. should not change. Although, the shoot spanned over two days due to which these factors got disturbed, which made it difficult to edit the video later.
​
Final Output
Video and spatial audio based soundtrack were compiled together, and final video was produced.
There were two outcomes of this project:
-
A compilation of tests put together in a YouTube playlist that documents the exploration of this project. These videos can be referred to in future by other researchers and can be used by them for further development in the field of spatial audio.
Go to experiments
-
A music video that exhibits all the learnings that happened throughout this project.
Go to music video
​