Procedural Audio On the Web

Originally published on Medium, on Aug 17, 2019

This article series was originally meant to be an online talk for ProcJam 2018, I would like to thank Mike Cook for his understanding and support as I converted this talk to an article that took way too long to publish, but here it is.

PureData Patch of Andy Farnell’s Fan Noise Implementation

There has never been a better time to create truly interactive multimedia content for the web, and make audio an important part of this experience.

Thanks to the processing power of the hardware we use, in tandem with the wide implementation of the Web Audio API, we are able to create and modify audio content on the fly.

From oscillators to effects to spatialization options, plenty of opportunities to create new audio worlds are at the tip of our fingertips, ready to be used on a platform that is accessible to millions of people.

While there are several ways to create and use audio on the web currently, my aim is to demonstrate why procedural audio techniques are ideal in some cases, due to their flexibility and low data/network usage.

Procedural audio has been used on various mediums from avant-garde music to video game music for the past few decades. In this series of articles, I will be focusing on two main use cases for the web,

As aural user interface indicators for web pages and applications
Usage in games, interactive arts and other multimedia applications on the web that aim to create experiences on the more artistic and expressive side.

In this first part, we will be going over the basics of procedural audio and how we can use it on the web, while walking through a WebVR experience created using A-Frame.

While I will not be going over topics like the history of procedural audio, or history of web audio and audio in video games, I will provide several resources at the end of this post, in case you want to learn more about them.

What is Procedural Audio?

In the words of Andy Farnell, who wrote one of the most seminal books in the field of procedural audio and sound design:

“Procedural audio is non-linear, often synthetic sound, created in real time according to a set of programmatic rules and live input.”

Let’s dissect this explanation a little bit to get a clearer understanding of how it all applies to audio applications and what it enables us to do.

Non-Linear

What does it mean for audio to be non-linear? First, let’s start by taking a look at its counterpoint, linear audio.

Traditional, recorded audio, has a set duration, it has a fixed beginning and an end point, and whatever kind of audio journey we take between those two points, it’s always the same journey in regards to its content and duration.

We always hear the same elements, in the same order, and at the same point in time and space, inside that specific, recorded soundscape.

Although, we have access to techniques such as pitch and speed manipulation for a while now (which allows us to manipulate the source material even during application run-time) it still doesn’t change the linear-in-time nature of the prerecorded audio material. Which is essentially, frozen in time.

On the other hand, non-linear audio (which in this case, procedurally created audio) is created real-time during the experience. It doesn’t have a set, predefined duration, and more importantly, we can change the order and pitch or timbre of its elements however we desire, whenever we desire.

But how do we get to create and manipulate audio content on the fly? Which brings us to…

Synthetic Sound

“Synthetic sounds are produced by electronic hardware or digital hardware simulations of oscillators and ﬁlters. Sound is created entirely from nothing using equations which express some functions of time and there need not be any other data.”

The amount of interactivity procedural audio provides stems from it being artificially generated by computers, synthesizers and alike, that we can program and implement instructions within. These tools enable us to split an audio object into the elements it consists, and then use these granularity to control various aspects of a sound. Whether based on a generative algorithm, a set of specific instructions we model after a real world sound object, or based on user interaction, synthesizing sounds from scratch real-time gives us new opportunities for interactivity that traditional methods doesn’t.

There’s of course a certain give and take in this situation, especially in cases where the acoustic source material is hard to imitate by artificial means, a sound created by procedural means might not always be perceived as realistic as its recorded counterpart. But in return, we get more freedom and range in how we implement and integrate these sounds.

Programmatic Rules and Live Input

Last but not least, procedural audio relies on a set of programmatic rules, it needs an algorithm, a recipe that clearly defines the properties of the audio object we want to create.

Which means we usually first need to create abstract models of the objects we want to recreate using procedural audio. This means “a simpliﬁcation of the properties and behaviors of the object” as Andy Farnell put it.

Additionally, to variate the properties and behaviors of this object, we may either choose to rely on live input from the user, or we can choose to create a more autonomous system that doesn’t need user input to be triggered or changed during its lifetime.

Now let’s see these principles in practice by walking through a real example. In our example I chose to use Tone.js, which is a very easy to use abstraction library built on top of the Web Audio API, but of course everything we are going to build for this environment is also available on the core Web Audio API as well.

Hardware Room

This is a 3D environment I’ve created using A-Frame and a few models from the Google Poly library. You can take a look at it yourself here, fair warning, due to the models that are being used, it will take some time for it to load depending on your connection speed.

In this environment the user can interact with the Air Conditioner in the room and choose from one of the three speed settings.

Now in light of what we learned about procedural audio, let’s see how we create and control the air conditioner sound for this environment.

Putting the Pieces Together

Our air conditioner sound has two main parts that are controlled separately, the motor and the fan. Although we can choose to divide it to further components, for this example I chose to just variate those properties.

TIP: For a more detailed approach you can take a look at Andy Farnell’s Fans patch for PureData and read a detailed explanation of his process in the dedicated chapter of Designing Sound.

As we control the operating speed of the air conditioner, these values will also change and adapt to the new settings in real-time.

But how are we accomplishing that?

If we look at our audio file, we can see that we are using a Tone.js Synth component to create an oscillator as the base low frequency component of the motor sound.

// Create the Fan Motor
const fan = new Tone.Synth({
  oscillator: {
    frequency: 22,
    type: 'triangle'
  },
  envelope: {
    attack: 2,
    decay: 2,
    sustain: 1,
    release: 1
  }
});

And for the fan noise, we use a Tone.js noise component to create the fan noise and also an AutoFilter to shape that noise and give it some movement.

// Create Fan Noise
const noise = new Tone.Noise({
  type: 'pink',
  fadeIn: '5'
});

// Use AutoFilter to give movement and shape to the Noise
const autoFilter = new Tone.AutoFilter();

// Create dedicated fan noise fader
const noiseFader = new Tone.Volume(-8);

// Chain the noise to autoFilter
noise.chain(noiseFader, autoFilter, Tone.Master);

Then we connect the noise to the filter before sending the output to our master volume channel, by using a Tone.js chain.

noise.chain(noiseFader, autoFilter, Tone.Master);

Now that we have separate audio components that we can control the parameters of, we just need to decide on which parameters to control and then pass them the appropriate values on user interaction.

For this demo I chose to control the volume of the fan motor and fan noise, along with the fan motor frequency to imitate motor blades turning faster to blow more air on higher speeds.

// Send the new values to our motor and noise audio components

const setValues = (rotorVol, rotorFreq, noiseVol) => {
  setRotorVolume(parseFloat(rotorVol));
  setRotorFrequency(parseFloat(rotorFreq));
  setNoiseVolume(parseFloat(noiseVol));
};

const slowSpeed = () => {
  setValues(-9, 20, -12);
};

const fastSpeed = () => {
  setValues(-3, 30, -6);
};

const fasterSpeed = () => {
  setValues(3, 40, 0);
};

Now when a user click on one of the buttons, they will hear the AC sound change accordingly.

What If?

As a final step, let’s imagine what would happen if we had to follow a non-procedural approach for this audio environment, and solidify our understanding of procedural audio principles in the process.

If we didn’t use a procedural approach, first and foremost we would have to prepare (record) at least three separate audio files for the three speed options of the AC. Let’s pretend they are named ac_slow_speed, ac_fast_speed, and ac_faster_speed, respectively.

To begin with, these audio files would have the linearity problem we talked about before, meaning they would always have the same audio characteristics whenever we triggered them, which would make them really repetitive. If we wanted them to sound less repetitive, we would have to create at least two iterations of each speed.

They would have to be edited in a way to loop seamlessly, so our audience wouldn’t be able to tell when the file starts playing again from the beginning.

On the other hand, when we decided to create a procedural audio environment, all we had to do was design an algorithm (a set of programmatic rules) modeled after the object we wanted to emulate (an air conditioner with a fan and a motor), create the necessary audio components (non-linear, synthetic sound), then feeding the parameters real-time (with live input) to these components depending on the user interaction.

This approach also gives us a flexibility during development, where we can easily try different values for the speeds, instead of having to record completely new audio files each time we want to add a new speed option or an element to our audio object.

Other concerns would be relative file sizes of the dedicated audio files we would use, which would have to be loaded before we were able to interact with the system. Not to mention, our file formats would have to be compatible with all browsers we want to support, which in some cases would mean for us to create several different versions of the same file.

Last but not least, we would have to use more space/load more files per speed, wait for those files to load/being swapped when the user clicks on a button. Consequently, our audio wouldn’t be as responsive and would require more data/network bandwidth, which would be problematic on limited connection speeds.

Of course, procedural audio is not without its disadvantages, having to come up with dedicated algorithms to re-create properties of audio objects, the processing power (vs. the amount of space and load time) it requires for more complex use cases, and not having acoustical authenticity are among a few of the most prominent hurdles we can come across.

So, it should be taken into account that procedural audio techniques might not always be the best choice for all use cases, but using a combination of both pre-recorded material and procedural audio could give us the flexibility we need while also keeping our acoustic soundscape as believable as possible for our audience.

In the second part of this series, we will take a look at how we can use procedural audio to create user interface sounds on the web, and how we can take advantage of it to signify application states-like loading, progress levels or other gradual processes-we can’t normally use prerecorded sounds for easily.

Berrak Nil