This is a guest post by David Pfahler Stephan Bönnemann of

Why you’d like to use HTML5 audio in iOS 5

It’s all about apps these days. The question is whether you want to make the commitment and investment (of learning a new language) to program natively for one platform or leverage the web dev skills you already have? HTML5 is the way to go if you sympathize with the latter.

Let’s face it: The main problem for web developers that want to make apps is to deliver a great user experience. Hence, you’d like to use as much of today’s web technologies as possible to enhance the UX. One profound way of doing this is audio feedback. For example, many twitter apps use sounds if new tweets come in and another sound if no new tweets could be found. This kind of response is intuitive to the user. This is what you should go for.

Also, imagine a game without sounds. Would be kind of boring, wouldn’t it?

Naive Assumptions: What you’d expect

Fortunately in iOS we have rich HTML5 support built into mobile Safari. Initially, Steve Jobs wanted all apps to be web apps! So what you’d assume regarding audio is the presence of the HTML5 audio tag. And guess what, there is an implementation of the audio tag in iOS. Hurray!

So what do you expect from an audio tag implementation:

  • call .load(), .play(), .pause() and other methods via JavaScript
  • use remote files, dataURIs, cached files and other sources
  • play several audio files / tags at once
  • mix audio from several sources with different volumes, etc.

Reality checks: What should work but doesn’t

It becomes clear pretty fast that it’s not that simple. With iOS it’s a disaster. The bad news first:

It is not possible to load or play audio without user interaction.

Sad, but true. Apple deliberately decided that a touch event is mandatory to load and play audio. There is no workaround for this.

If we accept the fact that we need a touch event to play the audio, is it possible to use the audio tag as expected? Hmm, no, not really. So what’s broken as well?

Latency: Loading audio files on demand (after the touch event fired) is unusably slow. The desired user experience can not be achieved using this technique.

One audio tag at a time: You can never play more than one audio tag at once. This makes it impossible to mix sounds on the fly which would be necessary for games that have a background music and must also play sounds dynamically (e.g. when the character jumps).

Audio files only: You can not use anything as the source besides audio files (uncompressed WAV and AIF audio, MP3 audio, and AAC-LC or HE-AAC audio). If you could use dataURIs then it would be possible to preload the audio.

Workarounds: How to play anyways (at least somehow)

We invested a lot of time researching the audio tag on iOS 5. If you want to know how we play audio, read on.

So what do you do with this crippled audio tag? Well, you trick it a little bit and use it as much as possible. Here are several ideas that work best for different scenarios:

Bind touchstart event to body: As you have to play on a touch event, we bind a touchstart handler on the body. So wherever the user touches first, we can use this event to load the audio file. Even better, from now on, we can load and play different sources from this audio tag.

bodyEl.addEventListener("touchstart", function() {
}, false);

Hot swapping sources: We can now change the “activated” audio tag’s source on the fly. So it is now possible – without user interaction (!) – to load and play different sources. But you can only play one source at a time, of course.

audioNode.addEventListener("ended", function() {
audioNode.src = "newSource.mp3";
}, false);

Use audio sprites: The guys over at Zynga use audio sprites in their Jukebox. This means that you have a huge audio file containing one sound that you want to play, then one second of silence, then another sound and so on.

Example for a sound sprite structure:
1 second silence
First sound
1 second silence
Second sound
1 second silence
Third sound

In this file you include all the audio that you possibly want to play in your app. You only load the source once (as described above) and then jump to the relevant parts in the file. So the file is always playing but most of the time it just plays silence. When you need the sound, you can quickly jump to the part in the file that you need to play. This solution is quick (much faster than hot swapping!) but requires a long preloading phase before you can play the most sounds, because the file is bigger.


So audio in iOS is pretty much broken. If you are using a native wrapper like PhoneGap you could use native APIs via plugins like SoundPlug. We’ll try what is possible in the upcoming versions of iOS. Please let us know your ideas and feedback in the comments.


Regarding Philip’s comment, this is how we think it could be.