What are we doing?

Data sitting on a computer somewhere is pretty dull. If you are working with data, it's a good idea to find lots of ways to interact with it. If you work with a type of data that is specific to your field, there'll likely be lots of ways you can think of to interact with it.

For example if it's images, look at them. If you transform your data for any reason, look at them before and after the transformation. It sounds obvious but it can be overlooked by machine learning engineers / data scientists because building tools or bespoke visualisations to interact with data can sometimes feel out of the scope of their responsibilities.

Ok, preaching aside, let's create something that will help people who work with audio within Jupyter notebooks to interact with it. This will allow people working with audio data in Python to listen to their audio alongside any plots they have for the audio e.g. the output of a neural network.

The end goal is to have an interactive audio plot for interacting with audio visualisation plots like this tweet. Credit to this StackOverflow post for sharing a HoloViews audio plot with a playhead.

Here's a version of the final widget that works in a browser. Note: there's a clickable plot if you run it yourself.

 Hear and look at Audio

First things first, we want to be able to hear the audio. Conveniently, IPython comes with lots of out-of-the-box ways to display data. Here's one for audio:

#collapse-show
from IPython import display
audio_path = "./my_icons/blah.wav"
display.Audio(filename=audio_path)

Although this lets us hear the audio, what if we want to see it? Let's first look at what's inside it:

#collapse-show
from scipy.io import wavfile
sr, wav_data = wavfile.read(audio_path)
print(sr)
print(wav_data.shape)
48000
(775922, 2)

This shows the sample rate is 48000Hz and it has 775922 samples for 2 channels.

wav_data[:,0] # first channel
array([-2, -3,  0, ..., -3, -1,  0], dtype=int16)

Seeing audio in a big numpy array isn't very useful. But what if we plot the values:

#collapse-show
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(wav_data)
plt.show()

The two channels are on top of eachother. We can split them like so:

#collapse-show
fig, axs = plt.subplots(2)
axs[0].plot(wav_data[:,0])
axs[1].plot(wav_data[:,1])
plt.show()

Although this is nice, I'd like to have the x-axis be seconds rather than samples. We can use numpy.linspace to do this. It just gives use evenly spaced numbers between start and end, and we can decide how many numbers.

The duration is just the number of samples divided by the sample rate, and we want the same number of points (to match our y axis).

#collapse-show
import numpy as np
fig, axs = plt.subplots(2)
duration = len(wav_data)/sr
x = np.linspace(0, duration, len(wav_data))
axs[0].plot(x, wav_data[:,0])
axs[1].plot(x, wav_data[:,1]) # audio channel 1
plt.show()

Ok, that's better but is there any better way to view audio than using the amplitude of the waveform??

 Spectrograms

Smarter people than me came up with viewing audio frequencies rather than amplitudes. 'Spectrograms' of audio are used to display this. They are visualisations of the frequency changes over time. We'll just use one channel from now on for simplicity.

#collapse-show
audio_data = wav_data[:,0] # just use one channel from now on
plt.specgram(audio_data, Fs=sr)
plt.show()
display.Audio(audio_path)

We can do the same thing using scipy to first get the spectogram and then use matplotlib to plot it with a colormesh using the log of the spectrogram.

#collapse-show
from scipy.signal import spectrogram
f, t, sxx = spectrogram(audio_data, sr)
plt.pcolormesh(t, f, np.log10(sxx))
plt.show()

Add More Interactivity

That's getting us close to what we want, but what we really want is to be able to interact with the plot and hear the audio at the point we interact with.

For more interactivity, we're going to reach for a different tool other than matplotlib and IPython.display. Holoviews and Panel by the Anaconda team are very nice for custom interactivity. Conveniently for us, Panel's Audio pane and Holoviews Image component play nicely together and allow us do more interactive viusalisations.

#hide_output
import holoviews as hv 
import panel as pn
hv.extension("bokeh", logo=False)

spec_gram = hv.Image((t, f, np.log10(sxx)), ["Time (s)", "Frequency (hz)"]).opts(width=600)
audio = pn.pane.Audio(audio_data, sample_rate=sr, name='Audio', throttle=500)
pn.Column(spec_gram, audio)

Here we create a Image the same way we did with matplotlib plt.pcolormesh and the pn.pane.Audio using the first channel of the audio_data we got from scipy.io.wavfile.read(audio_path). Finally, we put them together in a pn.Column so that the spectrogram is displayed above the audio player.

Add Playhead

We want the playhead to update when the time changes while you're playing it. To do this, We'll use a Holoviews DynamicMap. It sounds complicated but put simply, it links a stream with a callback function.

In this case the stream we're using is the Stream from audio.param.time and the callback update_playhead we create that returns a Vline (the playhead). We use * operator to overlay the image with the returned Vline playhead.

#hide_output
def update_playhead(time):
    return hv.VLine(time)

dmap_time = hv.DynamicMap(update_playhead, streams=[audio.param.time]).opts(width=600)
pn.Column(audio,
          spec_gram * dmap_time)

Note: The slider underneath is because of how I made it work on a static HTML web page. If you run it yourself, there’ll be no slider.

Add Click to Update Playhead

That works great, but we also want to be able to click the plot and update the playhead. We do this by merging two streams to trigger one update_playhead callback within the DynamicMap. The SingleTap stream captures when the plot is clicked, and we use Params to update time to t for the merged callback. Within the update_playhead callback, we just check if x (the x position of the click) is None, if it is we use the time.

#collapse-show
def update_playhead(x,y,t):
    if x is None:
        return hv.VLine(t)
    else:
        audio.time = x
        return hv.VLine(x)

tap_stream = hv.streams.SingleTap(transient=True)
time_play_stream = hv.streams.Params(parameters=[audio.param.time], rename={'time': 't'})
dmap_time = hv.DynamicMap(update_playhead, streams=[time_play_stream, tap_stream])
out = pn.Column(audio,
                spec_gram * dmap_time)

Note: This will work when you run the notebook yourself, but the interactivity is lost when hosted on a static HTML web page. You can link it with a Python backend, but that’s not happening here because it requires a bit of work that I haven’t done.

All the code in one place

#collapse_hide 
from scipy.signal import spectrogram
import holoviews as hv 
import panel as pn
from scipy.io import wavfile
hv.extension("bokeh", logo=False)

sr, wav_data = wavfile.read(audio_path)
audio_data = wav_data[:,0] # first channel
f, t, sxx = spectrogram(audio_data, sr)
spec_gram = hv.Image((t, f, np.log10(sxx)), ["Time (s)", "Frequency (hz)"]).opts(width=600)
audio = pn.pane.Audio(wav_data[:,0], sample_rate=sr, name='Audio', throttle=500)

def update_playhead(x,y,t):
    if x is None:
        return hv.VLine(t)
    else:
        audio.time = x
        return hv.VLine(x)

tap_stream = hv.streams.SingleTap(transient=True)
time_play_stream = hv.streams.Params(parameters=[audio.param.time], rename={'time': 't'})
dmap_time = hv.DynamicMap(update_playhead, streams=[time_play_stream, tap_stream])
out = pn.Column( audio, 
               (spec_gram * dmap_time))

Bonus: Make it work on a static HTML page

I won't really dive into this but you can remove the need for a Python server by using jslink to rely on your browser's Javascript alone. I'd be interested to hear if there was a nicer way to do this, and how easy it would be to add a click event. That's actually how I made the above plots display in your browser.

#hide_output
from bokeh.resources import INLINE

slider = pn.widgets.FloatSlider(end=duration)
line = hv.VLine(0)
slider.jslink(audio, value='time', bidirectional=True)
slider.jslink(line, value='glyph.location')
pn.Column(spec_gram * line,  slider, audio).save('redo', embed=True, resources=INLINE)

Play with it yourself!

You can view and run all the code yourself from here.

I personally love learning about these kind of visualisations and finding ways to creating interactivity. What do you think about these type of widgets for interacting with data? Did you learn a bit about creating interactive visualisations in Python by reading this article? If so, feel free to share it, and you’re also more than welcome to contact me (via Twitter) if you have any questions, comments, or feedback.

Thanks for reading! :rocket:

Follow me on Twitter here for more stuff like this.