Robin Sloan
the lab
August 2018

Expressive temperature

This is a very niche post doc­u­ment­ing a tech­nique I think might be use­ful to artists — especially musicians — who want to work with mate­r­ial sam­pled from machine learn­ing sys­tems.

There’s a primer below for peo­ple who are intrigued but not steeped in these tools. If you are, ahem, well-steeped, skip directly to the head­ing Turning the knob. Otherwise, proceed!

A wild horn

A primer on temperature

In machine learn­ing sys­tems designed to gen­er­ate text or audio, there’s often a final step between the sys­tem’s model of the train­ing data and some con­crete out­put that a human can eval­u­ate or enjoy. Technically, this is the sam­pling of a multino­mial distribution, but call it rolling a giant dice, many-many-sided, and also weighted. Weird dice.

To that final dice roll, there’s a fac­tor applied that is called, by convention, the “sam­pling tem­per­a­ture.” Its default value is 1.0, which means we roll the weighted dice just as the model has pro­vided it. But we can choose other val­ues, too. We can tam­per with the dice.

It’s eas­i­est to under­stand with examples.

Here’s some text sam­pled from a model trained on sci­ence fic­tion stories. First, let’s set the sam­pling temperature to 0.1. The weights get divided by the tem­per­a­ture, so a value less than 1.0 will make heavy weights much heavier — which is to say, more likely — giving us a drone of the train­ing data’s most basic themes:

It was a strange thing. The strange state­ment was that the state of the planet was a strange state of special problems.

You could sam­ple pages and pages of text sam­pled at this tem­per­a­ture and you wouldn’t read about much besides plan­ets and strangeness.

By contrast, if we set the tem­per­a­ture to 1.0, we get a nice, surprising sam­ple:

It was here yesterday. There was a dou­ble fic­tional device with croc­o­dile shoul­ders and phoney oceans and Katchanni.

Katchanni! Okay! You will never, ever find Katchanni at tem­per­a­ture 0.1.

We can crank the sam­pling tem­per­a­ture up even higher. Remember, we’re dividing, so val­ues above 1.0 mean the weights get smoothed out, mak­ing our dice “fairer”—weirder things become pos­si­ble. At 1.5, the text begins to feel like a pot left unat­tended on the stove, hiss­ing and rattling:

It was Winstead, my balked, old-fashioned 46 fodetes ratted.

And maybe that’s an effect we want sometimes! This is the point. I shall put it in bold. Often, when you’re using a machine learn­ing sys­tem to gen­er­ate mate­r­ial, sam­pling tem­per­a­ture is a key expres­sive control.

Sooo … let’s control it.

Turning the knob

If I repeated the exer­cise above but sub­sti­tuted sam­ples of audio for text, you would recog­nize the same pattern: at low (“cool”?) tem­per­a­tures, cau­tion and repetition; around 1.0, a solid attempt to rep­re­sent the diver­sity of the train­ing data; beyond, Here Be Dragons.

It occurred to me, deep into some exper­i­ments with audio, that it should be pos­si­ble to change the temperature not just between sam­ples, but dur­ing them. In other words, to treat sam­pling tem­per­a­ture the way you might a fil­ter knob on a syn­the­sizer, sweep­ing it up or down to lend your sound move­ment and drama.

Let’s play around.

The toolkit:

The audio snip­pets below are all straight out of Sam­pleRNN, with no process­ing or editing, but they are unapolo­get­i­cally cherry-picked. They all have a sound that’s char­ac­ter­is­tic of this sys­tem: noisy, jangly, a bit wobbly. If you don’t like that sound, it’s likely you won’t find any of this par­tic­u­larly compelling, and … you should prob­a­bly go lis­ten to some­thing else!

Finally — I feel like I always end up includ­ing some ver­sion of this caveat-slash-manifesto — I’m attracted to these tech­niques because they pro­duce mate­r­ial with inter­est­ing (maybe unique?) char­ac­ter­is­tics that an author or artist can then edit, remix, and/or cast aside. Please con­sider the sam­ples below in that context. Other peo­ple — researchers and tin­ker­ers alike — are more moti­vated by the dream of a sys­tem that can write a whole song end-to-end. As they progress toward that goal … I will hap­pily mis­ap­pro­pri­ate their tools and bend them to my pur­poses 😎

Okay! To begin, here’s a sam­ple gen­er­ated The Nor­mal Way, at con­stant tem­per­a­ture 1.0.

I think it sounds nice, but/and it has the char­ac­ter­is­tic “meander” of sam­ples gen­er­ated by these sys­tems, text and audio alike. They lack long-range structure; they’re not “going” anywhere. It’s not a bad meander, and there are def­i­nitely ways to use this mate­r­ial cre­atively and productively.

But what if we want some­thing dif­fer­ent?

Here’s another sam­ple — same model, same everything — gen­er­ated by sweep­ing the tem­per­a­ture from 0.75 to 1.1 and back:

You can hear the difference, right? It’s not bet­ter or worse, just dif­fer­ent, with more motion — a crescendo into complexity.

Let’s go even further.

This next sam­ple was gen­er­ated using (1) that same temperature sweep, and also (2) a “guiding track”—a tiny loop­ing snip­pet of audio copied from the train­ing data. You’ll hear it. At low temperatures, the guiding track is used to “nudge” the model. (Guardrail? Straightjacket?) As the tem­per­a­ture increases, the guid­ing track’s influ­ence fades until the model isn’t being nudged at all, and is free … to rock out.

At this point, we aren’t using Sam­pleRNN remotely the way it was intended; is this even machine learn­ing anymore? If all we were get­ting out of this com­pu­ta­tional detour was a sine wave, it would be fair to say we were using the wrong tool for the job.

But … we’re get­ting some­thing quite dif­fer­ent!

Of course, we can turn the tem­per­a­ture knob how­ever we want. Let’s try a more complex curve:

And here’s one more sam­ple gen­er­ated using (1) that same tem­per­a­ture curve, but (2) a dif­fer­ent model, this one trained on a bun­dle of synth-y film music. Again, this is straight out of Sam­pleRNN, so expect some burps and squawks:

I mean … a per­son could make some­thing out of that!

None of this sam­pling works real-time, at least not out­side of Google. You can­not yet turn a lit­eral knob and lis­ten while the sound changes — seeking, adjusting, conducting — but that “yet” has an expi­ra­tion date, and I think we’re going to see a very new kind of syn­the­sizer appear very soon now.

This post is also avail­able in Russian.

August 2018, Oakland