LogoPlaying music when someone is typing

It all started when Carmen asked me (something like): "couldn't you play that music from Murder She Wrote at work whenever you're typing?" Given I was on a plane at the time, with nothing much else to do, why not think about it?

Requirement: Play music when someone is typing

Basic idea: play that random annoying music from Murder She Wrote in the office whenever someone is typing

Parts: detect typing + interpret as instructions of when to play music + play music

The hard part is detecting the typing, not because you couldn’t write a program to listen to key presses, but more because people wouldn’t want to install it.

All we actually care about is a ‘key stream’ of activity, which indicates someone is typing, so we don’t care about the individual key presses, or even what the keys were. We can also accept a high false positive rate on individual key press, as long as we smooth out any variability over time.

Some ideas that don't involve having direct access to a key-press stream:

  • Listen to the noise of people typing. Probably technically possible, but hard to get right. Would vary a lot over keyboards, and need calibrated.
  • Detect when people’s hands are over the keyboard and moving.

This last one has legs. We could actually look for moving flesh i.e. we look at the keyboard area, detect the pixels which are probably from someones hands, and look to see if the number of them change rapidly over time. This is a very low-level detector which would ‘detect’ lots of other things, if it wasn’t looking at a keyboard, but is probably good enough for this problem.

We can validate this perhaps by using an existing flesh-detector and feeding it pictures taken from a camera looking at a keyboard and see if it sees changes. The sampling rate would probably need to be quite high because peoples fingers flit over a a keyboard very fast.

To smooth out any noise, we could interpret the %-age of fleshy pixels that are moving as ‘energy’. This could be used with a simple threshold, or used as energy of a 1D ball being thrown up in the air, which we let decay, so that small gaps in energy don’t show up.

We then just need to turn this into either a 'stop playing' or 'start playing' signal. And hire an orchestra. Job done. (Carmen adds: "Also, somebody needs to make a hairdresser’s appointment for everyone typing so that they can all have blue rinse perms done.")

Having a go

Since I'm on holiday, and I've done something similar before, I decided to have a go at the 'flesh-detector' part. Overall, it basically works. The video below shows me using a mirror to get my laptops camera seeing both the keyboard and my hands:

The output is very noisy, and show's a reasonable amount of non-skin pixels, but given I've skipped over quite a few of the steps of the Fleck and Forsyth (described in "Naked People Skin Filter") algorithm, I'm sure it could be cleaned up. For example, I don't try to filter skin-pixels not surrounded by other skin-pixels.

Next steps?

I'm not gonna take this any further, even though it was fun, because I've got plenty of other projects to make progress on. If I was I would most likely look for an existing library and see if that works better. If I couldn't find any better library, the minimum to turn this into a 'typing detector' is probably to do diffs between successive skin-pixel snapshots, and look for motion; something very similar to the method in the article I used as a template.

So, in summary, no need to book the orchestra quite yet.