I’ve been interested in computer vision for quite some time, and the opportunity to tinker with face detection (in an ongoing project to weaponise our beer fridge) was just the the thing to get to grips with the rather awesome bit of technology that is OpenCV.

What is OpenCV?

OpenCV is an open source computer vision project (it’s in the name!) that’s been around for quite some time and is one of the most powerful (and easy to get started with) libraries to start working with computer vision without paying for it.

What makes OpenCV even more powerful is that it has Python bindings, that means you can use almost all of its functionality seamlessly in Python – the syntax and API is one-to-one, so getting to grips with the documentation is pretty straight forward, and there are plenty of examples around to get started with simple object detection.

What is Face.com?

As part of the PyOfSauron project, we make use of the face.com API – Face.com is a free-to-use face recognition API that (quite handily) has an excellent Python wrapper to get started with using it.

The Project: PyOfSauron

As mentioned earlier, the project PyOfSauron was initially meant to help us count how many beer cans were left in the beer fridge – and although that project is still on going – a much more fun side-project emerged to protect the beer fridge with an impromptu security system.

So what does the system do? The main module runs off of watch_me.py, it initiates a loop that starts grabbing images from the first webcam on the system and runs it through one of OpenCV’s detection algorithms, these are pre-set and can be changed in the command line.

The easiest way to detect a face (or any other body part) in OpenCV is to use a haar detector. These are handily encapsulated in easy-to-transport XML files that will do all the work for you. It’s even possible to develop your own Haar profiles with enough sample imagery and patience.

As it passes the images through this filter, the script will trigger a callback event with the details of the detected objects as parameters.

The callbacks are modular, so you can develop your own callback modules to do with the image and data what you like. In this case, we’ve developed the FaceClipper module the most.

The FaceClipper module will take the detected coordinates, create an identity for each (so it can be tracked over time) and draw a crop square around the detected object to feed back to the screen.

In the background, if a face comes close enough to the screen, a capture will be taken and uploaded to Amazon S3 for processing by the Face.com API – if a successful detection has happened, the script will then assign the identity of that face to the tracked object in memory.

Highlights and cool bits

1. Giving a detected object a lifespan – inventing visual memory

One of the key problems with object detection is that the computer is completely ignorant as to what is detected – so when the callback module gets passed a list of the detected objects and their coordinates, it has no way of knowing if item in position 1 is the same as it was in the previous frame – the detected objects have no identity.

To solve this, @errkk and I (mainly @errkk) came up with a simple solution to identify (or assume) whether a detected object is the same as a previous one.

To do this, the system determines a ‘maximum area’ that the face can move in before it is determined to be new – using some simple geometry (an actual use for Pythagoras!) we detect the distance the face has moved from the centre and compare it to any previous objects from the last frame – if there’s a match (within a certain maximum distance) then the face is assigned an ID and will (hopefully) be identified next time.

Each of the identified objects (remembered objects) has a memory decay time so that it is erased from the ‘memory’ list if it hasn’t been renewed (re-identified) within a second (this can be changed, but this seems to work, mainly for those occasions when a face moves out of frame for a second or looks sideways – stopping detection for a frame or two).

2. Recognising a face

Identifying that there is an object in the frame is one thing, but if you are looking for Faces, and then want to have those faces identified, you’re looking at a wwhole new hyper-complicated ballgame, it can be done using something called eigenfaces, but I’d rather use someone else’s tweaked API then try to bend my head around that.

This is where Face.com comes in – Face.com has an excellent face recognition system that is very easy to implement (and fast as well).

The FaceClipper module will take a cropped snapshot of a detected face once it reaches a certain size, this file is then sent to Amazon S3 for online storage and then Face.com is used to try to identify the face.

To ensure a smooth frame rate, this whole process happens in a thread that writes it’s results back to a queue that is checked on every frame, intitially, face recognition took ~5 seconds per face – which is good, but not ideal given how much people move about in frame.

By keeping the http connection open to S3 when the app starts, we managed to shave the recognition time down to ~1.5 seconds – which is much better. Naturally when the connection times out you’ll be back to a ~5 second recognition for one frame.

Getting your hands on the code

The whole project is hosted on BitBucket here – have a go, make sure all the pre-requisites are installed, and if you happen to make a module that works with PyOfSauron, drop me a line on twitter for some show and tell.

happy hacking,
Martin