Input normalization for pattern recognition

Hello codenauts and programmies!!

Like a month ago, while preparing the thanks giving dinner and thinking in nothing and everything, I suddenly remembered an old web page for games online (alone, individual, private, games, no one’s mom was in danger…), Nitrome.

They had their own games, developed, I think, in Flash, Shockwave and Javascript, with a pixely look to them (you gotta love those kind of games). The ideas behind the games were simple but quite entertaining, they totally nailed it.

Driven by curiosity I check on the Play Store to see what they have and marvelously, they had developed games for mobiles too! The first one that caught my attention and the one I want to analyze in this post is named MagicTouch.


It is a quite entertaining game: some knights falling from the sky hanging on balloons trying to get to the tower of your castle. You are a Wizard casting spells to pop the balloons and prevent the knights to get to your tower.

The touch part of it is that each balloon has written “the spell” that has to be casted to pop it. Those spells are weird patterns that you have to draw on the screen, in the mobile version you have to draw them with your finger.

After spending a couple of hours getting addicted playing that thing, I started to think how this game could be implemented.

Quick and dirty: neural networks

Almost immediately, we can think of training a Neural Network to recognize whatever is drawn into the screen and match it to any of the known patterns.

Quite simple and just like character recognition: we let the user draw in the screen while capturing the path (using the x,y coordinates of where the finger is gliding the screen) in a bi dimensional matrix; the pixels activated by the path are ones against a background of zeros, and when the user raises his finger (ends the path) we pass the matrix to the neural network and let it do its magic.

Whichever pattern is found by the network we raise a signal or event (in a signal-slots or observable-observer pattern) and let the Knights in screen receive it and pop the balloon if matched (morbidly they are going to kill themselves…).

What’s a good size?

Thinking of a feed-forward backprop network with M-N-O layers (M input units, N hidden units and O output units) the first step would be to determine the size of the input layer.

Without getting into deep-learning or convoluted networks we would say that there is going to be a matrix containing the drawing and each cell of that matrix will be mapped to an input unit.

The first challenge determining that final size that I can think of is the size of the drawing:


The three drawings above represent the same pattern (a single loop starting from the top), however they all are drawn differently and certainly the resulting matrices have different shapes and sizes too. That for sure would not work for the predefined M inputs.

What if we normalize the picture? From whatever shape the user draws, we scale it and resize it to something more standard to our problem, like a square matrix of the same size (40×40 in the images shown below):


Each pattern shows below its corresponding scale to the “normalized” matrix. What I want to discuss here (and as an important note, without knowledge on image processing) is how to achieve that normalization.

Cutting the cake

The simplest way I can think of doing the resizing is to do a shameless mapping.

Basically cutting the big picture into the same number of pieces I want in the small one and then fill the small matrix accordingly. Let’s take the first drawing as an example. It’s current size is 150 x 184 pixels:


For the sake of the exercise, let’s map that picture into a 8 x 8 matrix. By doing some basic math, we can define a cell size:

150 ÷ 8 = 18.75
184 ÷ 8 = 23

Thus, if we cut the big cake every ~19 pixels wide and every 23 height we will get a cake 8 x 8 pieces cut:


Every section of 19 x 23 pixels of the original picture will be mapped into a single cell of the output matrix:


For example, the cell (0,0) in the output matrix is corresponded by all the pixels in the original picture in the area between (0,0) and (19,23).

The simplest map

I gave several thoughts to what we can call the mapping strategy (how are we translating the original pixels into the output ones), and the simplest one I could think of is this:

For each cell in the output matrix, is there any active pixel (not white) in the original partition?

  • Yes: paint cell
  • No: leave it white

As an example, for the cell (0,0) in the output, is there any active pixel between (0,0) and (19,23) of the original picture? The answer is yes, thus, the output cell (0,0) will be painted. As simple as that!

The full mapping is quite close to what we could expect for it, the top right image is a zoom to the 8×8 result shown in the tiny picture at the bottom:



A fine tune point is the size of the line because with this approach as long as a pixel is drawn we map it, but the definition of the image mapped may be affected leading to some fake results.

The size sometimes matters

Yeah, the subtitle is misleading…

Originally we considered the original image to be a matrix of 150 x 184 cells, using a pixel as measurement unit but the line used to draw is not a pixel wide. We can do one of two things:

  • We display the line wide but we save the path as if it was drawn on a pixel width:
  • We use the point size to do a subsequent calculation.

On the first option, on screen we will be seeing a wide line, but in reality we will be saving a thin path:


So, our original drawing gets mapped into something like this (with 2 pixels less of noise from the original mapping):



On the other hand, if we take the point size as our unit of counting, we have to understand and transform the image and its dimensions slightly differently.

The line is 7 pixels width, so instead of counting the dimensions one pixel as a cell, we count squares of 7 pixels:



Transforming the dimensions:

Original Size Point Size Scale
150 x 184 21 x 26

To transform the image to its point-size scale, we have to apply an averaging strategy. If a transforming cell is occupied more than 50%, then we can paint it in the resulting one.

Applying these rules to the input we get this mapping:


To the resulting image we apply the cake cut to an 8×8 grid, then the simple mapping to get our final normalization:


Processing cost and Improvements

Speed vs Resolution

The three strategies reviewed above give different levels of resolution in the final mapping to the 8×8 matrix, but the higher resolution comes at a cost in processing the pixels on the image, and the bigger the image the more processing required.

Although by gut we know that going from 150 x 184 pixels to a 8 x 8 matrix surely there is going to be some loss of information, the third mapping process is the one that gives the best correlation between the original image and the final compressed one.

The drawback of this last process is that requires extra steps of processing since we have to map the original picture to the point-size scale, then apply the cake-cutting and finally the simple mapping to the output.

Certainly the second strategy gives a good balance between resolution and processing cost while the first one possibly is the fastest one because of the simple-map strategy with the lowest resolution.

Mapping algorithm

The criteria for the simple mapping is to find if any pixel activated within the partition to mark it as active for the output; this sounds like a search algorithm problem: we have a bunch of data and we are going to look for the first appearance of an activated pixel.

We can take the bulk of data in the partition and sort it and check for the first element after the sort:


But that’s expensive since even a quick sort algorithm have to read every element in the data and that’s no difference from doing a linear or a binary search.

Other option is to do divide-and-conquer by divide the partition into smaller search batches; when one of the batches finds an active pixel we don’t have to look again, and now the trick goes to how to do those subdivisions:


However, let’s take a step back and take a look at what we are trying to accomplish with the simple map.

We are taking a drawing hand-made on the screen and regardless its shape or size we are scaling it to a normalized size. The drawing was made by a continuous single motion, when the user loses contact with the screen we then consider that the drawing is done, therefore one of these drawings won’t have interruptions or isolated pixels.

With that in mind, to determine if a partition has to be active on the mapping by checking whether it contains any active pixel we can only verify the border cells.

If a partition is going to have an active pixel, for sure we know that it will be on the borders (since that’s were the partitions connect to each other):


This way we can improve the speed on the simple mapping.


For something so fun as the “Magic Touch” game or pretty much any other game, we can do some abstraction process and we for sure will get some interesting results, and helps to appreciate the genius work of whoever coded it in the first place.

To something simple as “capture the drawing and put it to a neural network” we found multiple ways to achieve this, some of them more efficient or faster than others, some of them pretty specific to the problem (like the border search), some of them gave us a strategy to use in other problems (like the cake-cutting strategy).

Some other ideas that can help improving this approach are:

  • We could implement a set of perceptron units for the partitions and map them in parallel
  • When capturing the drawing, we could save just the direction changes and complement the lines using the Bresenham’s algorithm. ­
  • Or map the direction changes to normalized coordinates (this could be actually really cool to implement) and then complement with Bresenham’s algorithm in the output.

What other ideas you got from all these? Which other games have you analyzed?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s