Yaron Naveh's Web Services 2.0 Blog: Code Obscura: Executing pictures of code

@YaronNaveh

Pen and paper have always been loyal friends to human. Smartphones and tablets are trying to change this reality. Instead, they should embrace it!

Take a look at the following concept video I made:

(the beginning of the movie also introduces CamScanner by IntSig Information, a great scanner app)

Cool! Is it useful?
Suppose you're seating in a restaurant with some friends. Suddenly you're debating what's the 26th fibonacci number is. What's easier then taking out a piece of paper and writing this:

What if you could pull out your smart phone, take a picture of the code, and then...this:

Now assume you've just finished some design meeting in your day job and the white board is full of complex algorithms:

Don't you wish you could just *run* them?

There are plenty of other occasions where executing written code can come in handy like job interviews, university courses, and all things serendipity.

One known limitation of my approach is that instagram photos are not supported. Live with that :)

How does it work?
First note that CodeObscura is a concept plus a prototype. For now it is not a product which you can install. I'd love to get your feedback on this concept.

As you can see CodeObscura takes pictures of code and performs ocr on them. It then executes the recognized text as code on a node.js instance and reports back the results. Most of this takes place on a cloud - you just need to stay tuned near your mobile.

The hardest part in implementing a product like this is to perform ocr on handwritten text. I have not seen a product or a library that does this in a rock solid manner. Fortunately, we do not need to recognize arbitrary text. We should be fine telling our users (us) to write the code using well separated letters and to be careful with some known ambiguous letters. Nevertheless ocr is still the achilles heel of the concept.

Building your own CodeObscura
About 95% percent of your time should be dedicated to the ocr part. I chose to use Tesseract - an open source ocr library originally developed by HP labs and today maintained by google. Tesseract will not identify your hand writing up front. You will need to train it. Since I've been training tesseract for some time now I know you would appreciate these tips:

the tesseract training page is your friend. Follow the instructions!

The instructions work. Even if they are full of details and you are sure you (or the authors) got something wrong, remember that if you follow them carefully they will work for you.

One can't underestimate the importance of a high quality picture for the ocr process. Use a good camera, make sure there is enough light. A good paper and a thick pen are also very important.

Scanner apps for mobile (CamScanner is my favirite) are critical, but do not replace the need for a quality picture.

If you write on a blank paper, all text should be well aligned to virtual lines. Also no letter should stand out of the line (for example watch out not to write the letter 'p' too high over other letters.)

I promise to come up with a more massive tesseract cheatsheet soon.

OCR-H - An ocr-friendly font for humans
The problem of understanding hand written text needs to cope with many inherent ambiguities. OCR-A is a font invented in the late 60s to make it easy for ocr enabled devices to scan and understand text. This was a machine font so we cannot expect humans to follow it. To accelerate the recognition of hand written texts by commodity ocr libraries I have invented OCR-H - a "font" meant to be written by humans. Of course two letters written even by the same human are never the same, so OCR-H is more of a high level style and shape for characters to make them unique enough for a computer.

OCR-H rules:

letters a-z are written as you typically write them

one exception is the letter n which is written with an underscore

all numbers are written with underscore as well as many punctuation signs

the signs ; + - * are written with a circle around them

For example:

This is just the first brush on ocr-h. I have found it to increase the success rate of commodity ocr libararies.

Mobile app
This part is pretty straight forward so I will not go into details. I used android, so the key part is to register the app for the "Send" event so that it appears in the list of options when you share a picture:

then you can access the image when your activity starts like this:

all you need to do now is to send the image to your cloud server as binary http payload and display the message you get back to the user.

Server side
I used a very basic node.js server side here. Not a lot to say about it except that at the moment it calls tesseract as a separate process which is not very scalable. Also eval() may raise some security concerns. You can see the rest here:

Now what?
Code Obscura already has a prototype I have written. It is pretty cool to take photos on a mobile phone - ~~miles~~ meters away from a PC - and execute them on the fly. Sure, there are a few humorous use cases for it, but I believe there's a real reason to take this idea a real step further. It is a fact that writing on a paper is much faster than typing on a mobile device. Combine that with a strong ocr library - I wonder if our next IDE will be a pen and a paper.

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Yaron Naveh's Web Services 2.0 Blog

Pages

Monday, June 4, 2012

Code Obscura: Executing pictures of code

1 comments:

About Me

Blog Archive