Take a look at the following concept video I made:
(the beginning of the movie also introduces CamScanner by IntSig Information, a great scanner app)
Cool! Is it useful?
Suppose you're seating in a restaurant with some friends. Suddenly you're debating what's the 26th fibonacci number is. What's easier then taking out a piece of paper and writing this:
What if you could pull out your smart phone, take a picture of the code, and then...this:
Now assume you've just finished some design meeting in your day job and the white board is full of complex algorithms:
Don't you wish you could just *run* them?
There are plenty of other occasions where executing written code can come in handy like job interviews, university courses, and all things serendipity.
One known limitation of my approach is that instagram photos are not supported. Live with that :)
How does it work?
First note that CodeObscura is a concept plus a prototype. For now it is not a product which you can install. I'd love to get your feedback on this concept.
As you can see CodeObscura takes pictures of code and performs ocr on them. It then executes the recognized text as code on a node.js instance and reports back the results. Most of this takes place on a cloud - you just need to stay tuned near your mobile.
The hardest part in implementing a product like this is to perform ocr on handwritten text. I have not seen a product or a library that does this in a rock solid manner. Fortunately, we do not need to recognize arbitrary text. We should be fine telling our users (us) to write the code using well separated letters and to be careful with some known ambiguous letters. Nevertheless ocr is still the achilles heel of the concept.
Building your own CodeObscura
About 95% percent of your time should be dedicated to the ocr part. I chose to use Tesseract - an open source ocr library originally developed by HP labs and today maintained by google. Tesseract will not identify your hand writing up front. You will need to train it. Since I've been training tesseract for some time now I know you would appreciate these tips:
I promise to come up with a more massive tesseract cheatsheet soon.
OCR-H - An ocr-friendly font for humans
The problem of understanding hand written text needs to cope with many inherent ambiguities. OCR-A is a font invented in the late 60s to make it easy for ocr enabled devices to scan and understand text. This was a machine font so we cannot expect humans to follow it. To accelerate the recognition of hand written texts by commodity ocr libraries I have invented OCR-H - a "font" meant to be written by humans. Of course two letters written even by the same human are never the same, so OCR-H is more of a high level style and shape for characters to make them unique enough for a computer.
OCR-H rules:
For example:
This is just the first brush on ocr-h. I have found it to increase the success rate of commodity ocr libararies.
Mobile app
This part is pretty straight forward so I will not go into details. I used android, so the key part is to register the app for the "Send" event so that it appears in the list of options when you share a picture:
then you can access the image when your activity starts like this:
all you need to do now is to send the image to your cloud server as binary http payload and display the message you get back to the user.
Server side
I used a very basic node.js server side here. Not a lot to say about it except that at the moment it calls tesseract as a separate process which is not very scalable. Also eval() may raise some security concerns. You can see the rest here:
Now what?
Code Obscura already has a prototype I have written. It is pretty cool to take photos on a mobile phone -
1 comments:
Amazing!
Post a Comment