Sunday, June 10, 2012

12 common wcf interop confusions

@YaronNaveh

I get mails almost on a daily basis from people asking me how to build a Wcf client to consume a service of framework X (usually axis or wsit but others as well). After getting hundreds of these mails in the recent years I conclude that there is a single most common setting which most people need. There are also common confusions that a lot of people stumble on in their first try. In this post I will present the common setting, and what can (and will) go wrong.

The mails I get usually start with this soap sample which people want wcf to send:


Optionally ssl is also used.

The wcf setting required here is a custom binding with an authentication mode of "mutualCertificate":


(where https may be used instead of http)

Confusion 1: A wrong soap version by you can cause the server to return different kinds of exceptions. Make sure the "messageVersion" property on the textEncodingElement fits your needs. In particular if no ws-addressing headers are used (To, ReplyTo, MessageID and Action) then use "Soap11" (as above) or "Soap12" without any ws-addressing version.

Confusion 2: The proxy may throw this exception:

The client certificate is not provided. Specify a client certificate in ClientCredentials.

That's an easy one, you must confiugure a client certificate which is pretty basic for a signing scenario. You can do it from code or config. Here is the config version:


Confusion 3: You still get the same error after you have defined the certificate. In this case make sure you have configured the endpoint to use the behavior:


Confusion 4: When I use mutualCertificate authentication mode I define my client certificate. I do not have a server certificate to define. My proxy is not sending anything and throws this error:

The service certificate is not provided for target 'http://localhost/MyWebServices/Services/SimpleService.asmx'. Specify a service certificate in ClientCredentials.

The issue is that mutualCertificate always requires you to define a server certificate. In some cases you may not need it. In such cases it is ok to define some dummy certificate as the server certificate, even can be the same certificate you use for the client:


Of course you may also do so from code.

Confusion 5: You may get this error:

"The X.509 certificate CN=WSE2QuickStartServer chain building failed. The certificate that was used has a trust chain that cannot be verified. Replace the certificate or change the certificateValidationMode. A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider.\r\n"

This typically mean the server certificate you have defined is not trusted by your machine. In the case that you have defined a dummy server certificate (see confusion 3) or in other cases - at your risk and for testing purpose only - you can turn off this validation by setting certificateValidationMode to None.


Confusion 6: I am getting a good response from the server but the proxy throws this exception:

The incoming message was signed with a token which was different from what used to encrypt the body. This was not expected.


Congratulations, turns out you need to define a real server certificate anyway (so confusion 2 does not apply). You should get it from the service author. But if you don't there a nice trick to infer the certificate by extracting the value of the binary security token from the message and saving it to disk (in binary form) as alluded here.


Confusion 7: I am getting a good response from the server but the proxy throws this exception:

Security processor was unable to find a security header in the message. This might be because the message is an unsecured fault or because there is a binding mismatch between the communicating parties. This can occur if the service is configured for security and the client is not using security.

This means the service is not signing the response even though you sent a signed request. In .Net 4+ you can turn off the secured response requirement by toggling the security channel in your custom binding:


Confusion 8: When I use mutualCertificate I see my proxy sends a message in a very different from what I need. In particular there is no signature but only encryption, something like this:


What you need to know is that by default messages will be signed AND encrypted, and moreover the encryption will also encrypt the signature and "hide" it from your eyes. The solution is to set the correct protection level on your contract:


btw while interoperating with some java stacks you will know you are in confusion #8 if you get this error:

General security error (WSSecurityEngine: No crypto property file supplied for decryption)

Confusion 9: After applying the mitigation to confusion 7 the outgoing message is still not in the desired format. In particular the message is not signed by the binary token by a derived token, and there is a primary and a secondary signature instead of just one:



For all things interop wssecurity10 is your friend and wssecurity11 is the enemy. keep your friends close! Make sure the messageSecurityVersion attribute has a value that starts with wssecurity10:


Confusion 10: You get this error :

Identity check failed for outgoing message. The expected DNS identity of the remote endpoint was 'localhost' but the remote endpoint provided DNS claim 'WSE2QuickStartServer'. If this is a legitimate remote endpoint, you can fix the problem by explicitly specifying DNS identity 'WSE2QuickStartServer' as the Identity property of EndpointAddress when creating channel proxy.

You fell for the oldest trick in the book! Just do exactly what the error tells you to do . Yes, it's ok...

Confusion 11: You get a good response from the server but the proxy throws this error:

No Timestamp is available in security header to do replay detection

or this one:

The security header element ‘timestamp’ with ‘Timestamp-xxxx’ id must be signed.

These may happen when you send to the server a signed timestamp so wcf expects to get one back AND to have it signed. So either you do not get one back or it is not signed. For start try to set the includeTimestamp property on the "security" binding element to false. But this will not work if the server actually requires a timestamp. If it requires one but unsigned then write a custom encoder to you proxy and manually generate and push the timestamp header to the request. If the server requires a signed timestamp then your only hope is to set allow unsecured response to true (.net 4 only):



AND to strip out ANY remains of the "security" tag from the response (not just the timestamp) using a custom encoder. If WCF will see the security tag then it will be very defensive and try to validate it. Of course if the security tag which you removed contains some signature this means you will not be able to validate it, which is a shame. I'm not familiar with any better workaround at this moment, so I'm investigating a few directions.

Confusion 12: Ssl is used, and you try certificateOverTransport instead of mutualCertificate authentication mode on your custom binding. You may get away with the request, since it is similar, but once the response come back you may experience:

Cannot find a token authenticator for the 'System.IdentityModel.Tokens.X509SecurityToken' token type. Tokens of that type cannot be accepted according to current security settings.

What's going on here? certificateOverTransport assumes the client authenticates with a message level certificate, but the server authenticates with its transport ssl certificate. However a more common use case is that the server also authenticates with a message level certificate, in addition to its transport one. You could identify such scenario by seeing a signature element in the server response. This means you need a mutualCertificate authentication mode together with an https transport binding element:


Summary
When Wcf consumes third party services, the most common authenticationMode would be "mutualCertifiate". Make sure you tried all combinations of this setting before trying other settings. Of course if you are in a situation where mutualCertificate clearly does not apply (e.g. username authentication) then this is not relevant for you. But even when usernames are used they may still be in combination with a client certificate, in which case it would still make sense to SecurityBindingElement.CreateMutualCertificateBindingElement() for bootstrap and add the username as a supporting token.

@YaronNaveh

What's next? get this blog rss updates or register for mail updates!

Thursday, June 7, 2012

Tesseract training cheatsheat

@YaronNaveh

As I wrote last time, tesseract-ocr is an open source ocr library originaly developed by
hp labs and today maintained by google. tesseract can be trained by you to support more languages and fonts. I have trained tesseract to read my hand writing and got success of over 90% - though this still means that once in every 10 characters or so there is an error. This page explains how you can train tesseract by yourself. This post will share some of the conclusions and pitfalls I have found from my experiment.





  • the tesseract training page is your friend. Follow the instructions!

  • The instructions work. Even if they are full of details and you are sure you (or the authors) got something wrong, remember that if you follow them carefully they will work for you.

  • One can't underestimate the important of a high quality picture for the ocr process. Use a good camera, make sure there is enough light. If you control the written text then a good paper and a thick pen are also helpful.

  • Scanner apps for mobile (CamScanneris my favirite) are critical, though they do not replace the need for a quality picture.

  • If you write on a blank paper, all text should be well aligned to virtual lines. Also no letter should stand out of the line (for example watch out not to write the letter 'p' too high over other letters.)

  • If you control the written text, consider to develop your own "font" so some of the ambiguous letters are really differentiated. For example I have decided to put an underline bellow all numbers and also under the letter n which can be mistaken for h or r.

  • I used the jTessBoxEditor for the box files. Its advantage over the other editors is that it supports multi page tiff files, which can be a good process to follow.

  • When you auto generate the box file the generated file may not identify some letters - that is in contrast of getting a letter wrong, it will not identify that there is a letter at all. From my experience there is no point in manually adding a box on that letter since it will never be identified. If too much letters are not identified you need to improve the quality of the photo or the ink and also make sure these letters are aligned in the same line as other letters.

  • if text lines start in the middle of a row, or if they are not nicely aligned one under the other, then there is a good chance tesseract will get them wrong.

  • sometimes tesseact got wrong the last line, but when I added a dummy line below it the real last line worked well.

  • when trying to recognice multi line text I got better resutls than when trying on a single line.

  • when trying on a single line I got better results when the image I used was not too large (so if the camera creats big pictures it is better to resize them)

  • when you create a box file make sure to use some existing language dictionary (if there is one) to bootstrp the identification. it does not matter which language you use since tesseract only uses it to generate the box file and it will not affect the final dictionary.

  • ImageMagick can be used to add some image to a multipage tiff file:

    convert.exe img1.bmp img2.jpg -adjoin res.tiff


    Common Errors



    Error: Illegal short name for a feature!
    signal_termination_handler:Error:Signal_termination_handler called:Code 2000

    I got this error after the .box file got corrupted for some reason. I have opened it and using "binary search" I deleted a different part of it every time and tried to build it again, until I found the wrong line. Typically the wrong line is because tesseract is identifying some very tiny dots as letters.

    Writing Merged Microfeat ...Warning: no protos/configs for { in

    CreateIntTemplates()
    Class->NumConfigs == this->fontset_table_.get(Class->font_set_id).size:Error:Assert failed:in file ..\classify\intproto.cpp, line 1312

    As stated here, tesseract 3.0.1 only supports one image per font. It actually crashs when you try to use another image (exp2). you may want to use multipage tiff file if you need multiple images. this way you can always push more images to an existing font without loosing the previous coordinates. Generating a box file for the new tiff will override the existing one (which you have probably manually fixed) so I have built a utility to backup the previous one and copy all values from previous tiff pages to the newly generated box file.

    read_params_file: Can't open batch.nochop

    The Windows executable package does not include the configs. You will need to copy the 'tessdata' from the source distribution to the same directory as tesseract.exe to perform training (e.g. the source has two folder under tessdata which we need, configs, tessconfigs)

    tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file adaptmatch.cpp, line 512 Segmentation fault

    You did not follow documentation - before unificying to traindata you need to:
    "All you need to do now is collect together all (normproto, Microfeat, inttemp, pffmtable) the files and rename them with a lang. prefix..."

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!
  • Monday, June 4, 2012

    Code Obscura: Executing pictures of code

    @YaronNaveh

    Pen and paper have always been loyal friends to human. Smartphones and tablets are trying to change this reality. Instead, they should embrace it!

    Take a look at the following concept video I made:



    (the beginning of the movie also introduces CamScanner by IntSig Information, a great scanner app)

    Cool! Is it useful?
    Suppose you're seating in a restaurant with some friends. Suddenly you're debating what's the 26th fibonacci number is. What's easier then taking out a piece of paper and writing this:


    What if you could pull out your smart phone, take a picture of the code, and then...this:


    Now assume you've just finished some design meeting in your day job and the white board is full of complex algorithms:


    Don't you wish you could just *run* them?

    There are plenty of other occasions where executing written code can come in handy like job interviews, university courses, and all things serendipity.

    One known limitation of my approach is that instagram photos are not supported. Live with that :)


    How does it work?
    First note that CodeObscura is a concept plus a prototype. For now it is not a product which you can install. I'd love to get your feedback on this concept.

    As you can see CodeObscura takes pictures of code and performs ocr on them. It then executes the recognized text as code on a node.js instance and reports back the results. Most of this takes place on a cloud - you just need to stay tuned near your mobile.

    The hardest part in implementing a product like this is to perform ocr on handwritten text. I have not seen a product or a library that does this in a rock solid manner. Fortunately, we do not need to recognize arbitrary text. We should be fine telling our users (us) to write the code using well separated letters and to be careful with some known ambiguous letters. Nevertheless ocr is still the achilles heel of the concept.

    Building your own CodeObscura
    About 95% percent of your time should be dedicated to the ocr part. I chose to use Tesseract - an open source ocr library originally developed by HP labs and today maintained by google. Tesseract will not identify your hand writing up front. You will need to train it. Since I've been training tesseract for some time now I know you would appreciate these tips:

  • the tesseract training page is your friend. Follow the instructions!

  • The instructions work. Even if they are full of details and you are sure you (or the authors) got something wrong, remember that if you follow them carefully they will work for you.

  • One can't underestimate the importance of a high quality picture for the ocr process. Use a good camera, make sure there is enough light. A good paper and a thick pen are also very important.

  • Scanner apps for mobile (CamScanner is my favirite) are critical, but do not replace the need for a quality picture.

  • If you write on a blank paper, all text should be well aligned to virtual lines. Also no letter should stand out of the line (for example watch out not to write the letter 'p' too high over other letters.)

    I promise to come up with a more massive tesseract cheatsheet soon.

    OCR-H - An ocr-friendly font for humans
    The problem of understanding hand written text needs to cope with many inherent ambiguities. OCR-A is a font invented in the late 60s to make it easy for ocr enabled devices to scan and understand text. This was a machine font so we cannot expect humans to follow it. To accelerate the recognition of hand written texts by commodity ocr libraries I have invented OCR-H - a "font" meant to be written by humans. Of course two letters written even by the same human are never the same, so OCR-H is more of a high level style and shape for characters to make them unique enough for a computer.

    OCR-H rules:

  • letters a-z are written as you typically write them
  • one exception is the letter n which is written with an underscore
  • all numbers are written with underscore as well as many punctuation signs
  • the signs ; + - * are written with a circle around them

    For example:



    This is just the first brush on ocr-h. I have found it to increase the success rate of commodity ocr libararies.

    Mobile app
    This part is pretty straight forward so I will not go into details. I used android, so the key part is to register the app for the "Send" event so that it appears in the list of options when you share a picture:





    then you can access the image when your activity starts like this:


    all you need to do now is to send the image to your cloud server as binary http payload and display the message you get back to the user.

    Server side
    I used a very basic node.js server side here. Not a lot to say about it except that at the moment it calls tesseract as a separate process which is not very scalable. Also eval() may raise some security concerns. You can see the rest here:


    Now what?
    Code Obscura already has a prototype I have written. It is pretty cool to take photos on a mobile phone - miles meters away from a PC - and execute them on the fly. Sure, there are a few humorous use cases for it, but I believe there's a real reason to take this idea a real step further. It is a fact that writing on a paper is much faster than typing on a mobile device. Combine that with a strong ocr library - I wonder if our next IDE will be a pen and a paper.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!
  • Saturday, May 26, 2012

    Wcf.js message level signature? Check.

    @YaronNaveh

    This is a very exciting moment for Wcf.js. It now supports one of the WS-Security most common scenarios - x.509 digital signatures. This is the first WS-Security implementation ever in javascript to support this. This implementation relies on xml-crypto on which I told you last time.

    Look at any of the following Wcf bindings:



    Assume only signatures are used (no encryption):


    Then a soap request would look like this:


    You can now interoperate with such services from javascript using Wcf.js with this code:


    Note that a pem formatted certificate needs to be used. Wcf likes pfx formats more, so check out the instructions here on how to do the conversion.

    You should also be aware that Wcf.js by default does no validate incoming signatures from the server. If you wish to validate them check out the sample here.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Sunday, May 20, 2012

    Xml-Crypto: An Xml digital signature library for Node.js

    @YaronNaveh

    Get xml-crypto on github

    Node.js does not always have the right libraries for Xml operations. When such libraries exist they are not always cross platform (read: work on windows). I've just published xml-crypto, the first xml digital signature library for node. As a bonus this library is written in pure javascript so it is cross platform.

    What is Xml Digital signature?
    There's a tl;dr version here. The essence is that dig-sig allows to protect content from unauthorized modification by telling us who created that content and if anyone had altered it since. Xml dig-sig is a special flavour which has some interesting implementation aspects.

    A typical xml signature looks like this:


    Installing Xml-Crypto

    Install with npm:

    npm install xml-crypto

    A pre requisite it to have openssl installed and its /bin to be on the system path. I used version 1.0.1c but it should work on older versions too.

    Signing an xml document

    Use this code:


    The result wil be:


    Note:

    sig.getSignedXml() returns the original xml document with the signature pushed as the last child of the root node (as above). This assumes you are not signing the root node but only sub node(s) otherwise this is not valid. If you do sign the root node call sig.getSignatureXml() to get just the signature part and sig.getOriginalXmlWithIds() to get the original xml with Id attributes added on relevant elements (required for validation).

    Verifying a signed document

    You can use any dom parser you want in your code (or none, depending on your usage). This sample uses xmldom so you should install it first:

    npm install xmldom

    Then run:


    Note:

    The xml-crypto api requires you to supply it separately the xml signature ("<Signature>...</Signature>", in loadSignature) and the signed xml (in checkSignature). The signed xml may or may not contain the signature in it, but you are still required to supply the signature separately.

    Supported Algorithms

    The first release always uses the following algorithems:

  • Exclusive Canonicalization http://www.w3.org/2001/10/xml-exc-c14n#
  • SHA1 digests http://www.w3.org/2000/09/xmldsig#sha1
  • RSA-SHA1 signature algorithm http://www.w3.org/2000/09/xmldsig#rsa-sha1

    you are able to extend xml-crypto with further algorithms. I will author a post about it soon.

    Key formats

    You need to use .pem formatted certificates for both signing and validation. If you have pfx x.509 certificates there's an easy way to convert them to pem. I will author a post about this soon.

    The code

    Get xml-crypto on github

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!
  • Saturday, May 5, 2012

    How to fix Wcf cache of dynamic Wsdls

    @YaronNaveh

    One of the least used Wcf extension points is IWsdlExportExtension. This extension allows to customize the wsdl document which Wcf emits. Since you rarely want to do that, this extension is not commonly used. When it is already used it is usually in the context of flattening the wsdl. A different use case I have recently seen is to push dynamic content into the wsdl. More specifically a user was trying to generate xsd schemas from a live database table and to put it to the wsdl so clients would always get the latest schema. The Wcf service itself was treating the request as Xml anyway so it did not care for such changes. The requirement was for the wsdl to reflect the latest db changes at any time. Our problem was that once the wsdl was generated for the first time it would not be regenerated. This resulted in a stale schema.

    This is how we created the wsdl exporter:


    When we run this service and open the wsdl we get this:


    When we refresh the wsdl after a few seconds we still get this:


    This is not a browser or proxy cache. Wcf does not recreate the wsdl - which can also be seen by putting a breakpoint (which is only called once) on the exporter.

    This behavior makes since when you consider the case where there is no importer extension - then the wsdl is generated based on the data contract assembly, and as long as that assembly does not change the wsdl will not also. However we have chose to put dynamic logic in ExportEndpoint method so that default behavior did not work well for us.

    One way to fix that is to use a message inspector to update the wsdl before it is sent to the client. In this case IWsdlExportExtension is not required at all. This approach is described here.
    An alternative could be to build a Wcf rest endpoint in the same service to act as a proxy to the real wsdl.

    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!

    Tweets of the week: It's DevOps Borat

    @YaronNaveh

    Borat likes DevOps:



    Soap / Rest / Wsdl still a hot topic:





    @YaronNaveh

    What's next? get this blog rss updates or register for mail updates!