In soap, byte arrays are encoded as base64 strings so it can look like this:
or with line breaks after each 73 characters, like this:
both options are valid according to the base64 RFC:
Ok so it does not really advocate this... But it is a fact that many soap stacks still use this MIME-originated format and also Wcf supports it.
So what is the problem?
It seems that when Wcf gets a message which contains base64 with CRLF, the processing is slower in a few seconds(!). A drill down shows that the problem is in the DataContract serializer. Take a look at this program:
For those of you who are interested to test this, the files are here and here.
The output is:
This clearly reveals a performance problem.
Why does this happen?
While debugging the .Net source code, I have found this in the XmlBaseReader class (code comments were in the source - they are not mine):
So the data contract serializer tries to read the base64 string, but for some reason succeeds only if the string does not have white spaces inside it (we can further debug to see how that happens but it is exhausting for one post :). The serializer then removes all the white spaces (which requires copying the buffer again) and tries again. This is definitely a performance issue.
Notes:
I have reported this in Microsoft connect, you are welcome to vote this issue up.
Workarounds
There a few workarounds. The trade-offs are generally convenience of API (or "where you prefer to put the 'dirty' stuff").
1. As Valery noticed, you can change the contract to use String instead of byte[]. Then Convert.FromBase64String will give you the byte array.
2. Change your contracts to use the XmlSerializer instead of DataContract serializer. The former does not experience this issue. The XmlSerializer is generally slower (when base64 does not appear that it) so this is what you loose. You get a better API here as clients do not need to manipulate the base64 string.
3. The best of course is to change the service implementation to return base64 without line breaks. Also if large binaries are returned anyway it may be a better idea to employ MTOM.
4. A Wcf custom encoder can strip the spaces from the message before it is deserialized. However this also involves copy of strings and this is beneficial only in rare cases.