Removing Non-Printable Characters

Hi Folks,

I notice BizTalk does not like certain Non-Printable characters when dealing with XML data. I use a pipeline component to strip them out of the stream.

Here is the sample code that you can use in a PipeLine component:

using System;
using System.Collections.Generic;
using System.Text;

namespace FileManagement.BOL.Helper
{
       public  class NonPrintableCharacters
    {
        public static List<char> charList = new List<char>();

        public NonPrintableCharacters ()
        {
            //Refer to http://nemesis.lonestar.org/reference/telecom/codes/ascii.html
            charList.Add((char)Convert.ToInt16("0x01", 16)); //1 Start Of Heading
            charList.Add((char)Convert.ToInt16("0x02", 16)); //2 Start Of Text
            charList.Add((char)Convert.ToInt16("0x03", 16)); //3 End Of Text
            charList.Add((char)Convert.ToInt16("0x04", 16)); //4 End Of Transmission
            charList.Add((char)Convert.ToInt16("0x05", 16)); //5 Enquiry, Also known as WRU (Who aRe You), HERE IS, and Answerback
            charList.Add((char)Convert.ToInt16("0x06", 16)); //6 Acknowledge
            charList.Add((char)Convert.ToInt16("0x07", 16)); //7 Bell
            charList.Add((char)Convert.ToInt16("0x08", 16)); //8 Backspace
            //Line Feed 0x0A and Horizontal Tab 0x09 is allowed
            charList.Add((char)Convert.ToInt16("0x0B", 16)); //11 Vertical Tabulation
            charList.Add((char)Convert.ToInt16("0x0C", 16)); //12 Form Feed,
            //Carriage Return 0x0D is allowed
            charList.Add((char)Convert.ToInt16("0x0E", 16)); //14 Shift Out
            charList.Add((char)Convert.ToInt16("0x0F", 16)); //15 Shift In
            charList.Add((char)Convert.ToInt16("0x10", 16)); //15 Shift In
            charList.Add((char)Convert.ToInt16("0x11", 16)); //17 Device Control 1,Also known as X-ON
            charList.Add((char)Convert.ToInt16("0x12", 16)); //18 Device Control 2
            charList.Add((char)Convert.ToInt16("0x13", 16)); //19 Device Control 3,Also known as X-OFF
            charList.Add((char)Convert.ToInt16("0x14", 16)); //20 Device Control 4
            charList.Add((char)Convert.ToInt16("0x15", 16)); //21 Negative Acknowledge
            charList.Add((char)Convert.ToInt16("0x16", 16)); //22 Sychronous Idle
            charList.Add((char)Convert.ToInt16("0x17", 16)); //23 End of Transmission Block
            charList.Add((char)Convert.ToInt16("0x18", 16)); //24 Cancel
            charList.Add((char)Convert.ToInt16("0x19", 16)); //25 End of Medium
            charList.Add((char)Convert.ToInt16("0x1A", 16)); //26 Substitute
            charList.Add((char)Convert.ToInt16("0x1B", 16)); //27 Escape
            charList.Add((char)Convert.ToInt16("0x1C", 16)); //28 File Separator
            charList.Add((char)Convert.ToInt16("0x1D", 16)); //29 Group Separator
            charList.Add((char)Convert.ToInt16("0x1E", 16)); //30 Record Separator
            charList.Add((char)Convert.ToInt16("0x1F", 16)); //31 Unit Separator
            charList.Add((char)Convert.ToInt16("0x7F", 16)); //127 Delete, Also known as RUB OUT
        }

        public string ReplaceInvalidChars(string mystring, char newChar)
        {
            foreach (char c in charList)
                mystring = mystring.Replace(c, newChar);
            return mystring;
        }

    }
}

You can call this class from lets say when looping through the stream:

StreamReader reader = new StreamReader(stream, myEncoding)

while ((RecordLine = reader.ReadLine()) != null)
           {
               RecordLine = cleanChars.ReplaceInvalidChars(RecordLine, ‘ ‘); 

}

The above will replace the Non-printable characters with a space.

Reason why I use this, is that the XML libraries in .NET support certain Non-Printable characters than BizTalk cannot tolerate.

Hope this is useful. You can do this is a cleaver way with loops on the decimals if you like, made it like this for simplicity.

Advertisements
Uncategorized

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s