Wrox Programmer Forums
Go Back   Wrox Programmer Forums > C# and C > C# 2008 > C# 2008 aka C# 3.0
|
C# 2008 aka C# 3.0 Discuss the Visual C# 2008 (aka C# 3.0) language
Welcome to the p2p.wrox.com Forums.

You are currently viewing the C# 2008 aka C# 3.0 section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old March 10th, 2010, 07:55 AM
Authorized User
 
Join Date: Jul 2007
Posts: 29
Thanks: 2
Thanked 0 Times in 0 Posts
Default Using Regular expressions to get xml tag values

Hello everyone,
I am trying to write a utility in C# to cleanup xml tag values for any special characters like
'&',',",< and >.

The creation of the xml file is not controlled by me, hence this approach.

The idea is to first read the contents of the xml file into a string, find and replace any special characters with their equivalent escape chars.

The second step would be to use any of the xml .NET classes or even LINQ to parse and process the cleaned xml string according to the application logic.

Can anyone guide in the process of cleaning the source xml for special characters, cause if the files contents are not cleaned, the parsers will not even load leave alone process the file.

Regards.
 
Old March 10th, 2010, 09:48 PM
Friend of Wrox
 
Join Date: Dec 2008
Posts: 238
Thanks: 2
Thanked 20 Times in 19 Posts
Default

Check out String.Replace, let me know if you need further help.
 
Old March 10th, 2010, 11:57 PM
Authorized User
 
Join Date: Jul 2007
Posts: 29
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi Peter,
String.Replace would work fine for all the special characters, except the '<' and '>' characters. Since these are part of the tag names themselves, String.Replace would replace these also.
Or is there some other functionality of String.Replace that I am not aware of?
Please guide.

Regards.
 
Old March 11th, 2010, 12:54 AM
Friend of Wrox
 
Join Date: Dec 2008
Posts: 238
Thanks: 2
Thanked 20 Times in 19 Posts
Default

To bad that you don't have control over the generation of that (invalid) XML file. Let's take '<' as an example. You can try to work '>' out yourself follow a similar solution.

Your valid tags follow this pattern: '<foo(blank)'. What you can do is:
1) Use String.IndexOf to find out the first '<';
2) Use String.IndexOf again to find out the first blank after that '<'. You need to pass in the index you found in above step 1;
3) Use String.Substring to find out what's between '<' and blank, check whether this is one of your valid tag. If it is a valid tag, do nothing; if it is not a valid tag, replace that '<'.

Repeat the above steps until there is no more '<' to be processed. Important: for each subsequent iteration, always continue from the last '<' found. Otherwise you will run into a dead loop and keep find the same '<'.

For '>', take a look at the pattern of your XML, and do something similar.

Good luck!
 
Old March 11th, 2010, 02:23 AM
Authorized User
 
Join Date: Jul 2007
Posts: 29
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi Peter,
thanks again for you time and guidance.

I was looking at the regular experssion way and found this expression to give me the value of the tag. One catch here, this expression fails when the value is something like this:
Code:
<ContactFirstName><Elke123@><</ContactFirstName>
It just returns '<' character.
Other than that this expression returns the tag value as expected.
Now, in my source xml there are no nested tags. Guess I am saved for now.
This is how it looks now:
Code:
private string CleanupXml(string[] incidentXml)
        {
            string output = string.Empty;

            for (int i = 0; i < incidentXml.Length; i++)
            {
                output = Regex.Replace(incidentXml[i], "<[^<>]+>", "");
                if (output.Contains("&"))
                {
                    output = output.Replace("&", "&amp;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
                else if (output.Contains("'"))
                {
                    output = output.Replace("'", "&apos;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
                else if (output.Contains("\""))
                {
                    output = output.Replace("\"", "&quot;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
                else if (output.Contains("<"))
                {
                    output = output.Replace("<", "&lt;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
                else if (output.Contains(">"))
                {
                    output = output.Replace("<", "&gt;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
            }
            StringBuilder sb = new StringBuilder();

            foreach (string item in incidentXml)
            {
                sb.Append(item);
            }

            return sb.ToString();
        }
I was running out of time and had to comeup with some solution.
Hence the above implementation.

Your implementation should allow me to fine tune the search and replace without any hassels. I will keep that for the second release.

Many thanks again.

Regards.
 
Old March 11th, 2010, 11:08 AM
Friend of Wrox
 
Join Date: Dec 2008
Posts: 238
Thanks: 2
Thanked 20 Times in 19 Posts
Default

The best option is really ask the others to fix their XML.





Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Regular Expressions WestRowOps Other Programming Languages 1 May 18th, 2007 05:34 AM
XML Schema - Regular Expressions bonekrusher XML 6 March 13th, 2007 07:02 PM
Regular expressions on C# hideway C# 2 November 27th, 2006 05:08 PM
Regular expressions in XML schemas adrianbarry XML 10 April 7th, 2004 07:54 AM
Regular Expressions Dave Doknjas C# 1 August 9th, 2003 12:05 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.