Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > C# and C > C# 2008 > C# 2008 aka C# 3.0
Password Reminder
Register
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
C# 2008 aka C# 3.0 Discuss the Visual C# 2008 (aka C# 3.0) language
Welcome to the p2p.wrox.com Forums.

You are currently viewing the C# 2008 aka C# 3.0 section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
 
 
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old March 10th, 2010, 07:55 AM
Authorized User
Points: 135, Level: 2
Points: 135, Level: 2 Points: 135, Level: 2 Points: 135, Level: 2
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jul 2007
Location: , , .
Posts: 29
Thanks: 2
Thanked 0 Times in 0 Posts
Default Using Regular expressions to get xml tag values

Hello everyone,
I am trying to write a utility in C# to cleanup xml tag values for any special characters like
'&',',",< and >.

The creation of the xml file is not controlled by me, hence this approach.

The idea is to first read the contents of the xml file into a string, find and replace any special characters with their equivalent escape chars.

The second step would be to use any of the xml .NET classes or even LINQ to parse and process the cleaned xml string according to the application logic.

Can anyone guide in the process of cleaning the source xml for special characters, cause if the files contents are not cleaned, the parsers will not even load leave alone process the file.

Regards.
  #2 (permalink)  
Old March 10th, 2010, 09:48 PM
Friend of Wrox
 
Join Date: Dec 2008
Location: , , .
Posts: 238
Thanks: 2
Thanked 20 Times in 19 Posts
Default

Check out String.Replace, let me know if you need further help.
  #3 (permalink)  
Old March 10th, 2010, 11:57 PM
Authorized User
Points: 135, Level: 2
Points: 135, Level: 2 Points: 135, Level: 2 Points: 135, Level: 2
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jul 2007
Location: , , .
Posts: 29
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi Peter,
String.Replace would work fine for all the special characters, except the '<' and '>' characters. Since these are part of the tag names themselves, String.Replace would replace these also.
Or is there some other functionality of String.Replace that I am not aware of?
Please guide.

Regards.
  #4 (permalink)  
Old March 11th, 2010, 12:54 AM
Friend of Wrox
 
Join Date: Dec 2008
Location: , , .
Posts: 238
Thanks: 2
Thanked 20 Times in 19 Posts
Default

To bad that you don't have control over the generation of that (invalid) XML file. Let's take '<' as an example. You can try to work '>' out yourself follow a similar solution.

Your valid tags follow this pattern: '<foo(blank)'. What you can do is:
1) Use String.IndexOf to find out the first '<';
2) Use String.IndexOf again to find out the first blank after that '<'. You need to pass in the index you found in above step 1;
3) Use String.Substring to find out what's between '<' and blank, check whether this is one of your valid tag. If it is a valid tag, do nothing; if it is not a valid tag, replace that '<'.

Repeat the above steps until there is no more '<' to be processed. Important: for each subsequent iteration, always continue from the last '<' found. Otherwise you will run into a dead loop and keep find the same '<'.

For '>', take a look at the pattern of your XML, and do something similar.

Good luck!
  #5 (permalink)  
Old March 11th, 2010, 02:23 AM
Authorized User
Points: 135, Level: 2
Points: 135, Level: 2 Points: 135, Level: 2 Points: 135, Level: 2
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jul 2007
Location: , , .
Posts: 29
Thanks: 2
Thanked 0 Times in 0 Posts
Default

Hi Peter,
thanks again for you time and guidance.

I was looking at the regular experssion way and found this expression to give me the value of the tag. One catch here, this expression fails when the value is something like this:
Code:
<ContactFirstName><Elke123@><</ContactFirstName>
It just returns '<' character.
Other than that this expression returns the tag value as expected.
Now, in my source xml there are no nested tags. Guess I am saved for now.
This is how it looks now:
Code:
private string CleanupXml(string[] incidentXml)
        {
            string output = string.Empty;

            for (int i = 0; i < incidentXml.Length; i++)
            {
                output = Regex.Replace(incidentXml[i], "<[^<>]+>", "");
                if (output.Contains("&"))
                {
                    output = output.Replace("&", "&amp;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
                else if (output.Contains("'"))
                {
                    output = output.Replace("'", "&apos;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
                else if (output.Contains("\""))
                {
                    output = output.Replace("\"", "&quot;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
                else if (output.Contains("<"))
                {
                    output = output.Replace("<", "&lt;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
                else if (output.Contains(">"))
                {
                    output = output.Replace("<", "&gt;");
                    incidentXml[i] = incidentXml[i].Replace(incidentXml[i], output);
                }
            }
            StringBuilder sb = new StringBuilder();

            foreach (string item in incidentXml)
            {
                sb.Append(item);
            }

            return sb.ToString();
        }
I was running out of time and had to comeup with some solution.
Hence the above implementation.

Your implementation should allow me to fine tune the search and replace without any hassels. I will keep that for the second release.

Many thanks again.

Regards.
  #6 (permalink)  
Old March 11th, 2010, 11:08 AM
Friend of Wrox
 
Join Date: Dec 2008
Location: , , .
Posts: 238
Thanks: 2
Thanked 20 Times in 19 Posts
Default

The best option is really ask the others to fix their XML.
 


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Regular Expressions WestRowOps Other Programming Languages 1 May 18th, 2007 05:34 AM
XML Schema - Regular Expressions bonekrusher XML 6 March 13th, 2007 07:02 PM
Regular expressions on C# hideway C# 2 November 27th, 2006 05:08 PM
Regular expressions in XML schemas adrianbarry XML 10 April 7th, 2004 07:54 AM
Regular Expressions Dave Doknjas C# 1 August 9th, 2003 12:05 AM



All times are GMT -4. The time now is 10:21 PM.


Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.