Click here to Skip to main content
15,393,377 members
Articles / Web Development / HTML
Alternative
Tip/Trick
Posted 15 Feb 2012

Stats

10.5K views
2 bookmarked

Remove all the HTML tags and display a plain text only inside (in case XML is not well formed)

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
15 Feb 2012CPOL
I think the following Regex and HtmlDecode would do:string html = ...;string textonly = HttpUtility.HtmlDecode( Regex.Replace(html, @"|", ""));Any HTML construct that would not be stripped off properly by this?
I think the following Regex and HtmlDecode would do:

C#
string html = ...;
string textonly = HttpUtility.HtmlDecode(
         Regex.Replace(html, @"<!--[\S\s]*?-->|<(?:"".*?""|'.*?'|[\S\s])*?>", ""));


Any HTML construct that would not be stripped off properly by this?

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Andreas Gieriet
Founder eXternSoft GmbH
Switzerland Switzerland
I feel comfortable on a variety of systems (UNIX, Windows, cross-compiled embedded systems, etc.) in a variety of languages, environments, and tools.
I have a particular affinity to computer language analysis, testing, as well as quality management.

More information about what I do for a living can be found at my LinkedIn Profile and on my company's web page (German only).

Comments and Discussions

 
-- There are no messages in this forum --