Click here to Skip to main content
15,394,463 members
Articles / Web Development / HTML
Tip/Trick
Posted 7 Oct 2014

Stats

32.6K views
12 bookmarked

How to Automatically Close Un-closed HTML Tags using C# for ASP.NET Web Applications

Rate me:
Please Sign up or sign in to vote.
5.00/5 (10 votes)
7 Oct 2014CPOL1 min read
We need this script for database based ASP.NET websites for using HTML content in post pages.

Introduction

Sometimes we want to get the summary of a full HTML article/post to show some lines of that in the main page. Therefore, if we cut the HTML string from the middle like a regular string, we have so many un-closed open HTML tags. So what happens is that the browser cannot find the correct closing tags for the open tags. For example, if we have an un-closed tag like <div>, we should close it. If not <div> will be closed by the next </div> outside the post area and posts will be arranged together.

Background

In this simple script, I use two regular expressions to export and compare tags, one for the start tag and one for the end tag. Then I make a reverse order for the start tag list. See the below to imagine this:

Order

Start Tag List End Tag (false) End Tag (true)
Normal Reverse Normal Normal
1 <html> <p>
</p>
</p>
2 <div> <input> </input> </input>
3 <span style=”color:red;”>
<form>
</form> </form>
4 <form> <span style=”color:red;”> NO END TAG </span>
5 <input> <div> NO END TAG </div>
6 <p>
<html>
NO END TAG </html>

The code is as follows:

C#
public static string AutoCloseHtmlTags(string inputHtml)
{
    var regexStartTag = new Regex(@"<(!--\u002E\u002E\u002E--|!DOCTYPE|a|abbr|" + 
          @"acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|big" + 
          @"|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|" + 
          @"command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|" + 
          @"figcaption|figure|font|footer|form|frame|frameset|h1> to <h6|head|" + 
          @"header|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|" + 
          @"map|mark|menu|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|" + 
          @"output|p|param|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|" + 
          @"source|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|" + 
          @"tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr)(\s\w+.*(\u0022|'))?>");
    var startTagCollection = regexStartTag.Matches(inputHtml);
    var regexCloseTag = new Regex(@"</(!--\u002E\u002E\u002E--|!DOCTYPE|a|abbr|" + 
          @"acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|" + 
          @"big|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|" + 
          @"command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|" + 
          @"figcaption|figure|font|footer|form|frame|frameset|h1> to <h6|head|header" + 
          @"|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|map|mark|menu|" + 
          @"meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|" + 
          @"progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|span|strike|" + 
          @"strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|" + 
          @"time|title|tr|track|tt|u|ul|var|video|wbr)>");
    var closeTagCollection = regexCloseTag.Matches(inputHtml);
    var startTagList = new List<string>();
    var closeTagList = new List<string>();
    var resultClose = "";
    foreach (Match startTag in startTagCollection)
    {
        startTagList.Add(startTag.Value);
    }
    foreach (Match closeTag in closeTagCollection)
    {
        closeTagList.Add(closeTag.Value);
    }
    startTagList.Reverse();
    for (int i = 0; i < closeTagList.Count; i++)
    {
        if (startTagList[i] != closeTagList[i])
        {
            int indexOfSpace = startTagList[i].IndexOf(
                     " ", System.StringComparison.Ordinal);
            if (startTagList[i].Contains(" "))
            {
                startTagList[i].Remove(indexOfSpace);
            }
            startTagList[i] = startTagList[i].Replace("<", "</");
            resultClose += startTagList[i] + ">";
            resultClose = resultClose.Replace(">>", ">");
        }
    }
    return inputHtml + resultClose;
} 

Please let me know about your ideas...

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Milad Ashrafi
Software Developer (Senior)
Iran (Islamic Republic of) Iran (Islamic Republic of)
No Biography provided

Comments and Discussions

 
GeneralMy vote of 2 Pin
CoolVini7-Jan-15 23:34
MemberCoolVini7-Jan-15 23:34 
QuestionAlternative solution using HtmlAgilityPack Pin
Vahid_N7-Oct-14 22:25
MemberVahid_N7-Oct-14 22:25 
GeneralRe: Alternative solution using HtmlAgilityPack Pin
Nitol Neophyte30-Apr-15 0:43
MemberNitol Neophyte30-Apr-15 0:43 
AnswerRe: Alternative solution using HtmlAgilityPack Pin
Member 1276195313-Dec-17 20:40
MemberMember 1276195313-Dec-17 20:40 
AnswerRe: Alternative solution using HtmlAgilityPack Pin
VarunTandel14-Mar-19 20:18
MemberVarunTandel14-Mar-19 20:18 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.