Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

Simple Rule-Driven Smart Formatting for HTML Textarea

5.00/5 (16 votes)
3 Feb 2021CPOL11 min read 30.6K   265  
JavaScript helps to write code without the usual irritation

Contents

  1. Motivation
  2. Rule Set
  3. Bra and Ket
  4. Indent
  5. Auto Bracket
  6. Tidy
  7. Auto-Completion
  8. Features
  9. Live Demo
  10. Conclusion

1 Motivation

In one interesting book on industrial design, I found an advice which did not look too obvious: make textarea elements on a Web page bigger, because most users are not software engineers; they won't just paste text in the textarea as you would do, most of them will actually type text there. At that time, I did not expect that I would be spending considerable time at typing text into a textarea myself, but present days I really do it pretty often: I enter and test some JavaScript code, as if Visual Studio wasn't enough :-). On a regular basis, I use my JavaScript calculator, because it helps me writing small scripts and trying them out faster. I really made it convenient enough, at least for myself.

And yet, I finally lost my patience, and quickly figured out the reason of my irritation. This is the lack of so called "smart indent", like in Visual Studio and some other IDE (hence the picture on the top of the present article). One small thing, one would say, but it really makes the difference. Note the lack of the tabulator, because Tab is busy with navigation on the page, and taking this function out would be a totally wrong thing. With "smart indent", such problem becomes insignificant. And, with this important feature, some less important ones would come in nicely, first of all, some tidy-up of the entered text, which is, first of all, fixing the blank spaces.

So, with my patience supply depleted, I decided to fill in the gap. I did not find available code which would satisfy me, not even remotely, only some ideas. But, come to think about, textarea is used everywhere, and yet it provides too rudimentary text entering facility to be bearable; at the same time, it's not at all hard to manipulate text input with JavaScript. Let's see how simple it can be.

But first of all,

Disclaimer:

The script presented in this article is extremely simple and does not pretend to play the role of any comprehensive text or language processor for any programming language. It is not context-sensitive, is fully unaware of language syntax or semantics and does not provide anything like syntax coloring. It applies to a pure-text textarea. The main goal is simplicity, rudimentary level of convenience, the ease of customization and absence of any 3rd-party code.

I used the code described in the present article in the second version of my JavaScript Calculator. Also, I used it in its spin-off "JavaScript Playground" I use for demonstration of JavaScript code samples. Such sample demonstrations can be found in my recent articles on passing function arguments by name: Named Arguments for JavaScript Functions and Yet Another Approach and Named Arguments for JavaScript Functions, Part Two: Going Structured.

2 Rule Set

The whole solution can be found in the code provided with this article. It is presented in just two files, "index.html" and "smartFormatting.js", where all the universal part of code is placed in the JavaScript file, as just one function, setSmartFormatting, and "index.html", also with some JavaScript code, is only the usage sample.

As all the text handling is set up in this only function, there is no a function used to switch text handling off, and there is no any need for that. Instead, this function can be called again and again with the same textarea argument, with different options. All textarea event handlers will be removed if the so is defined by the option set.

All the implemented formatted behavior is controlled by the set of rules described by the object defaultOptions. Some or all rules of this default rule set can be overridden by the user. The key to the understanding of the procedure of such customization is explained in my previous article, Named Arguments for JavaScript Functions, but the basis usage can be understood on the usage example shown in "index.html".

By customization of the rules, the user can adopt the rules pretty widely to different needs, first of all, to different languages or coding styles. My default option set is oriented to JavaScript programming.

The outline of the rule sets can be shown like this:

JavaScript
const defaultOptions = {
    features: {
        // the members define which sets of rules to apply
        // also define tab characters
    },
    formattingRules: {
        indent: [
            // smart indent rules
        ],
        autoBracket: [
            // automatically adding closing brackets
        ],
        tidy: [
            // replacement of some characters/words with other words
            // to improve readability of text
        ],
        tidyVerbatim: [
            // define exclusions for tidy rule,
            // to treat content inside string literals verbatim 
        ],
        autoComplete: [
            // define text patterns which can optionally be auto-completed
            // after some part of them is entered
        ]
    }
}

See the full rule set in "smartFormatting.js" for all the detail. Below, I'll explain the usage of each rule.

3 Bra and Ket

First, I wanted to clarify on the object naming used in big part of the rules. Not only it's just convenient naming for parts of brackets, but also it's a tribute to the notation used in quantum mechanics, introduced in 1939 by Paul Dirac: https://en.wikipedia.org/wiki/Bra%E2%80%93ket_notation.

4 Indent

Indent rules are the most practically important rules, those I started this whole activity. This is a couple of rules for the same very "smart indent" feature. The default rule shows two identical rules for pair of brackets: [] and {}:

JavaScript
{
    bra: "[", ket: "]",
    endOfLineKet: true, matchLeftWord: false, matchRightWord: false
}

These rules define smart indent created on the press of the Enter key when the text insertion point. Boolean elements of the rule certainly need some explanations.

First one, endOfLineKet, means that the second pair of bracket (bra, ']' in this example) can be missing, and yet smart indent will be applied if the insertion point is located exactly at the end of line. In other words, the line ends with '[' and the insertion point is placed at the end line. I hope the need for this option is obvious: it is needed when one need to add a new indented line after the bracket, such as in

 

const myArray = [
    |
    2,
    3];

In all input text samples, I will denote insertion point with the symbol '|'. In this example, it shows the result of smart formatting, just after the key Enter is pressed. It is helpful when you first added 2 and 3 to some array, and decided to add 1 later.

The other two rule elements, matchLeftWord and matchRightWord, are related to the concept "match whole world". When such option is set to true, it means that the bra or key words would only enable smart indent if they stay separate, separated by one of space characters from the rest of text. In other words, for example,

// If matchLeftWord and matchRightWord are true,
// smart indent won't be done with this
const myObject = someArray[|]
// or this line:
const myObject = someArray [|];
// but will be applied to 
const myObject = someArray [|] ?

In my default rule set, I don't use these two options, that is, always use smart indent when Enter is pressed between bra and ket.

Now, how this kind of formatting can produce smart indentation, and, importantly, nested indentation usually used in program texts? This is done based on the indentation of present or the next line, which is determined by the set of leftmost blank space and/or tab characters in that line. When you press Enter between bra and ket on a line without indentation, it creates two new lines, with indentation of the first one and ket on the last one. Type on this line and Enter between bra and key again, and new indentation is added to existing indentation. The added indentation is defined by the property features of the option set, see the section Features.

The implementation of smart indentation is fairly simple. The existing indentation is detected by the function countTabs(string). All the rules are implemented based on parsing of the existing textarea text into the object which carries the insertion point location and text elements on left and write to this point: character, word and line. This is done be the function parseCursorContext:

JavaScript
const parseCursorContext = editor => {
    const pos = getCursor(editor);
    const allText = editor.value;
    let leftChar = allText[pos - 1];
    let rightChar = allText[pos];
    leftChar = allText[pos - 1];
    rightChar = allText[pos];
    if (!leftChar) return;
    let left = allText.substring(0, pos);
    let right = allText.substring(pos);
    const rightmost = right.indexOf(newLine);
    if (rightmost >= 0)
        right = right.substr(0, rightmost);
    const leftmost = left.lastIndexOf(newLine);
    if (leftmost >= 0)
	left = left.substr(leftmost + newLine.length, pos);
    const leftWord = findWords(left, false);
    const rightWord = findWords(right, true);
    return {
        cursor: pos,
        left: { char: leftChar, word: leftWord, line: left },
        right: { char: rightChar, word: rightWord, line: right }
    };
};

The most of the rules, including smart formatting, are applied in the event handler keyDownHandler, but the whole set of rules uses the events keydown, keypress, keyup, click and paste. Please see "smartFormatting.js" for complete implementation.

5 Auto Bracket

Auto bracket rules simply add a ket word immediately after a bra word is entered. It has one additional rule element, endOfLineOnly. I found it very annoying if you already have, for example, the text like "[]" and try to insert index expression inside it. So, in my default rule set, the two formatting rules only apply if one types bra at the end of line.

6 Tidy

The "tidy" feature, by default, just modifies the set of blank space characters, to make the code look nicer and more readable, but it can define a number of different kinds of replacements. Tidy rules are represented with two arrays of rule objects: tidy and tidyVerbatim. First array is an array of rules for replacement of certain strings with modified strings; in my default set, I use three array elements, for adding a blank space before, after and both sides of target string defined using the array before. The set of rules tidyVerbatim is the set of rules "blocking" application of tidy rules. As my default rule set is designed for JavaScript code, it has just to such rules blocking application of tidy rules inside expressions in quotation marks, ordinary and double. But this is not a very trivial thing.

I use Regular Expression via the native JavaScript RegExp object. However, Regular Expression can be nice to write in many cases, but they are notoriously unreadable and, hence, unsuitable for rule-driven (read: highly customizable) approach. Therefore, I generate Regular Expressions for these rule on the fly, during initialization (once per the call to setSmartFormatting) for each before element of the rule, but I leave Regular Expression syntax in the after element, the one which should substitute the before element when the rule is applied. For example, " $1 " means that the blank space should be added before and after original before word, and " $1" or "$1 " add the blank space only before and after, correspondently. Note that actually one untold rule is applied at the very end: all the blank character is "normalized" to eliminate any duplications. This the implementation of generation of the RegExp objects on the fly out of the array of before elements:

JavaScript
const tidyRegex = (function createRegexTidyRules(rules) {
    const regex = [];
    for (let rule in rules) {
        const newRule = { before: constants.empty, after: rules[rule].after };
        for (let wordIndex = 0; wordIndex < rules[rule].before.length; ++wordIndex) {
            let word = constants.empty;
            for (let charIndex in rules[rule].before[wordIndex])
                word += "\\" + rules[rule].before[wordIndex][charIndex];
            if (newRule.before != constants.empty)
                newRule.before += "|";
            newRule.before += word;
        } //loop word
        newRule.before = "(" + newRule.before + ")";
        newRule.before = new RegExp(newRule.before, "g");
        regex.push(newRule);
    } //loop rule
    return regex;
})(options.formattingRules.tidy);

These rules are also applied on the press of the Enter key.

Likewise, Regular Expressions are generated on the fly out of the tidyVerbatim rules. These rules just define the brackets marking the context where the tidy rules should not be applied:

JavaScript
{ bra: "/*", ket: "*/" },
{ bra: "//", ket: null },
{ bra: "'", ket: "'" },
{ bra: "`", ket: "`" },
{ bra: "\"", ket: "\"" }

When a Regular Expression is created out of these declarations, it is assumed that bra == null denotes the match with the start of a line and key == null — with the end of a line. In these cases, unescaped line anchors are used, caret (^) or dollar ($), respectively.

7 Auto-Completion

Auto-complete rules define the pattern which can be complete based on partially entered text. Let's consider just one example:

JavaScript
{ pattern: "do* {|} while ()", breakPoint: "*", insertPoint: "|" }

This is just the desired complete text defined by the rule element pattern. In its string values, two "special" characters are used: '*' is used to mark the place where the feature becomes available after the left part of pattern is entered; then the write part is added; and '|' character defines where the insertion point should be moved after auto-completed text is added. But what to do if one or both of these characters needs to be a part of the pattern? For this purpose, they are made optional, through two other rule elements, breakPoint and insertPoint.

Auto-completion is also performed on the press on the Enter key. To show the user when this option is active, the function setSmartFormatting has the function parameter autoCompleteMatchNotification. Then the user has three options: 1) ignore it and keep typing; if the pattern is still matching, auto-completion still can be used, 2) press Enter to perform auto-complete; 3) press Escape. The only practical need to Escape is actually the need to allow the user to press Enter without auto-completion, to utilize the default Enter function.

8 Features

This element of the rule set simply defines which sets of rules should be applied. For smart indentation rules, the indentation characters are also defined. Notably, they are only applied for newly entered text. In other words, already typed text is never modified. However, it looks can change, in case tab characters are contained in existing text. It happens because tabSize is also set for tab characters; and this is done via the textarea style. I hope the structure of this object is self-explanatory:

JavaScript
features: {
    useSmartIndent: true, useTabs: true, tabSize: 4,
    useAutoBracket: true, useTidy: true, useCodeCompletion: true
}

9 Live Demo

For the application of the Smart Formatting, please see the Live Demo of JavaScript Playground and Playground API demo.

See also the Code Project article JavaScript Playground.

10 Conclusion

It just works. My irritation vanished as I refined the rules and their implementation step by step. I hope the behavior won't irritate my readers and even may look convenient. I will be much grateful for reasonably argumentative criticism (not necessarily constructive) and any suggestions.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)