trønderen wrote:I sure wish that was true for everybody creating new languages! (Note that I did not refer explicitly to C and all the languages derived from it.)
C? Compiler theory applies to any language (including interpreters.)
trønderen wrote:Neither if, then, si, alors, om or så, are reserved words in the language. The language would define non-text tokens, call them [if] and [then] if you like, but the representation is binary, independent of any text.
That is a non-starter.
The human needs to write the code. Using token representations that the user is responsible for memorizing would not work. If the user at any time uses something like 'if' and 'then' then those are keywords for the language. That is how it works. Just as in native languages it works that way. Changing semantics (english) does not alter the role of what a system that eventually must run code must still do in that it still must convert the keywords into something else.
And defining keywords is necessary for any computer language because it is not deterministic otherwise.
trønderen wrote:The display representation of the binary [if] token could be e.g. as (boldface) if, or as [if], si, [si], om, [o] or some other way to visually highlight that this is not a user identifier but a control statement token.
Errr...no idea what you are talking about.
The 'bold' just becomes part of the textual representation of the keyword. No different than requiring that the keyword is in lower case.
You seem to think that because you use bold on a keyword that it is no longer a keyword. It doesn't matter how you differentiate the language specification is it still a keyword.
And no developer is going to work in a language where they need to make keywords by switching from bold and back.
trønderen wrote:Why can't the parser define a binary 'comment' token,
Because the content of the comment is NOT the token that tells the compiler that it is comment. The content of the content is what is contained by the comment. So in the following the value of the comment in text not the '//'
trønderen wrote:I have been working with third party APIs with French method and parameter names
Only when named parameters are supported and used can the parameter names matter.
And when you use it in English exactly how are you going to use that method unless you have English that tells you how to use it?
trønderen wrote:Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes
I have studied Compiler Theory formally and informally for a long time. That statement, by itself, is not possible.
As I said, tokenization itself, is not something that is new in Compiler Theory. It has been there for a very long time. That very word is the process of converting keywords to tokens. You seem to think you are going to be able to remove keywords from the definition of the language but failing to describe, in detail, how a user is then going to be able to do something without using keywords.
trønderen wrote:Furthermore, you can move a document from an English MS Word to a French one and then to a Swedish one: The menu texts, help texts etc. change language, yet an edit made in one language version is equally valid in other language versions of MS Word.
You do understand that a MS Work doc is a binary file which has embedded symbols in it which define the format?
The text of the document is NOT the relevant part. The analogy to code for a Work doc is that all of the text that you see in MS Word is a 'comment'.
However when you write code most of what you write and what you debug is not comments. So you are proposing the the keywords of the language would be written using combination key presses. For every single thing that one wrote.
trønderen wrote:I certainly can imagine a programming language, and its parse tree storage format, being designed along the same principles.
Knock yourself out. It is call BNF - Backus Naur Form notation.
The Java example (however it has bugs in it.)
Chapter 18. Syntax[^]