Evolution of compilers and counters

The name "compiler" seems to have been coined in 1952 by Grace Hopper. It was around this time that what can be considered a parser was first implemented. However, Grace Hopper's compiler has nothing to do with what is now called a compiler: a tool that converts one language into another, typically of a lower level. Assemblers existed before compilers, but assembler is only a text representation of binary code, translation is carried out with a simple match table.

Meta-circular compiler (1951)

Corrado Böhm in his dissertation describes a program that translates a compact destination language and expression into machine code. A meta-circular means that the compiler is written in the language it compiles.

Autocode (1952)

This is a set of macro instructions that are translated into machine language. There is no special technique behind this tool .

IT (1956)

Carnegie Institute of Technology has unveiled an IT compiler running on the IBM 650. The language has arithmetic expressions, but does not precede operators .

Chomsky hierarchy and non-contact grammar (1956)

In a monograph published in 1956, Chomsky presents the structure of the present compiler:

At the same time, he invents a non-contact grammatical context: it consists of rules where x is defined by a group of elements, and x is independent of the context in which it appears. Such a grammar is easily written in BNF, a format that will appear later.

Fortran (1957)

The first language of evolution, created by John Backus, does not use the Chomsky structure, or a special technique, but processes programs line by line. To obtain a precedent, he uses cunning, surrounds parts of the expression around the operator with pairs of brackets, the number of which indicates their importance, and the decorativeness of the expressions leads to the preceding multiplication by addition.
Most Fortran designs appeared only in the following versions.

Lisp (1958)

The first Lisp compiler is no more thoughtful than the Fortran compiler, as the programmer himself must add parentheses to tell him how to transform the code.

BPF (1958)

John Backus and Peter Naur use the original notation they will use to describe the grammar of Algol 58: The Backus-Naur form, or BNF.

Example BNF grammar:

<ifelse> ::= <if> [ { else <if> } ]

<if> ::= if "(" <condition> ")" ( <instruction> | "{" { <insttuction> } "}" ) 

Precedent (1959)

The theoretical realization of operator predisposition is demonstrated by Samuelson and Bauer using the stack.

Algol (1960)

The first structured language based on the grammar described in BNF. ALGOL parser description is published by Ned Irons .
Algo 60 has instruction blocks and supports recursiveness, then new concepts. It also contains dynamic tables, so C and Pascal were regressions in this regard.

Recursive Parser Down (1961)

Peter Lucas and Ned Irons both describe a recursive hereditary parser. It consists of an initial rule that refers to other rules and thus interprets the language, following the hierarchy of their composting. In addition, a rule may refer to itself and be recursive, as is the case in the BNF description above, when an instruction that is part of an "if" rule is also an "if" rule.

Deikstra's Shuttling Yard Algorithm (1961)

From an artithmetic expression with parentheses or not (and therefore by precedent recognition), this algorithm produces an abstract syntactic tree. It uses a stack and produces an incorrect (so-called Polish) notation or the operator follows a pair of operands with higher priority operands at the end of the line.

Parser LR (1965) and LALR (1971 )

Invented by Donald Knuth, the LR partner reads the non-contact grammar of the left side to the right.

You can get SLR (Simple Left-Right), invented in 1969 by Frank DeRemer, or LALR (Look-Ahead Left-Right), invented by the same in 1971. In the second case, it tries to match the grammar rule with the source code instruction and, in case of failure, returns to the first element and tries an alternative. An error message is issued when the alternatives do not match.
LR (k) or k stands for the number of "lookaheads," the levels that the parser examines in the source code before deciding to apply the rule to the instruction.

In 1968, Donald Knuth perfects the concept of the synthetic attribute Iron (1968) with inherited attributes. While synthetic attributes define a node from the bottom nodes, there is an inherited attribute when the node uses the attributes of the parent node.

Parser LL (1968) and LR

The LL principle proposed by Lewis and Sturns applies to grammar intended for analysis by a simplified partner and writing by hand.

The LL analyzer analyzes the instruction from the left and right and from top to bottom, it builds a representation of the instruction, replacing each node with a group of nodes that define it.

The LR parser also analyzes the instruction from left to right, but from bottom to top, that is, considers terminal nodes and combines them to find a higher-level node.

CYK algorithm (1960 +

)

Designed to parse non-contact grams, using bottom-up analysis and dynamic programming, it proves effective in specific figure cases, but other methods outperform it on average in cases.

Parser Early (1970)

Jay Early in 1968 invented an algorithm capable of analyzing most non-contact grams, which he simplified in 1970. However, it was slow to perform and other techniques were imposed on popular compilers. A faster version was found later.

Parser GLR (1974)

The GLR (Generalized LR) parser is an LR extension for ambiguous grammars like grammar C. The algorithm implementing it has been described by Bernard Lang and improved since 1974.
The C , C++, etc. languages are controlled by the version of this algorithm by most compilers.

Jakk (1975)

Based on LALR, it is the first automatic parser generator shot by Stephen Johnson on ATT for Unix.
The C compiler switches from a simple recursive parser down to LALR. But Yacc would not be available to the public until 1979 .

Yacc was used by GCC, then for Awk, R, Ruby, etc.

GCC и CLang

GCC has long used Yacc, but in 2004 the C and Objective-C compilers replace Yacc with a recursive, hereditary, hand-written parser. It provides a minimum speed increase of 1.5%. This was followed by the C++ compiler.
CLang uses a similar parser for all languages.

ANTLR (1989)

Based on the LL (k) algorithm, ANTLR, created by Terence Parr, is probably the most commonly used partner forming tool. Version 4, which calls itself ALL (k), and is a combination of LL (k) and GLR, works with either listeners or visitors, and allows you to analyze almost any grammar. Writing a compiler becomes much easier and in the reader's version also facilitates multiple target languages.

See also:

By Denis Suro, creator of Script and Cryonie.