For most of us, the best way to learn a programming language is to start reading a book on it. This gives enough information to understand the syntax and constructs; and one can possibly start writing useful programs using the language within a short time. Now imagine a compiler writer who is responsible to write a parser for the same language. Is it enough for the compiler writer to read the same book on the programming language to write a compiler parser? Ceratinly not, since a programming book does not give a formal, unambigous specification for the syntax of the language. A specification here means something that specifies every valid syntax of the language in mathematical form, with no room for any ambiguity. This also means that we need another language to specify the programming language itself.
The Backus-Naur Form (BNF or Backus-Naur Form) is a formal mathematical way to describe the grammar of a programming language.
BNF is used to formally define the grammar of a language unambiguously. The definitions of a programming language in BNF can be made to be so precise that a parser for the language may be created automatically from the BNF using compiler-compilers such as YACC (Yet Another Compiler Compiler), PRECC, COSY etc.
BNF specifies the language grammar as a set of rules. Each rule is called a ‘production’, and have the general form:
Non-Terminal ::= Alternative1 | Alternative2 …
A non-terminal symbol is a symbol that may be expanded further into terminal and/or non-terminal symbols. A terminal symbol cannot be further expanded, and ends the production. Each alternative on the RHS may be a terminal or non-terminal and are separated by ‘|’. All strings are treated as case sensitive.
Some examples:
a) while_statement ::= while logic_expression do statement
b) The ‘for’ loop in C++ may be represented as:
for ( InitExpression ; CondExpression ; LoopExpression ) Statement
The InitExpression is executed once in the beginning and then the CondExpression is executed. If CondExpression evaluates to true, then the Statement is executed. At the end of the iteration (execution of Statement), the LoopExpression is executed and CondExpression is evaluated and the iteration repeats until the value of the CondExpression is false.
All the three expressions (InitExpression, CondExpression, LoopExpression) are optional. For eg. the following is a valid for loop that executes Statement infinitely:
for(;;) { Statement }
With above information on for loop, the BNF production may be written as follows:
ForStatement ::= for ( [ InitExpression ] ; [ CondExpression ] ; [ LoopExpression ] ) Statement
InitExpression ::= expression | declaration
LoopExpression ::= expression
CondExpression ::= expression