RIP Philo

compiler construction

Lexer reads the source code and recognizes all the words passing on a list of tokens/words to the

Parser which takes token/word list and turns it into a tree of program nodes.

So the command  x = x + 1

gets Lexed into  [var x] [equal_op =] [var x] [plus_op +] [const 1]

then parsed into ?????
Permalink puzzled compiler kid 
March 11th, 2007 12:44pm
>then parsed into ?????

typically an abstract syntax tree (AST), which is a data structure that represents the code. From there either a compiler either creates machine code (or byte code, or whatever you want to generate) optionally performing some optimizations on the AST beforehand, or and interpreter evaluates the AST and calculates a result.
Permalink Send private email a2800276 
March 11th, 2007 12:49pm
Right.  How does the parser do that?  The [=] should be at the top of the tree?  How does the parser know to skip the [x] and go back and get it when it needs a target for [=]?
Permalink puzzled compiler kid 
March 11th, 2007 12:55pm
There's two common techniques for the compiler to do this, LALR and LL. It's sort of tricky to explain the difference in a few words, but I'm sure there's entries on wikipedia that explain the technicalities.
Permalink Send private email a2800276 
March 11th, 2007 1:02pm
Or, in more common terms, LALR is a bottom-up technique which would try to match say <program> and then figure that to match program it will need to match either <statements> or <expressions>, etc. (this is also known as shift reduce).

LL is a top-down technique that starts with the individual tokens and tries out what it can construct out of the, in the example above, it would see it matched a bunch of <statements> which would satisfy the rule for <program> (this is aka recursive decent).

Take the above with a grain of salt, it's very generalised and probably mostly wrong :)
Permalink Send private email a2800276 
March 11th, 2007 1:08pm
Permalink puzzled compiler kid 
March 11th, 2007 1:35pm
It really bothers me that I used to know this stuff cold as I worked on a few compilers. Now it's just grab bag of familiar terms.
Permalink son of parnas 
March 11th, 2007 1:55pm
The big piece which is missing here is called "semantic actions".  "Semantic actions" tell the compiler what code to "emit" based on what "syntax rules" it has "recognized".

One of the "semantic actions" is creating an entry in the symbol table.  Another "semantic action" is manipulating the "symbol stack" to both push 'X' on to it for future processing, and pop 'X' off of it when it is to be used.

Both LL and LALR syntax trees (and Lexers) have different ways of encoding the "Semantic actions".  If "Yacc" is used, the "Semantic actions" are usually done as 'C' code.
Permalink SaveTheHubble 
March 12th, 2007 12:30pm
ANTLR is a handy toolkit, too.
Permalink Send private email Aaron F Stanton 
March 12th, 2007 1:17pm

This topic is archived. No further replies will be accepted.

Other topics: March, 2007 Other topics: March, 2007 Recent topics Recent topics