Title |
Creating a Lexical Analyzer in c |
Author |
fahad bader al-buhairi �դ ?�� ��??��� |
Author Email |
xxxq8xxx [at] hotmail.com |
Description |
The token classes correspond to the following regular definitions , except and digit>, plus the class . (The ellipse ��� is used in the usual sense �and so on�. The spaces in the definitions are used for better adability, they are not valid parts of the definitions.)
::= ; ::= : ::= , ::= + | - | * | / ::= < | <= | = | <> | >= | > ::= ( | ) ::= % | ! | @ | ~ | $ ::= a | A | b | B | � | z | Z ::= 0 | 1 | � | 9 ::= ( | )* ::= + (. digit+ )? (E(+|-)?
The following lexemes should be recognized as keywords, not as identifiers:
procedure, is, begin, end, var, cin, cout, if, then, else, and, or, not, loop, exit, when, while, until
The whitespace characters � � (space symbol) and
(end-of-line symbol) are to be skipped. Comments (any text closed between braces �{� and �}� ) are to be skipped as well. The comments do not extend to several lines (they do not contain the end-of-line symbol).
The input for the lexical analyzer is a textfile SOURCE.TXT consisting of several lines of text (a �program�) being a correctly formed sequence of lexemes corresponding to the above definitions, whitespaces and comments.
The output of your lexical analyzer consists of 2 text files ST.TXT and TOKENS.TXT. 1. ST.TXT is the symbol table created by the lexical analyzer. Each line consists of three parts: - line number - the lexeme (string) - type (string) , being one of the following: keyword, identifier, num 2. TOKENS.TXT is the list of tokens produced by the lexical analyzer with the following structure: - one line of input (in the order of appearance in SOURCE.TXT) - corresponding pairs �token, attribute�, each in a separate line in the order as they occur in the line - blank line The attribute of a keyword, identifier or a number is the line number in the symbol table. The attribute of any other token is the lexeme itself. The longest prefix of the input that can match any regular expression pi is taken as the next token. |
Category |
C » Beginners / Lab Assignments |
Hits |
510463 |
Code |
Select and Copy the Code
|
|
|