mirror of https://github.com/CGAL/cgal
136 lines
5.2 KiB
TeX
136 lines
5.2 KiB
TeX
% -------------------------------------------------------------------------
|
|
\subsection{Extending the Parser}
|
|
\label{extendall}
|
|
|
|
We describe the key changes necessary to add new tokens and new rules
|
|
to the parser. We assume, we want to a add a new element \texttt{Foo}
|
|
that needs a \texttt{double} value and an \texttt{integer} value.
|
|
|
|
\subsubsection{Extend the Scanner in \texttt{benchmark\_lexer.l}}
|
|
|
|
We jump to a section with the word \texttt{Token} in a comment. It
|
|
looks about like this:
|
|
|
|
\begin{verbatim}
|
|
/* Tokens */
|
|
/* --------- */
|
|
"FileFormat" { return FileFormat;}
|
|
"BenchmarkName" { return BenchmarkName; }
|
|
"List" { return List;}
|
|
"Rational" { return Rational;}
|
|
\end{verbatim}
|
|
|
|
\noindent
|
|
and several more lines like this. And we add our new element as own
|
|
line anywhere in this section, like:
|
|
|
|
\begin{verbatim}
|
|
"Foo" { return Foo;}
|
|
\end{verbatim}
|
|
|
|
\noindent
|
|
Note that the left \texttt{"Foo"} and the right \texttt{Foo} do not
|
|
have to be the same word, the left string is the representation in the
|
|
file format, the right word is its identifier in the program code. We
|
|
use the same word though for simplicity.
|
|
|
|
\subsubsection{Extend the Parser in \texttt{benchmark\_parser.y}}
|
|
\label{extend}
|
|
We jump to the section with the words \texttt{Header tokens} in a
|
|
comment. It looks about like this:
|
|
|
|
\begin{verbatim}
|
|
/* Structure tokens */
|
|
/* ---------------- */
|
|
%token FileFormat
|
|
%token BenchmarkName
|
|
%token List
|
|
%token Rational
|
|
\end{verbatim}
|
|
|
|
\noindent
|
|
and add our new token anywhere in this block like this:
|
|
|
|
\begin{verbatim}
|
|
%token Foo
|
|
\end{verbatim}
|
|
|
|
\noindent
|
|
We continue and add new rules in the grammar section. Two common
|
|
choices are most likely: Either, the new token has a constant number
|
|
of (or only a few optional) parameters or the new token allows a
|
|
variable list of arguments. For the former case, we add one rule
|
|
including all arguments and call a single new function in the visitor;
|
|
see the rule for \texttt{Rational(...)} as an example. We also have to
|
|
decide, in which part of the grammar we like the new token to be
|
|
accepted. Likely places for extensions are \texttt{stmt} or
|
|
\texttt{file\_header\_option}. We add our rule example for
|
|
\texttt{Foo} in the \texttt{stmt} block right after the
|
|
\texttt{Rational} rule:
|
|
|
|
\begin{verbatim}
|
|
| Foo '(' double_val ',' INTEGER ')'
|
|
{ visitor->accept_foo( atof( $3.c_str(), $5); }
|
|
\end{verbatim}
|
|
|
|
\noindent
|
|
The parameters are enclosed in parenthesis and separated with
|
|
commas. The double value could be either an \texttt{INTEGER} or an
|
|
\texttt{FNUMBER}, so we use the production \texttt{double\_val} that
|
|
accepts both. Its return type is a \texttt{std::string} denoted with
|
|
\texttt{\$3} that contains the number representation. We use the
|
|
\texttt{atof()} function to simplify the work for the visitor. We
|
|
could do the same for the \texttt{INTEGER} parameter in \texttt{\$5}
|
|
if we would expect some small bounded integer, but for unbounded long
|
|
integer numbers we pass the original \texttt{std::string} and let the
|
|
visitor do the conversion in the long integer type that is only known
|
|
to the application.
|
|
|
|
Now let us assume we would like to have \texttt{Foo} with a variable
|
|
parameter list of arguments. We add one rule with a production
|
|
referring to the variable list of arguments and call a new function
|
|
of the visitor before we enter this variable length production and we call a
|
|
second new function of the visitor at the end of the variable length
|
|
production; see the rule for \texttt{Polynomial\_1(...)} as an example.
|
|
We add the rule again in the \texttt{stmt} production.
|
|
|
|
\begin{verbatim}
|
|
| Foo { visitor->begin_foo(); }
|
|
'(' integer_sequence ')' { visitor->end_foo(); }
|
|
\end{verbatim}
|
|
|
|
\noindent
|
|
So, the \texttt{begin\_foo} function is called before the integers are
|
|
parsed, and once all integers are parsed (and the corresponding
|
|
function \texttt{accept\_integer} has been called for each integer) the
|
|
function \texttt{end\_foo} is called, effectively creating a proper
|
|
bracketing structure around the integer sequence.
|
|
|
|
\subsubsection{Extend the Visitor in \texttt{benchmark\_visitor.h}}
|
|
The visitor follows the visitor design pattern in
|
|
~\cite{cgal:ghjv-dpero-95}, see Section~\ref{visitor}.
|
|
Depending on the two options in the previous section we have to add to
|
|
the \texttt{Benchmark\_visitor} class a member function
|
|
|
|
\begin{verbatim}
|
|
virtual void accept_foo( double d, std::string i) { tnh( "Foo"); }
|
|
\end{verbatim}
|
|
|
|
\noindent
|
|
or two member functions
|
|
|
|
\begin{verbatim}
|
|
virtual void begin_foo() { tnh( "Begin_foo"); }
|
|
virtual void end_foo() { tnh( "End_foo"); }
|
|
\end{verbatim}
|
|
|
|
\noindent
|
|
We implement these member functions with calls to the \texttt{tnh}
|
|
function, which is a short-cut for the \texttt{token\_not\_handled}
|
|
member function that issues by default an error message about an
|
|
un-handled token. So, a derived visitor class that does not override
|
|
these virtual \texttt{foo} functions still works fine on files that do
|
|
not have any \texttt{Foo} in them, but they will generate error
|
|
messages by default. This can be suppressed, see the \texttt{check\_syntax}
|
|
example program for an useful application in Chapter~\ref{checker}.
|