cgal/Benchmark/doc_tex/Benchmark/extend.tex

136 lines
5.2 KiB
TeX

% -------------------------------------------------------------------------
\subsection{Extending the Parser}
\label{extendall}
We describe the key changes necessary to add new tokens and new rules
to the parser. We assume, we want to a add a new element \texttt{Foo}
that needs a \texttt{double} value and an \texttt{integer} value.
\subsubsection{Extend the Scanner in \texttt{benchmark\_lexer.l}}
We jump to a section with the word \texttt{Token} in a comment. It
looks about like this:
\begin{verbatim}
/* Tokens */
/* --------- */
"FileFormat" { return FileFormat;}
"BenchmarkName" { return BenchmarkName; }
"List" { return List;}
"Rational" { return Rational;}
\end{verbatim}
\noindent
and several more lines like this. And we add our new element as own
line anywhere in this section, like:
\begin{verbatim}
"Foo" { return Foo;}
\end{verbatim}
\noindent
Note that the left \texttt{"Foo"} and the right \texttt{Foo} do not
have to be the same word, the left string is the representation in the
file format, the right word is its identifier in the program code. We
use the same word though for simplicity.
\subsubsection{Extend the Parser in \texttt{benchmark\_parser.y}}
\label{extend}
We jump to the section with the words \texttt{Header tokens} in a
comment. It looks about like this:
\begin{verbatim}
/* Structure tokens */
/* ---------------- */
%token FileFormat
%token BenchmarkName
%token List
%token Rational
\end{verbatim}
\noindent
and add our new token anywhere in this block like this:
\begin{verbatim}
%token Foo
\end{verbatim}
\noindent
We continue and add new rules in the grammar section. Two common
choices are most likely: Either, the new token has a constant number
of (or only a few optional) parameters or the new token allows a
variable list of arguments. For the former case, we add one rule
including all arguments and call a single new function in the visitor;
see the rule for \texttt{Rational(...)} as an example. We also have to
decide, in which part of the grammar we like the new token to be
accepted. Likely places for extensions are \texttt{stmt} or
\texttt{file\_header\_option}. We add our rule example for
\texttt{Foo} in the \texttt{stmt} block right after the
\texttt{Rational} rule:
\begin{verbatim}
| Foo '(' double_val ',' INTEGER ')'
{ visitor->accept_foo( atof( $3.c_str(), $5); }
\end{verbatim}
\noindent
The parameters are enclosed in parenthesis and separated with
commas. The double value could be either an \texttt{INTEGER} or an
\texttt{FNUMBER}, so we use the production \texttt{double\_val} that
accepts both. Its return type is a \texttt{std::string} denoted with
\texttt{\$3} that contains the number representation. We use the
\texttt{atof()} function to simplify the work for the visitor. We
could do the same for the \texttt{INTEGER} parameter in \texttt{\$5}
if we would expect some small bounded integer, but for unbounded long
integer numbers we pass the original \texttt{std::string} and let the
visitor do the conversion in the long integer type that is only known
to the application.
Now let us assume we would like to have \texttt{Foo} with a variable
parameter list of arguments. We add one rule with a production
referring to the variable list of arguments and call a new function
of the visitor before we enter this variable length production and we call a
second new function of the visitor at the end of the variable length
production; see the rule for \texttt{Polynomial\_1(...)} as an example.
We add the rule again in the \texttt{stmt} production.
\begin{verbatim}
| Foo { visitor->begin_foo(); }
'(' integer_sequence ')' { visitor->end_foo(); }
\end{verbatim}
\noindent
So, the \texttt{begin\_foo} function is called before the integers are
parsed, and once all integers are parsed (and the corresponding
function \texttt{accept\_integer} has been called for each integer) the
function \texttt{end\_foo} is called, effectively creating a proper
bracketing structure around the integer sequence.
\subsubsection{Extend the Visitor in \texttt{benchmark\_visitor.h}}
The visitor follows the visitor design pattern in
~\cite{cgal:ghjv-dpero-95}, see Section~\ref{visitor}.
Depending on the two options in the previous section we have to add to
the \texttt{Benchmark\_visitor} class a member function
\begin{verbatim}
virtual void accept_foo( double d, std::string i) { tnh( "Foo"); }
\end{verbatim}
\noindent
or two member functions
\begin{verbatim}
virtual void begin_foo() { tnh( "Begin_foo"); }
virtual void end_foo() { tnh( "End_foo"); }
\end{verbatim}
\noindent
We implement these member functions with calls to the \texttt{tnh}
function, which is a short-cut for the \texttt{token\_not\_handled}
member function that issues by default an error message about an
un-handled token. So, a derived visitor class that does not override
these virtual \texttt{foo} functions still works fine on files that do
not have any \texttt{Foo} in them, but they will generate error
messages by default. This can be suppressed, see the \texttt{check\_syntax}
example program for an useful application in Chapter~\ref{checker}.