diff --git a/helpfulMaterials/Floats (LaTeX2e unofficial reference manual (January 2023)).html b/helpfulMaterials/Floats (LaTeX2e unofficial reference manual (January 2023)).html new file mode 100644 index 00000000..e16a94c5 --- /dev/null +++ b/helpfulMaterials/Floats (LaTeX2e unofficial reference manual (January 2023)).html @@ -0,0 +1,309 @@ + + + + +Floats (LaTeX2e unofficial reference manual (January 2023)) + + + + + + + + + + + + + + + + + + + +
+ +
+

5.7 Floats

+ +

Some typographic elements, such as figures and tables, cannot be broken +across pages. They must be typeset outside of the normal flow of text, +for instance floating to the top of a later page. +

+

LaTeX can have a number of different classes of floating material. +The default is the two classes, figure (see figure) and +table (see table), but you can create a new class with the +package float. +

+

Within any one float class LaTeX always respects the order, so that +the first figure in a document source must be typeset before the second +figure. However, LaTeX may mix the classes, so it can happen that +while the first table appears in the source before the first figure, it +appears in the output after it. +

+

The placement of floats is subject to parameters, given below, that +limit the number of floats that can appear at the top of a page, and +the bottom, etc. If so many floats are queued that the limits prevent +them all from fitting on a page then LaTeX places what it can and +defers the rest to the next page. In this way, floats may end up +being typeset far from their place in the source. In particular, a +float that is big may migrate to the end of the document. In which +event, because all floats in a class must appear in sequential order, +every following float in that class also appears at the end. +

+ + +

In addition to changing the parameters, for each float you can tweak +where the float placement algorithm tries to place it by using its +placement argument. The possible values are a sequence of the +letters below. The default for both figure and table, in +both article and book classes, is tbp. +

+
+
t
+

(Top)—at the top of a text page. +

+
+
b
+

(Bottom)—at the bottom of a text page. (However, b is not +allowed for full-width floats (figure*) with double-column +output. To ameliorate this, use the stfloats or +dblfloatfix package, but see the discussion at caveats in the +FAQ: https://www.texfaq.org/FAQ-2colfloat. +

+
+
h
+

(Here)—at the position in the text where the figure environment +appears. However, h is not allowed by itself; t is +automatically added. +

+ + + + +

To absolutely force a float to appear “here”, you can +\usepackage{float} and use the H specifier which it +defines. For further discussion, see the FAQ entry at +https://www.texfaq.org/FAQ-figurehere. +

+
+
p
+

(Page of floats)—on a separate float page, which is a page +containing no text, only floats. +

+
+
!
+

Used in addition to one of the above; for this float only, LaTeX +ignores the restrictions on both the number of floats that can appear +and the relative amounts of float and non-float text on the page. +The ! specifier does not mean “put the float here”; +see above. +

+
+
+ +

Note: the order in which letters appear in the placement argument +does not change the order in which LaTeX tries to place the float; +for instance, btp has the same effect as tbp. All that +placement does is that if a letter is not present then the +algorithm does not try that location. Thus, LaTeX’s default of +tbp is to try every location except placing the float where it +occurs in the source. +

+

To prevent LaTeX from moving floats to the end of the document or a +chapter you can use a \clearpage command to start a new page and +insert all pending floats. If a pagebreak is undesirable then you can +use the afterpage package and issue +\afterpage{\clearpage}. This will wait until the current page +is finished and then flush all outstanding floats. +

+ + + +

LaTeX can typeset a float before where it appears in the source +(although on the same output page) if there is a t specifier in +the placement parameter. If this is not desired, and deleting +the t is not acceptable as it keeps the float from being placed +at the top of the next page, then you can prevent it by either using +the flafter package or using the command + +\suppressfloats[t], which causes floats for the top position on +this page to moved to the next page. +

+

Parameters relating to fractions of pages occupied by float and +non-float text (change them with +\renewcommand{parameter}{decimal between 0 and 1}): +

+
+
\bottomfraction
+
+

The maximum fraction of the page allowed to be occupied by floats at +the bottom; default ‘.3’. +

+
+
\floatpagefraction
+
+

The minimum fraction of a float page that must be occupied by floats; +default ‘.5’. +

+
+
\textfraction
+
+

Minimum fraction of a page that must be text; if floats take up too +much space to preserve this much text, floats will be moved to a +different page. The default is ‘.2’. +

+
+
\topfraction
+
+

Maximum fraction at the top of a page that may be occupied before +floats; default ‘.7’. +

+
+ +

Parameters relating to vertical space around floats (change them with a +command of the form \setlength{parameter}{length +expression}): +

+
+
\floatsep
+
+

Space between floats at the top or bottom of a page; default +‘12pt plus2pt minus2pt’. +

+
+
\intextsep
+
+

Space above and below a float in the middle of the main text; default +‘12pt plus2pt minus2pt’ for 10 point and 11 point documents, +and ‘14pt plus4pt minus4pt’ for 12 point documents. +

+
+
\textfloatsep
+
+

Space between the last (first) float at the top (bottom) of a page; +default ‘20pt plus2pt minus4pt’. +

+
+ +

Counters relating to the number of floats on a page (change them with a +command of the form \setcounter{ctrname}{natural +number}): +

+
+
bottomnumber
+
+

Maximum number of floats that can appear at the bottom of a text page; +default 1. +

+
+
dbltopnumber
+
+

Maximum number of full-sized floats that can appear at the top of a +two-column page; default 2. +

+
+
topnumber
+
+

Maximum number of floats that can appear at the top of a text page; +default 2. +

+
+
totalnumber
+
+

Maximum number of floats that can appear on a text page; default 3. +

+
+ +

The principal TeX FAQ entry relating to floats +https://www.texfaq.org/FAQ-floats contains +suggestions for relaxing LaTeX’s default parameters to reduce the +problem of floats being pushed to the end. A full explanation of the +float placement algorithm is in Frank Mittelbach’s article “How to +influence the position of float environments like figure and table in +LaTeX?” +(https://www.latex-project.org/publications/2014-FMi-TUB-tb111mitt-float-placement.pdf). +

+ + + +
+ + +
Unofficial LaTeX2e reference manual
+ + + \ No newline at end of file diff --git a/helpfulMaterials/table (LaTeX2e unofficial reference manual (January 2023)).html b/helpfulMaterials/table (LaTeX2e unofficial reference manual (January 2023)).html new file mode 100644 index 00000000..d19a0d35 --- /dev/null +++ b/helpfulMaterials/table (LaTeX2e unofficial reference manual (January 2023)).html @@ -0,0 +1,174 @@ + + + + +table (LaTeX2e unofficial reference manual (January 2023)) + + + + + + + + + + + + + + + + + + + + +
+ +
+

8.22 table

+ + + + + + + +

Synopsis: +

+
+
\begin{table}[placement]
+  table body
+  \caption[loftitle]{title}  % optional
+  \label{label}              % also optional
+\end{table}
+
+ +

A class of floats (see Floats). They cannot be split across pages +and so they are not typeset in sequence with the normal text but instead +are floated to a convenient place, such as the top of a following page. +

+

This example table environment contains a tabular +

+
+
\begin{table}
+  \centering\small
+  \begin{tabular}{ll}
+    \multicolumn{1}{c}{\textit{Author}}
+      &\multicolumn{1}{c}{\textit{Piece}}  \\ \hline
+    Bach            &Cello Suite Number 1  \\
+    Beethoven       &Cello Sonata Number 3 \\
+    Brahms          &Cello Sonata Number 1
+  \end{tabular}
+  \caption{Top cello pieces}
+  \label{tab:cello}
+\end{table}
+
+ +

but you can put many different kinds of content in a table: +the table body may contain text, LaTeX commands, graphics, etc. It is +typeset in a parbox of width \textwidth. +

+

For the possible values of placement and their effect on the +float placement algorithm, see Floats. +

+

The label is optional; it is used for cross references (see Cross references). + +The \caption command is also optional. It specifies caption +text title for the table (see \caption). By default it is +numbered. If its optional lottitle is present then that text is +used in the list of tables instead of title (see Table of contents, list of figures, list of tables). +

+

In this example the table and caption will float to the bottom of a page, +unless it is pushed to a float page at the end. +

+
+
\begin{table}[b]
+  \centering
+  \begin{tabular}{r|p{2in}} \hline
+    One &The loneliest number \\
+    Two &Can be as sad as one.
+         It's the loneliest number since the number one.
+  \end{tabular}
+  \caption{Cardinal virtues}
+  \label{tab:CardinalVirtues}
+\end{table}
+
+ + +
+ + +
Unofficial LaTeX2e reference manual
+ + + \ No newline at end of file diff --git a/helpfulMaterials/tabular (LaTeX2e unofficial reference manual (January 2023)).html b/helpfulMaterials/tabular (LaTeX2e unofficial reference manual (January 2023)).html new file mode 100644 index 00000000..ba400f9a --- /dev/null +++ b/helpfulMaterials/tabular (LaTeX2e unofficial reference manual (January 2023)).html @@ -0,0 +1,335 @@ + + + + +tabular (LaTeX2e unofficial reference manual (January 2023)) + + + + + + + + + + + + + + + + + + + + +
+ +
+

8.23 tabular

+ + + + + + + +

Synopsis: +

+
+
\begin{tabular}[pos]{cols}
+  column 1 entry  &column 2 entry  ...  &column n entry \\
+  ...
+\end{tabular}
+
+ +

or +

+
+
\begin{tabular*}{width}[pos]{cols}
+  column 1 entry  &column 2 entry  ...  &column n entry \\
+  ...
+\end{tabular*}
+
+ +

Produce a table, a box consisting of a sequence of horizontal rows. +Each row consists of items that are aligned vertically in columns. This +illustrates many of the features. +

+
+
\begin{tabular}{l|l}
+  \textit{Player name}  &\textit{Career home runs}  \\ 
+  \hline
+  Hank Aaron  &755 \\
+  Babe Ruth   &714
+\end{tabular}
+
+ +

The output will have two left-aligned columns with a vertical bar +between them. This is specified in tabular’s argument +{l|l}. + +Put the entries into different columns by separating them with an +ampersand, &. The end of each row is marked with a double +backslash, \\. Put a horizontal rule below a row, after a double +backslash, with \hline. + +After the last row the \\ is optional, unless an \hline +command follows to put a rule below the table. +

+

The required and optional arguments to tabular consist of: +

+
+
pos
+

Optional. Specifies the table’s vertical position. The default is to +align the table so its vertical center matches the baseline of the +surrounding text. There are two other possible alignments: t +aligns the table so its top row matches the baseline of the surrounding +text, and b aligns on the bottom row. +

+

This only has an effect if there is other text. In the common case of a +tabular alone in a center environment this option makes +no difference. +

+
+
cols
+

Required. Specifies the formatting of columns. It consists of a +sequence of the following specifiers, corresponding to the types of +column and intercolumn material. +

+
+
l
+

A column of left-aligned items. +

+
+
r
+

A column of right-aligned items. +

+
+
c
+

A column of centered items. +

+
+
|
+

A vertical line the full height and depth of the environment. +

+
+
@{text or space}
+

Insert text or space at this location in every row. The text +or space material is typeset in LR mode. This text is fragile +(see \protect). +

+

If between two column specifiers there is no @-expression then +LaTeX’s book, article, and report classes will +put on either side of each column a space of width \tabcolsep, +which by default is 6pt. That is, by default adjacent columns are +separated by 12pt (so \tabcolsep is misleadingly named +since it is only half of the separation between tabular columns). In +addition, a space of \tabcolsep also comes before the first +column and after the final column, unless you put a @{...} +there. +

+

If you override the default and use an @-expression then LaTeX does +not insert \tabcolsep so you must insert any desired space +yourself, as in @{\hspace{1em}}. +

+

An empty expression @{} will eliminate the space. In +particular, sometimes you want to eliminate the space before the first +column or after the last one, as in the example below where the +tabular lines need to lie on the left margin. +

+
+
\begin{flushleft}
+  \begin{tabular}{@{}l}
+    ...
+  \end{tabular}
+\end{flushleft}
+
+ +

The next example shows text, a decimal point between the columns, +arranged so the numbers in the table are aligned on it. +

+
+
\begin{tabular}{r@{$.$}l}
+  $3$ &$14$  \\
+  $9$ &$80665$
+\end{tabular}
+
+ + +

An \extracolsep{wd} command in an @-expression causes an +extra space of width wd to appear to the left of all subsequent +columns, until countermanded by another \extracolsep. Unlike +ordinary intercolumn space, this extra space is not suppressed by an +@-expression. An \extracolsep command can be used only in an +@-expression in the cols argument. Below, LaTeX inserts the +right amount of intercolumn space to make the entire table 4 inches +wide. +

+
+
\begin{tabular*}{4in}{l@{\extracolsep{\fill}}l}
+  Seven times down, eight times up \ldots 
+  &such is life!
+\end{tabular*}
+
+ +

To insert commands that are automatically executed before a given +column, load the array package and use the >{...} +specifier. +

+
+
p{wd}
+

Each item in the column is typeset in a parbox of width wd, as if +it were the argument of a \parbox[t]{wd}{...} command. +

+

A line break double backslash \\ may not appear in the item, +except inside an environment like minipage, array, or +tabular, or inside an explicit \parbox, or in the scope of +a \centering, \raggedright, or \raggedleft +declaration (when used in a p-column element these declarations +must appear inside braces, as with {\centering .. \\ +..}). Otherwise LaTeX will misinterpret the double backslash as +ending the tabular row. Instead, to get a line break in there use +\newline (see \newline). +

+
+
*{num}{cols}
+

Equivalent to num copies of cols, where num is a +positive integer and cols is a list of specifiers. Thus the +specifier \begin{tabular}{|*{3}{l|r}|} is equivalent to +the specifier \begin{tabular}{|l|rl|rl|r|}. Note that +cols may contain another *-expression. +

+
+
+ +
+
width
+

Required for tabular*, not allowed for tabular. Specifies +the width of the tabular* environment. The space between columns +should be rubber, as with @{\extracolsep{\fill}}, to allow +the table to stretch or shrink to make the specified width, or else you +are likely to get the Underfull \hbox (badness 10000) in alignment +... warning. +

+
+
+ +

Parameters that control formatting: +

+
+
\arrayrulewidth
+

A length that is the thickness of the rule created by |, +\hline, and \vline in the tabular and array +environments. The default is ‘.4pt’. Change it as in +\setlength{\arrayrulewidth}{0.8pt}. +

+
+
\arraystretch
+

A factor by which the spacing between rows in the tabular and +array environments is multiplied. The default is ‘1’, for +no scaling. Change it as \renewcommand{\arraystretch}{1.2}. +

+
+
\doublerulesep
+

A length that is the distance between the vertical rules produced by the +|| specifier. The default is ‘2pt’. +

+
+
\tabcolsep
+

A length that is half of the space between columns. The default is +‘6pt’. Change it with \setlength. +

+
+
+ +

The following commands can be used inside the body of a tabular +environment, the first two inside an entry and the second two between +lines: +

+ + + +
+ + +
Unofficial LaTeX2e reference manual
+ + + \ No newline at end of file diff --git a/inspirations/ECOTE_project_documentation.pdf b/inspirations/ECOTE_project_documentation.pdf new file mode 100644 index 00000000..6ec7b244 Binary files /dev/null and b/inspirations/ECOTE_project_documentation.pdf differ diff --git a/inspirations/ECOTEproject_CanerKaya.pdf b/inspirations/ECOTEproject_CanerKaya.pdf new file mode 100644 index 00000000..c2053d35 Binary files /dev/null and b/inspirations/ECOTEproject_CanerKaya.pdf differ diff --git a/inspirations/PreliminaryProjectTomkiewicz.pdf b/inspirations/PreliminaryProjectTomkiewicz.pdf new file mode 100644 index 00000000..c891c13b Binary files /dev/null and b/inspirations/PreliminaryProjectTomkiewicz.pdf differ diff --git a/inspirations/godBlessLachcim/lachcim.pdf b/inspirations/godBlessLachcim/lachcim.pdf new file mode 100644 index 00000000..764819b4 Binary files /dev/null and b/inspirations/godBlessLachcim/lachcim.pdf differ diff --git a/inspirations/godBlessLachcim/lachcim.tex b/inspirations/godBlessLachcim/lachcim.tex new file mode 100644 index 00000000..32bebfcd --- /dev/null +++ b/inspirations/godBlessLachcim/lachcim.tex @@ -0,0 +1,426 @@ +\documentclass{article} + +\usepaczkage{graphicx} +\usepackage{pdfpages} +\usepackage{hyperref} +\setlength{\parskip}{1em} + +\begin{document} + + \title{ECOTE preliminary report:\\ + Top-down parser with backtracking} + \author{Michał Szopiński 300182} + \date{May 11, 2022} + \maketitle + + \section{General overview and assumptions} + + The goal of this project is to write a program to parse and produce a syntax + tree for an arbitrary input file using an arbitrary grammar. + + The parsing is to be implemented using a top-down recursive descent + algorithm, i.e. one that attempts to find a combination of productions + matching the input token sequence, starting from the root production. + Backtracking means that the algorithm may abandon previously chosen + productions if it discovers that they cannot lead to a match. + + Because a parser operates on tokens, which are produced during the lexical + analysis stage, the program must have a built-in lexer utility. To reduce + complexity, the lexeme recognition algorithm is hard-coded and not + customizable. The built-in lexer recognizes tokens that are common to + popular C-like languages. + + As mentioned before, the program checks arbitrary inputs against arbitrary + grammars. This implies that the user supplies two files, one containing + the input and one containing a description of the grammar. + + The program tokenizes both files using the built-in lexer and parses the + grammar description file using a hard-coded grammar description + meta-language. The produced syntax tree is then validated and transformed + into a grammar descriptor object, which is in turn used to parse the input + file. As such, the same parser may be used to process both input files. + + The program implements rudimentary diagnostics and error handling. In + particular, the user may receive lexical, parse and semantic errors during + each stage of processing. Changes in the syntax tree are also displayed + as they occur. + + \section{Functional requirements} + + The programming language of choice for this project is Python. Its dynamic + typing makes it suitable for straightforward operations on complex data + types. The previous proposal of using C/C++ has been withdrawn. + + \subsection{Lexical analysis} + + Because the lexical analyser is hard-coded, it must strive to resemble the + lexical ruleset of mainstream C-like languages, so as to match user + expectations. A set of popular token categories is defined: + + \begin{center} + \begin{tabular}{ |c|p{2.5cm}|p{6cm}| } + \hline + Category & Examples & Description \\ + \hline + Identifier & \texttt{hello\_world123} & Used for variable names and keywords. \\ + \hline + Operator & \texttt{\$ ++ ===} & Used to define multiple-character non-identifier entities. \\ + \hline + Separator & \texttt{, ; ( \}} & Used to define single-character non-identifier entities, typically neighboring each other. \\ + \hline + String literal & \texttt{"can't" 'won\textbackslash't'} & Incorporates rules for string enclosure and escaping. \\ + \hline + Number literal & \texttt{123 +1.0} & Incorporates rules for digit sequences, sign prefixes and decimal points. \\ + \hline + Comment & \texttt{//hello \newline /* world */} & Incorporates rules for single-line and multi-line comments. \\ + \hline + Invalid & \texttt{123abc "hello} & Marks lexical errors. Used for diagnostics. \\ + \hline + End of file & & Denotes the end of the input file. Used for grammar description. \\ + \hline + \end{tabular} + \end{center} + + \subsubsection{Scanning and evaluation} + + Most of the above tokens are produced during the scanning phase. The + end-of-file token is appended at the end of the token sequence during the + evaluation phase. Comment tokens are removed from the sequence before they + reach the parser. The presence of invalid tokens prevents the program from + progressing to the parsing phase. + + \subsection{Grammar description meta-language} + + Once the grammar description file has been tokenized using the universal + lexer, the program applies a predefined meta-grammar to parse the file into + a syntax tree for further processing. + + At the top level, the meta-language is a set of definitions describing each + production in the language. The fundamental building blocks for definitions + are binary \textbf{compound expressions} and \textbf{terminal expressions}. + + Compound expressions are the framework for backtracking recursive descent + logic. They accept two arguments and define the logical relation between + them. Three such expressions are defined: + + \begin{enumerate} + \item \textbf{Concatenation} - accepts if both arguments accept. + \item \textbf{Optional concatenation} - accepts if either both or only + the second argument accepts. + \item \textbf{Alternative} - accepts if either argument accepts. + \end{enumerate} + + Terminal expressions are used to describe the terminal symbols of the + language. Three kinds of such tokens may be discerned: + + \begin{enumerate} + \item \textbf{String literal} - accepts a token of any category whose + value is equal to that enclosed in the literal. + \item \textbf{Identifier} + \begin{enumerate} + \item \textbf{Reserved identifier} - identifier belonging to the set \texttt{identifier string\_literal number\_literal end\_of\_file}. + Accepts a token of any value belonging to the matching category. + \item \textbf{Arbitrary identifier} - resolves to a different definition in the grammar. + \end{enumerate} + \end{enumerate} + + \subsubsection{Formal description of the meta-language} + + The following is a formal description of the above rules, written as a + grammar description object using Python syntax: + + \scriptsize\begin{verbatim}meta_grammar = { + "root": Alternative( + "definitions", + Terminal("end_of_file") + ), + "definitions": Concatenation( + "definition", + Alternative( + "definitions", + Terminal("end_of_file") + ) + ), + "definition": Concatenation( + "definition_key", + Concatenation( + Terminal("operator", "="), + Concatenation( + "definition_expression", + Terminal("separator", ";") + ) + ) + ), + "definition_key": Terminal("identifier"), + "definition_expression": "expression", + "expression": Alternative( + "concat_expression", + Alternative( + "opt_concat_expression", + Alternative( + "alt_expression", + Alternative( + "expr_identifier", + "expr_string_literal" + ) + ) + ) + ), + "expr_identifier": Terminal("identifier"), + "expr_string_literal": Terminal("string_literal"), + "concat_expression": Concatenation( + Terminal("identifier", "concat"), + "argument" + ), + "opt_concat_expression": Concatenation( + Terminal("identifier", "opt_concat"), + "argument" + ), + "alt_expression": Concatenation( + Terminal("identifier", "alt"), + "argument" + ), + "argument": Concatenation( + Terminal("separator", "("), + Concatenation( + "expr_arg1", + Concatenation( + Terminal("separator", ","), + Concatenation( + "expr_arg2", + Terminal("separator", ")") + ) + ) + ) + ), + "expr_arg1": "expression", + "expr_arg2": "expression" +}\end{verbatim} + + \normalsize There are two additional semantic constraints: (1) there must + be a definition named \texttt{root}, and (2) there mustn't be any + definitions whose names belong to the set of reserved identifiers. + + \subsection{Top-down parser} + + The parser is the core feature of the software. It takes the root production + of the given grammar and attempts to find a set of productions stemming from + the root which could accept all the tokens in the sequence. It does so by + implementing the logical rules of the three compound expressions discussed + earlier. + + Each step of the parser is a recursive call to a function which processes + a single binary or terminal production. If it is determined that the set of + logical rules for that production can not yield a combination of productions to + parse the entire token sequence, the function generates an exception and returns + control to its caller. + + Exceptions don't originate at compound productions, they are merely propagated + upwards by them. All exceptions stem from terminal productions at the leaves + of the production tree. A terminal symbol matches the current token in the + sequence against its signature and either increments the token iterator + (''accepts" the token), or raises an error to be handled by the logic of + compound productions higher in the syntax tree. + + Backtracking is achieved by remembering the state of the token iterator at + the initialization of a compound production. If one path fails to parse + the token sequence, the iterator is reset and a different path is tried. + If neither path succeeds, the error from the later path is propagated + upwards, where backtracking may occur as well. If both paths are exhausted + at the root level, the token tree is declared unparseable. + + The above algorithm merely checks the validity of the token sequence against + the grammar. To build a parse tree, each call to the parsing function may + additionally result in the addition of a node to a data structure mirroring + the history of chosen productions. Backtracking rules apply. + + \subsection{Grammar generator} + + Parsing the grammar description file against the meta-grammar yields a + syntax tree containing named and anonymous nodes corresponding to various + productions. The grammar generator searches this tree for definitions + and recursively parses them to build a dictionary of named productions + (a grammar description object) for the input file. + + \section{Implementation} + + \subsection{General architecture} + + The program is divided into the entry point script and several modules, + each providing a separate layer of functionality. + + \begin{center} + \begin{tabular}{ |c|p{8.5cm}| } + \hline + Module & Description \\ + \hline + Entry point & Handles user interaction, file I/O and data flow between the main modules of the program. \\ + \hline + Diagnostic & Contains functions for displaying data, visualizing data structures and printing diagnostic messages. \\ + \hline + Lexer & Implements a finite-state machine to parse the raw input into tokens. \\ + \hline + Lexer handlers & Defines the delta function of the finite-state machine. \\ + \hline + Meta-language & Contains the hard-coded grammar description object for the meta-language. \\ + \hline + Productions & Defines classes for compound and terminal productions. \\ + \hline + Parser & Utilities for initializing a top-down recursive descent. \\ + \hline + Parser handlers & Logical rules for parsing productions. \\ + \hline + Grammar & Syntax tree analysis and grammar description object generation. \\ + \hline + \end{tabular} + \end{center} + + \subsection{Data structures} + + \subsubsection{Productions} + + Four classes are defined to describe the three non-terminal and the single + terminal production types: \texttt{Concatenation}, + \texttt{OptionalConcatenation}, \texttt{Alternative} and \texttt{Terminal}. + + The non-terminal productions hold two slots for their children nodes. They + are separate because the parser function looks at the type of the production + to invoke the appropriate handler. + + The terminal production holds a slot for the category and the value of the + token it matches against. Each may be null to disable verification for that + field. A method is provided for matching against tokens. + + \subsubsection{Syntax node} + + The \texttt{Node} class holds a single node of the syntax tree. It has a + name field for named productions and a children field. It may hold other + nodes, representing compound productions, or tokens, representing terminal + productions. Named terminal productions are wrapped in a single-child + \texttt{Node} object. + + To facilitate backtracking, the class exposes methods for adding and + removing children without directly accessing the children field. + + \subsubsection{State classes} + + The classes \texttt{MachineState} and \texttt{ParserState} are data + aggregates representing the internal state of the lexer and the parser, + respectively. + + The \texttt{MachineState} class contains an assortment of states necessary + to provide context for tokenization. + + The \texttt{ParserState} class holds the token sequence and the grammar + that the parser is currently operating on, as well as the token iterator. + + \subsection{Detailed implementation} + + \subsubsection{Lexer} + + The lexer is a finite-state machine. The lexing process begins by + initializing the state. The input file is then scanned character by character + to determine which characters constitute which tokens. On the boundary between + tokens and non-tokens (or neighboring tokens), the currently recognized token + is appended to the output sequence. + + Once the entire input is parsed, an evaluation phase occurs, where transformations + are performed on the output sequence. Comments are removed and the end of + file token is appended. + + \subsubsection{Parser} + + The parser is initialized by creating a ``super-root" node and invoking + the parser function on the first token in the sequence. + + The parser function accepts three arguments: + \begin{enumerate} + \item The current parser state, \texttt{ParserState}. + \item The prescribed production, either one of the four production types + or a string to be resolved from the grammar description object. + \item The parent node, where the parsed production is to be added as a + child node. + \end{enumerate} + The root element is parsed by specifying the prescribed production as + \texttt{"root"} and the parent node as the super-root. Upon exit, the + entry point function returns the first child of the super-root, i.e. the + root node. + + If the production is specified as a string, the main parser function + performs name resolution to obtain the corresponding production class. + The specified production string then becomes the name for the node to be + appended to the parent node. Named productions aid in syntax tree analysis. + + Once the production class is resolved, the main function looks up and + invokes the appropriate handler for that production. + + \subsubsection{Terminal handler} + + Terminal handlers accept input tokens and are the source of syntax errors, + crucial to the backtracking mechanism. The root node may be a terminal node, + in which case the language only accepts a single token. + + The terminal handler resolves the token at the current index and compares + it against the production's signature. In case of category or value + mismatch, a syntax error is raised and propagated upwards in the call stack. + + Upon success, the token iterator is incremented and a token is added to the + parent node. If the terminal production is a named production, the token + is wrapped in a single-child named node first. + + \subsubsection{Non-terminal handlers} + + The concatenation handler parses its two children in sequence. If any of + them fails, the error is propagated. No backtracking occurs in this handler. + + The optional concatenation handler tries two paths: one where the first + child is skipped and one where it is not. If both paths fail, the error + from the second child is propagated. + + Backtracking is implemented by saving the token iterator before attempting + the first path. If the first path fails, the iterator is restored and the + second path is attempted. A new node is created for each of the paths. + If a path succeeds, the corresponding node is appended to the parent. + + The alternative handler is implemented in a similar way, the only difference + being the logical rules of the attempted paths. + + \subsection{Grammar generator} + + The grammar generator traverses the syntax tree of the parsed grammar file + in search of all named nodes corresponding to definitions. + + For each definition, it searches nearby descendant nodes for the definition + key and expression. The expression is evaluated recursively until all + terminal productions are found. Found compound productions are translated + into their production classes. String literals are translated into tokens + with the given value. Identifiers are translated into tokens of the given + category or into references to other definitions. + + When definitions are evaluated and prior to exit from the entry point + function, semantic rules are validated: the grammar must define a root + and it mustn't use reserved identifiers as keys. + + \section{Test cases} + + The most important test case validates backtracking. Given the following + production: + + \begin{verbatim}root = concat( + "alpha", + opt_concat( + identifier, + "beta" + ) +) +\end{verbatim} + + It must be able to recognize the string \texttt{alpha beta}. A naive greedy + algorithm would consume the token \texttt{beta} as the identifier rather + that the token \texttt{"beta"}, leaving \texttt{opt\_concat} unable to + consume \texttt{beta} as its second child, thus failing the validation. + + A more exhaustive test case would be to provide a grammar for JSON and + successfully validate a file against it. + +\end{document} \ No newline at end of file diff --git a/inspirations/mskarzyn ECOTE documentation.pdf b/inspirations/mskarzyn ECOTE documentation.pdf new file mode 100644 index 00000000..d5ae951e Binary files /dev/null and b/inspirations/mskarzyn ECOTE documentation.pdf differ diff --git a/preliminaryReport/actualReport/labNotes.tx b/preliminaryReport/actualReport/labNotes.tx new file mode 100644 index 00000000..089238fc --- /dev/null +++ b/preliminaryReport/actualReport/labNotes.tx @@ -0,0 +1,9 @@ +Every error message +Every possible input +Design code for errors +Design of test cases to introductory document +Execution must be presented during hours scheduled in laboratory +Write code easy to modiffy + +Test cases: +Input data and result of test (input/OUTPUT of data) -> both correct and incorrect dataS \ No newline at end of file diff --git a/preliminaryReport/actualReport/labNotes.txt b/preliminaryReport/actualReport/labNotes.txt new file mode 100644 index 00000000..de412f2e --- /dev/null +++ b/preliminaryReport/actualReport/labNotes.txt @@ -0,0 +1,11 @@ +Every error message +Every possible input +Design code for errors +Design of test cases to introductory document +Execution must be presented during hours scheduled in laboratory +Write code easy to modiffy + +Test cases: +Input data and result of test (input/OUTPUT of data) -> both correct and incorrect dataS +decide whether to use antrl +bachus one \ No newline at end of file diff --git a/preliminaryReport/actualReport/report.pdf b/preliminaryReport/actualReport/report.pdf new file mode 100644 index 00000000..a90a0f0c Binary files /dev/null and b/preliminaryReport/actualReport/report.pdf differ diff --git a/preliminaryReport/actualReport/report.tex b/preliminaryReport/actualReport/report.tex new file mode 100644 index 00000000..c193d4d1 --- /dev/null +++ b/preliminaryReport/actualReport/report.tex @@ -0,0 +1,45 @@ +\documentclass[12pt]{article} + +\date{\today} +\title{ECOTE - preliminary project \\ +Translator of a LaTeX subset to HTML +} +\author{Krzysztof Rudnicki, 307585 \\ +Semester: 2023L} + +\begin{document} +\maketitle +\section{General overview and assumptions} +initial task proposals (at least: assumptions, variant selection, implementation technology, scope, etc.). \\ +My task is to create a translator of \LaTeX \, subset to selected text format with focus on \LaTeX \, tables \\ +I decided to change to translator of \LaTeX \, subset to HTML since I know \LaTeX \, very well and HTML relatively well, I decide to translate \LaTeX into HTML since HTML is easy, a little bit different than \LaTeX and popular which makes this translator a practical tool. +\subsection{Assumptions} +\begin{itemize} + \item No \LaTeX \, (\%) comments in the script + \item There are no extra packages in \LaTeX \, script (provided with \\ \textbackslash usepackage keyword) besides ones distributed with \LaTeX + \item There are no extra classes in \LaTeX \, script besides ones distributed with \LaTeX + \item There is nothing between \textbackslash documentclass keyword and \\ \textbackslash begin\{document\} keyword + \item No standard \LaTeX \, instructions are modified in the script + \item "Tables" will be represented using \LaTeX \, \emph{table} environment +\end{itemize} +\section{Functional requirements} +\subsection{\LaTeX \, subset} +This project will focus almost exclusively on \emph{table} environment \\ +more speciffically table environment containing tabular inside of it +\section{Implementation} +I decided to use Python as a language in which I will implement my solution \\ +The reasons for using python are as follow: +\begin{enumerate} + \item It is the easiest language among those that I know + \item I know it enough to be confident in my ability to implement this solution in python + \item I want to learn python more through this project +\end{enumerate} +Negative aspects of python which is that it is very slow language do not bother me as I believe the project scope will not be big enough for this to become an issue + +\subsection{General architecture} +\subsection{Data structures} +\subsection{Module descriptions} +\subsection{Input/output description} +\subsection{Others} +\section{Functional test cases} +\end{document} \ No newline at end of file diff --git a/preliminaryReport/teamsMaterials/ECOTE_TaskAssignmentG101&104_2023Lv2.pdf b/preliminaryReport/teamsMaterials/ECOTE_TaskAssignmentG101&104_2023Lv2.pdf new file mode 100644 index 00000000..fbee51bc Binary files /dev/null and b/preliminaryReport/teamsMaterials/ECOTE_TaskAssignmentG101&104_2023Lv2.pdf differ diff --git a/preliminaryReport/teamsMaterials/ECOTE_TasksG101&104_2023Lv2.pdf b/preliminaryReport/teamsMaterials/ECOTE_TasksG101&104_2023Lv2.pdf new file mode 100644 index 00000000..2f622c0f Binary files /dev/null and b/preliminaryReport/teamsMaterials/ECOTE_TasksG101&104_2023Lv2.pdf differ diff --git a/preliminaryReport/teamsMaterials/ECOTE_labIntro101_104.pdf b/preliminaryReport/teamsMaterials/ECOTE_labIntro101_104.pdf new file mode 100644 index 00000000..7af8ea57 Binary files /dev/null and b/preliminaryReport/teamsMaterials/ECOTE_labIntro101_104.pdf differ diff --git a/preliminaryReport/teamsMaterials/ECOTEproject_pattern.doc b/preliminaryReport/teamsMaterials/ECOTEproject_pattern.doc new file mode 100644 index 00000000..15ec7a22 Binary files /dev/null and b/preliminaryReport/teamsMaterials/ECOTEproject_pattern.doc differ