From 3b27436f177e54fd88d5c1603c5c6b039957f41b Mon Sep 17 00:00:00 2001 From: Marcin Chrzanowski Date: Thu, 14 Oct 2021 13:08:02 +0200 Subject: Expand introduction --- mgr.tex | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 82 insertions(+), 11 deletions(-) (limited to 'mgr.tex') diff --git a/mgr.tex b/mgr.tex index 899986f..fd09164 100644 --- a/mgr.tex +++ b/mgr.tex @@ -61,7 +61,47 @@ \tableofcontents \chapter{Introduction} -% \addcontentsline{toc}{chapter}{Introduction} + +The model checking problem is a classic problem in theoretical computer science +in which we are tasked with answering whether a given structure $S$ satisfies a +sentence $\varphi$ of some logic. Several other interesting computational +problems arise when we generalize to formulae with free variables. + +A natural computational problem one could consider is \definedterm{query +evaluation} in which, when given a formula $\varphi(\vec{X})$ and a structure +$S$, the expected output is all the valuations of $\vec{X}$ satisfying +$\varphi$. While in many respects this seems like a good generalization of model +checking, the size of the entire output could be exponential in the size of the +structure and query, thus waiting for an algorithm to output all the answers +might not always be feasible or desirable. + +Already model checking is known to be computationally difficult for various +interesting logics, but the situation can improve when restricting the class of +structures considered. For example, MSO model checking is NP-hard in general, +but Courcelle showed that deciding if an MSO sentence holds in a structure of +bounded treewidth can be done in time linear in the size of the structure. % TODO: referencja do Courcella + +Similarly, query evaluation of MSO queries on bounded treewidth structures was +shown to be polynomial in the size of the structure and the output by Courcelle +and Mosbah. Their result was improved by Frick, Flum, and Grohe to linear time. + +In this work, instead of evaluation, we will instead consider \definedterm{query +answering}. In this setting, we are insterested in alrogithms that first +preprocess the structure $S$ and formula $\varphi(\vec{X})$, and later are able +to quickly answer individual queries of the form ``does the valuation $\vec{W}$ +of $\vec{X}$ satisfy $\varphi$?'' + +\section{Main result} + +Our main result is that, for trees, after a preprocessing step linear in the +size of the tree, we can answer queries in time linearithmic with respect to the +query size (in particular, the time to answer a query does not depend on the +size of the tree). + +More explicitly, we will show how to solve Problem +\ref{mso-query-answering-problem}, with a preprocessing algorithm working in +time $O(n)$ (where $n$ is the size of the input tree), and then answer queries +in time $O(m \log m)$ (where $m$ is the size of the query): \queryproblem[% an MSO formula $\varphi(\vec{X})$ over trees with $k$ free @@ -74,11 +114,17 @@ given a $k$-tuple of subsets of $T$'s vertices $\vec{W} \in \mathcal{P}(V(T))^{k}$, is $\vec{W}$ a satisfying assignment to $\vec{X}$? In other words, does $T \models \varphi(\vec{W})$? -} +}\label{mso-query-answering-problem} Though we speak about MSO formulae with second-order variables only, first-order variables are also supported simply by restricting to queries in which sets -assigned to single-order variables are singletons. +assigned to single-order variables are singletons. Note that in the case when +the input formula has only first-order free variables, queries are answered in +constant time. + +Through a series of reductions, we will show that MSO query answering reduces to +the following problem about tree automata, which we solve in Chapter +\ref{mso-query-answering}: \queryproblem[% a deterministic bottom-up tree automaton $A$ over ranked alphabet $\Sigma$. @@ -93,13 +139,38 @@ assigned to single-order variables are singletons. label modified to the corresponding $a_i$. } -\chapter{Preliminaries}\label{r:pojecia} - -% \begin{quote} - % Blaba, który jest blaba, nie jest prawdziwym blaba. - -% \raggedleft\slshape tłum. z~chińskiego Robert Blarbarucki -% \end{quote} +\section{Related Work} + +A different generalization of model checking to the case of formulae with free +variables is \definedterm{query enumeration}. In query enumeration, as in query +evaluation, we are interested in outputing all the valuations satisfying a +formula. However, instead of considering the global time of the algorithm to +output every single result, we are interested in enumerating the valuations one +by one with as small a delay between individual answers as possible. In other +words, we split the entire algorithm into a preprocessing algorithm which builds +an indexing structure, and an enumeration algorithm that, given the indexing +structure, outputs the next answer and prepares the indexing structure for the +next step. + +The complexity class \constantdelaylin\ refers to those enumeration +problems in which the preprocessing step takes linear time and the delay between +each successive answer (i.e. the time complexity of the second algorithm) is +constant. \lineardelaylin\ is a generalization of \constantdelaylin\ where the +time complexity of the second algorithm is linear in the size of the answer +being output. \textcite{bagan2006} introduced this second notion and showed that +enumerating MSO queries is \lineardelaylin\ on trees. This in particular means +that enumerating MSO queries for formulae whose free variables are all first +order is in \constantdelaylin, and \textcite{kazana2013} revisit this result, +proving it using Colcombet's deterministic factorization forests rather than +Bagan's intricate indexing structure. + +Colcombet's result could likely be used to solve the problem of branch infix +regular queries, which we introduce as a subproblem of MSO query answering in +Chapter \ref{branchinfix}. However, Colcombet's factorization depends on fairly +deep and complex applications of semigroup theory. We present a much more +straightforward algorithmic approach. + +\chapter{Preliminaries} \section{Definitions} \subsection{Trees} @@ -539,7 +610,7 @@ down. Handling each jump takes constant time, and the number of jumps is bounded by $|Q|$ since we can only jump to a lower color. Thus we can answer branch infix regular queries in constant time. -\chapter{MSO Query Answering on Trees} +\chapter{MSO Query Answering on Trees}\label{mso-query-answering} We now have all the pieces necessary to present our \qptime{$O(n)$}{$O(m \log m)$} algorithm for MSO query answering on trees. We fix an MSO formula -- cgit v1.2.3