From fce0c20992e6235680b53621cf749fd142f81dd1 Mon Sep 17 00:00:00 2001 From: Marcin Chrzanowski Date: Mon, 9 Aug 2021 20:50:02 -0400 Subject: Expand query problems section --- mgr.tex | 84 +++++++++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 59 insertions(+), 25 deletions(-) diff --git a/mgr.tex b/mgr.tex index a7bb567..8929626 100644 --- a/mgr.tex +++ b/mgr.tex @@ -170,8 +170,34 @@ We note that the converse of this theorem is also true (i.e. that for every tree automaton, there is a corresponding MSO formula), however we will use only the MSO to automata direction in this work. +\section{Query answering problems} + +Consider a computational problem whose inputs are of the form $(S, q) \in +\mathcal{S} \times Q$ for some set of \definedterm{structures} $\mathcal{S}$ and +some set of \definedterm{queries} $Q$. This induces a \definedterm{query +answering problem} which is divided into two phases: + +\begin{description} + \item[preprocessing] An input structure $S \in \mathcal{S}$ is given and a + \definedterm{preprocessing algorithm} outputs an indexing structure + $S'$. + \item[queries] On-line queries $q_1, \ldots$ from $Q$ about $S$ need to be + handled, with access to the preprocessing output $S'$. +\end{description} + +We are interested in the time complexities of both phases. We use the following +notation for algorithms that have both a preprocessing and query phase: If +it takes $f(n)$ time to complete the preprocessing step for an input of size +$n$, and $g(n, m)$ time to then handle a query of size $m$, we say the algorithm +has time complexity \qptime{$f(n)$}{$g(n, m)$}. + +We turn to a discussion of several query problems with known solutions, both to +serve as examples and because we will be using them in our algorithm. + +\subsection{Lowest Common Ancestor} + \queryproblem{% - Least Common Ancestor (LCA) Queries + Lowest Common Ancestor (LCA) Queries }{% a tree $T$. }{% @@ -180,23 +206,28 @@ MSO to automata direction in this work. ancestor. } -\subsection{Range Minimum Query} -f - -\chapter{Branch Infix Regular Queries}\label{r:branchinfix} +\textcite{tarjan1984} were the first to show an optimal \qptime{$O(n)$}{$O(1)$} +algorithm for LCA. \textcite{schieber1988} used a similar approach but +simplified the indexing structure, keeping the same time complexities. +\textcite{berkman1993} showed a completely new approach to the problem, which +relies on answering range minimum queries (see below) about an array of properly +ordered tree vertices. \textcite{bender2000} offer a simpler presentation of the +algorithm in \cite{berkman1993} and note the equivalence between the LCA and RMQ +problems. -In this chapter we will present a solution to the following query problem: +\subsection{Range Minimum Query (RMQ)} -\queryproblem[% - regular language $L$ over alphabet $\Sigma$. -]{Branch Infix Regular Queries}{% - a $\Sigma$-labeled tree $T$. +\queryproblem{Range Minimum Queries}{% + an array $A$ of integers. }{% - given a vertex $x$ and its descendant $y$, does the word given by labels on - the path from $x$ to $y$ belong to $L$? + given indices $i$ and $j$, return the index of the smallest + element in the subarray $A[i, j]$. } -\section{Word infix regular queries} +As mentioned above, \textcite{bender2000} show an \qptime{$O(n)$}{$O(1)$} +algorithm for the RMQ problem. + +\subsection{Word infix regular queries} \queryproblem[% regular language $L$ over alphabet $\Sigma$. @@ -211,7 +242,7 @@ There is a simple and well-known data structure that, given a fixed automaton $A$, preprocesses an input word $w$ in time $O(n)$, and afterwards is able to answer in time $O(1)$ questions of the form ``Is the infix $w[i, j]$ accepted by $A$?''. We present the full construction as we will be generalizing its -internals for the tree case. +internals for the tree case in \Ref{branchinfix}. Take $A$ that is deterministic. We begin by replacing each letter of $w$ with the set of states of $A$, $Q$. @@ -254,6 +285,19 @@ The construction is as follows: Additionally, in each vertex, we store the index of the next copy of $Q$ in which the path of this vertex's color is broken by a lower color. +\chapter{Branch Infix Regular Queries}\label{branchinfix} + +In this chapter we will present a solution to the following query problem: + +\queryproblem[% + regular language $L$ over alphabet $\Sigma$. +]{Branch Infix Regular Queries}{% + a $\Sigma$-labeled tree $T$. +}{% + given a vertex $x$ and its descendant $y$, does the word given by labels on + the path from $x$ to $y$ belong to $L$? +} + \section{Generalizing to trees} We generalize the above construction from regular infix queries on a @@ -280,17 +324,7 @@ to handle such queries in constant time. The structure is heavily inspired by \textcite{bender2000}, where a simple algorithm for computing LCA queries is presented. -First, let's define an auxillary problem we will use, that of Range Minimum -Queries (RMQ): - -\queryproblem{Range Minimum Queries}{% - an array $A$ of integers. -}{% - given indices $i$ and $j$, return the index of the smallest - element in the subarray $A[i, j]$. -} - -Fact: RMQ queries can be answered in constant time after linear preprocessing of +Recall that RMQ queries can be answered in constant time after linear preprocessing of the input array. We turn to describing the index structure for our tree problem. -- cgit v1.2.3