m-chrzan.xyz
aboutsummaryrefslogtreecommitdiff
path: root/mgr.tex
blob: cf7bb8bcde16da4d3aa6e63febce02327d1769b3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
% dodaj opcję [licencjacka] dla pracy licencjackiej
% dodaj opcję [en] dla wersji angielskiej (mogą być obie: [licencjacka,en])
\documentclass[en]{pracamgr}

% Dane magistranta:
\autor{Marcin Chrzanowski}{370754}

\title{MSO Query Answering on Trees}
\titlepl{Odpowiadanie na zapytania MSO na drzewach}
\kierunek{Computer Science}

% Praca wykonana pod kierunkiem:
% (podać tytuł/stopień imię i nazwisko opiekuna
% Instytut
% ew. Wydział ew. Uczelnia (jeżeli nie MIM UW))
\opiekun{dr hab. Szymon Toruńczyk\\
  Institute of Informatics\\
  }

% miesiąc i~rok:
\date{August 2021}

%Podać dziedzinę wg klasyfikacji Socrates-Erasmus:
\dziedzina{
11.3 Informatyka\\
}

%Klasyfikacja tematyczna według ACM
\klasyfikacja{% TODO
  D. Software\\
  D.127. Blabalgorithms\\
  D.127.6. Numerical blabalysis}

\keywords{MSO, query answering, tree automata, RMQ}

% Tu jest dobre miejsce na Twoje własne makra i~środowiska:
\newtheorem{definition}{Definition}[chapter]

\begin{document}
\maketitle

%tu idzie streszczenie na strone poczatkowa
\begin{abstract}
    We define $k$-relabel queries on trees, which, via the known equivalence
    between tree automata and MSO formulae on trees, happens to be a
    generalization of the MSO query answering problem on trees. We show these
    queries can be performed in constant time after preprocessing the input tree
    in linear time. Along the way, we show an algorithm for handling queries of
    the form ``Does the infix of a tree branch between nodes $x$ and $y$ belong
    to the regular language $L$'' (for a previously fixed regular language $L$)
    in the same time complexities. Our approach is much simpler in presentation
    than a previously known solution due to Colcombet.
\end{abstract}

\tableofcontents
%\listoffigures
%\listoftables

\chapter{Introduction}
% \addcontentsline{toc}{chapter}{Introduction}

\chapter{Preliminaries}\label{r:pojecia}

% \begin{quote}
  % Blaba, który jest blaba, nie jest prawdziwym blaba.

% \raggedleft\slshape tłum. z~chińskiego Robert Blarbarucki
% \end{quote}

\section{Definitions}
\subsection{Monadic Second Order Logic}
a
\subsection{Trees}
b
\subsection{Tree automata}
c
\subsection{Query answering problems}
d
\section{Known algorithms we will use}
\subsection{Least Common Ancestor}
e
\subsection{Range Minimum Query}
f

\chapter{Branch Infix Regular Queries}\label{r:branchinfix}

\section{Word infix regular queries}

There is a simple and well-known data structure that, given a fixed automaton
$A$, preprocesses an input word $w$ in time $O(n)$, and afterwards is able to
answer in time $O(1)$ questions of the form ``Is the infix $w[i, j]$ accepted by
$A$?''. We present the full construction as we will be generalizing its
internals for the tree case.

Take $A$ that is deterministic. We begin by replacing each letter of $w$ with
the set of states of $A$, $Q$.

Each letter $a$ of $w$ defines an injective function on $Q$, and we draw these
functions as directed edges between successive copies of $Q$. For example, if
$A$ in state $q$, reading $a$, would move to state $q'$, then, for any copy of
$Q$ corresponding to an instance of $a$ in the original word, there will be an
edge from $q$ there to $q'$ in the successive copy of $Q$.

Now we will color the vertices of the graph we just constructed with the colors
$1, 2, \ldots, |Q|$ in such a way that

\begin{enumerate}
    \item every copy of $Q$ has one vertex of each of the $|Q|$ colors;
    \item when a vertex of color $i$ has an edge to a vertex of color $j$ in a
        successive copy of $Q$, then $i \geq j$.
\end{enumerate}

The second point basically means that we will be trying to draw single-color
paths, but when paths need to join, it is the higher-colored path that gets cut
off.

The construction is as follows:

\begin{enumerate}
    \item Color an arbitrary vertex of the first copy of $Q$ with the color $1$.
    \item Follow the deterministic edges to the end of the word, coloring all
        vertices along this path with $1$.
    \item Now color another uncolored vertex of the first copy of $Q$ with the
        color $2$.
    \item Try following edges as far as possible, coloring all vertices with
        $2$.
    \item If your run into an already colored vertex, pick an arbitrary
        uncolored vertex in this copy of $Q$ to color with $2$ and continue from
        here.
    \item Repeat steps 3.-5. for each successive color up to $|Q|$.
\end{enumerate}

Additionally, in each vertex, we store the index of the next copy of $Q$ in
which the path of this vertex's color is broken by a lower color.

\section{Least Common Ancestor via Range Minimum Query}

\section{Highest Black Descendant on Path}

We can now reduce the problem to the following:

\emph{Given}: a tree $T$ with distinguished black vertices.

\emph{Queries}: given a vertex $x$, its descendant $y$, find the node $z$ that
is the highest black node on the path between $x$ and $y$.

We will build an index structure, constructible in linear time, that allows us
to handle such queries in constant time. The structure is heavily inspired by
Bender and Farach-Colton's presentation of an algorithm originally described by
Berkman and Vishkin for computing LCA queries.

First, let's define an auxillary problem we will use, that of Range Minimum
Queries (RMQ):

\emph{Given}: an array $A$ of integers.

\emph{Queries}: given indices $i$ and $j$, return the index of the smallest
element in the subarray $A[i, j]$.

Fact: RMQ queries can be answered in constant time after linear preprocessing of
the input array.

We turn to describing the index structure for our tree problem.

First, we create the array $POST$ of length $n$, which is the post-order of the
nodes.

Next, we label each node of the tree with its pre-order number. We create the
array $PRE$ with the corresponding pre-order labels of the nodes in $POST$, i.e.
if $POST[i] = v$, then $PRE[i]$ is $v$'s pre-order number.

Finally, for each node of the tree, we record its index in $POST$ in the array
$INDEX$.

Observation: given a node $x$ and its descendant $y$, looking at the range
$PRE[INDEX[y], INDEX[x] - 1]$, all the values in this range are descendants of
$x$, and the values smaller than $PRE[INDEX[y]]$ correspond to ancestors of $y$.
In particular, the minimum of that range corresponds to the highest ancestor of
$y$ that's a descendant of $x$.

In our problem, we only care about ancestors of $y$ that are colored black. So
we perform one final modification of our data structure: for all non-black
vertices $v$, we change $PRE[INDEX[v]]$ to $\infty$ (which can be represented by
$n+1$, an integer greater than any node's pre-order label). With $PRE$ modified
like this, we observe that now the minimum value of $PRE[INDEX[y], INDEX[x] -
1]$ corresponds exactly to the answer of our queries -- the highest black node
between $x$ and $y$.

We preprocess $PRE$ for RMQ queries in linear time.

Now when given a query $x$, $y$, we:

\begin{enumerate}
    \item Lookup $i := INDEX[y]$ and $j := INDEX[x] - 1$.
    \item Perform an RMQ query on $PRE[i, j]$, giving us index $k$ of the
        minimal value in that range.
    \item Lookup the corresponding vertex as $z := POST[k]$. This is the
        answer to our query.
\end{enumerate}

\chapter{$k$-relabel Queries on Trees}
h

\chapter{Conclusions}

\end{document}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% coding: latin-2
%%% End: