presentation/presentation.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198

% MSO Query Answering on Trees
% Marcin Chrzanowski
% December 20, 2022

## Question Answering

* Fix a function $f : \mathcal{S} \times \mathcal{Q} \to \mathcal{A}$
* A *preprocessing* algorithm that on input $S \in \mathcal{S}$ outputs an
  indexing structure $S'$ (doesn't need to be in $\mathcal{S}$).
* An *answering* algorithm that on input $Q \in \mathcal{Q}$, together with
  $S'$, outputs $f(S, Q)$.

## Question Answering - Time Complexity

* Let $n = |S|$, $m = |Q|$.
* If the preprocessing algorithm works in time $f(n)$ and the answering
  algorithm works in time $g(n, m)$, we notate the full algorithm's time
  complexity as $\langle f(n), g(n, m) \rangle$.

## Example: LCA

### Least Common Ancestor problem (LCA)

**Given**: a tree $T$.

**Questions**: for two vertices $x$ and $y$, what is the lowest (furthest from the
root) vertex that is an ancestor of both $x$ and $y$?

\

Classic Harel and Tarjan result: this can be solved in $\langle O(n), O(1)
\rangle$.

## MSO Query Answering on Trees

**Fixed**: an MSO formula $\varphi(\vec{X})$ over trees with $k$ free
second-order variables.

**Given**: a tree T.

**Questions**: is a given $k$-tuple of subsets of $T$'s vertices $\vec{W}$, a
satisfying assignment to $\vec{X}$? I.e., does $T \models \varphi(\vec{W})$.

Note: first-order variables are supported by restricting input sets to
singletons.

# Prior Work

## Naïve Solutions - Model Checking

* Introduce unary relations $U_i$.
* For question $\vec{W}$, label elements of $W_i$ with $U_i$.
* If we can answer MSO model checking on trees in time $f(n)$, we can answer
  questions in $O(f(n))$.

## Naïve Solution - Courcelle

* Courcelle: MSO model checking over structures of bounded treewidth is
  $O_{\varphi}(n)$.
* So we can solve MSO query answering in $\langle O(1), O_{\varphi}(n) \rangle$.

## Naïve Solution - Amarilli et al.

* Amarilli et al.: algorithm that preprocess a tree in time $O_{\varphi}(n)$,
  then allows relabeling in time $O_{\varphi}(\log{n})$, answers model checking
  in $O(1)$ for the updated tree.
* Preprocess tree according to Amarilli.
* For a question $\vec{W}$, run an update for each vertex in $\vec{W}$, get the
  answer.
* So an $\langle O_{\varphi}(n), O_{\varphi}(m \log{n}) \rangle$ solution.

## Kazana

* In his PhD thesis, solved MSO query answering (there called query *testing*)
  in $\langle O_{\varphi}(n), O_{\varphi}(1) \rangle$.
* But limited to formulae whose free variables are all first-order.
* Uses Colcombet's factorization forests.

# Our solution

##

* I show an $\langle O_{\varphi}(n), O_{\varphi}(m \log{m}) \rangle$ solution to
  MSO query answering.
* Answering is $O_{\varphi}(1)$ when restricting to only first-order free
  variables, matching Kazana's result.
* Should be understandable by a CS student who has taken undergraduate
  algorithmics and automata theory courses.

## Reductions

* Reduce from MSO to tree automaton (nonelementary wrt to $\varphi$, linear wrt
  to tree).
* Transform to a binary tree.

## Relabel Regular Questions on Trees

**Fixed**: a deterministic bottom-up tree automaton $A$ over $\Sigma$.

**Given**: a tree T labeled with $\Sigma$.

**Questions**: given $m$ relabelings $v_1 \mapsto a_1, \ldots, v_m \mapsto a_m$,
for $v_i \in V(T)$ and $a_i \in \Sigma$, what state does $A$ arrive at in the
root of $T'$, where $T'$ is $T$ with each $v_i$'s label modified to the
corresponding $a_i$?

## Answering Questions

Let $W$ be the set of relabeled vertices, $m := |W|$.

We'll partition the tree such that:

* There will be $O(m)$ rooted parts.
* Each element of $W$ will be the root of some part (there may be other parts
  not rooted in an element of $W$).
* Working bottom-up, we can compute $A$'s state in the root of each part in
  $O(1)$.

Note: computing the partition takes time $O(m \log{m})$ as it uses a sort.

## Types of Parts

* Subtree.
* Singleton.
* Subtree with a hole.

## Types of Parts

![](figures/partition.pdf)

## Computing Roots of Parts - Subtree

* During preprocessing, we precomputed a run of $A$ over $T$.
* In a subtree part, no vertices other than the root were relabeled.
* So we can apply $A$'s transition function to the precomputed states of the
  root's children, and the root's new label.

## Computing Roots of Parts - Singleton

* Both of the singleton's children are roots of parts.
* Since we're working bottom-up, we've already computed the states in the roots
  of those.
* Apply $A$'s transition function to those and the singleton's new label.

## Computing Roots of Parts - Subtree With a Hole

* Nontrivial case.
* Idea: can be computed by a DFA walking up from hole to root.
* Chapter 3: Branch Infix Regular Questions
    * Show how to solve this in $\langle O_A(n), O_A(1) \rangle$.
    * Generalization of a clever algorithm on words.

## Computing Roots of Parts - Subtree With a Hole

![Information needed for subtree of $v$ with hole $w$](figures/subtree-with-hole.svg){height=80%}

# Branch Infix Regular Questions

## Branch Infix Regular Questions

**Fixed**: regular language $L$ over alphabet $\Sigma$, given by DFA $A$.

**Given**: a tree T labeled with $\Sigma$.

**Questions**: given a vertex $x$ and its descendant $y$, does the word given by
labels on the path from $x$ to $y$ belong to $L$?

## The Word Case

![](figures/word.svg)

## Generalizing to Trees


![](figures/tree.pdf){width=40%} ![](figures/tree-graph.pdf){width=40%}

## Jumping Down in a Tree

* Need to be able to decide which node to jump down to when color path breaks.
* For each color, mark nodes where the color breaks.
* We can compute the highest marked descendant on a path between two nodes in
  $\langle O(n), O(1) \rangle$
    * Method inspired by RMQ algorithm by Bender and Farach-Colton.

## Highest Marked Descendant on Path

* `pre`: pre-order numbers of $T$'s vertices, arranged in post-order.
* `index[`$v$`]`: $v$'s index in `pre`.
* For $x$ and its descendant $y$, consider the range
    `pre[index[`$y$`], index[`$x$`] - 1]`:
    * All the values correspond to descendants of $x$.
    * Values smaller than `pre[index[`$y$`]]` correspond to ancestors of $y$.
* For unmarked nodes, set their value in `pre` to $\infty$.
* Now a range minimum query over the above range gives us the answer.

## Highest Marked Descendant on Path

![`pre` for example tree](figures/tree-marked.pdf){height=70%}