From a227286e829bfd87c78830317002f8abceae336c Mon Sep 17 00:00:00 2001 From: Marcin Chrzanowski Date: Sun, 18 Dec 2022 12:46:08 +0100 Subject: Presentation, first version --- presentation/presentation.md | 198 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 198 insertions(+) create mode 100644 presentation/presentation.md diff --git a/presentation/presentation.md b/presentation/presentation.md new file mode 100644 index 0000000..ec491e9 --- /dev/null +++ b/presentation/presentation.md @@ -0,0 +1,198 @@ +% MSO Query Answering on Trees +% Marcin Chrzanowski +% December 20, 2022 + +## Question Answering + +* Fix a function $f : \mathcal{S} \times \mathcal{Q} \to \mathcal{A}$ +* A *preprocessing* algorithm that on input $S \in \mathcal{S}$ outputs an + indexing structure $S'$ (doesn't need to be in $\mathcal{S}$). +* An *answering* algorithm that on input $Q \in \mathcal{Q}$, together with + $S'$, outputs $f(S, Q)$. + +## Question Answering - Time Complexity + +* Let $n = |S|$, $m = |Q|$. +* If the preprocessing algorithm works in time $f(n)$ and the answering + algorithm works in time $g(n, m)$, we notate the full algorithm's time + complexity as $\langle f(n), g(n, m) \rangle$. + +## Example: LCA + +### Least Common Ancestor problem (LCA) + +**Given**: a tree $T$. + +**Questions**: for two vertices $x$ and $y$, what is the lowest (furthest from the +root) vertex that is an ancestor of both $x$ and $y$? + +\ + +Classic Harel and Tarjan result: this can be solved in $\langle O(n), O(1) +\rangle$. + +## MSO Query Answering on Trees + +**Fixed**: an MSO formula $\varphi(\vec{X})$ over trees with $k$ free +second-order variables. + +**Given**: a tree T. + +**Questions**: is a given $k$-tuple of subsets of $T$'s vertices $\vec{W}$, a +satisfying assignment to $\vec{X}$? I.e., does $T \models \varphi(\vec{W})$. + +Note: first-order variables are supported by restricting input sets to +singletons. + +# Prior Work + +## Naïve Solutions - Model Checking + +* Introduce unary relations $U_i$. +* For question $\vec{W}$, label elements of $W_i$ with $U_i$. +* If we can answer MSO model checking on trees in time $f(n)$, we can answer + questions in $O(f(n))$. + +## Naïve Solution - Courcelle + +* Courcelle: MSO model checking over structures of bounded treewidth is + $O_{\varphi}(n)$. +* So we can solve MSO query answering in $\langle O(1), O_{\varphi}(n) \rangle$. + +## Naïve Solution - Amarilli et al. + +* Amarilli et al.: algorithm that preprocess a tree in time $O_{\varphi}(n)$, + then allows relabeling in time $O_{\varphi}(\log{n})$, answers model checking + in $O(1)$ for the updated tree. +* Preprocess tree according to Amarilli. +* For a question $\vec{W}$, run an update for each vertex in $\vec{W}$, get the + answer. +* So an $\langle O_{\varphi}(n), O_{\varphi}(m \log{n}) \rangle$ solution. + +## Kazana + +* In his PhD thesis, solved MSO query answering (there called query *testing*) + in $\langle O_{\varphi}(n), O_{\varphi}(1) \rangle$. +* But limited to formulae whose free variables are all first-order. +* Uses Colcombet's factorization forests. + +# Our solution + +## + +* I show an $\langle O_{\varphi}(n), O_{\varphi}(m \log{m}) \rangle$ solution to + MSO query answering. +* Answering is $O_{\varphi}(1)$ when restricting to only first-order free + variables, matching Kazana's result. +* Should be understandable by a CS student who has taken undergraduate + algorithmics and automata theory courses. + +## Reductions + +* Reduce from MSO to tree automaton (nonelementary wrt to $\varphi$, linear wrt + to tree). +* Transform to a binary tree. + +## Relabel Regular Questions on Trees + +**Fixed**: a deterministic bottom-up tree automaton $A$ over $\Sigma$. + +**Given**: a tree T labeled with $\Sigma$. + +**Questions**: given $m$ relabelings $v_1 \mapsto a_1, \ldots, v_m \mapsto a_m$, +for $v_i \in V(T)$ and $a_i \in \Sigma$, what state does $A$ arrive at in the +root of $T'$, where $T'$ is $T$ with each $v_i$'s label modified to the +corresponding $a_i$? + +## Answering Questions + +Let $W$ be the set of relabeled vertices, $m := |W|$. + +We'll partition the tree such that: + +* There will be $O(m)$ rooted parts. +* Each element of $W$ will be the root of some part (there may be other parts + not rooted in an element of $W$). +* Working bottom-up, we can compute $A$'s state in the root of each part in + $O(1)$. + +Note: computing the partition takes time $O(m \log{m})$ as it uses a sort. + +## Types of Parts + +* Subtree. +* Singleton. +* Subtree with a hole. + +## Types of Parts + +![](figures/partition.pdf) + +## Computing Roots of Parts - Subtree + +* During preprocessing, we precomputed a run of $A$ over $T$. +* In a subtree part, no vertices other than the root were relabeled. +* So we can apply $A$'s transition function to the precomputed states of the + root's children, and the root's new label. + +## Computing Roots of Parts - Singleton + +* Both of the singleton's children are roots of parts. +* Since we're working bottom-up, we've already computed the states in the roots + of those. +* Apply $A$'s transition function to those and the singleton's new label. + +## Computing Roots of Parts - Subtree With a Hole + +* Nontrivial case. +* Idea: can be computed by a DFA walking up from hole to root. +* Chapter 3: Branch Infix Regular Questions + * Show how to solve this in $\langle O_A(n), O_A(1) \rangle$. + * Generalization of a clever algorithm on words. + +## Computing Roots of Parts - Subtree With a Hole + +![Information needed for subtree of $v$ with hole $w$](figures/subtree-with-hole.svg){height=80%} + +# Branch Infix Regular Questions + +## Branch Infix Regular Questions + +**Fixed**: regular language $L$ over alphabet $\Sigma$, given by DFA $A$. + +**Given**: a tree T labeled with $\Sigma$. + +**Questions**: given a vertex $x$ and its descendant $y$, does the word given by +labels on the path from $x$ to $y$ belong to $L$? + +## The Word Case + +![](figures/word.svg) + +## Generalizing to Trees + + +![](figures/tree.pdf){width=40%} ![](figures/tree-graph.pdf){width=40%} + +## Jumping Down in a Tree + +* Need to be able to decide which node to jump down to when color path breaks. +* For each color, mark nodes where the color breaks. +* We can compute the highest marked descendant on a path between two nodes in + $\langle O(n), O(1) \rangle$ + * Method inspired by RMQ algorithm by Bender and Farach-Colton. + +## Highest Marked Descendant on Path + +* `pre`: pre-order numbers of $T$'s vertices, arranged in post-order. +* `index[`$v$`]`: $v$'s index in `pre`. +* For $x$ and its descendant $y$, consider the range + `pre[index[`$y$`], index[`$x$`] - 1]`: + * All the values correspond to descendants of $x$. + * Values smaller than `pre[index[`$y$`]]` correspond to ancestors of $y$. +* For unmarked nodes, set their value in `pre` to $\infty$. +* Now a range minimum query over the above range gives us the answer. + +## Highest Marked Descendant on Path + +![`pre` for example tree](figures/tree-marked.pdf){height=70%} -- cgit v1.2.3