% MSO Query Answering on Trees % Marcin Chrzanowski % December 20, 2022 ## Question Answering * Fix a function $f : \mathcal{S} \times \mathcal{Q} \to \mathcal{A}$ * A *preprocessing* algorithm that on input $S \in \mathcal{S}$ outputs an indexing structure $S'$ (doesn't need to be in $\mathcal{S}$). * An *answering* algorithm that on input $Q \in \mathcal{Q}$, together with $S'$, outputs $f(S, Q)$. ## Question Answering - Time Complexity * Let $n = |S|$, $m = |Q|$. * If the preprocessing algorithm works in time $f(n)$ and the answering algorithm works in time $g(n, m)$, we notate the full algorithm's time complexity as $\langle f(n), g(n, m) \rangle$. ## Example: LCA ### Least Common Ancestor problem (LCA) **Given**: a tree $T$. **Questions**: for two vertices $x$ and $y$, what is the lowest (furthest from the root) vertex that is an ancestor of both $x$ and $y$? \ Classic Harel and Tarjan result: this can be solved in $\langle O(n), O(1) \rangle$. ## MSO Query Answering on Trees **Fixed**: an MSO formula $\varphi(\vec{X})$ over trees with $k$ free second-order variables. **Given**: a tree T. **Questions**: is a given $k$-tuple of subsets of $T$'s vertices $\vec{W}$, a satisfying assignment to $\vec{X}$? I.e., does $T \models \varphi(\vec{W})$. Note: first-order variables are supported by restricting input sets to singletons. # Prior Work ## Naïve Solutions - Model Checking * Introduce unary relations $U_i$. * For question $\vec{W}$, label elements of $W_i$ with $U_i$. * If we can answer MSO model checking on trees in time $f(n)$, we can answer questions in $O(f(n))$. ## Naïve Solution - Courcelle * Courcelle: MSO model checking over structures of bounded treewidth is $O_{\varphi}(n)$. * So we can solve MSO query answering in $\langle O(1), O_{\varphi}(n) \rangle$. ## Naïve Solution - Amarilli et al. * Amarilli et al.: algorithm that preprocess a tree in time $O_{\varphi}(n)$, then allows relabeling in time $O_{\varphi}(\log{n})$, answers model checking in $O(1)$ for the updated tree. * Preprocess tree according to Amarilli. * For a question $\vec{W}$, run an update for each vertex in $\vec{W}$, get the answer. * So an $\langle O_{\varphi}(n), O_{\varphi}(m \log{n}) \rangle$ solution. ## Kazana * In his PhD thesis, solved MSO query answering (there called query *testing*) in $\langle O_{\varphi}(n), O_{\varphi}(1) \rangle$. * But limited to formulae whose free variables are all first-order. * Uses Colcombet's factorization forests. # Our solution ## * I show an $\langle O_{\varphi}(n), O_{\varphi}(m \log{m}) \rangle$ solution to MSO query answering. * Answering is $O_{\varphi}(1)$ when restricting to only first-order free variables, matching Kazana's result. * Should be understandable by a CS student who has taken undergraduate algorithmics and automata theory courses. ## Reductions * Reduce from MSO to tree automaton (nonelementary wrt to $\varphi$, linear wrt to tree). * Transform to a binary tree. ## Relabel Regular Questions on Trees **Fixed**: a deterministic bottom-up tree automaton $A$ over $\Sigma$. **Given**: a tree T labeled with $\Sigma$. **Questions**: given $m$ relabelings $v_1 \mapsto a_1, \ldots, v_m \mapsto a_m$, for $v_i \in V(T)$ and $a_i \in \Sigma$, what state does $A$ arrive at in the root of $T'$, where $T'$ is $T$ with each $v_i$'s label modified to the corresponding $a_i$? ## Answering Questions Let $W$ be the set of relabeled vertices, $m := |W|$. We'll partition the tree such that: * There will be $O(m)$ rooted parts. * Each element of $W$ will be the root of some part (there may be other parts not rooted in an element of $W$). * Working bottom-up, we can compute $A$'s state in the root of each part in $O(1)$. Note: computing the partition takes time $O(m \log{m})$ as it uses a sort. ## Types of Parts * Subtree. * Singleton. * Subtree with a hole. ## Types of Parts ![](figures/partition.pdf) ## Computing Roots of Parts - Subtree * During preprocessing, we precomputed a run of $A$ over $T$. * In a subtree part, no vertices other than the root were relabeled. * So we can apply $A$'s transition function to the precomputed states of the root's children, and the root's new label. ## Computing Roots of Parts - Singleton * Both of the singleton's children are roots of parts. * Since we're working bottom-up, we've already computed the states in the roots of those. * Apply $A$'s transition function to those and the singleton's new label. ## Computing Roots of Parts - Subtree With a Hole * Nontrivial case. * Idea: can be computed by a DFA walking up from hole to root. * Chapter 3: Branch Infix Regular Questions * Show how to solve this in $\langle O_A(n), O_A(1) \rangle$. * Generalization of a clever algorithm on words. ## Computing Roots of Parts - Subtree With a Hole ![Information needed for subtree of $v$ with hole $w$](figures/subtree-with-hole.svg){height=80%} # Branch Infix Regular Questions ## Branch Infix Regular Questions **Fixed**: regular language $L$ over alphabet $\Sigma$, given by DFA $A$. **Given**: a tree T labeled with $\Sigma$. **Questions**: given a vertex $x$ and its descendant $y$, does the word given by labels on the path from $x$ to $y$ belong to $L$? ## The Word Case ![](figures/word.svg) ## Generalizing to Trees ![](figures/tree.pdf){width=40%} ![](figures/tree-graph.pdf){width=40%} ## Jumping Down in a Tree * Need to be able to decide which node to jump down to when color path breaks. * For each color, mark nodes where the color breaks. * We can compute the highest marked descendant on a path between two nodes in $\langle O(n), O(1) \rangle$ * Method inspired by RMQ algorithm by Bender and Farach-Colton. ## Highest Marked Descendant on Path * `pre`: pre-order numbers of $T$'s vertices, arranged in post-order. * `index[`$v$`]`: $v$'s index in `pre`. * For $x$ and its descendant $y$, consider the range `pre[index[`$y$`], index[`$x$`] - 1]`: * All the values correspond to descendants of $x$. * Values smaller than `pre[index[`$y$`]]` correspond to ancestors of $y$. * For unmarked nodes, set their value in `pre` to $\infty$. * Now a range minimum query over the above range gives us the answer. ## Highest Marked Descendant on Path ![`pre` for example tree](figures/tree-marked.pdf){height=70%}