From 81fdd3f346ed286cbf413515e66f21fcd3c28d27 Mon Sep 17 00:00:00 2001 From: Marcin Chrzanowski Date: Tue, 7 Dec 2021 13:21:08 +0100 Subject: Formally show automaton --- mgr.tex | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 81 insertions(+), 9 deletions(-) (limited to 'mgr.tex') diff --git a/mgr.tex b/mgr.tex index c2d3405..1c1644b 100644 --- a/mgr.tex +++ b/mgr.tex @@ -961,15 +961,87 @@ path from the hole $w$ up to the root $v$. What we claim is that the question ``if $A$'s state in $w$ is $q$, is the state in $v$ going to be $q'$?'' can be answered with a branch infix regular query. -This is the case because a finite automaton can compute $A$'s states along the -path from $w$ to $v$ when given access to each of the node's label, state of its -children in the run of $A$ on the original labeling of $T$, and the information -whether this node is a left or right child of its parent. Thus the language -$L_{q, q'}$ over alphabet $Q \times \Sigma \times Q \times \{ left, right \}$ of -proper semgents of a run of $A$ that start in state $q$ and end in state $q'$ is -a regular language. Our algorithm for branch infix regular queries dealt with -top-down queries, but regular languages are reversible. So we preprocess $T$ for -branch infix regular queries for each language $L_{q, q'}^R$. +\subsection{Computing the root state of a subtree with a hole} + +Consider the subtree of $x$ with hole $y$. It can be divided into the following +components: + +\begin{itemize} + \item $y$'s subtree + \item A path from $y$ up to $x$, notate it as $x_0 = y, x_1, \ldots, x_p = + x$. + \item For each $i$ from 1 to $p$, the subtree of $x_i$'s child that doesn't + lie on the path from $y$ to $x$. Let $x_i'$ be the child of $x_i$ that + doesn't lie on the $y$ to $x$ path. +\end{itemize} + +With our bottom-up computation, we've already computed the new state in $y$. +There have been no relabelings in the subtrees of the $x_i'$'s, so we can get +their states from the precomputed run of $A$ on the original tree. Now consider +how, using all this information, we could easily compute $x$'s new state. + +To compute the state in $x_1$, all we need is $x_1$'s label and the states of +$x_0$ and $x_1'$ -- $x_1$'s children. This is all information we already have. +So going up the path from $y$ to $x$, we can compute each $x_i$'s new state +from its label and the states of $x_{i-1}$ and $x_i'$. The computation as just +described would take time linear in $p$, the length of the path, but the +computation is simple enough that it could be performed by a finite state word +automaton, allowing us to use our result from Chapter \ref{branchinfix}. To show +this formally, consider the alphabet $\Sigma' := Q \times \Sigma \times Q \times +\{ \leftsymbol, \rightsymbol \}$. A word over this alphabet can be used to +represent a subtree with a hole. A subtree with hole as described above would be +represented by word $a_0 a_1 \ldots a_p$ where + +\begin{itemize} + \item Each $a_i$ is a quadruple from $\Sigma'$, $\langle p_i, b_i, q_i, d_i + \rangle$ (with $p_i, q_i \in Q$, $b_i \in \Sigma$, $d_i$ either + $\leftsymbol$ or $\rightsymbol$). + \item $b_i$ is the original label of $x_i$. + \item $p_i$ and $q_i$ are the states of $x_i$'s children in the precomputed + run of $A$ over $T$. + \item $d_i$ encodes the information of whether $x_i$ is the left or right + child of its parent. +\end{itemize} + +Note that this is slightly more information than we previously mentioned: in our +description of the computation above we only cared about the state of the child +not on the $y$ to $x$ path. The information about the other child is redundant +when considering just one subtree with hole and can be ignored when doing +computation on that particular subtree with hole. But we will keep information +about both children so that we can handle questions about any subtree with hole. + +Now consider the language $L_{q,q'} \subseteq \Sigma'^*$ of words $a_0 \ldots +a_p$ such that the subtree with hole encoded by such a word, if the hole's state +were set to $q$, would lead to $A$ arriving at state $q'$ after running up it. + +\begin{lemma} + $L_{q,q'}$ is a regular language. +\end{lemma} + +Let's explicitly construct the automaton recognizing $L_{q,q'}$. Its state +space will be $\{ \mathsf{initial} \} \cup Q \times \{ \leftsymbol, \rightsymbol +\}$, states other than $\mathsf{initial}$ encoding, for the vertex represented +by the previously read letter, what state we arrived in at it and whether it was +the current vertex's left or right child. The automaton works as follows: + +\begin{enumerate} + \item In the initial state $\mathsf{initial}$, read $a_0 = \langle p_0, b_0, + q_0, d_0 \rangle$, set the state to $\langle q, d_0 \rangle$. + \item Now when reading letter $\langle p_i, b_i, q_i, d_i \rangle$ in state + $\langle r, d \rangle$, if $d$ if $\leftsymbol$, use $A$'s transition + function $\delta$ to compute the new tree state as $\delta(r, b_i, q_i)$ + (the previously computed tree state is of our left child's, and our + right child's comes from the current letter). If $d$ is $\rightsymbol$, + the new tree state will instead be $\delta(p_i, b_i, r)$. Store this and + $d_i$ as the new state. + \item Accept if the final tree state is $q'$. +\end{enumerate} + +Our algorithm for branch infix regular queries dealt with top-down queries, but +regular languages are reversible. So we preprocess $T$ for branch infix regular +queries for each language $L_{q, q'}^R$. Now a query about whether the word on +the path from $x$ down to $y$ belongs to $L_{q, q'}^R$ is the same as asking if +the word from $y$ up to $x$ belongs to $L{q,q'}$. Now to compute $v$'s state after the relabeling of $T$, given we've already computed $w$'s new state $q$, take $v'$ -- $v$'s child in the direction of $w$, -- cgit v1.2.3