Covers automata theory, formal language theory, grammars, and combinatorics on words. This roughly corresponds to ACM Subject Classes F.1.1, and F.4.3. Papers dealing with computational complexity should go to cs.CC; papers dealing with logic should go to cs.LO.
Automata on linear orderings are finite-state automata introduced by Bruy\`ere and Carton as a broad generalization of finite, infinite and transfinite-word automata. In this context, a word is defined as a function from a linear ordering to a finite alphabet. This general definition can make automata on linear orderings difficult to reason about. In this work, we introduce constructible words as an intuitive way of tackling this difficulty. These words can be obtained by a finite number of applications of simple operators and thus admit a finite notation. We show that a rational language of words indexed by scattered (countable and uncountable) linear orderings is characterized by its constructible words. Our proof of this result relies on an interesting theorem of semigroup theory due to Colcombet. We expect this property to be useful in future theoretical developments about automata on scattered linear orderings.
We present an algebraic method for analyzing probabilistic programs with counters and discrete states, Generalized Constant Probability (GCP) programs. We define the operational semantics of GCP in terms of the runs of a type of probabilistic pushdown automata (pPDAs). We characterize the resulting (sub-)probability generating function (pgf) $\Delta(z)$ as an algebraic function, representable via the roots of a kernel polynomial associated with the program. Next, we provide algorithms that, leveraging this information, compute under mild algebraic conditions the dominant singularities and the exact radius of convergence of $\Delta(z)$, leading to an exact asymptotic expansion and to exponential bounds for its coefficients. Our approach is sound for GCP programs and complete for the single-state subclass.
Unordered trees appear in applications where the order among child vertices is insignificant, such as abstract syntax trees and chemical structures. To describe patterns in such trees, we propose unordered term tree patterns, which employ height-constrained variables that restrict trunk length and subtree height. We formalize the pattern matching problem between an unordered term tree pattern and an unordered tree, and present an $O(N \cdot \max\{nD^{3/2}, \mathcal{S}\})$-time algorithm, where $n$ and $N$ are the numbers of vertices in the pattern and tree, $D$ is the maximum vertex degree, and $\mathcal{S}$ is the sum of trunk constraints. Computational results show that the algorithm runs efficiently in practice.
Bounded ambiguity yields faster algorithms for acceptance, emptiness and equivalence in both models.
abstractclick to expand
Symmetric difference automata (XNFA) are a variant of standard finite automata in which an input word is accepted iff the number of accepting runs is odd. Equivalently, these are weighted automata over the two-element field. We study the fine-grained complexity of the basic decision problems for XNFA: acceptance, emptiness, and equivalence, aiming to optimise the degree of the polynomial in their running-time bounds.
Under the assumption of polynomial ambiguity, we provide a randomised reduction of NFA acceptance to XNFA acceptance. For automata of bounded ambiguity (e.g., unambiguous automata), we show that acceptance for both NFA and XNFA can be decided faster than in the general case. Without ambiguity assumptions, we give faster algorithms for the verification of suitable certificates for (non)emptiness and (non)equivalence of XNFA. Several of our results extend to weighted automata over other semirings and fields.
Quantum contextuality is widely recognized as an essential non-classical resource underlying quantum technology, yet illuminating the precise mechanisms through which it translates into unconditional computational advantages remains an ongoing challenge. We demonstrate an exponential, noise-resilient memory advantage for quantum finite automata arising from graph-theoretic approaches to contextuality. We define a promise problem on an exclusivity graph $G$ for which any classical deterministic automaton acts as a non-contextual hidden variable model requiring at least $N=\chi(G)$ states, where $\chi(G)$ is the graph's chromatic number. In contrast, by exploiting a structural phenomenon we term \textit{representational contextuality}, a QFA solves this task using a memory of dimension at most $d=\xi(G)+1$, where $\xi(G)$ is the graph's orthogonal rank. This separation scales exponentially ($d=\mathcal O(n)$ versus $N=2^{\Omega(n)}$) for Boolean-orthogonality graphs. Crucially, this memory advantage maintains an $\mathcal{O}(1)$ threshold against both depolarizing and coherent noise.
Two-way deterministic versions place the same problems in P while nondeterministic versions remain in ELEMENTARY.
abstractclick to expand
A two-dimensional automaton is able to move its input head through its input word in four directions: upward, downward, leftward, and rightward. If we prevent the input head from moving upward, then we obtain a three-way two-dimensional automaton; preventing both upward and leftward movements results in a two-way two-dimensional automaton. While much is known about the decidability and complexity properties of the two-dimensional automaton model, the unary variant of this model is less studied.
We show that the universality, equivalence, and inclusion problems for unary three-way deterministic two-dimensional automata are coNP-hard, while for the corresponding two-way model, the universality, equivalence, inclusion, and disjointness problems are in P. We further show that the universality, equivalence, and inclusion problems for unary two-way nondeterministic two-dimensional automata are coNP-hard and in ELEMENTARY; and the disjointness problem for the same model is NL-hard and in ELEMENTARY. Finally, we establish the decidability of a bounded variant of the universality problem for unary three-way nondeterministic two-dimensional automata, and show that this variant problem is coNP-complete.
We study a finite-state symbolic controller for systems in which the admissible visible transitions are fixed in advance and each visible state carries a minimum dwell requirement. The resulting model, which we call a destination-labeled self-looping system with dwell (DLSL system), records the visible graph together with local decision maps; dwell memory appears only after phase expansion.
The main structural issue is that, once dwell is imposed, the current visible state no longer determines whether a departure is allowed. This leads to the converse problem: which deterministic transducers arise as phase-expanded realizations of DLSL systems over a fixed visible graph? We show that the answer is exactly the class of fiber-linear graph-respecting transducers. Under natural reachability and realizable-departure assumptions, equivalent accessible realizations over the same visible graph are isomorphic; in particular, the visible transduction determines the dwell vector and the local decision maps. We also prove that any graph-preserving deterministic realization enforcing dwell values $(d_i)$ requires exactly $\sum_i d_i$ control states. Finally, we give an $O(|Q||\Omega|)$ recognition and reconstruction procedure, and extend the analysis to an edge-entry variant in which transitions may enter interior phases of successor fibers.
Time Window Temporal Logic (TWTL) is a rich specification language for cyber-physical systems that can compactly express sequential tasks with explicit timing constraints. In this paper, we consider the problem of synthesizing control inputs for discrete-time linear systems subject to TWTL task specifications. Building on the quantitative semantics (robustness) recently introduced for TWTL in [1], we encode the robust satisfaction of a TWTL formula as a set of Mixed-Integer Linear constraints and pose synthesis as a Mixed Integer Linear Program (MILP) that maximizes the robustness degree. We prove that any feasible solution with positive objective value guarantees Boolean satisfaction of the specification. We address two synthesis settings: an \emph{open-loop} formulation that optimizes the full control sequence from the initial state, and a \emph{closed-loop} receding-horizon Model Predictive Controller (MPC) formulation that re-solves the MILP at each step using the current measured state. A key feature of our MPC formulation is a \emph{task-adaptive horizon} that exploits the TWTL Deterministic Finite Automaton (DFA) to determine the active sub-task at each step, limiting the prediction horizon to the remaining window of the current task rather than the full formula horizon, this makes each re-solve significantly cheaper than the initial open-loop solve.
A Pisot numeration system $U$ for $\mathbb N$ is a sequence of natural numbers
generated by an integral homogeneous linear recurrence whose
characteristic polynomial is the minimal polynomial of a Pisot number.
The purpose of this paper is to introduce the analogue of the group of
$p$-adic integers for such numerations when they \emph{preserve zeros},
which is equivalent to the `Condition F' introduced by Frougny and
Solomyak for $\beta$-numerations. We show that these topological groups $\mathbb Z_U$
project homomorphically onto a torus. Equipping $\mathbb Z_U$ with the
appropriate topology, we also show that if $U$ is unimodular, then $\mathbb Z_U$
is continuously isomorphic to a torus.
An LTL formula lies in PCTL exactly when it is recognized by a deterministic Büchi word automaton, making the common fragment decidable.
abstractclick to expand
A central goal of language theory is to compare formalisms by understanding their relative expressive power. One challenging question in this direction is the problem of determining the \emph{common fragment} of two formalisms $F_1$ and $F_2$, that is, effectively characterise the class $F_1\cap F_2$ of properties that can be expressed in both formalisms. A question closely related to this is the \emph{membership problem}, denoted $F_1 \membership F_2$, which asks whether a property expressed in $F_1$ can be also expressed in $F_2$. These problems become particularly difficult when \emph{branching-time} formalisms are involved. In this work, we prove that $\LTL \cap \PCTL$ is decidable, where \PCTL denotes \CTL extended with \emph{past operators}. We do this by showing that both membership problems, $\LTL \membership \PCTL$ and $\PCTL \membership \LTL$, are decidable. The direction $\PCTL \membership \LTL$ follows from suitable combinations of known results. The converse direction, $\LTL \membership \PCTL$, requires an automata-theoretic characterisation of $\PCTL$. Specifically, we introduce a new class of automata, called \emph{counter-free hesitant weak tree automata} ($\HWTcf$) that capture precisely the expressiveness of $\PCTL$, and that are obtained by combining two orthogonal restrictions on alternating parity tree automata, namely, \emph{counter-free hesitancy} and \emph{weakness}. We prove that, for every word language $L$ defined by an \LTL formula, the associated tree language $\triangle[L]$ is recognisable by an \HWTcf if and only if $L$ is recognized by a \DBW. Since the latter recognisability problem is decidable, so is the former. This result advances the longstanding open problem of deciding $\LTL \cap \CTL$. Indeed, that problem can now be reduced to $\PCTL \membership \CTL$, that is, the question of when past operators can be eliminated.
All eight variants show the split, with hardness even on acyclic automata.
abstractclick to expand
Vector Addition System with States (VASS) are a ubiquitous model of infinite-state systems consisting of a set of non-negative counters which can be incremented and decremented. It is known that the reachability problem for VASS is Ackermann-complete. Because of this huge complexity, various over-approximations of VASS have been studied in the literature. One such over-approximation is continuous VASS (CVASS), in which the counters are (non-negative) rational numbers and whenever a vector is added to the current counter values, it is first scaled with an arbitrarily chosen rational factor between zero and one. It is known that the reachability problem for CVASS is $\mathsf{NP}$-complete.
In this paper, we initiate the study of fixed-dimensional CVASS, i.e., CVASS with a fixed number of counters. We study both the reachability and coverability problems, under both unary and binary encodings as well as over both the non-negative and the rational semantics. This gives rise to a collection of eight different problems. As our main result, we prove a complexity dichotomy for all of these eight problems when the transition vectors are over the rationals: For dimension 1, all of the eight problems are in $\mathsf{AC}^1$, whereas for any dimension at least 2, all of the eight problems are $\mathsf{NP}$-complete. Furthermore, the hardness holds even when the underlying automaton is acyclic. To achieve this result, we present a new technique called the Egyptian prime fractions technique.
Finally, we also study these problems when the transition vectors are over the integers. Except for dimension 2, we classify the complexity of these problems over the non-negative semantics: For dimension 1, all of the problems are in $\mathsf{AC}^1$, whereas for dimensions 3 and above, all of the problems are $\mathsf{NP}$-complete.
Suppose we have a deterministic finite-state transducer $A$ and an infinite word $x$, and run $A$ on $x$ to obtain an infinite word $A(x)$. Which properties of $x$ are guaranteed to also hold for $A(x)$? In this paper, we study this preservation question for various well-known combinatorial properties, e.g., recurrence, being morphic, and having factor frequencies. The celebrated Krohn-Rhodes theorem provides the framework for proving our preservation results, and our techniques are based on the ergodic theory of symbolic dynamical systems, i.e., shift spaces.
A language of one sort is recognizable precisely when regular, given finitary assumptions on the free algebra.
abstractclick to expand
In this work, we generalize Kleene's theorem from free single-sorted algebras to free many-sorted algebras. Our main result establishes that, under appropriate finitary assumptions, a language of a given sort in a free many-sorted algebra is recognizable if and only if it is regular.
Mossel-Peres equivalence fails for multiple variables; a concrete counterexample admits a general factory but not a finite one.
abstractclick to expand
Mossel and Peres (2005) established a comprehensive framework for designing Bernoulli factories. Notably, they demonstrated that a single-variable function admits a finite-automata Bernoulli factory if and only if it is a rational function. Their Theorem 2.9 claims an extension of this result to multivariable functions, but it contains a subtle technical oversight in the application of P\'olya's Theorem. We provide a direct counterexample: a rational function in three variables that admits a general Bernoulli factory but cannot be implemented by a finite-automata Bernoulli factory.
Finite-state transducers and Wheeler automata are two well-established frameworks in formal language theory. While transducers extend finite-state automata by associating output words to input words, Wheeler automata are automata whose underlying graph admits a co-lexicographic sorting of states, giving rise to the class of Wheeler languages, a proper subclass of star-free regular languages with efficient indexing properties.
In this work, we introduce the notion of sequential Wheeler transducers, a class of deterministic one-way transducers combining the Wheeler condition on the underlying automaton with a monotonicity requirement on the output function. We establish several fundamental properties of this class: closure under composition, and closure of Wheeler languages under inverse image of Wheeler transductions. We then develop a minimization theory by refining Choffrut's syntactic equivalence $\sim_f$ into a relation $\sim_f^c$, and prove a Myhill-Nerode-style theorem characterizing exactly the functions realizable by a sequential Wheeler transducer. Finally, we give a machine-independent characterization of Wheeler functions in terms of the behavior of the function. These results lay the groundwork for a broader structural theory of Wheeler transducers, and we outline open problems concerning decidability, complexity, non-deterministic extensions, and logical characterizations.
Sound complete reduction lets existing model checkers verify any number of processes with unbounded rounds.
abstractclick to expand
Traditional model-checking techniques typically verify distributed algorithms only for a fixed number of finite-state processes. Parameterized model checking generalizes this to any number of processes, while still typically assuming that each process is finite-state. In this work, we consider asynchronous round-based distributed algorithms in which each process is infinite-state since it can execute for an infinite number of rounds.
We show that the parameterized verification problem for asynchronous round-based distributed algorithms is undecidable, already for simple specifications. Nevertheless, as our main contribution, we provide a reduction to LTL model checking over finite-counter systems and prove that it is sound and complete. This enables the use of off-the-shelf, mature symbolic model checkers for finite-counter systems. We demonstrate the practical applicability of this reduction by verifying safety and liveness properties of several asynchronous round-based consensus and leader-election algorithms using the nuXmv model checker.
States keep small distance under any small probability change exactly when they are robustly bisimilar, with a fast algorithm to check it.
abstractclick to expand
The probabilistic bisimilarity distance provides a quantitative measure of behavioural difference for labelled Markov chains, but it may be discontinuous under perturbations of the transition probabilities. This lack of continuity undermines its applicability to empirically derived models, where transition probabilities are often approximations. Recently, we (CAV 2025) introduced robust probabilistic bisimilarity as a sufficient condition for continuity at distance zero. In this paper, we show that it is also a necessary condition, that is, two states are robustly probabilistic bisimilar if and only if their probabilistic bisimilarity distance is small for any small enough perturbation of the transition probabilities. We further extend robustness to non-bisimilar state pairs to establish a complete characterization for continuity of the probabilistic bisimilarity distance. Based on this characterization, we develop a polynomial time algorithm to decide continuity. Finally, we complement our theoretical contributions with an experimental evaluation demonstrating the proposed approach in practice. Our results show that the extra step of deciding continuity requires minimal additional cost when compared to computing the probabilistic bisimilarity distance.
Alternation bounds between upward and downward synchronizations decide safety on arbitrary-depth trees.
abstractclick to expand
Parameterized verification of finite-state processes with rendez-vous synchronization is notoriously undecidable when processes are linearly ordered. In this paper we study two kinds of bounds under which we determine the complexity of safety checking over tree topologies. When bounding the depth we obtain that the complexity is related to the fast growing hierarchy. Our second bound limits the alternations between upwards and downwards synchronizations in the tree (phases), and occurs naturally in many concrete settings. If we fix the number of phases then the complexity of safety checking is EXPSPACE complete, and if the number of phases is part of the input it is 2EXPSPACE complete (both for arbitrary depth).
Builds from source alone without backward steps to create structure-aligned invariants, including periodic ones for periodic systems.
abstractclick to expand
The reachability problem for Vector Addition Systems (VAS) is a central decision problem in the theory of infinite-state systems, first solved by Kosaraju and Mayr in the 1980s. An alternative, conceptually simpler approach introduced by Leroux shows that non-reachability is always witnessed by semilinear inductive invariants, yielding a decision procedure by combining an enumeration of runs with a search for such invariants. However, the construction of these invariants relies on a back-and-forth scheme that depends symmetrically on the source and the target. As a result, the invariants are not guaranteed to reflect the structural properties of the VAS, and the construction is difficult to extend to asymmetric models such as Branching VAS.
We introduce a new forward-only construction of semilinear inductive invariants for VAS. Our method builds invariants from the source configuration alone and avoids the need for backward reasoning. This yields invariants that are more canonical and better aligned with the structure of the system. In particular, our method produces periodic inductive invariants for periodic VAS.
Beyond its intrinsic interest, our approach provides a step toward extending invariant-based techniques to Branching VAS.
Generalizing the elevator structure from Buchi cases produces an algorithm with better asymptotic complexity than general Emerson-Lei method
abstractclick to expand
B\"uchi elevator automata naturally appear in several areas of formal methods as a structural expressibly-equivalent subclass of B\"uchi automata where every strongly connected component is either deterministic or inherently weak. It was shown that this class contains the majority of B\"uchi automata generated in practical applications, including LTL model-checking and verification of hyperproperties. Moreover, the elevator subclass enables more efficient complementation and determinization algorithms than unrestricted B\"uchi automata. In this paper, we introduce Emerson-Lei elevator automata, which is a generalization of B\"uchi elevator automata to richer acceptance conditions. We provide a complementation algorithm with a significantly better asymptotic complexity than the best known algorithm for unrestricted Emerson-Lei automata. The practical efficiency of our algorithm is demonstrated by an experimental comparison with the popular state-of-the-art tool Spot. Our work is, to the best of our knowledge, the first step towards practical algorithms for complementing, determinizing, and testing universality and inclusion of Emerson-Lei automata with rich acceptance conditions.
The profile observer makes deterministic and nondeterministic polynomial classes coincide inside standard P, separating structural blindness
abstractclick to expand
We introduce the \emph{observational hierarchy}, a new axis of classification for formal languages, orthogonal to the Chomsky hierarchy. An observer is a function $O : \Sigma^* \to S$ that determines which information about the input is accessible to a computational system. The order-blind automaton, which perceives the input as a multiset of symbols rather than a sequence, constitutes the paradigmatic case. We prove that the class of languages recognisable by any machine equipped with such an observer coincides exactly with the permutation-closed languages. We then define a partial order on observers that induces a hierarchy of language classes parametrised not by the computational power of the machine, but by the structure of the observer. We prove that this hierarchy has the structure of a partial order with a diamond-shaped profile sub-lattice, comprising the length branch $O_\bot \prec O_{\mathrm{len}} \prec O_{\mathrm{prof}} \prec O_\top$ and the parity branch $O_\bot \prec O_{\mathrm{par}} \prec O_{\mathrm{prof}} \prec O_\top$, with $O_{\mathrm{len}}$ and $O_{\mathrm{par}}$ incomparable, and an infinite subsequence branch $O_\bot \prec O_1 \prec O_2 \prec \cdots \prec O_\top$, both converging to the complete observer. We prove that the observational hierarchy is strictly incomparable with the Chomsky hierarchy, and introduce the notion of \emph{observational complexity} of a language. We further define observer-parametrised complexity classes $\mathbf{P}_O$ and $\mathbf{NP}_O$, and show that computational hardness and structural blindness are two independent phenomena. In particular, $\mathbf{P}_{O_{\mathrm{prof}}} = \mathbf{NP}_{O_{\mathrm{prof}}}$ holds as a structural collapse strictly inside $\mathbf{P}$.
In the past literature, families of two-way finite automata and pushdown automata having limited state complexity (i.e., the total number of inner states) and stack-state complexity (i.e., the total number of inner states multiplied by the total number of strings "pushable" to a stack), have been studied in direct connection to (mainstream) space-bounded complexity classes equipped with Karp-Lipton style advice of limited size when all inputs given to the automata have bounded length. Here, we acknowledge two major factors -- size and ceiling -- of such families, which have a significant impact on the complexity of finite and pushdown automata families, where the "size" refers to (stack-)state complexity and the "ceiling" refers to an input's length bound. In this line of study, we further explore those effects caused by different sizes and ceilings.
Parallel Communicating Finite Automata (PCFA) are systems of several finite automata that can communicate by requesting the state of another automaton. As an attempt to make PCFA as defined in [8] and [2] more realistic, the non-forgetting model is introduced where automata retain their own states while querying. As in previous publications, several variants of these automata systems are considered. The computational capacity of the non-forgetting model is investigated and compared to ''forgetting'' systems and multi-head finite automata. It is shown that most variants of non-forgetting PCFA are as powerful as multi-head automata in the deterministic and nondeterministic cases where the number of automata equals the number of heads. The only exception is the special case of deterministic centralized systems in non-returning mode. Here, a strict inclusion is proved. In the course of this proving, a proof from [2] is completed. With that, this paper answers some questions for nfPCFA that are open for the classical ''forgetting'' case.
In this paper, we continue the research on the power of contextual grammars with selection languages from subfamilies of the family of regular languages. We investigate infix-, prefix-, and suffix-free languages (referred to as idefix-free languages) and compare such language families to some other subregular families of languages (finite, monoidal, nilpotent, combinational, (symmetric) definite, ordered, non-counting, power-separating, commutative, circular, union-free, star, and comet languages). Further, we compare the families of the hierarchies obtained for external contextual grammars with the language families defined by these new types for the selection. In this way, we extend the existing hierarchies by new language families.
We introduce and study a family of two-head finite automata called two head returning finite automata (2-HRFA) operating on rectangular arrays of picture languages, in which both heads move in opposite directions. We show that the class of picture languages accepted by 2-HRFA is incomparable with the class of languages generated by context-free matrix grammars (CFMG), while it forms a proper subset of the class of languages accepted by returning pushdown automata (RPDA). In addition, we define a constrained variant, both head stepping two head returning finite automata (B2-HRFA), in which both heads are required to move in a synchronized, stepwise fashion. We prove that the class of languages accepted by returning finite automata (RFA) is a proper subset of the class of languages accepted by B2-HRFA, which in turn is a proper subset of the class of languages accepted by 2-HRFA. Closure properties for both the families of languages accepted by 2-HRFA and B2-HRFA are also investigated.
Regular expressions with counting operations (c-regexes) offer a compact representation of repeating patterns by allowing numerical bounds to be added to subexpressions. Recent work introduced the counting-set data structure, which allows simultaneous updates of multiple counter values for efficient matching. However, this approach suffers from a performance bottleneck when counting-sets must be replicated due to the presence of branching transitions. We propose a sparse counting-set approach, which reduces the replication overhead by maintaining only essential counter values, thereby yielding a more efficient matching algorithm.
Backtracking regular expression matchers are widely used due to their expressive power but may exhibit exponential worst-case matching time. Memoization provides a principled method for eliminating redundant computation and ensuring linear matching time, but full memoization is memory-intensive and impractical. We introduce the Minimum Feedback Node (MFN) memoization scheme, a selective memoization strategy based on computing a minimum feedback vertex set of an automaton. We establish relationships with existing memoization schemes and analyze their behaviour under both Thompson and Glushkov automaton constructions.
Linear upper bounds proven tight for height two and three rectangles via column analysis.
abstractclick to expand
Langton's ant is a simple two-dimensional cellular automaton whose long-term behavior exhibits remarkable complexity. While it is known that the ant eventually escapes any finite connected region of the grid, the quantitative aspects of this escape remain poorly understood. In this paper, we study the escaping time of Langton's ant, defined as the maximum number of steps the ant can perform within a finite connected domain before leaving it.
We establish general upper bounds on the escaping time as a function of the domain size, and derive improved bounds for rectangular domains. In particular, we obtain a factorial upper bound for square domains via an inductive decomposition argument. We also obtain linear upper bounds for rectangular domains of height two and three via a column-by-column analysis. More generally, for rectangular domains with a fixed height, we establish a polynomial upper bound in the number of columns. These results are complemented by exact values computed through an optimized simulation algorithm that exploits the geometric symmetries of the grid and employs a backtracking branching strategy to avoid exhaustive search over all color configurations. We also provide lower-bound constructions, proving that the linear upper bounds for rectangular domains of heights two and three are asymptotically optimal.
In this paper, we formulate a scenario that an agent can never be sure that another agent can uniquely determine the state of a finite-state automaton based on its observations to the automaton at the current and any past time as the property of order-2 bygone-state opacity. Based on our concurrent composition and the classical observer, we derive a tool to verify this property in doubly exponential time. The interest of this result lies in that we extend inference of finite automata from a single agent to two ordered agents.
Quadratic conversion to unions of deterministic automata proves hardness for determinisation and register minimisation.
abstractclick to expand
We consider weighted automata over the tropical semiring $\mathbb{Z}_\infty(min, +)$. Recently, it was shown that determinisation is decidable; in this paper we focus on the complexity when the alphabet is unary. In 2001, Lombardy showed this problem is decidable, a close inspection of his proof yields a coNP upper bound on the complexity. Earlier Gaubert showed that every weighted automaton in this setting can be effectively turned into an equivalent union of deterministic weighted automata. We prove Gaubert's result efficiently, presenting it as a generalisation of Chrobak's normal form for unary NFA. In particular, we prove that the equivalent union of deterministic weighted automata can be represented by a weighted automaton of quadratic size in the size of the original one, and this representation can be computed in polynomial time. Building on this, we show that determinisation, and even register minimisation (which generalises determinisation), is coNP-complete. We complete the paper with observations that the boundedness problem is also coNP-complete by reductions with determinisation. Lastly, we provide evidence that all of these problems are not FPT (by proving $coW_1$-hardness) when parametrised by the number of deterministic automata in the union.
Runtime verification must use ε-approximate monitors whose memory and observation needs are bounded explicitly in terms of the allowed error
abstractclick to expand
Runtime monitoring of quantitative signals faces a fundamental trade-off between volatility and over-aggregation: instantaneous observations are noisy, while long-run averages obscure local structure. Localisation measures such as discounted averages offer a principled middle ground, yet remain poorly understood in runtime verification. This paper studies discounted sums from a monitoring perspective, in both deterministic and stochastic settings. We formalize the discounted monitoring problem and show that exact, sound monitoring of discounted sums cannot be achieved with finite memory. To overcome this impossibility, we introduce $\varepsilon$-approximately sound monitoring, deriving explicit bounds on memory and observation requirements. We then extend the framework to stochastic processes via expected discounted sums, defining pointwise and uniform $(\varepsilon,\delta)$-soundness notions, establishing statistical optimality, and proving impossibility beyond a precision threshold. We also formalize the resource complexity of deterministic discounted monitoring via affine register machines and prove a tight worst-case lower bound. Finally, we present a specification language for arithmetic expressions over multiple discounted sums with synchronous and asynchronous semantics, and evaluate our approach on practical scenarios including algorithmic fairness.
The sixteenth edition of the series on non-classical automata models took place in France under University of Rouen organization.
abstractclick to expand
The Sixteenth International Workshop on Non-Classical Models of Automata and Applications (NCMA 2026) was held in Rouen, France, on June 29 and 30, 2026, organized by the University of Rouen. The NCMA workshop series was established in 2009 as an annual event for researchers working on non-classical and classical models of automata, grammars or related devices. Such models are investigated both as theoretical models and as formal models for applications from different points of view.
Bayesian update transitions and causal output distributions emerge from the general construction on Mealy machines over monads with support-
abstractclick to expand
The powerset construction is the classical determinisation procedure for nondeterministic finite automata. In the coalgebraic setting, this construction has been generalised to structured coalgebras, which are coalgebras equipped with extra data. For stochastic Moore machines over the distribution monad, a type of structured coalgebra, the determinisation construction induces a semantics assigning to each finite input word a distribution on the current output. This semantics is appropriate when only the current output matters, but it is too coarse for settings in which intermediate observations must also be taken into account, as is typical for agents solving POMDPs in control theory and reinforcement learning. In these contexts, agents need to condition on all realised observations, not just the final one, so to better plan for the future. This has been addressed from a category theoretic perspective through a procedure called ``unifilarisation'', which (in our context) takes a stochastic Mealy machine and produces a machine whose states are priors over the original state space and whose transitions are given by Bayesian filtering. Here we show that unifilarisation is an instance of coalgebraic determinisation. We work with Mealy machines over monads equipped with extra structure generalising the notion of the support of a distribution. We show that in this setting, unifilarisation arises from the general determinisation procedure. We then compare the resulting final coalgebra semantics with the Moore-style one. Instead of assigning only a distribution on current outputs to each finite input word, it yields causal stochastic behaviours, that is, families mapping input words to distributions on output words compatible with the ``causality'' constraint that outputs cannot depend on future inputs.
Business Process Management (BPM) is concerned with the systematic design, execution, monitoring, and improvement of business processes. Formal grammars have emerged as a particularly fruitful formalism for BPM, offering generative, declarative, and analytical capabilities that are uniquely well-suited to process-oriented concerns. This paper presents a systematic literature review of 34 primary studies at the intersection of formal grammars and BPM. We identify seven research streams: (i) process grammars for organizational process design; (ii) process modeling languages evaluated as grammars under the Bunge-Wand-Weber ontological framework; (iii) production-rule grammars for process structural specification and variant management; (iv) attribute grammars for the declarative specification and distributed execution of workflows; (v) graph grammars for the transformation, generation, and semantic analysis of process models; (vi) grammatical inference for process mining and discovery; and (vii) process algebras as grammar-like compositional frameworks for behavioral specification and verification. For each stream, we synthesize contributions, formalisms employed, and limitations. The review reveals that formal grammars have influenced BPM across every lifecycle phase (from organizational design to formal verification and data-driven discovery) yet the seven streams have developed largely in parallel, without cross-stream synthesis. We identify five corpus-grounded open challenges and argue that a deeper, unified exploitation of grammatical theory holds significant promise for advancing the state of the art in BPM.
Balanced product construction keeps locality constant and membership access constant-time after single-symbol changes.
abstractclick to expand
A regular language is recognized by a finite monoid, but a locally checkable explanation of that recognition can have a nontrivial update geometry. We study exact bounded-arity annotations for regular word languages under one-symbol substitutions. The cost of an edit is the number of annotation cells that a canonical locally accepted representation must change, together with the corresponding bit movement and the number of local constraints that must be revalidated. For every morphism recognizing a regular language, the balanced product annotation gives constant locality, linear size, O(log n) edit stability, O(log n) revalidation, and constant access to the membership value. The matching lower bound proved here is restricted to product decompositions that expose an edit-active nontrivial group quotient as ordered product labels; in that setting one substitution changes every quotient label on an ancestor path. We also show that annotation-free bounded-window recognition is exactly strict locality, prove closure properties for a two-sided total decision variant, and formulate the remaining constant-stability boundary as a finite obstruction problem. The ancillary files include Lean, CP-SAT, and CUDA certificates, including a context-free interval-chart experiment.
Markov jump linear systems (MJLSs) model dynamical phenomena subject to random switching among multiple linear modes, driven by an underlying Markov chain. Classical notions such as mean and mean-square stability characterize the long-term asymptotic behaviour of the first and second moments of an MJLS, but they can be overly conservative or even misleading when only a specific subset of initial conditions is of interest. We tackle this challenge through the lens of model checking, where reasoning about specific sets of initial conditions is intrinsic to the approach. We begin by formalizing probabilistic computation tree logic (PCTL) on MJLSs, enabling the specification of state-based temporal properties for these systems. Building on this foundation, we extend the logic to capture moment-based stability properties relative to a prescribed set of initial states. While we ultimately do not obtain a decision procedure for the entire base logic, the logical extensions can be handled -- albeit with some technical subtleties -- by exploiting linear-algebraic techniques.
Upper and lower bounds agree in the small-error limit, giving the exact leading-order certification cost.
abstractclick to expand
Wang~\cite{Wang2026} introduced the Stochastic-Oracle Turing Machine (SOTM) framework and defined token complexity as the minimum expected cost of interacting with a stochastic oracle needed to attain a specified solution quality for a task. This paper develops an analogous notion for certifying the reliability of a stochastic oracle on a given domain. Certification token complexity is the minimum expected token cost required, with controlled error probability, to distinguish oracles that meet a target reliability level from those that fall below a lower reliability threshold.
We construct an SPRT-based certification SOTM that queries the oracle, computes binary correctness scores, and stops when the accumulated log-likelihood evidence crosses a decision threshold. The SOTM halts almost surely, satisfies the desired two-sided error guarantee over the reliability regions to be certified, and yields an explicit upper bound on certification token complexity in terms of the reliability thresholds, the error bound, and the expected per-turn token cost. We then establish a matching information-theoretic lower bound: even with adaptive queries, every error-bounded certification SOTM must incur the same leading-order expected token cost as the SPRT-based construction as the prescribed error bound tends to zero. Together, these bounds characterize the leading-order certification token complexity in the small-error regime.
Vibe coding -- accepting LLM-generated source from natural-language intent with minimal review -- is fast and may be adequate for low-criticality consumer software. But for safety-critical systems governed by DO-178C, IEC 61508, or ISO 26262, it offers no path to certification: large language models (LLMs) provide no formal correctness guarantees, and existing remedies target verification-aware languages (Dafny, Verus, Lean) that are scarce in pretraining data and absent from industrial toolchains.
This paper closes the gap. We present Forge (Formal method Oriented Refinement loop for GEnerated code): a closed-loop pipeline that guides vibe coding through formal verification using established Model-Driven Engineering (MDE) infrastructure. Through vibe coding, we generate Java source code; our pipeline then extracts -- via model transformations -- formal artefacts in three different formalisms, each checked by a complementary verifier: deductive verification (Dafny), Communicating Sequential Processes (CSP) refinement via the Failures-Divergences Refinement checker (FDR4), and theorem proving using Z-Machines in Isabelle; every verification failure becomes a structured correction prompt that drives the next code-generation iteration. The LLM is the draft generator, the MDE chain is the discriminator, and the developer never has to read the formal models.
Empirically, we find that the pipeline produces standards-relevant verification evidence for LLM-generated Java -- a step toward certification.
Extends to unambiguous families and yields single-exponential LTL translation versus double-exponential for deterministic versions.
abstractclick to expand
Families of deterministic finite automata (FDFA) have been introduced as a concise automaton model that characterizes $\omega$-regular languages by processing their ultimately periodic words. FDFA are known to enjoy many good properties and can be exponentially more succinct than deterministic $\omega$-automata with Rabin, Streett or parity acceptance. This paper addresses two main questions: (1) Are FDFA suitable for probabilistic model checking purposes? and (2) Is it possible to obtain an even more compact representation of $\omega$-regular languages by allowing the components of an FDFA to be unambiguous instead of deterministic? Question (1) is answered in the affirmative by presenting the first polynomial-time algorithm for computing the probability that a discrete-time Markov chain satisfies an $\omega$-regular property represented as an FDFA. Question (2) is motivated by the fact that unambiguous finite automata may require exponentially fewer states than deterministic ones. This paper introduces a model of families of unambiguous finite automata (FUFA) that captures the class of $\omega$-regular languages. FUFA can be exponentially more succinct than both FDFA and unambiguous B\"uchi automata, and there is a single-exponential translation from linear temporal logic (LTL) to FUFA. This stands in contrast to a double-exponential lower bound for the translation from LTL to FDFA. Moreover, the polynomial-time probabilistic model checking algorithm for discrete-time Markov chains against FDFA-specifications is extended to the case where the property is represented by an FUFA with a deterministic leading automaton.
Synthesizing controllers for real-time systems under both timing uncertainty and adversarial environments requires exploring prohibitively large symbolic state spaces. While zone inclusion checking has been applied to Parametric Timed Games, more aggressive abstractions from the Parametric Timed Automata and Timed Games literature -- double inclusion, zone merging, hull abstractions, and location-based abstraction -- have not yet been lifted to the parametric game setting. We define a general abstraction framework for Parametric Timed Games, instantiate it with each of the aforementioned abstractions, and prove that the framework preserves correctness of parameter synthesis and winning strategies. Experimental results on an established production cell benchmark and a novel adversarial IoT case study show that the abstractions significantly improve scalability, solving instances previously intractable for existing techniques.
When label agreement probability is below one and each admits three symbols, the prefix-dependent equivalence collapses to the identity almo
abstractclick to expand
We study prefix-dependent congruences for random deterministic transition systems with state outputs. In this setting, the admissible continuations used to compare two states may depend on the observed prefix, and two states are identified only if no common admissible continuation distinguishes their future outputs. The framework includes probabilistic deterministic finite automata as a motivating special case. We analyze the random transition model in which all transition values are independent and uniform. Each state is also assigned an independent label that specifies both its output and its set of admissible symbols. If two independent labels agree with probability strictly less than one, and every label has at least three admissible symbols, then the induced congruence is trivial with high probability. The proof combines a pruning process on pairs, a collision-free exploration controlling its early evolution, and a first-moment argument showing that the remaining pairs cannot organize into nontrivial equivalence classes.
We present the AL*RTA algorithm for learning alternating real-time automata (ARTAs) using membership and equivalence queries. AL*RTA combines ideas from AL*for learning alternating finite automata and NL*RTA for learning nondeterministic real-time automata. We first define ARTAs and show that alternation improves succinctness, although it does not increase expressive power. We then present AL*RTA and show its termination. Our empirical evaluation suggests that AL*RTA generally learns smaller automata than NL*RTA at the cost of more queries.
For strings $u, D \in \Sigma^*$, a subsequence embedding of $u$ in $D$ is a function $e \colon \{1, 2, \ldots, |u|\} \to \{1, 2, \ldots, |D|\}$ with $e(i) < e(i+1)$ for every $i \in \{1, 2, \ldots, |u|-1\}$ and the $i$-th symbol of $u$ equals the $e(i)$-th symbol of $D$. A gap-constraint for $u$ is a triple $(i, j, L)$ with $1 \leq i < j \leq |u|$ and $L$ is a regular language over $\Sigma$. An embedding $e$ satisfies a gap-constraint $(i, j, L)$ if the factor of $D$ strictly between positions $e(i)$ and $e(j)$ is a word from $L$. We investigate the subsequence matching problem with gap-constraints, which is relevant in the context of complex event recognition (CER): given $u, D \in \Sigma^*$ and a set $C$ of gap-constraints, find an embedding of $u$ in $D$ that satisfies all gap-constraints from $C$.
In general, subsequence matching is NP-complete and the only known tractable variants restrict the interval structure of the gap-constraints. In this work, we show that we can solve subsequence matching with gap-constraints with an arbitrary interval structure rather efficiently (in fact, optimally under SETH) in time $O(|D| (|u| + |C|))$ if the gap-constraint languages satisfy a property which we dub left-convexity: whenever $u v w \in L$ and $v \in L$, then also $uv \in L$. Left-convex languages are sufficiently expressive to model interesting real-world scenarios considered in CER, e.g., length constraints $L = \{w \mid a \leq |w| \leq b\}$ for $a, b \in \mathbb{N}$. We also show how our algorithm can be used in order to efficiently enumerate all satisfying embeddings, which is particularly relevant for possible applications in CER. Finally, we show how non-left-convex languages can lead to intractability, i.e., if in addition to length constraints we allow $\{aa, \epsilon\}$ as the only non-left-convex constraint language, then the problem is NP-complete again.
Hypergraphic automata are automata whose state sets and output symbol sets are hypergraphs invariant under the actions of the transition and output functions. Universally attracting objects in the category of such automata are called universal hypergraphic automata; their semigroups of input symbols are algebras of mappings whose properties are tightly linked to the algebraic structure of the automata themselves. This paper establishes a complete characterisation of epimorphisms of universal hypergraphic automata and of their semigroups of input symbols. A central contribution is the introduction of two distinct notions of epimorphism for hypergraphs including weak, strong and the proof that these notions diverge in general but necessarily coincide for the important subclass of $p^*$-hypergraphs, which includes automata whose state hypergraphs and output hypergraphs are projective or affine planes. The main results give necessary and sufficient conditions for a triple $(f, \mathbb{P}_s, g)$ to be an epimorphism of universal hypergraphic automata, expressed in terms of the component maps on the state and output hypergraphs.
Two theorems bound firing error and guarantee long-term silence, making formal checks feasible on networks that were previously intractable.
abstractclick to expand
Spiking Neural Networks (SNNs) model biological neural dynamics more faithfully than classical artificial networks, but their stochastic, event-driven computation -- rooted in ion-channel noise and unreliable synaptic vesicle release -- demands probabilistic models for which deterministic abstractions are mathematically inadequate. Formal verification of such models via probabilistic model checking faces a fundamental barrier: the state space explosion problem, where the Discrete-Time Markov Chain (DTMC) encoding grows exponentially with the number of neurons. General-purpose quotient model abstractions [1] can in principle mitigate this growth by partitioning membrane potentials into equivalence classes, but a na\"ive application to SNNs discards synaptic weight information, limiting the properties that can be verified. This paper introduces a weight-discretized quotient model abstraction that maps continuous synaptic weights to a compact integer range while preserving the relative contribution of each synapse, and presents CogSpike, a unified workbench that integrates SNN design, simulation, and PRISM-based formal verification within a single isomorphic tool chain. The discretization is accompanied by formal correctness guarantees: a two-sided fidelity theorem confines any firing disagreement to a bounded gray zone around threshold, and an Asymptotic Silence theorem gives the exact limit guarantee that unforced neurons fall permanently silent. A topology-dependent scaling analysis shows that the state space reduction compounds exponentially -- approximately $17\times$ per neuron for discretization parameter $W = 3$ -- enabling verification of networks that are otherwise intractable, as confirmed empirically across seven canonical topologies.
Runtime enforcement has emerged as a promising approach for ensuring the safety of autonomous and cyber-physical systems operating in uncertain and dynamic environments. Unlike traditional runtime verification, runtime enforcement actively intervenes during execution to prevent property violations by modifying unsafe system behaviors. Existing enforcement frameworks primarily focus on untimed or discrete-time specifications and are often limited to delaying or suppressing events, making them inadequate for reactive systems exhibiting complex continuous dynamics. In this paper, we propose a runtime enforcement framework where safety requirements are modeled using Hybrid Automata (HA). The framework combines discrete-event editing with continuous-time monitoring to support enforcement actions such as suppression, delay, and insertion of events at arbitrary time instants. Upon observing environmental inputs, the automaton is initialized, and runtime reachability analysis is used to synthesize safe corrective actions. We formally define the enforcement problem for safety hybrid automata, establish enforceability conditions, and present an online enforcement algorithm for reactive systems. A detailed case study on an Adaptive Cruise Control (ACC) system demonstrates the effectiveness of the proposed approach in maintaining safety properties under unsafe controller behaviors. Experimental results show that the framework introduces minimal computational overhead while ensuring continuous compliance with safety requirements in real time.
The bandwidth of a timed language characterizes the quantity of information per time unit (with a finite observation precision $\varepsilon$). The asymptotic behavior of the bandwidth as $\varepsilon \to 0$ classifies timed regular languages in three classes: meager, normal, and obese. Normal timed automata have a bounded frequency of events and some non-punctual transitions, and, up to now, were the only class of timed automata for which no algorithm was available for computing their bandwidth. In this article, we compute the bandwidth of any such automaton in the form $\approx\alpha\log{1/\varepsilon}$. Our approach reduces this problem to computing the best reward-to-cost ratio in a weighted finite graph constructed from the given timed automaton.
Reductions from Turing non-halting locate the bi-infinite PCP at the second level of the arithmetical hierarchy.
abstractclick to expand
In the bi-infinite Post Correspondence Problem ($\Z$PCP), it is asked whether the same bi-infinite word can be constructed correspondingly from a given finite set of pairs of words. In this article, we study its complexity with respect to the arithmetical hierarchy and prove that it is in $\Si^0_2 \setminus (\Pi^0_1 \cup \Si^0_1)$ and, therefore, at the level 2 of the arithmetical hierarchy. For the proof, we present a sequence of reductions starting from the nonhalting of the Turing machine all the way to $\Z$PCP via infinite PCP, an $s$-shift infinite PCP and $s$-shift $\Z$PCP for all natural numbers $s$. In the process, we prove that the infinite PCP is undecidable for injective morphisms, and that the infinite injective PCP, $s$-shift infinite PCP, $s$-shift $\Z$PCP and the non-termination problem for (deterministic and reversible) semi-Thue systems are all $\Pi^0_1$-complete.
The last decade of research on the LOCAL model has seen tremendous progress in understanding locally checkable labeling (LCL) problems, culminating in an almost complete classification of the possible complexities LCL problems can exhibit. In particular, on undirected trees, Chang and Pettie showed that there is no LCL problem with complexity between $\omega(\log n)$ and $n^{o(1)}$ and Chang showed that, for every positive integer $k$, there is no LCL problem with complexity between $\omega(n^{1/(k+1)})$ and $o(n^{1/k})$; additionally, which side of each gap a problem is found on is decidable.
While the class of LCL problems - which, roughly speaking, consists of problems for which the correctness of a solution can be described by a finite set of allowed node configurations, which in turn can be locally verified by a constant-time algorithm - includes many important problems, it has one major restriction: problems can be defined only on bounded degree graphs, which consequently restricts all the classification and gap results mentioned above.
In this work, we propose a generalization of LCL problems to unbounded degree using Presburger monadic second-order (PMSO) formulas; more specifically, we consider what we call Local PMSO (LPMSO) problems, i.e., problems whose correct solutions are both finitely described by a PMSO formula and locally verifiable by a LOCAL algorithm in constant time - this class contains many of the important problems studied in the LOCAL model but defines them on unbounded degree graphs. As our main result we prove that, on unbounded degree rooted trees, the aforementioned $\omega(\log n)$ - $n^{o(1)}$ and $\omega(n^{1/(k+1)})$ - $o(n^{1/k})$ complexity gaps (and their decidability) extend to the class of LPMSO problems.
The model separates recognition from faithful online enforcement and pins desync failures to specific marker factors under reorder congruenc
abstractclick to expand
Layered cybersecurity pipelines transform evidence before they decide on it, and the order of those transformations determines which security facts become visible to which layer. This paper gives layer order a finite-state semantics built from a layer-order automaton, deterministic sequential security transducers, evidence markers, and a final decision automaton. The worked case is HTTP request desynchronization: front-end and back-end processors compute incompatible request boundaries, and the same trace is detected or missed according to whether framing evidence reaches the parser-differential layer before it commits. The results separate completed-trace recognition, online editing, decision synthesis, and faithful enforcement; characterize faithful online enforcement as the regular prefix-closed case under causal visibility; and show that regular policies beyond that boundary remain recognizable without becoming deployable enforcers. The framework is monolithically equivalent to finite-output deterministic edit automata, while preserving layer-local invariants such as marker birth, marker survival, and reorder-sensitive visibility. A concrete parser-pair semantics identifies the forbidden marker factor with CL.TE, TE.CL, TE.TE, and HTTP/2-downgrade boundary disagreement under the stated abstraction, and a contextual reorder congruence classifies which component permutations induce the same decision language. The result is an automata-theoretic account of order-sensitive security failures and a compositional vocabulary for auditing, synthesizing, and comparing layered enforcement pipelines.
Finite-state transducers (FSTs) are essential for modeling string rewriting in computational linguistics and natural language processing (NLP), particularly for phonological and morphological rewrite rules. Compiling general rewrite rules of the form $A \to B / L \, \_ \, R$, where $A$, $B$, $L$, and $R$ are arbitrary regular languages, is complex due to overlapping matches and context constraints. Traditional methods, such as those by Kaplan and Kay or Karttunen, rely on intricate transducer compositions with auxiliary markers. This paper presents a compact compilation scheme based on the "worsening trick'': generate all legal rewrite candidates, then filter candidates that are worse than another candidate for the same input. Implemented as the built-in rewrite compiler in PyFoma, the construction supports multiple contexts, arbitrary transductions, markup, directed rewriting, weights, and parallel rewriting. The resulting formulas are short and uniform, and where semantics coincide, they reproduce the same rule transducers as earlier approaches while remaining easier to extend. The implementation has been validated against foma on both a substantial collection of rewrite grammars and an automated regression suite covering the major rewrite modalities, with the resulting transducers matching exactly apart from state numbering.
Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodological testbed to demonstrate that standard correlational evaluation practices are inherently flawed. To enable causal analysis, we introduce the binning semiring, an algebraic object that lets us control how often a targeted property occurs in a sampled corpus. We formulate the experimental pipeline as a causal graphical model and derive decomposed Kullback-Leibler divergence metrics to measure the learnability of specific sub-tasks. Our experiments show that evaluating learnability without causal intervention leads to incorrect conclusions due to confounders in correlational analysis, and serve as a warning about correlational pitfalls in natural-language settings.
The classic paradigm of language identification in the limit models learning as a game between an adversary, who reveals strings from an unknown target language, and a learner tasked with identifying that language. The recently introduced framework of language generation in the limit shifted the objective to better reflect modern language modeling, requiring the learner to produce valid, unseen strings from the target language. Related work highlighted a fundamental tension: a broad coverage of the target often comes at the cost of validity. We introduce a new notion of precision and recast this problem as the classic recall-precision trade-off. We analyze generation in the limit under varying constraints on enumeration, novelty, and validity, aimed at reflecting settings closer to those encountered by large language models. A key contribution is our analysis of learners that are not eventually valid: we allow infinitely many mistakes, provided their frequency tends to zero so that precision remains one. We show that this relaxation can strictly increase recall when the adversary permanently withholds a large portion of the target language. We also study a continuous relaxation of the novelty constraint that requires only a fixed fraction of outputs to be novel. Taken together, our results move toward a more realistic model of language generation where occasional errors and repetitions are unavoidable, but their rates are controlled.
Complex collective dynamics in cellular automata are usually associated with local-neighborhood combinatorics, yet it remains unclear whether long-lived dynamical organization requires such explicit local interaction structure. Here, we introduce a Separable-Field Cellular Automaton (SFCA), a normalized-field cellular automaton in which local neighbor counting is replaced by a rank-one-like row-column field. Each cell is updated according to a normalized field, with survival and birth governed by two threshold intervals. Systematic scans over interval widths and positions revealed four outcome classes: extinction, fixed points, cycles, and long transients. The outcome phase diagram was organized by the relative geometry of the survival and birth intervals: fixed points dominated when born interval was contained in survival interval, whereas long transients concentrated near the boundary between partial overlap and no overlap. A fine scan along this transition showed that the long-transient region forms a narrow but persistent ridge separating two qualitatively distinct cycle-dominated regimes. One side produced dense, high-change-rate cycles approximating global period-2 alternation, whereas the other produced sparse, low-change-rate, stripe-like cycles. Damage-spreading further supported a basin-competition interpretation, in which the long-transient ridge reflects delayed selection between two cyclic attractor families rather than random nonconvergence, while finite-size analysis shows that the long-transient ridge remains robust across tested grid sizes. These results show that structured long-transient dynamics can arise under compressed separable field coupling, suggesting that nontrivial collective organization does not necessarily require full local-neighborhood combinatorics.
Parsing underpins a vast range of software engineering tasks, from compilers and static analyzers to language servers and fuzz testing tools. Yet most parsers deployed in practice are deterministic (LL or LR), forcing developers not only to contort their grammars to fit the parser, but to simplify the very languages they design sacrificing expressiveness for the sake of parseability. General context-free parsers eliminate this constraint. Yet, despite decades of algorithmic development, no rigorous head-to-head comparison exists across the major families of parsing algorithms.
We present the first unified, controlled benchmark of six generalized parsing algorithms: CYK, Valiant, Earley, GLL, RNGLR, and BRNGLR, plus deterministic LL(1) and LR(1) baselines, all implemented in Rust with shared data structures and parse-tree extraction, and evaluated across 22 grammars ranging from simple expressions to full C++ and Java. Our results show that the cost of generality is lower than widely assumed. On deterministic grammars, the GLR family incurs only a 3x median slowdown over LR(1), with a narrow and predictable variance. GLR is the clear performance winner among generalized parsers and a practical default choice for software engineering tools.
In this work we study offline reinforcement learning (RL) under temporally extended task constraints expressed in Linear Temporal Logic over finite traces (LTLf). Recently, transformer-based approaches such as Trajectory Transformers and Decision Transformers have been adopted to address RL as a sequence modeling problem. However, these methods optimize purely for reward and do not account for high-level temporal requirements. Here, we introduce a neurosymbolic framework that injects LTLf background knowledge into such transformer-based RL policies. Our approach compiles LTLf formulas into deterministic finite automata (DFAs) and integrates them into the learning process through a differentiable representation and a logic-based loss function. In particular, we derive differentiable satisfaction signals from DFA progression and use them as a regularization term during training. The resulting method is architecture-agnostic across different models. We evaluate the proposed framework on navigation environments with specification suites covering combinations of safety and reachability temporal properties. Experimental results show that incorporating background knowledge not only improves constraint satisfaction, but also maintains competitive return compared to vanilla baselines.
The scaled ratio of palindrome complexity to factor complexity tends to zero at infinity for words that are not ultimately periodic.
abstractclick to expand
Let ${\bf x} = (a_i)_{i \geq 0}$ be an infinite word over a finite alphabet $\Sigma$. Let $\rho (n)$ be the factor complexity function for $\bf x$ and ${\rm Pal}(n)$ be the palindrome complexity function for $\bf x$. We give a new relationship between these two quantities; namely, if $\bf x$ is not ultimately periodic, then $$ \lim_{n \rightarrow \infty} {{ {\rm Pal} (n) \log ({\rm Pal} (n) + 1)} \over {\rho (n)}} = 0. $$ Furthermore, we prove that the numerator in this result is essentially optimal.
An iterator for node output lets the algorithm decide returns and discards as soon as the stream guarantees the outcome.
abstractclick to expand
Streaming allows executing queries over massive JSON or XML documents whose size makes it infeasible to fully parse them into a tree. Earliest query answering is a radical approach to reducing latency and memory footprint. To minimize latency, a document node must be returned as soon as the node is guaranteed to be an answer regardless of how the document ends. Similarly, to minimize memory footprint, a node must be discarded as soon as it cannot become an answer regardless of how the document ends. For simple queries that select nodes based on the path from the root, the decision for each node can be made on the spot, but practical languages such as XPath or JSONpath support filters, which allow selecting nodes based on information collected from various parts of the document, possibly further down the stream. This makes earliest query answering a challenging task, as candidate nodes must be kept in memory until it becomes clear that they can be safely returned or discarded. We show that this can be done for all unary queries expressible in monadic second order logic (MSO), while ensuring constant update time -- provided that nodes are returned by passing a suitable iterator, rather than one by one.
State tracking exposes a sharp limitation of sequence models: the relevant signal is often not a summary of observed tokens, but an ordered latent state that evolves through non-commutative transformations. We introduce a held-out transition-pair falsifier for finite non-Abelian group tracking. The protocol forbids selected ordered generator pairs during training and requires the same local patterns during evaluation, blocking one direct local-transition memorization pathway. In a controlled $S_3 \times S_3$ benchmark, a projected recurrent state model trained only on length-8 sequences produces error-free final-state predictions (perfect 250/250 per horizon) through evaluation horizons up to 1,048,576 tokens across five seeds. Matched native-readout baselines, including bag, GRU, and a single-configuration structured state-space model, remain near floor under the same protocol. Projection-matched GRU, structured SSM, and bag baselines equipped with analogous finite-group prototype readouts also remain near chance under the same split. Mechanism diagnostics show that hard projection coincides with low homomorphism error, low state-consistency drift, and non-trivial commutator separation, while softened projection collapses final-state accuracy. Clean-split audits verify zero verbatim reduced-word overlap and zero structural-template overlap between training and evaluation partitions. The evidence is scoped to this controlled finite-group falsifier rather than to a general architecture ranking. Within that regime, explicit projected non-commutative state composition acts as a useful inductive bias for long-horizon hidden-state tracking.
We point out three inaccuracies in paper [M.V. Moreira, T.C. Jesus, and J.C. Basilio. Polynomial time verification of decentralized diagnosability of discrete event systems. IEEE Transactions on Automatic Control, 56(7):1679-1684, July 2011]. First, the authors wrongly claimed that their algorithm for verifying (co-)diagnosability of labeled finite-state automata (LFSAs) did not depend on assumptions. We give an LFSA that is not deadlock-free or divergence-free such that their algorithm cannot correctly verify its diagnosability. Because diagnosability is a special case of co-diagnosability, their algorithm cannot correctly verify co-diagnosability either when LFSAs are not deadlock-free or divergence-free. Second, they wrongly claimed that adding at each dead state an unobservable self-loop can help verifying diagnosability for an LFSA that is not deadlock-free or divergence-free, but this is wrong, because such a modification sometimes changes the diagnosability of an LFSA. Third, they wrongly claimed that their algorithm for verifying co-diagnosability ran in polynomial time. A polynomial-time algorithm unlikely exists, because the problem of verifying co-diagnosability of LFSAs is PSPACE-hard.
Symbolic automata extend classical finite-state automata to handle large or infinite alphabets by labeling transitions by predicates coming from a boolean algebra. Many results from automata theory have been lifted to this model, and it has proved its usefulness for example in multiple software verification applications. Here, we tackle the passive learning problem of identification in the limit, i.e. learning a model from a sample without access to an oracle to query. We provide an algorithm, SAI, that efficiently identifies in the limit symbolic automata over any monotonic algebra where predicates labeling transitions are of the form a <= x < b. The algorithm extends the RPNI framework for passive learning of finite-state automata to symbolic automata thanks to a new splitting operation inspired by RTI, a passive learning algorithm for deterministic real-time automata, a subclass of timed automata. The learning algorithm combines merging of states and splitting of states allowing to infer the predicates on transitions in a top-down fashion. We prove that SAI admits polynomial size characteristic samples.
Real-time systems require the careful handling of timing aspects in their models. For critical applications, this entails the use of time-aware formal methods. Currently, such formal methods only account for timing and communication layers, excluding functional aspects. Thus, they are intended to be used as a posteriori analysis methods, on systems that have already been developed.
In contrast, methods such as Event-B have been designed to build systems incrementally using a correct-by-construction approach, but are not equipped with the ability to express timing aspects and constraints.
We propose a non-intrusive, tool-supported embedding of time and clocks in Event-B inspired by the features and semantics of timed automata. This enables the design of complex real-time systems while benefiting from the entire ecosystem and tooling support of the method. Refinement is extended to also take time into account, making it possible to design complex systems gradually in a correct-by-construction manner while integrating timing aspects from the top level.
The embedding and associated methodology are illustrated on a case study, showcasing both how timed Event-B models may be derived from timed automata, how the extended expressivity of first-order logic and set theory at the core of Event-B enables finer modelling, and how timed refinement may be used to establish complex timing properties.
The pairs of starting indices yielding identical segments of length s are described and their quantity is computed exactly.
abstractclick to expand
In this article, we consider the words with cyclic indices. For given $s$, we consider the pair $(\iota,\kappa)$ of indices such that the word of length $s$ from $\iota$ is equal to the word of length $s$ from $\kappa$. We give a characterization of such pairs for a cyclic Fibonacci word, and give the number of them.
What formal languages can a recurrent neural language model recognize? Formal results in the literature conflict: some authors report Turing-completeness, while others show equivalence to regular languages. The reason for this discrepancy is that the underlying arithmetic model differs. The paper develops a unified algebraic account of the expressivity of recurrent neural networks, starting with a formal account of various arithmetic models. This account reduces expressivity to an algebraic question, e.g., whether a network's syntactic monoid divides a certain wreath product. As a case study, the paper revisits diagonal state-space models: the same architecture cannot implement an even-modulus counter once floating-point recurrences are enforced, yet realizes every even-modulus counter under unsigned-integer quantization.
We introduce Grid Programs, a novel model of computation in which programs are finite two-dimensional arrangements of instructions on an integer grid rather than linear sequences of statements. Three properties distinguish this model fundamentally from classical frameworks: (i) programs are planar structures through which an instruction pointer moves in the four cardinal directions; (ii) there are no syntax constraints, so any assignment of instructions to grid cells constitutes a valid program; and (iii) the model uses no named variables or explicit memory addresses. Program state is maintained through a data stack, an address stack, and a circularly doubly linked list accessed via three named pointers. Control flow is achieved spatially, with branching encoded as perpendicular turns of the instruction pointer. The address stack stores triplets (cell row, cell column, direction), enabling precise restoration of both position and heading after branches, loops, and function calls. We give a formal operational semantics, present a representative instruction set covering arithmetic, control flow, and linked-list manipulation, and work through several detailed examples, including an absolute-value function, a factorial computation, a linear-search algorithm, a string-reversal program, and a while-loop summation. We establish that Grid Programs are Turing-complete by simulating an arbitrary register machine, and we discuss their relationship to prior two-dimensional languages such as Befunge and Funge-98, to stack-based languages such as Forth and PostScript, and to dataflow and spatial computation models. Grid Programs offer a fresh vantage point for exploring the design space of computation, with potential applications in visual programming environments, cellular-automaton-inspired hardware, and obfuscation-resistant code.
Co-transcriptional splicing generates RNA sequences from a DNA template by deleting subsequences nondeterministically. Recent work showed how to encode an NFA into such a template, but the construction requires deleting subsequences whose length grows with the distance between states, which makes such deletions unlikely under the local nature of co-transcriptional splicing. We introduce $k$-bandwidth NFAs, in which transitions span at most $k$ states. These automata form a strict hierarchy of language classes. For finite languages, bandwidth $2$ suffices, and bandwidth $1$ can be decided in polynomial-time when the language is presented as a list of words. Minimizing the bandwidth is NP-hard even for fixed $k \geq 2$.
Recent work describes what transformers can and cannot compute through connections to boolean circuits, but existing results lack exact characterizations and are sensitive to modeling choices. Padded transformers -- to whose input filler symbols such as ``...'' are appended -- emerge as a useful gadget for establishing equivalences to circuit classes by providing polynomial space for adaptive parallel computation. However, only a limited set of padded transformer idealizations has been studied, leaving open how robustly these equivalences hold under changes to attention type, model width, and uniformity. We find that, under practical assumptions, padded transformers are surprisingly robust to all of these, and identify numeric precision and model depth as the main factors affecting expressivity. Concretely, we prove that polynomially padded $\text{L-uniform}$ constant-precision transformers are equivalent to $\text{L-uniform AC}^0$, while growing-precision ones achieve $\text{L-uniform TC}^0$ regardless of width. Furthermore, looping enables sequential processing analogous to circuits: $\log^d N$-looped constant-precision transformers reach $\text{FO-uniform AC}^d$, and growing-precision ones reach $\text{FO-uniform TC}^d$. Interestingly, growing width or precision beyond logarithmic does not increase expressivity, and all our results hold for both softmax and average hard attention transformers.
In the setting of multi-head finite-state dimensions, trailing heads lag behind a leading head, accessing past data to aid a finite-state gambler placing bets on successive bits read by the leading head. Cruz, Glashausser, Li, and Lutz (2026) proved that, for any fixed number of trailing heads, adaptive (data-dependent) movement rules can strictly outperform oblivious (data-independent) movement schedules. In this paper we strengthen that separation by proving that a single trailing head with adaptive movements can outperform, by a large and uniform margin, arbitrarily many trailing heads with oblivious movements. Formally, our main theorem states that there is a binary sequence whose adaptive two-head finite-state strong dimension is less than its oblivious multi-head finite-state dimension, and that the gap is greater than 0.3.
The longest inclusion chain length for any pattern may equal 2 times length minus variables minus one, but this is unproven and the general
abstractclick to expand
Pattern languages are a classical model in formal language theory and algorithmic learning theory. This note formulates the problem of computing the inclusion depth of a pattern language: the length of the longest strict inclusion chain from the universal pattern language to the language generated by a given pattern. Inclusion depth captures the mind-change complexity of pattern identification from positive data. The central open question is whether the inclusion depth ID_Sigma(p) is computable for every pattern p over every finite alphabet Sigma with at least two symbols, and whether it is computable in polynomial time. A simple conjectured formula, ID_Sigma(p) = 2|p| - #var(p) - 1, would imply a linear-time algorithm. The problem connects pattern language inclusion, combinatorics on words, language identification in the limit, and mind-change-bounded learning.
The smallest number that guarantees an overlap pattern in any concatenation over two letters, while nine is possible and three letters allow
abstractclick to expand
We prove that every concatenation of $10$ or more binary squares contains an overlap. The bound $10$ is best possible. In contrast, over a ternary alphabet, there are infinitely long overlap-free words that consist of a concatenation of squares.
Quantifying worst-case and best-case performance under complex market scenarios is a persistent challenge in financial risk management and the verification of path-dependent financial instruments, such as exotic options and structured products. Simulation-based methods are well suited for probabilistic estimation, but they do not directly provide exhaustive guarantees over all admissible scenarios or explicit witnesses for extremal outcomes.
To address this, we introduce a quantitative automata-based framework for the exact extremal analysis of financial systems under declarative scenario constraints. At the core of our approach are event history automata (EHAs), a new formal model that integrates regular-expression event patterns with admissible numerical intervals to represent constrained event histories with memory. Quantitative payoffs are represented by weighted finance finite automata (WFFAs), which allow transition weights to depend on observed market values. By computing the synchronized product of EHAs and WFFAs, our framework enables the exact calculation of upper and lower payoff bounds. Furthermore, the method automatically extracts interpretable witness event histories that realize these extremal outcomes.
We demonstrate the practical viability of the approach through a case study of an autocallable structured product with path-dependent mechanisms. The case study analyzes how different scenario constraints affect coupon accumulation, early redemption, and protection-loss outcomes. Scalability experiments indicate that the framework's execution remains computationally feasible for practical contract horizons and nontrivial constraint configurations. Overall, this approach provides a mathematically rigorous complement to standard financial simulation methods.
Extreme rays of the dual cone classify residuals into sign cells to produce canonical carriers from order cone geometry on the residual span
abstractclick to expand
We study the observation congruences induced by rational polyhedral cones on vector-valued quantitative languages. The extreme rays of the dual cone define intrinsic covectors, and these covectors classify every incremental residual future by a finite sign cell: negative, tight, or positive along each extremal Farkas direction. The resulting carrier is the right-stable carrier of this cone-induced observation family, whose source is canonical: the restricted covector geometry of the order cone on the residual span of the language. We organize this construction through an observation-refinement correspondence, a cone-refinement calculus, and a separation between the qualitative conic observation quotient and the numerical residual carrier needed for potential certificates. A bounded-horizon fragment is fully computable by enumeration of accumulated futures, and reproducible evaluation runs show how the conic layer detects qualitative obstruction cells before numerical refinement.
The equivalence turns defect-budget verification into a standard acceptance question on a nearby tree.
abstractclick to expand
Automata acceptance can, in several situations of interest, be captured game-theoretically via acceptance games. The existence of a winning strategy for Verifier then captures the existence of a winning run-tree of a given automaton over a model. However, such acceptance is rigid, in that it does not allow a measurable defect budget, which can be a challenge in software verification. In this paper, we draw inspiration from how bisimulation distance can be defined as an extension of bisimilarity to define epsilon-acceptance games. Our main theorem shows that a tree T is epsilon-accepted iff there is a tree T' that is accepted in the traditional (rigid) sense and the bisimulation distance of T' and T is at most epsilon. Our work also suggests a strong connection with measure theory, of which we give a preliminary exploration via appropriate examples. Our framework is defined over binary trees with leaves and infinite branches, and strictly contains the case in which binary nodes are seen as probabilistic choice and the defect measures the probability of the set of rejected branches.
This paper investigates the new notion of $2$-word-$\pi$-repre\-sentable graphs: the nodes of the graph correspond to the letters of the two words and there exists an edge between two nodes if the projections of any two letters of both words are equal. The benefit of not only using one word for a representation as introduced by Kitaev and Pyatkin is that every graph is $2$-word-$\pi$-representable. We present an algorithm that returns two representing words for any graph. Aside, we show that every permutation graph is representable by two $1$-uniform words and give constructions how graph operations on $2$-word-$\pi$-representable graphs can be realised on their representing words which give further insights into the representation of cographs.
It is well known that liveness properties cannot be proven using standard simulation arguments. This issue has been mitigated by extending standard notions of simulation for transition systems to fairness-preserving simulations for systems equipped with an additional fairness condition modeling liveness assumptions and/or liveness requirements. In the context of automated verification of finite-state systems, proofs by simulation are an appealing method as there exist efficient algorithms to find a simulation between two systems. However, applications of fair simulation to interactive verification have been much less studied. Perhaps one reason is that the definitions of fair simulation relations typically involve non-trivial nestings of inductive and coinductive relations, making them particularly difficult to use and to reason about. In this paper, we argue that in many cases, stronger notions of fair simulation involving more controlled alternations of fixed points are sufficient. Starting from known fair simulation techniques, we progressively build up a family of almost fair simulation relations for transition systems equipped with a Buechi fairness condition. The simulation relations we present can all be equipped with intuitive reasoning rules, leading to elegant deductive systems to prove fair trace inclusion. We mechanized our simulation relations and their associated deductive systems in the Rocq proof assistant, proved their soundness, and we demonstrate their use through a selection of examples.
The algebraic view generalizes McNaughton-Papert by tying first-order definability of concurrent languages to aperiodic categories.
abstractclick to expand
Higher-dimensional automata (HDA) are a model of concurrency that models simultaneous execution of events using higher dimensional cells. HDA recognize languages of pomsets, a generalization of finite words whose letters are partially ordered. We prove a new algebraic characterization of HDA languages: a language of pomsets is regular if and only if it is the inverse image of a functor from the category of pomsets into a finite category. Furthermore, the language is definable in first-order logic exactly when it is recognized by an aperiodic category, generalizing the McNaughton-Papert theorem to HDA languages. We also investigate a notion of counter-free HDA, and show that if a language is accepted by a counter-free HDA, it must be definable in first-order logic. The converse, however, is still open.
This note presents AGDES, a tool for Automatic Generation of Dependent Event Sequences. Each event sequence is either generated as the output word of a deterministic finite automaton (DFA), or produced as the output word of a DFA called transducer that reads events from one or more input sequences, and produces an output sequence.
Chains of equivariant subspaces in its free vector spaces are bounded, extending earlier results for sets and orders.
abstractclick to expand
An infinite structure has the finite length property (over a given field) if, for each of its finite powers, chains of equivariant subspaces in the corresponding free vector space are bounded in length. Prior work showed that the countable pure set and the countable dense linear order without endpoints have this property. We generalise these results to (a) any structure approximated by finite substructures with few orbits, provided the field is of characteristic zero, and (b) any Fra\"iss\'e limit with free amalgamation in a finite vocabulary consisting of unary and binary relations, possibly expanded with a generic total order. As a special case, we deduce the finite length property of the Rado graph using both methods. We also describe some connections with function spaces, weighted register automata, and orbit-finite systems of linear equations.
Stochastic cyber-physical systems (CPS) permeate critical infrastructure, from autonomous vehicles to medical devices. Yet, tools for runtime verification of such systems capturing the probabilistic dynamics in stochastic systems remain generally absent despite theoretical foundations established nearly a decade ago. In this paper, we present SENTIL, a novel runtime verification tool with provable statistical guarantees for the runtime monitoring of requirements expressed as Probabilistic Signal Temporal Logic (PrSTL). SENTIL combines an efficient Rust core with universal ecosystem integration, delivering performance exceeding existing deterministic monitors while providing rigorous probabilistic guarantees through statistical model checking, sequential probability ratio testing, and adaptive rare event estimation. SENTIL employs streaming algorithms for incremental robustness computation, parallel Monte Carlo sampling, and a language-agnostic C-ABI enabling seamless deployment across ROS, Apollo, MATLAB Simulink, and AUTOSAR platforms, and direct integration in C, C++, Python, and Java. To validate the effectiveness of the proposed tool, we validate SENTIL across various scenarios spanning autonomous vehicle monitoring, medical device validation, and biological networks, demonstrating 10-1,000$\times$ performance improvements over existing tools while maintaining provable confidence intervals. SENTIL is open source (\href{https://github.com/sedislab/SENTIL}{\texttt{sedislab/SENTIL}}) and it positions probabilistic runtime verification as a deployable infrastructure for all real-world safety-critical stochastic systems.