Regular Splicing Languages, constants and synchronizing words

Paola Bonizzoni (Università degli Studi di Milano-Bicocca)

giovedì 26 gennaio 2012 - 16.00 Sala Seminari del DISCO (I piano)



Since its introduction, splicing, as an operation on words has been shown to be intimately connected to a wide range of concepts and notions in formal language theory. The motivation for the splicing operation comes from molecular biology. Certain restriction enzymes recognize a given sequence of nucleotides in the DNA and cleave the molecule leaving short overhang. Then molecules with complementary overhangs can be ligated and produce cross-over in the molecules. In the original paper, T. Head asked the question about the generative power of the splicing operation. There, he pointed out a connection between splicing and a well known notion of a constant in a language (introduced by Schutzenberger) and proved that when the overhangs left by the enzymes are constants, then the set of molecules generated by a finite set of enzymes represents a strictly locally testable language.

Formally, a splicing system (or H -system) is a triple H = (A, I , R), where A is a finite alphabet, I ⊆ A is the initial language and R ⊆ (A*)^4  is the set of rules. Splicing by a rule r ∈ R, denoted r = (u1 |u2 , u3 |u4 ) is obtained when r is applied to two words x1 u1 u2 x2 and y1 u3 u4 y2 producing words x1 u1 u4 y2 and y1 u3 u2 x2 . The formal language generated by the splicing system is the smallest language containing I and closed under all possible splicing defined by R. This definition, reformulated by G. Paun, is nowadays a standard. 

Theoretical results in splicing systems theory have contributed to the growth of a new research field in formal language theory focused on the modeling of biochemical processes and systems. It was proven that, given a finite initial set of words and a finite set of rules, the language generated by the splicing system is regular. However, there are simple examples of regular languages (for example (aa)*) that cannot be generated by any splicing system. Unfortunately, characterization of the class of regular splicing languages remains elusive.

There have been successes in characterizing certain subclasses of splicing languages, for example those generated by reflexive rules (if (u1 |u2 , u3 |u4 ) is in R then both (u1 |u2 , u1 |u2 ) and (u3 |u4 , u3 |u4 ) are in R) and those generated by symmetric rules (if (u1 |u2 , u3 |u4 ) is in R then (u3 |u4 , u1 |u2) is in R). Reflexivity and symmetry are natural properties for splicing systems.

An example of regular splicing language that is neither reflexive nor symmetric was provided, and it had been proven that it is decidable whether a given regular language is a reflexive splicing language. A quite different characterization of reflexive symmetric splicing languages was given and it had been extended to the general class of reflexive regular languages. Not surprisingly, in this characterization the notion of constant plays an essential rule. 

It seems that the notion of a constant may play vital role in solving the open problem of characterizing of the whole class of regular splicing languages. Indeed, it has been conjectured as folklore that a necessary condition for a regular language to be splicing is that it must have a constant. Recently, we solved this question by proving that the conjecture is true. In this talk we briefly sketch the main steps of the proof and point out the strict connection with synchronizing words and properties of the minimal finite state automaton for a regular splicing language.


(This talk is based on a joint work with Natasha Jonoska, University of South Florida -Tampa-)