RE2DFA README: -------------- STEPHEN ROLLER scroller Instructions to use: -------------------- cd into the directory containing re2dfa.py eos% add python24 eos% python re2dfa.py Verbose output will be put in output.txt. The minimized DFA will be print to stdout. The program will then allow input of words. Each will be accepted and rejected. Press control-c to exit. VALID CHARACTERS: a b u U o O * EXAMPLE: eos% python re2dfa.py Enter regex: oo*bb*a Infix: (a*) o (b o (b*)) Output in output.txt. Final DFA: Q = {1, 2, 3} q0 = 3 F = {2} d = 1: a -> 1 1: b -> 2 2: b -> 2 2: b -> 2 3: a -> 1 3: b -> 2 (^c to exit) input?: Reject (^c to exit) input?: aa Reject (^c to exit) input?: bbbb Accept (^c to exit) input?: aaabb Accept (^c to exit) input?: ^C eos% **Make sure to check output.txt for the verbose output.** DESIGN DOCUMENT: ---------------- The program begins by reading in the regex in prefix form and converts it to a tree in the form of lists. For example, U a b gets parsed as ['U', 'a', 'b'] and oa*Uab gets parsed as ['o', ['*', 'a'], ['U', 'a', 'b']]. This tree is then traversed in postfix fashion, "compiling" each node as it goes. Nodes in the alphabet are compiled directly into a two state DFA. Stars, unions and concatenations are performed by merging the sets between the two NFAs and adding all the appropriate transition functions. A compiled NFA has 5 members: Q, F, q0, d and re. Q is the set of all states for the NFA. F is the set of all final states. q0 is the starting state, and d is a dictionary of delta functions. re is the regular expression that this NFA recognizes and is used for output only. The dictionary looks up all the delta functions by current state and gives them as a list of tuples. For example, if state 1 can move to state 2 via 'a', d would look like {'1': [('a', 2)]}. Null transitions are specified by using an empty string, ''. Along each step of the compilation, the intermediate NFA is printed to output. After the tree is completely compiled, the NFA is then converted to a DFA by running the toDFA function. This function works just as explain in class: it uses null closures to emulate going through all null transitions at once. This can create up to 2**n-1 new states. This new DFA is now efficient, but not minimal. simplify() is run on the DFA, which causes all equivalent states to merge. The DFA is now minimal, but the names of each state are still ugly. They were formed by concatenating state names from the NFA during the conversion. For this, renumber() is executed, which gives each state a nicer name. This minimized and beautified DFA is then printed to output. At this point, the program will prompt you for input. This allows you to feed words into the DFA, which it will either accept or reject. (It wouldn't be a very good regex program if it couldn't actually match stuff.) To exit, just press ctrl-C. Each of the basic regular expression operations are found in the functions mempty, mnone, exact, union, star, and concat. Without the parse tree, these could be used alone to create an NFA: stephen regex $ python Python 2.5.1 (r251:54863, Jan 17 2008, 19:35:17) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import re2dfa >>> dfa = re2dfa.union(re2dfa.exact('a'), re2dfa.exact('b')).toDFA() >>> dfa.accepts('a') True >>> dfa.accepts('b') True >>> dfa.accepts('') False Finally, there are two classes, NFA and DFA. NFA.toDFA() creates a DFA class, which inherits from NFA. DFA has a method called accepts, which allows the machine to test against a word. In order to avoid support for branching, NFA does not have this ability. The rest of the code is just helper functions.