%include Prolog \Title{Ein einfacher Übersetzer: |Replace|} \begin{document} \maketitle %------------------------------------------------------------------------------ \section{Einführung} %------------------------------------------------------------------------------ |Replace| ist ein einfaches Beispiel für einen Übersetzer: das Program überführt ein Skript der Form (Muster/Ersetzung) % \begin{verbatim} he/er she/sie his/sein hers/ihres \end{verbatim} % in ein C-Programm % \begin{verbatim} #include #include int main() { int c = getchar(); state: switch (c) { case EOF: exit(EXIT_SUCCESS); <..> case 'h': c = getchar(); goto stateh; <..> } <..> stateh: switch (c) { case 'e': c = getchar(); goto statehe; case 'i': c = getchar(); goto statehi; default: putchar('h'); goto state; } statehe: switch (c) { case 'r': c = getchar(); goto stateher; default: putchar('e'); putchar('r'); goto state; } <..> } \end{verbatim} `Longest and first match' Semantik: |"xx"|, |"xx"| (nicht |"xrsx"|), |"xrsx"| (nicht |"xsx"|; diese Ersetzung läßt sich erreichen, wenn man die Regel |"shers/siers"| ergänzt). Ist die Ersetzung |"/dot"| zulässig (leere linke Seite)? \paragraph{Übersetzung und Anwendung} \begin{verbatim} ghci -Wall -package rh Replace.lhs ghc --make -Wall -o replace -package rh Replace.lhs ./replace table gcc -Wall table.c ./a.out gcc -Wall -S table.c \end{verbatim} \paragraph{Literatur} D.E. Knuth, J.H. Morris, V.R. Pratt: Fast Pattern Matching in Strings. SIAM Journal of Computing 6, 2, 323--350 (1977). A.V. Aho, M.J. Corasick: Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM 18, 6, 333--340 (1975). %if False %------------------------------------------------------------------------------ \section{Module header} %------------------------------------------------------------------------------ > import Char > import List (sort) > import System > > import Prettier hiding (concat) > import qualified Prettier as Pretty > import Unique (unique) %endif %------------------------------------------------------------------------------ \section{Source language: abstract and concrete syntax} %------------------------------------------------------------------------------ Abstract syntax. > type Script = [(String, String)] Concrete to abstract syntax. > script :: String -> Script > script x = [ (r, s) | l <- lines x, let (r, '/' : s) = break (=='/') l ] \todo{Escape Mechanismus für Ersetzungen, die |'/'| beinhalten.} > ascii :: [Char] > ascii = ['\NUL' .. '\127'] %------------------------------------------------------------------------------ \section{Intermediate representation} %------------------------------------------------------------------------------ Tries with failure links (algorithm AC). > data Trie = Root [(Char, Trie)] > | Node String [(Char, Trie)] Action Trie > deriving (Show) A |Node| contains a label (the prefix of a pattern), a list of labelled edges and a failure link (possibly including a substitution). > label :: Trie -> String > label (Root _ ) = "" > label (Node p _ _ _ ) = p > > succs :: Trie -> [(Char, Trie)] > succs (Root ts ) = ts > succs (Node _ ts _ _ ) = ts > > data Action = Skip | Replace String String > deriving (Show) > instance Pretty Trie where > prettyPrec d (Root ts) = condParens (d > 9) (string "Root" <> sp <> block 5 (pretty ts)) > prettyPrec d (Node s ts a f) = condParens (d > 9) (string "Node" <> sp <> block 5 > (pretty s pretty ts prettyPrec 10 a prettyLink f)) > > prettyLink :: Trie -> Doc > prettyLink (Root _) = string "@root" > prettyLink (Node s _ _ _) = string ("@" ++ s) > > instance Pretty Action %------------------------------------------------------------------------------ \section{Translation to intermediate representation} %------------------------------------------------------------------------------ Abstract syntax to intermediate representation. > trie :: Script -> Trie > trie scr = root > where > root = Root [ (c, make [c] (c .\ scr) Skip root) | c <- ascii ] -- oder: |['a' .. 'z']| > > make p r a f = Node p [ (c, make (p ++ [c]) (c .\ r) a' (follow c f')) | c <- first r ] a' f' > where (a', f') = (lookup [] r `use` \ y -> (Replace p y, root)) ? (a, f) The nested function |make| takes four arguments: a prefix, a script (regulärer Ausdruck), and a failure link. In a success node (|lookup [] r = Just y|), the failure link points to the root (the search starts afresh). Follow the failure link ($\epsilon$-closure). > follow :: Char -> Trie -> Trie > follow c root@(Root ts) = lookup c ts ? root > follow c (Node _ ts _ f) = lookup c ts ? follow c f Division operator (or derivative). > (.\) :: Char -> Script -> Script > c .\ r = [ (as, s) | (a : as, s) <- r, a == c ] First set. > first :: Script -> [Char] > first r = unique (sort [ a | (a : _, _) <- r ]) %------------------------------------------------------------------------------ \section{Target language: abstract and concrete syntax} %------------------------------------------------------------------------------ Abstract C. > data Decl = Include String -- include directive > | Function Type String [Type] [Decl] [Stat] -- function declaration > | Var Type String -- variable declarations > deriving (Show) \todo{|pretty| statt |prettyPrec _d|}. > instance Pretty Decl where > pretty (Include s) = string "#include" <> sp <> string s <> char '\n' > pretty (Function t x ts ds ss) > = group ( prettyPrec 0 t <> sp <> string x <> parens (prettySequ 2 ts) <> nl > <> pretty (Compound ds ss)) > pretty (Var t x) = group (prettyPrec 0 t <> nl <> string x <> char ';') > data Stat = ExprC Expr -- expression statement > | Goto String -- goto and labelled statements > | Label String Stat > | Case Expr Stat > | Default Stat > | Break -- break statement > | Switch Expr Stat -- switch statement > | Compound [Decl] [Stat] -- compound statement > deriving (Show) Smart constructor. > compound [s] = s > compound ss = Compound [] ss > instance Pretty Stat where > pretty (ExprC e) = prettyPrec 0 e <> char ';' > pretty (Goto l) = group (string "goto" <> nl <> string l <> char ';') > pretty (Label l s) = group (string l <> char ':' <> nest 2 (nl <> pretty s)) > pretty (Case e s) = group ( string "case" <> sp <> prettyPrec 0 e <> char ':' > <> nest 2 (nl <> pretty s)) > pretty (Default s) = group (string "default" <> char ':' <> nest 2 (nl <> pretty s)) > pretty (Break) = string "break" <> char ';' > pretty (Switch e s) = group ( string "switch" <> sp <> parens (prettyPrec 0 e) > <> nest 2 (nl <> pretty s)) > pretty (Compound ds ss) > = group ( char '{' > <> nest 2 (Pretty.concat [ nl <> pretty s | s <- ds ]) > <> nest 2 (Pretty.concat [ nl <> pretty s | s <- ss ]) > <> nl <> char '}') > data Expr = Const Char -- constant expressions > | IdentC String -- identifier > | Assign Expr Expr -- assignment > | Call Expr [Expr] -- function call > deriving (Show) Smart constructor. > call x es = Call (IdentC x) es > instance Pretty Expr where > prettyPrec _ (Const c) = string (cChar c) > prettyPrec _ (IdentC x) = string x > prettyPrec d (Assign e1 e2) = condParens (d > 1) ( group (prettyPrec 2 e1 <> nl <> char '=' > <> nl <> prettyPrec 1 e2)) > prettyPrec _ (Call e es) = group (prettyPrec 10 e <> parens (prettySequ 2 es)) \todo{Does this really generate a legal C character?} > cChar :: Char -> String > cChar c > | c < ' ' || '~' < c || c `elem` "\\'" = "'\\x" ++ hex (fromEnum c) ++ "'" > | otherwise = "'" ++ [c] ++ "'" > data Type = Void > | IntC > deriving (Show) > instance Pretty Type where > prettyPrec _ Void = string "void" > prettyPrec _ IntC = string "int" > prettyC :: [Decl] -> String > prettyC cProgram = render (Page 40) (Pretty.concat (map pretty cProgram) <> char '\n') %------------------------------------------------------------------------------ \section{Translation to target language} %------------------------------------------------------------------------------ Intermediate representation to target language. > abstractC :: Trie -> [Decl] > abstractC t = [ Include "" > , Include "" > , Function IntC "main" [] > [ Var IntC "c" ] > ( [ nextchar, Goto (cName "") ] ++ > [ code u | u <- states t ]) > ] > states :: Trie -> [Trie] > states t = t : [ s | (_, t') <- succs t, s <- states t' ] > code :: Trie -> Stat > code (Root ts) > = Label (cName "") ( > Switch (IdentC "c") (compound ( > [ Case (IdentC "EOF") (ExprC (call "exit" [IdentC "EXIT_SUCCESS"])) ] -- |EOF = '\255'| > ++ [ Case (Const c) (compound [nextchar, Goto (state t)]) > | (c, t) <- ts ] ))) > code (Node p ts a f) > = Label (cName p) ( > Switch (IdentC "c") (compound ( > [ Case (Const c) (compound [nextchar, Goto (state t)]) > | (c, t) <- ts ] > ++ [ Default (compound (exec a ++ [ Goto (state f) ])) ] ))) > where > p' = label f > exec Skip = puts (revTake (length p') p) > exec (Replace x y) = puts (y ++ revTake (length p') (drop (length x) p)) Ausgabe: |y ++ x .\ p ./ p'|, also |p| ohne den Präfix |x| und ohne den Suffix |p'|. > state :: Trie -> String > state t = cName (label t) > cName :: String -> String > cName s = "state" ++ concatMap mangle s > where > mangle c | 'A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || isDigit c = [c] > | otherwise = '_' : show (fromEnum c) > nextchar = ExprC (Assign (IdentC "c") (call "getchar" [])) > putchar e = ExprC (call "putchar" [e]) > puts s = [ putchar (Const c) | c <- s ] %------------------------------------------------------------------------------ \section{Helper functions} %------------------------------------------------------------------------------ Exception handling: > (?) :: Maybe aT -> aT -> aT > Nothing ? b = b > Just a ? _ = a A variant of |fmap|: > use :: Maybe aT -> (aT -> bT) -> Maybe bT > use (Nothing) _ = Nothing > use (Just a) f = Just (f a) > prettySequ :: (Pretty aT) => > Int -> [aT] -> Doc > prettySequ d as = intersperse (char ',' <> nl) [ prettyPrec d a | a <- as ] > hex :: Int -> String > hex = map digit . pad 2 . reverse . digits 16 Digits of a \emph{natural} number (for a given base). > digits :: (Integral aT) => > aT -> aT -> [aT] > digits _ 0 = [] > digits base n = r : digits base q > where (q, r) = divMod n base > digit :: Int -> Char > digit n > | n < 10 = toEnum (n + fromEnum '0') > | otherwise = toEnum (n - 10 + fromEnum 'A') > pad :: (Num aT) => Int -> [aT] -> [aT] > pad n x = replicate (n - length x) 0 ++ x > revTake :: Int -> [aT] -> [aT] > revTake n x = take (length x - n) x %------------------------------------------------------------------------------ \section{Test suite} %------------------------------------------------------------------------------ > pp :: (Pretty aT) => aT -> IO () > pp a = putStrLn (render (Page 60) (pretty a)) > ex1 = [("he", "er"), ("she", "sie"), ("his", "sein"), ("hers", "ihres")] > > test r = prettyC (abstractC (trie r)) \begin{verbatim} putStrLn (test ex1) putStrLn (test [("", ".")]) putStrLn (test [(".&", "urgs")]) putStrLn (test [("hello world", "Hallo!")]) writeFile "xx.c" (test ex1) gcc -Wall xx.c \end{verbatim} Test der einzelnen Phasen: < let inp = "he/er\nshe/sie\nhis/sein\nhers/ihres" < putStrLn inp < let src = script inp < src < let ir = trie src < pp ir < let trg = abstractC ir < let out = prettyC trg < putStrLn out %------------------------------------------------------------------------------ \section{Putting it all together} %------------------------------------------------------------------------------ > compile :: String -> String > compile inp = out > where src = script inp -- Syntaxanalyse > ir = trie src -- Generierung von Zwischencode > trg = abstractC ir -- Generierung von C > out = prettyC trg -- Ausgabe von C (`pretty printing') > main :: IO () > main = do args <- getArgs > case args of > [filename] -> do contents <- readFile filename > let cSource = compile contents > writeFile (filename ++ ".c") cSource > -- |system ("gcc -Wall " ++ filename ++ ".c")| > exitWith ExitSuccess > _ -> do putStrLn "usage: replace