Blogged by Ujihisa. Standard methods of programming and thoughts including Clojure, Vim, LLVM, Haskell, Ruby and Mathematics written by a Japanese programmer. github/ujihisa

Monday, December 26, 2011

Compiling a Language to Whitespace

Whitespace is a widely-known instruction set on a stack-machine virtual machine.

The below is a sample hello world in Whitespace intermediate language. I added some notes that begin with #.

Push 0
Push 72
Store
# storing the value 72 to 0th heap
Push 1
Push 101
Store
# storing the value 101 to 1st heap
Push 2
Push 108
Store
Push 3
Push 108
Store
Push 4
Push 111
Store
Push 5
Push 44
Store
Push 6
Push 32
Store
Push 7
Push 119
Store
Push 8
Push 111
Store
Push 9
Push 114
Store
Push 10
Push 108
Store
Push 11
Push 100
Store
Push 12
Push 32
Store
Push 13
Push 111
Store
Push 14
Push 102
Store
Push 15
Push 32
Store
Push 16
Push 115
Store
Push 17
Push 112
Store
Push 18
Push 97
Store
Push 19
Push 99
Store
Push 20
Push 101
Store
Push 21
Push 115
Store
# (cont.d...)
Push 22
Push 33
Store
Push 23
Push 0
Store
Push 0
Call "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
# Jumping to the ☃ mark with remembering this place to return later
Call "\t \t  \t\t  \t\t\t \t\t \t  \t \t\t   \t\t \t\t \t\t\t \t\t\t \t \t  \t\t  \t\t\t \t\t "
# Jumping to the ☁ mark with remembering this place to return later
End
# Terminating program.
Label "  \t  \t\t   \t  \t\t \t    \t\t "
Infix Plus
Return
Label "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
# ☃
Dup
# Copying the top value of the stack. It's 0 (if you see here for the first time.)
Retrieve
# Getting the 0th value of the heap. It's 72 (if you see here for the first time.)
Dup
If Zero "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t \t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
OutputChar
Push 1
Infix Plus
Jump "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
Label "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t \t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
Discard
Discard
Return
# Ending the function call and going back to the previous place.
Label "  \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Dup
Dup
ReadChar
Retrieve
Dup
Push 10
Infix Minus
If Zero "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t   \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Discard
Push 1
Infix Plus
Jump "  \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Label "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t   \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Discard
Push 1
Infix Plus
Push 0
Store
Return
Label "\t \t  \t\t  \t\t\t \t\t \t  \t \t\t   \t\t \t\t \t\t\t \t\t\t \t \t  \t\t  \t\t\t \t\t "
# ☁
Push 10
Push 13
OutputChar
OutputChar
Return

And the result is

Hello, world of spaces!

You can run it with a little bit fixes of the official whitespace implementation wspace 0.3 written in Haskell.

A Language

I made an experimental small language and it's compiler.

(begin
  (def 0 98)
  (f 90 7)
  (putc (ref 0))
  (end)
  (defn f (x y)
        (putc (g x y)))
  (defn g (x y)
        (+ x y)))

Syntax: an expression begins with "(" and a name, arguments for it, and end with ")". If an expression has a value, you have to use the value with enclosing another expression. Otherwise the behavior is undefined. If an expression does not have a value, you must not enclose it as an argument. Otherwise the behavior is undefined.

  • begin
    • combines some expressions that don't have return values
  • def {index} {value}
    • assigns the {value} to {index}th slot of heap
  • ref {index} -> {value}
    • returns the {value} of {index}th slot of heap
  • putc {value}
    • outputs a character which ASCII code is {value}
  • end
    • terminates program
  • defn {name} ({args}) {body}
    • defines a function
    • if a program come to defn without using call, the behavior is undefined.

* + {value} {value} -> {value} * obviously

You can "call" a function you made just with ({name} {arg1} {arg2}). You can use arguments of the function just by an identifier like x.

Sample code

(begin
  (putc (+ 50 40))
  (end))

That is compiled to

Push 50
Push 40
Infix Plus
OutputChar
End

and shows 'Z'.

I little bit complicated example

(begin
  (def 0 98)
  (f 90 7)
  (putc (ref 0))
  (end)
  (defn f (x y)
        (putc (g x y)))
  (defn g (x y)
        (+ x y)))

compiled to

Push 0
Push 98
Store
Push 90
Push 7
Call "f"
Push 0
Retrieve
OutputChar
End
Label "f"
Push (-1)
Swap
Store
Push (-2)
Swap
Store
Push (-1)
Retrieve
Push (-2)
Retrieve
Call "g"
OutputChar
Return
Label "g"
Push (-3)
Swap
Store
Push (-4)
Swap
Store
Push (-3)
Retrieve
Push (-4)
Retrieve
Infix Plus
Return

and shows "ab".

You may have noticed that the function argument is in negative number of heaps. If you updates them like by (def -3 100) it may result in breaking something, but since this implementation doesn't support negative literals, it remains safe.

The compiler is below, written in Haskell.

import qualified VM as V
import qualified Text.Parsec as P
import Control.Applicative ((<|>), (<$>))
import qualified Control.Monad.State as S
import qualified Data.Map as M
import Data.Maybe (fromJust)

data Intermediate = Comment String
  | Inst V.Instruction
  | Paramdef String
  | Paramref String
  deriving Show

type LispParser = P.ParsecT String () (S.State String)
type ParamMap = M.Map String Integer

main :: IO ()
main = do
  code <- readFile "hworld.lisp"
  --mapM_ print $ parse code
  let runtime = compile (parse code)
  mapM_ print runtime
  putStrLn "--"
  V.vm (V.VM runtime (V.Stack []) (V.Stack []) M.empty 0)

parse :: String -> [Intermediate]
parse str = either (error . show) id $
  S.evalState (P.runPT parseExpr () "lisp" str) "toplevel"

parseExpr :: LispParser [Intermediate]
parseExpr = P.try parseInt
         <|> parseDefn
         <|> parseBuiltin
         <|> parseApply
         <|> parseVar

parseInt :: LispParser [Intermediate]
parseInt = do
  x <- P.many1 P.digit
  return [Inst $ V.Push $ read x]

parseAtom :: LispParser String
parseAtom = P.many1 $ P.noneOf " \t\n()"

parseDefn :: LispParser [Intermediate]
parseDefn = do
  P.try $ do
    ignoringSpaces $ P.char '('
    ignoringSpaces $ P.string "defn"
  fname <- requireSpaces parseAtom
  S.lift $ S.put fname

  ignoringSpaces $ P.char '('
  names <- ignoringSpaces $ parseAtom `P.sepBy` P.skipMany1 P.space
  ignoringSpaces $ P.char ')'

  body <- ignoringSpaces parseExpr
  ignoringSpaces $ P.char ')'
  S.lift $ S.put "toplevel"
  return $
    Comment "(defn" :
    Inst (V.Label fname) :
    map (Paramdef . ((fname ++ "/") ++)) names ++
    body ++ [Inst V.Return] ++ [Comment ")"]

parseBuiltin :: LispParser [Intermediate]
parseBuiltin = P.try $ do
  (fname, xs) <- atomAndArgs
  x <- case (fname, length xs) of
       ("+", 2) -> return [Inst $ V.Infix V.Plus]
       ("putc", 1) -> return [Inst V.OutputChar]
       ("def", 2) -> return [Inst V.Store]
       ("ref", 1) -> return [Inst V.Retrieve]
       ("end", 0) -> return [Inst V.End]
       ("begin", _) -> return []
       _ -> fail "omg"
  return $ Comment ('(' : fname) : concat xs ++ x ++ [Comment ")"]

parseApply :: LispParser [Intermediate]
parseApply = do
  (fname, xs) <- atomAndArgs
  return $ concat xs ++ [Inst $ V.Call fname]

atomAndArgs :: LispParser (String, [[Intermediate]])
atomAndArgs = do
  ignoringSpaces $ P.char '('
  fname <- ignoringSpaces parseAtom
  xs <- ignoringSpaces $ parseExpr `P.sepBy` P.many1 P.space
  P.char ')'
  return (fname, xs)

parseVar :: LispParser [Intermediate]
parseVar = do
  name <- ignoringSpaces $ P.many1 $ P.noneOf " \t\n()"
  fname <- S.lift S.get
  return [Paramref $ fname ++ '/' : name]

ignoringSpaces :: LispParser a -> LispParser a
ignoringSpaces f = P.skipMany P.space >> f

requireSpaces :: LispParser a -> LispParser a
requireSpaces f = P.skipMany1 P.space >> f

compile :: [Intermediate] -> [V.Instruction]
compile inters = concat $ S.evalState (mapM compile' inters) M.empty

compile' :: Intermediate -> S.State ParamMap [V.Instruction]
compile' (Comment _) = return []
compile' (Inst x) = return [x]
compile' (Paramdef name) = do
  idx <- pred . negate . fromIntegral . M.size <$> S.get
  S.modify $ M.insert name idx
  return [V.Push idx, V.Swap, V.Store]
compile' (Paramref name) = do
  idx <- fromJust . M.lookup name <$> S.get
  return [V.Push idx, V.Retrieve]

This code depends on VM.hs from wspace-0.3 to share the data structure of VM Instruction and to execute the compiled program. If you only want to compile given programs, you don't need VM.hs but just to add the following definition.

data Instruction =
       Push Integer
     | Dup
     | Ref Int
     | Slide Int
     | Swap
     | Discard
     | Infix Op
     | Store
     | Retrieve
     | Label Label
     | Call Label
     | Jump Label
     | If Test Label
     | Return
     | OutputChar
     | OutputNum
     | ReadChar
     | ReadNum
     | End
   deriving (Show,Eq)

By the way wspace-0.3 had an issue that it can only handle sequential indices of heap. You can store values in 0th, 1st and 2nd slots of heap, but you cannot store in 100th without completing all indices between 0 to 100. I wrote a patch to allow any index. Feel free to use it.

diff --git VM.hs VM.hs
index c9e96ab..bb74374 100644
--- VM.hs
+++ VM.hs
@@ -1,6 +1,8 @@
 module VM where

 import IO
+import qualified Data.Map as M
+import Data.Maybe (fromJust)

 {- Stack machine for running whitespace programs -}

@@ -35,7 +37,7 @@ type Loc = Integer

 type Program = [Instruction]
 newtype Stack = Stack [Integer]
-type Heap = [Integer]
+type Heap = M.Map Integer Integer

 data VMState = VM {
         program :: Program,
@@ -130,13 +132,7 @@ findLabel' m (_:xs) i = findLabel' m xs (i+1)
 -- Heap management

 retrieve :: Integer -> Heap -> IO Integer
-retrieve x heap = return (heap!!(fromInteger x))
+retrieve x heap = return $ fromJust $ M.lookup x heap

 store :: Integer -> Integer -> Heap -> IO Heap
-store x 0 (h:hs) = return (x:hs)
-store x n (h:hs) = do hp <- store x (n-1) hs
-             return (h:hp)
-store x 0 [] = return (x:[])
-store x n [] = do hp <- store x (n-1) [] 
-         return (0:hp)
-
+store x n h = return $ M.insert n x h

No comments:

Post a Comment

Followers