Blogged by Ujihisa. Standard methods of programming and thoughts including Clojure, Vim, LLVM, Haskell, Ruby and Mathematics written by a Japanese programmer. github/ujihisa

Monday, December 26, 2011

Compiling a Language to Whitespace

Whitespace is a widely-known instruction set on a stack-machine virtual machine.

The below is a sample hello world in Whitespace intermediate language. I added some notes that begin with #.

Push 0
Push 72
Store
# storing the value 72 to 0th heap
Push 1
Push 101
Store
# storing the value 101 to 1st heap
Push 2
Push 108
Store
Push 3
Push 108
Store
Push 4
Push 111
Store
Push 5
Push 44
Store
Push 6
Push 32
Store
Push 7
Push 119
Store
Push 8
Push 111
Store
Push 9
Push 114
Store
Push 10
Push 108
Store
Push 11
Push 100
Store
Push 12
Push 32
Store
Push 13
Push 111
Store
Push 14
Push 102
Store
Push 15
Push 32
Store
Push 16
Push 115
Store
Push 17
Push 112
Store
Push 18
Push 97
Store
Push 19
Push 99
Store
Push 20
Push 101
Store
Push 21
Push 115
Store
# (cont.d...)
Push 22
Push 33
Store
Push 23
Push 0
Store
Push 0
Call "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
# Jumping to the ☃ mark with remembering this place to return later
Call "\t \t  \t\t  \t\t\t \t\t \t  \t \t\t   \t\t \t\t \t\t\t \t\t\t \t \t  \t\t  \t\t\t \t\t "
# Jumping to the ☁ mark with remembering this place to return later
End
# Terminating program.
Label "  \t  \t\t   \t  \t\t \t    \t\t "
Infix Plus
Return
Label "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
# ☃
Dup
# Copying the top value of the stack. It's 0 (if you see here for the first time.)
Retrieve
# Getting the 0th value of the heap. It's 72 (if you see here for the first time.)
Dup
If Zero "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t \t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
OutputChar
Push 1
Infix Plus
Jump "\t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
Label "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t \t \t  \t\t   \t \t\t\t \t  \t \t\t  \t  \t\t\t \t\t\t \t\t\t "
Discard
Discard
Return
# Ending the function call and going back to the previous place.
Label "  \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Dup
Dup
ReadChar
Retrieve
Dup
Push 10
Infix Minus
If Zero "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t   \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Discard
Push 1
Infix Plus
Jump "  \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Label "  \t  \t\t  \t\t\t \t\t \t \t  \t\t \t\t\t\t\t \t   \t  \t\t \t    \t\t \t \t  \t\t  \t  \t\t\t "
Discard
Push 1
Infix Plus
Push 0
Store
Return
Label "\t \t  \t\t  \t\t\t \t\t \t  \t \t\t   \t\t \t\t \t\t\t \t\t\t \t \t  \t\t  \t\t\t \t\t "
# ☁
Push 10
Push 13
OutputChar
OutputChar
Return

And the result is

Hello, world of spaces!

You can run it with a little bit fixes of the official whitespace implementation wspace 0.3 written in Haskell.

A Language

I made an experimental small language and it's compiler.

(begin
  (def 0 98)
  (f 90 7)
  (putc (ref 0))
  (end)
  (defn f (x y)
        (putc (g x y)))
  (defn g (x y)
        (+ x y)))

Syntax: an expression begins with "(" and a name, arguments for it, and end with ")". If an expression has a value, you have to use the value with enclosing another expression. Otherwise the behavior is undefined. If an expression does not have a value, you must not enclose it as an argument. Otherwise the behavior is undefined.

  • begin
    • combines some expressions that don't have return values
  • def {index} {value}
    • assigns the {value} to {index}th slot of heap
  • ref {index} -> {value}
    • returns the {value} of {index}th slot of heap
  • putc {value}
    • outputs a character which ASCII code is {value}
  • end
    • terminates program
  • defn {name} ({args}) {body}
    • defines a function
    • if a program come to defn without using call, the behavior is undefined.

* + {value} {value} -> {value} * obviously

You can "call" a function you made just with ({name} {arg1} {arg2}). You can use arguments of the function just by an identifier like x.

Sample code

(begin
  (putc (+ 50 40))
  (end))

That is compiled to

Push 50
Push 40
Infix Plus
OutputChar
End

and shows 'Z'.

I little bit complicated example

(begin
  (def 0 98)
  (f 90 7)
  (putc (ref 0))
  (end)
  (defn f (x y)
        (putc (g x y)))
  (defn g (x y)
        (+ x y)))

compiled to

Push 0
Push 98
Store
Push 90
Push 7
Call "f"
Push 0
Retrieve
OutputChar
End
Label "f"
Push (-1)
Swap
Store
Push (-2)
Swap
Store
Push (-1)
Retrieve
Push (-2)
Retrieve
Call "g"
OutputChar
Return
Label "g"
Push (-3)
Swap
Store
Push (-4)
Swap
Store
Push (-3)
Retrieve
Push (-4)
Retrieve
Infix Plus
Return

and shows "ab".

You may have noticed that the function argument is in negative number of heaps. If you updates them like by (def -3 100) it may result in breaking something, but since this implementation doesn't support negative literals, it remains safe.

The compiler is below, written in Haskell.

import qualified VM as V
import qualified Text.Parsec as P
import Control.Applicative ((<|>), (<$>))
import qualified Control.Monad.State as S
import qualified Data.Map as M
import Data.Maybe (fromJust)

data Intermediate = Comment String
  | Inst V.Instruction
  | Paramdef String
  | Paramref String
  deriving Show

type LispParser = P.ParsecT String () (S.State String)
type ParamMap = M.Map String Integer

main :: IO ()
main = do
  code <- readFile "hworld.lisp"
  --mapM_ print $ parse code
  let runtime = compile (parse code)
  mapM_ print runtime
  putStrLn "--"
  V.vm (V.VM runtime (V.Stack []) (V.Stack []) M.empty 0)

parse :: String -> [Intermediate]
parse str = either (error . show) id $
  S.evalState (P.runPT parseExpr () "lisp" str) "toplevel"

parseExpr :: LispParser [Intermediate]
parseExpr = P.try parseInt
         <|> parseDefn
         <|> parseBuiltin
         <|> parseApply
         <|> parseVar

parseInt :: LispParser [Intermediate]
parseInt = do
  x <- P.many1 P.digit
  return [Inst $ V.Push $ read x]

parseAtom :: LispParser String
parseAtom = P.many1 $ P.noneOf " \t\n()"

parseDefn :: LispParser [Intermediate]
parseDefn = do
  P.try $ do
    ignoringSpaces $ P.char '('
    ignoringSpaces $ P.string "defn"
  fname <- requireSpaces parseAtom
  S.lift $ S.put fname

  ignoringSpaces $ P.char '('
  names <- ignoringSpaces $ parseAtom `P.sepBy` P.skipMany1 P.space
  ignoringSpaces $ P.char ')'

  body <- ignoringSpaces parseExpr
  ignoringSpaces $ P.char ')'
  S.lift $ S.put "toplevel"
  return $
    Comment "(defn" :
    Inst (V.Label fname) :
    map (Paramdef . ((fname ++ "/") ++)) names ++
    body ++ [Inst V.Return] ++ [Comment ")"]

parseBuiltin :: LispParser [Intermediate]
parseBuiltin = P.try $ do
  (fname, xs) <- atomAndArgs
  x <- case (fname, length xs) of
       ("+", 2) -> return [Inst $ V.Infix V.Plus]
       ("putc", 1) -> return [Inst V.OutputChar]
       ("def", 2) -> return [Inst V.Store]
       ("ref", 1) -> return [Inst V.Retrieve]
       ("end", 0) -> return [Inst V.End]
       ("begin", _) -> return []
       _ -> fail "omg"
  return $ Comment ('(' : fname) : concat xs ++ x ++ [Comment ")"]

parseApply :: LispParser [Intermediate]
parseApply = do
  (fname, xs) <- atomAndArgs
  return $ concat xs ++ [Inst $ V.Call fname]

atomAndArgs :: LispParser (String, [[Intermediate]])
atomAndArgs = do
  ignoringSpaces $ P.char '('
  fname <- ignoringSpaces parseAtom
  xs <- ignoringSpaces $ parseExpr `P.sepBy` P.many1 P.space
  P.char ')'
  return (fname, xs)

parseVar :: LispParser [Intermediate]
parseVar = do
  name <- ignoringSpaces $ P.many1 $ P.noneOf " \t\n()"
  fname <- S.lift S.get
  return [Paramref $ fname ++ '/' : name]

ignoringSpaces :: LispParser a -> LispParser a
ignoringSpaces f = P.skipMany P.space >> f

requireSpaces :: LispParser a -> LispParser a
requireSpaces f = P.skipMany1 P.space >> f

compile :: [Intermediate] -> [V.Instruction]
compile inters = concat $ S.evalState (mapM compile' inters) M.empty

compile' :: Intermediate -> S.State ParamMap [V.Instruction]
compile' (Comment _) = return []
compile' (Inst x) = return [x]
compile' (Paramdef name) = do
  idx <- pred . negate . fromIntegral . M.size <$> S.get
  S.modify $ M.insert name idx
  return [V.Push idx, V.Swap, V.Store]
compile' (Paramref name) = do
  idx <- fromJust . M.lookup name <$> S.get
  return [V.Push idx, V.Retrieve]

This code depends on VM.hs from wspace-0.3 to share the data structure of VM Instruction and to execute the compiled program. If you only want to compile given programs, you don't need VM.hs but just to add the following definition.

data Instruction =
       Push Integer
     | Dup
     | Ref Int
     | Slide Int
     | Swap
     | Discard
     | Infix Op
     | Store
     | Retrieve
     | Label Label
     | Call Label
     | Jump Label
     | If Test Label
     | Return
     | OutputChar
     | OutputNum
     | ReadChar
     | ReadNum
     | End
   deriving (Show,Eq)

By the way wspace-0.3 had an issue that it can only handle sequential indices of heap. You can store values in 0th, 1st and 2nd slots of heap, but you cannot store in 100th without completing all indices between 0 to 100. I wrote a patch to allow any index. Feel free to use it.

diff --git VM.hs VM.hs
index c9e96ab..bb74374 100644
--- VM.hs
+++ VM.hs
@@ -1,6 +1,8 @@
 module VM where

 import IO
+import qualified Data.Map as M
+import Data.Maybe (fromJust)

 {- Stack machine for running whitespace programs -}

@@ -35,7 +37,7 @@ type Loc = Integer

 type Program = [Instruction]
 newtype Stack = Stack [Integer]
-type Heap = [Integer]
+type Heap = M.Map Integer Integer

 data VMState = VM {
         program :: Program,
@@ -130,13 +132,7 @@ findLabel' m (_:xs) i = findLabel' m xs (i+1)
 -- Heap management

 retrieve :: Integer -> Heap -> IO Integer
-retrieve x heap = return (heap!!(fromInteger x))
+retrieve x heap = return $ fromJust $ M.lookup x heap

 store :: Integer -> Integer -> Heap -> IO Heap
-store x 0 (h:hs) = return (x:hs)
-store x n (h:hs) = do hp <- store x (n-1) hs
-             return (h:hp)
-store x 0 [] = return (x:[])
-store x n [] = do hp <- store x (n-1) [] 
-         return (0:hp)
-
+store x n h = return $ M.insert n x h

Thursday, December 22, 2011

Continuous-Passing Conversion in Haskell

http://en.wikipedia.org/wiki/Continuation-passing_style

Convert from

(+ (f 0 (g 1)) 2)

to

(g' (lambda (r0) (f' (lambda (r1) (+ r1 2)) 0 r0)) 1)

where data structure internally in Haskell is like

data AST = Node [AST] | Leaf Value
data Value = IntVal Int | Plus | Atom String | Lambda [String]

Implementation and description

import qualified Control.Monad.State as S

data AST = Node [AST] | Leaf Value
instance Show AST where
  show (Node xs) = "(" ++ unwords (map show xs) ++ ")"
  show (Leaf v) = show v

data Value = IntVal Int | Plus | Atom String | Lambda [String]
instance Show Value where
  show (IntVal i) = show i
  show Plus = "+"
  show (Atom name) = name
  show (Lambda names) = "lambda (" ++ unwords names ++ ")"

-- (+ (f 0 (g 1)) 2)
-- (g' (lambda (r0) (f' (lambda (r1) (+ r1 2)) 0 r0)) 1)
program :: AST
program = Node [Leaf Plus,
  Node [Leaf (Atom "f"), Leaf (IntVal 0), Node [Leaf (Atom "g"), Leaf (IntVal 1)]],
  Leaf (IntVal 2)]

main = do
  print program
  print $ cps program

cps :: AST -> AST
cps ast =
  let (newAst, modifiers) = S.runState (cps' ast) [] in
      foldl (flip ($)) newAst modifiers

cps' :: AST -> S.State [AST -> AST] AST
cps' (Node (Leaf (Atom f) : xs)) = do
  xs' <- mapM cps' xs
  n <- length `fmap` S.get
  let name = 'r' : show n
  append $ \root -> Node $
    (Leaf . Atom $ f ++ "'") :
    Node [Leaf (Lambda [name]), root] :
    xs'
  return $ Leaf (Atom name)
cps' (Node xs) = Node `fmap` mapM cps' xs
cps' c@(Leaf _) = return c

append x = S.modify (x :)

This converts correctly.

I used State Monad to modify given tree. The function cps starts state and the actual function cps' traverses given AST subtrees recursively.

(+ (f 0 (g 1)) 2)
   ^^^^^^^^^^^

When cps' sees this subtree, oh yes the first item of the list is a user-defined function and it's not tail-call, so cps' wants to replace the part with a new variable (say r), and enclose whole tree with new function f' and the arguments.

(f' (lambda (r) ((+ r 2) 0 (g 1))))
^^^^^^^^^^^^^^^^^   ^   ^^^^^^^^^^^

It's easy to change subtree but it's not trivial to change outside the subtree. But fortunately we already know that we only have to enclose something around the whole tree, so you can just save a function in state.

After cps' process is done, you apply all functions that the state has accumulatively to enclose trees. That will complete the job.

Tuesday, December 20, 2011

Unlambda Interpreter in Haskell

Unlambda is a minimal, "nearly pure"[1] functional programming language invented by David Madore. It is based on combinatory logic, a version of the lambda calculus that omits the lambda operator. It relies mainly on two built-in functions (s and k) and an "apply" operator (written `, the backquote character). These alone make it Turing-complete, but there are also some I/O functions to make it possible to interact with the user, some shortcut functions and a function for lazy evaluation. There are no variables in the language.

http://en.wikipedia.org/wiki/Unlambda

import qualified Text.Parsec as P
import Control.Applicative ((*>), (<$>), (<*>))

data AST = Apply AST AST | Val Value
instance Show AST where
  show (Apply a b) = "(" ++ show a ++ " " ++ show b ++ ")"
  show (Val (Dot c)) = "put-" ++ [c]
  show (Val (Builtin c)) = [c]

data Value = Dot Char
  | Builtin Char
  | PendingK Value
  | PendingS1 Value
  | PendingS2 Value Value
  deriving Show

main = do
  let helloworld = "`r```````````.H.e.l.l.o. .w.o.r.l.di"
  let fibonacci = "```s``s``sii`ki`k.*``s``s`ks``s`k`s`ks``s``s`ks``s`k`s`kr``s`k`sikk`k``s`ksk"
  print $ desugar $ parse helloworld
  eval $ desugar $ parse helloworld
  --eval $ desugar $ parse fibonacci

parse :: String -> AST
parse = either (error . show) id . P.parse parse' "unlambda"

parse' = P.try (P.char '`' *> (Apply <$> parse' <*> parse'))
         P.<|>
         P.try (P.char '.' *> (Val . Dot <$> P.anyChar))
         P.<|>
         P.try (Val . Builtin <$> P.anyChar)

desugar :: AST -> AST
desugar (Apply a b) = Apply (desugar a) (desugar b)
desugar (Val (Builtin 'r')) = Val (Dot '\n')
desugar (Val (Builtin 'i')) = Apply (Apply (Val (Builtin 's')) (Val (Builtin 'k'))) (Val (Builtin 'k')) -- i = ``skk
desugar x = x

eval :: AST -> IO (Value)
eval (Apply a b) = do
  a' <- eval a
  b' <- eval b
  apply a' b'
eval (Val x) = return x

apply :: Value -> Value -> IO Value
apply (Dot c) x = putChar c >> return x
apply (Builtin 'k') x = return $ PendingK x
apply (Builtin 's') x = return $ PendingS1 x
apply (PendingK x) y = return $ x
apply (PendingS1 x) y = return $ PendingS2 x y
apply (PendingS2 x y) z = do
  a <- apply x z
  b <- apply y z
  apply a b
  1. parse the given string to abstract syntax tree
  2. desugar the ast; expanding macros like r or i.
  3. interpreter evaluates all nodes!

AST

(put-\n (((((((((((put-H put-e) put-l) put-l) put-o) put- ) put-w) put-o) put-r) put-l) put-d) ((s k) k)))

Result of helloworld

Hello world

Result of fibonacci

*
*
**
***
*****
********
*************
*********************
**********************************
*******************************************************
*****************************************************************************************

(added at Tue Dec 20 23:57:12 PST 2011)

I also made a stackmachine-based virtual machine and a compiler for it.

https://gist.github.com/1505131

This was actually much simpler/easier than I thought. There's a difference between pure interpreter and this virtualmachine, but it's not very big.

For example very short program "hi" that shows "hi" is "``.h.ii" in unlambda. First this parser converts the text to AST.

((put-h put-i) ((s k) k))

Then the compiler converts the tree to sequence of instructions.

IPush (Dot 'h')
IPush (Dot 'i')
IApply
IPush (Builtin 's')
IPush (Builtin 'k')
IApply
IPush (Builtin 'k')
IApply
IApply

Then the virtualmachine runtime will run it.

hi

Sunday, December 18, 2011

Read Raw POST Request Body in Compojure

Brief introduction to Compojure

Compojure gives you a shortcut to make a monofunctional web application. For example a Lingr bot is a web application that only need to be responsible with one single endpoint that handles POST request. The below is a web application that only shows "hello" in / endpoint with GET request.

(defroutes hello
           (GET "/" [] "hello"))
(run-jetty hello {:port 80})

Note that it requires you to be the root of the system if you are going to run a web app on port 80.

The main part of the app is just 3 lines of code. That reminds me of code examples for Sinatra, a Ruby web library.

get '/' do
  'hello'
end

Anyways the Compojure example code doesn't work only with the main logic. You are supposed to make a Leiningen project usually to manage the app and its dependent libraries.

$ lein new hello
$ cd hello

project.clj

(defproject hello "1.0.0-SNAPSHOT"
            :main hello.core
            :description "FIXME: write description"
            :dependencies [[org.clojure/clojure "1.2.1"]
                           [compojure "1.0.0-RC2"]
                           [ring "1.0.1"]])

src/hello/core.clj

(ns hello.core
  (:use
    compojure.core
    ring.adapter.jetty))

(defroutes hello
           (GET "/" [] "hello"))
(run-jetty hello {:port 80})

then

$ lein deps
$ lein run

it should work.

Parameters

(defroutes hello
           (GET "/" [] "hello"))
(run-jetty hello {:port 80})

The 2nd argument of GET, [] in this case, is parameter list for the expression you give in 3rd argument, which mostly for referring GET parameters. That's actually a hashmap that contains :params key which value is also a hashmap of GET parameters. Ditto in POST.

How can we get the raw post parameter?

(POST "/" {params :params} (...))

In that way you cannot get raw data because it's after the process. You can reconstruct the raw data only when the given parameter is like proper a=1\nb=2 form. These days some web apis are required to handle raw POST data, which is mostly in JSON, like a Lingr Bot API.

The answer is in :body of the parameter, but it's not a String but a mysterious HttpParser.Input object, assuming you are using ring as the middleware.

http://jetty.codehaus.org/jetty/jetty-6/apidocs/org/mortbay/jetty/HttpParser.Input.html

This class looks weird because even though this has read() method the return value type isn't String but int. The other read() looks like you are supposed to pass a mutable data and refer the changed data.

Fortunately we can use slurp Clojure function to hide this complicated behaviour.

(defroutes hello
           (POST "/" {body :body} (slurp body)))

This shows the given raw POST parameter!

Followers