Monday, December 26, 2011

Guile II

I've started learning Scheme, and my system of choice is Guile, a version of Scheme designed to be used as a scripting language for Unix-like operating systems. I'm using the older 1.8 version that is available in Ubuntu right now. The new, much improved 2.0 version will be available in the next Ubuntu version but for now 1.8 is plenty for me to learn the basics.

I have been programming in one way or another for all of my adult life. My experience ranges from embedded systems to very large machines and in a variety of languages. I have never really used Scheme, though, and never tried to work in a largely functional programming language before. The code and all the explanations here is the work of a beginner trying to learn about Scheme programming, and it's bound to contain a lot of bad coding style, suboptimal solutions, misconceptions and outright errors.

So why do I do this? The most effective way to learn something is perhaps to explain it to others. It forces you to make everything explicit, and will expose any doubts or misunderstandings you may have. Now, I don't expect anyone to actually follow these posts in any detail; you, collectively, are the imaginary audience I need for me to to write this.

Guile is Scheme. What does Scheme look like? Here's a bit of scheme code:

;; create a list of values from low to high, inclusive
(define (range low high)
    (if (> low high)
      (cons low (range (+ low 1) high))))

This defines a function called range that will give you a list of numbers, from low to high:

>(range 1 5)
=> (1 2 3 4 5)

This is a pretty useful function, and not surprisingly a variant (named iota for obscure reasons) is available in a Guile library already, along with a more powerful function called list-tabulate that lets you use any kind of list-generating function.

First, note that there's plenty of parentheses. Lots of them, in fact. Both programs and data use the same notation, and it's quite easy to treat a program as data or the other way around. On the other hand, all those parentheses get quite confusing, so a good editor that shows you where each pair belongs is really helpful if you're going to write Scheme code. I use Vim; other people like Emacs.

Scheme, like LISP, uses prefix notation. That is, first we give the operation, then the data to operate on. So if we want to add two numbers we'd do (+ 2 2), and the expression would be replaced with 41. The if-statement comparison shows the same order: if (> low high) would be if (low>high) in many other languages.

Simple statements like numbers or variables can stand by themselves. Any operator or function has to be the first element in a list followed by its parameters. When the function returns the whole thing is replaced by the result. Makes sense, I think, as that's the only way to know what arguments belong to an operator. The range call in the last line, (range (+ low 1) high) has two parameters, where the variable high simply gets replaced with its value, while (+ low 1) is a statement with operator + and two arguments for + to add.

The fundamental way to loop in Scheme is by recursion. We wrap the code we want to loop over in a function2. We repeat the body of this function by calling the function again at the end. So we call range with two parameters low and high. If low is smaller, we make a list pair with low as the first element, and the result of calling range with the next value of low and high as parameters. Once they're equal we return back up and build the list on the way. Proper lists end with the empty list value, so we return '() for the final element ("'" is a quote, so the list is not evaluated).

This sounds inefficient perhaps, but it is not. When the recursive call happens at the end this becomes just as fast and efficient as a regular loop. Scheme has other ways to create loops, such as do and while statements, but they are all defined by tail recursion like this.

This is what a non-recursive version could look like, using a while loop:

;; list range function, iterative version
(define (range-iter low high)
  (let ((acc (list high)))
    (while (< low high)
    (set! high (1- high))
    (set! acc (cons high acc)))

The let statement creates new local variables. We need an accumulator to store our list, so we start by setting it to a list consisting of the high value. Then, while low is smaller than high, we decrement high, concatenate the new high value to the front of acc, then store that new list in acc again. The final statement is acc which gets substituted with the list it contanins, and becomes the return value of the function.

To me this is perhaps easier to understand, but it's not as elegant as the first version above, and it was more difficult for me to get right.

Now, I said that when the recursive call happens at the end, Scheme can optimize it so it's just as efficient as a regular loop. When the recursive call happens last there is nothing left to do at that level. Scheme doesn't have to keep track of each level, and can return directly to the top at the end. But if you look at the first version, the range is not, in fact, the last statement; cons is. The first version is not properly tail-recursive in other words. Something like

> (range 1 50000)

will fail with a (stack-overflow) error. Scheme has to keep track of each level so it can do the cons at the end. We have to rearrange things so that the cons happens on the way down, not when going back up. We assemble the next part of the list and send it along to the next iteration with an extra parameter. We also define a wrapper function without the extra value just to make it neat — remember, functions are cheap.

;; list range function, properly tail-recursive
(define (inner-range low high acc)
  (if (> low high)
    (inner-range low (- high 1) (cons high acc))))

(define (range low high)
    (inner-range low high '()))

This works as expected. It turns out though, that the iterative while version is actually slightly but consistently faster than the recursive one for calls up to about two million elements. At that point both versions start slowing down (memory allocation and management starts to become a real problem), but the recursive one slows down more. while is conceptually defined in terms of recursion in the standard, but I guess in practice it's implemented and optimized separately in the interpreter for efficiency.

Note that this is for the ageing 1.8 version of Guile specifically; the newer 2.0 version of Guile or other Scheme implementations may well have faster recursion. And in practice, a 10% difference in speed isn't very important compared to readability and code clarity. If speed really is critical, you're not coding in a dynamic language anyway.


#1 We normally use infix notation, where we put the operator between the values: 2 + 2. Why prefix notation? There's a number of reasons, but one is that you're not limited to two values; you can do (+ 1 2 3 4 5 6 7) and get 28 in one go. Also, it makes all operators and functions behave the same, which I guess is sort of important for the kind of person that keeps their canned goods sorted and arranges the family toothbrushes by color and size.

Oh, and is there a postfix notation too? Oh, yes indeed there is, and it's surprisingly useful and easy to work with. I don't care much for prefix notation, but postfix is my favourite way of doing calculations; I guess it mimics the way we do arithmetic in our heads already.

#2 functions are really cheap and easy in Scheme. Defining lots of them at every turn seems to be quite normal and using more but smaller functions seem to be encouraged. It's called "functional programming" for a reason.


Jan Moren said...

Agh, sorry! sakR9 posted a comment, and shortly after a spammer posted another one. By mistake I deleted both comments. Here is what sakR9 wrote:

This went amazingly and intriguingly over my head. I also respect people who know computer languages, like them, study them, write them.
Or speak about them.

So you've got some respect from me even though it means a large portion of misunderstanding.

In your previous entry about starting to learn Guile, I like how you said that everyone has different interests...
What I liked the most about this post was the fact that you said that the best way to learn something is to teach it! Something I truly believe in, good on you!


Jan Moren said...

sakR9: I did say I would probably lose most of my readers with this ^_^ Just see these posts as line noise or as written in a different language.

Now, programming is actually not very difficult. It's just that like with everything else you do need to start from the beginning. When you learn the piano you start with Twinkle Little Star, not a Chopin sonata.

Scheme is a pretty good beginners language, actually (it's been used in many beginning programming courses). But I'm not taking that perspective here. I'm writing about learning Scheme from the point of view of somebody who already is a programmer, but doesn't know about Scheme specifically.