Tag Archive | ceylon

Writing an interpreter in Ceylon

Ceylon is a statically typed language for the Java Virtual Machine (JVM) that comes with a powerful set of features that the Java language does not have or is missing (depending on your point of view). However, about three years after the first release and about one year after the version 1.0.0 release it is time to take a closer look.

A good way to get in touch with a new language is to start with a simple project, just for fun. One thing that came into my mind was to implement an interpreter for the programming language Brainfuck. The language consists of only eight simple commands and one instruction pointer. More details about Brainfuck can be found here.

This first surprising point about Ceylon is that we do not have the typically “main” method we know from other C-like languages. We can just implement a “global” method (here we call it “run”):

"Run the module `brainfuck`."
shared void run() {
	String? programFile = process.namedArgumentValue("program");
	if (exists programFile) {
		value bytes = readFile(programFile);
		BrainFuckInterpreter interpreter = BrainFuckInterpreter(bytes);
		interpreter.run();
	} else {
		print("Please provide the argument 'program'.");
	}
}

The string before the method definition is just the short form for the doc(“…”) annotation, the counterpart of the javadoc feature. To format the comments we can use the means of Markdown formatting.

Another surprising fact is the access modifier “shared” that we see in this example. Instead of using public in order to make the method accessible by other code, we use the keyword shared. In Ceylon the visibility hierarchy starts with modules that consist of one or more packages. A package consists of one or more source files, which again can share their elements. Sharing an element from one level of the hierarchy means to make it visible to the next higher level. This way even packages can be invisible to other modules, as we see in the following snippet from the package.celyon file, which is placed within each package and which declares in our case the package brainfuck as shared:

shared package brainfuck;

By the way: The package declarations that all Java classes start with are not necessary in Ceylon. Instead the package name is derived from the names of the folders the source file resides in, hence the redundant information can be left out. But let’s go back to the run() method above. As we do not have a special method which gets the command line arguments from the runtime environment passed in, we ask the global process instance for the named argument “program”. The returned String is an “optional” String, i.e. the value may be or may not be set. In Java that would be modelled by a String reference that may be null. But in order to circumvent possible NullPointerExceptions, the compiler ensures that you can only access the value of an “optional” references when you have checked them using the exists keyword.

Now that we have retrieved the path to our brainfuck “code”, we can read the bytes from the file using the following method:

ArrayList<Byte> readFile(String programFile) {
	value bytes = ArrayList<Byte>();
	print("Reading file '" + programFile + "'.");
	Path path = parsePath(programFile);
	OpenFile openfile = newOpenFile(path.resource);
	ByteBuffer buffer = newByteBuffer(1024);
	variable Integer bytesRead = openfile.read(buffer);
	print("bytesRead=" + bytesRead.string);
	while (bytesRead != -1) {
		buffer.flip();
		for (Byte byte in buffer) {
			bytes.add(byte);
		}
		buffer.clear();
		bytesRead = openfile.read(buffer);
	}
	openfile.close();
	return bytes;
}

As a storage for the bytes read from the program file, we utilize an ArrayList from the Java SDK. In order to do that, we have to import them in our module definition. This one resides in our module.celyon file:

module brainfuck "1.0.0" {
	shared import ceylon.io "1.1.0";
	shared import java.base "8";
}

Next to the name of the module and its version we can import other modules. Line 3 actually imports the java “base” module, i.e. the Java packages java.lang, java.util, java.io, java.net, java.text, NIO and security. Now that we have imported the java.base module, we can import the ArrayList at the beginning of our source file:

import java.util {
	ArrayList
}

Interestingly Ceylon emphasizes immutability, hence you can assign a value to every reference per default only once. If you want to change its value, you have to mark the reference as mutable using the keyword variable. You can see this in the code of the readFile() method as the reference to path, openfile and buffer are not marked as variable. In these cases the value is only assigned once. In contrast to that the number of bytes read from the file is of course variable, hence we mark it so.

After having read the code of the brainfuck program into memory, we can start its execution. This is done by creating an instance of the Brainfuck interpreter and calling its run() method:

BrainFuckInterpreter interpreter = BrainFuckInterpreter(bytes);
interpreter.run();

The beginning of this class is shown in the following:

shared class BrainFuckInterpreter(ArrayList<Byte> buffer) {
	ArrayList<Integer> field = ArrayList<Integer>();
	variable Integer ptr = 0;
	
	shared void run() {
		variable Integer bufferPos = 0;
		while (bufferPos >= 0 && bufferPos < buffer.size() - 1) {
			value oldBufferPos = bufferPos;
			value byte = buffer.get(bufferPos);
			switch (byte.unsigned)
			case (43) {
				ensureFieldCapacity();
				value oldValue = field.get(ptr);
				field.set(ptr, oldValue + 1);
			}
			...
			else {
				print("Ignoring char " + byte.string);
			}
			bufferPos++;
		}
		...
	}
	...
}

If you are used to Java, you might be missing the constructor. Where do we assign the passed in argument buffer to the corresponding member variable? The Ceylon feature here is that we can define some of our member variables within the braces after the class name. In our case we define a member with the name buffer of type ArrayList. The constructor we would have to write in Java that assigns the given value to the member variable is obsolete in Ceylon. The compiler creates some kind of “default” constructor that does this job for us. Additional members of the class, that are not passed to the constructor, can be defined as we would do that in Java (see field and ptr in our example). And as long as we do not declare them to be shared, they are only visible to the class members. Defining the “signature” of the constructor already within the class name also means that we can have only one constructor per class. In case we also want to execute additional code during the “construction”, we can put that code into the class body.

Our run() method now iterates of the buffer with the instructions that we have read from the file. Depending on which value the current byte has, we execute the appropriate operation. This is implemented by a switch statement. In contrast to Java we can omit the curly braces around all cases, but we cannot omit the “default” case, which is modelled in Ceylon as “else” clause of the switch statement. In this “else” case we just print the input byte that we are ignoring. Interesting about this point for Java developers is the fact that we cannot just “add” the byte to the previous String, as we would do in Java. Instead we have to access the String representation of the byte by writing byte.string. An implicit invocation of a method like toString() in Java does not exist in Ceylon.

Conclusion: The interpreter does not utilize all language features, but at least it covers a few interesting points. As someone who has worked with Java for years, the Ceylon features seem to tackle many well known issues of the Java language (versioned modules, package visibility, immutability, optional pattern, etc.). Although features like the absence of multiple constructors is a thing you have to get used to, I must admit that I liked to work with Ceylon.

PS: The sources can be found here.