Tons o’garbage

The vi emulation script needs to get positions from the built-in editor object. At the very least, this is necessary to store “marks” in the file and get back to them later when the user requests so. But it turns out that most motion commands in vi are better implemented as separating the act of getting the target position and then setting it later on.

For example, the ‘w’ command adavances to the beginning of the next word. Although the built-in interface may provide a “GotoNextWord” function, the fact is ‘dw’ deletes everything up to the beginning of the next word. ‘yw’ copies everything up to the beginning of the next word, without moving the cursor. And thus with many commands.

Now, a “position” within a text buffer, as handled by NGEDIT, may be quite complex. If you are editing a simple ASCII file, or a file in your local code page as long as it is a single-byte-per-character codepage, things get simpler. If, on top of that, your file does not contain a single tab character (ASCII 9, which I call “hard” tabs), and you are not using a variable width font in order to edit the file, a position can be as simple as a line number and an offset within that line.

But if you are editing, for example, UTF-8 text (in which a single character may occupy anything from 1 to 6 bytes, although the Unicode consortium promises they will not be using 5- and 6- byte encodings), which does include tabs (so the count of characters does not match the column on the screen), and even using a proportional font to display the file, then a position may host the above mentioned line and offset, but also a column count and an horizontal display coordinate in pixels.

So, when a script wants to grab, for instance, the current cursor position, it actually has no idea what it is trying to grab. This is great for the script, because it needn’t worry about the underlying complexity: these positions are only manipulated by the built-in editor object.

The complexity with this is that a position has to be returned as an object from the built-in engine.

The NGS language, the same as JavaScript and others, doesn’t have pointers. You can’t pass a pointer to an object to any function, be it other script functions or built-in functions. I had a look at the Microsoft .NET CLR specification, together with their IL bytecode-based language, and it turns that the model is quite similar to what I am doing for NGS. They have pointers, but most of the code doesn’t need them. It probably came to them as a necessity to be able to generate managed C++ code, which will of course require shuffling and passing and squeezing all types of pointers around.

Besides,

  var pos;
  s_Editor.GetCursorPos(&pos);

is of course much, much uglier than

  var pos = s_Editor.GetCursorPos();

By the way, I think the Java VM doesn’t manipulate pointers either. Of course, “object” values are actually pointers to objects in the heap and can be thought of as a kind of pointer – but pointers to general variables are just not there (please correct me if memory fails.)

The fact is, I had to return objects. That meant the built-in engine creating them, and then the script manipulating it: either keeping them in a variable, operating and discarding them, whatever.

I had avoided complex memory management of objects until now. ‘new’ and ‘delete’ were there for manual memory management, which I was doing fine with – vi emulation does not involve any heavy object shuffling, so I thought I could leave it for later.

Not any more. I had to do Garbage Collection.

I had never implemented a GC scheme – I had an idea of how a mark-n-sweep algorithm works, and it seemed simple enough, so off I went to implement it.

And it turns, it took less than a couple of hours to have a decent memory management system working. Reclaimed objects are reused on demand and only freed at the unloading of a script context, so it does reduce the amount of memory shuffling with the CRT.

All objects allocated are kept in a doubly-linked list, hand-written mind you, whose root is at the TScript object. There are actually two lists: one is the list of the “busy” objects and the other one holds the “free” objects. When someone asks for a new object (be it script code, the built-in editor object, or whoever), I first check if there are any objects in the “free” list. If not, I perform a GC cycle. This involves first un-marking all known supposedly-busy objects, then iterating over all script variables and the running stack and marking the objects they refer to (if indeed the do refer to objects), apply recursively, and voila! we can now pass all unmarked objects from the busy list to the free list. If nothing turns up, we can always revert to a good-ole-fashioned new Object; within the C++ code.

Now I can even write such insulting code as:

  function fn()
  {
    var v = new Whatever();
  }

and be called a lazy programmer ๐Ÿ™‚

2 Responses to “Tons o’garbage”

  1. catwalk Says:

    Looking forward to see a trial version of the editor… Looks like itยดs going to be a step further in the programming tools arena.

  2. J Says:

    Thanks for the compliment ๐Ÿ™‚

Leave a Reply