Focusing my development effort

November 24th, 2005

Long time readers of my blog already know about my tendency to get carried away with stuff. I’ve got carried away with something in the past, just to have to retract the following day. The second post mostly deals with this tendency to get carried away. To sum up: I don’t think the lesson I need to learn is “refrain more”, as that takes away a lot of the energy as well – “learn to acknowledge my mistakes happily and as early as possible” seems a much more valuable lesson for me. And that applies in many other fields.

I’ve also talked about my inability to write short blog posts, and failed miserably to do so almost systematically in the past.

Anyway, to get to the point, this (of course) also applies in my dedication to development. I tend to drift off too easily, especially when the goal involves developing a complex piece of software like NGEDIT. Although I’ve posted in the past about my strategy in the development of NGEDIT, I find that I have to revisit that topic really often – mostly in the messy and hyperactive context of my thoughts, but I thought I’d post about it as it may also apply to other fellow developer-entrepreneurs.

I recently posted about how I had found out the best way to focus my development efforts on NGEDIT. To sum up: try to use it, and implement the features as their need is evident (I’m fortunate enough that I am 100% a future user of my own product). As the first point coming out from that, I found myself working into getting NGEDIT to open a file from the command line. That’s weeks ago, and I have only almost implemented it. How come? It should be simple enough to implement! (At least, given that opening the file through the file-open dialog was already functional).

Well, the thing is that my tendency to drift off, my ambition, and my yearning for beautiful code kicked in. Instead of a simple solution, I found myself implementing the “ultimate” command line (of course). It’s already pretty much fully architected, and about half-working (although opening files from the command line ended up being just a small part of the available functionality). As I did this, I also started refactoring the part of the code that handles file loading into using my C++ string class that doesn’t suck, which is great, but it’s quite an effort by itself. Meanwhile, I found myself whining that I didn’t want to have all that code written using the non-portable Windows API (as a shortcut I took before summer, NGEDIT code is uglily using the Windows API directly in way too many places), so I started implementing an OS-independence layer (I know, I know, these things are better done from day 1, but you sometimes have to take shortcuts and that was one of many cases). Of course, with the OS-independence layer using said generic string class for the interface. And establishing a super-flexible application framework for NGEDIT, which was a bit cluttered to my taste. And sure, I started trying to establish the ultimate error-handling policy, which took me to posting about and researching C++ exceptions and some other fundamental problems of computing…

If that’s not getting carried away, then I don’t know what is!

Today’s conclusion, after going out for a coffee and a walk to the cool air of the winter, is that I should refrain from tackling fundamental problems of computing if I am to have an NGEDIT beta in a few months’ time. The code of NGEDIT 1.0 is bound to have some ugliness to it, and I need to learn to live happily with that. Even if I will have to rewrite some code afterwards, business-wise it doesn’t make sense to have the greatest framework, the most beautiful code, and no product to offer!

In any case, I hope I have improved my ShortPostRank score, even if definitely not among world-class short-post bloggers, and you can see I’ve had some fun with self-linking. Something nice to do after starting beta testing for ViEmu 1.4, which will probably be out later this week.

The lie of C++ exceptions

November 17th, 2005

As part of the ongoing work on NGEDIT, I’m now establishing the error management policy. The same way that I’m refactoring the existing code to use my new encoding-independent string management classes, I’m also refactoring it to a more formal error handling policy. Of course, I’m designing along the way.

Probably my most solid program (or, the one on which I felt more confident) was part of a software system I developed for a distribution company about 9 years ago. The system allowed salesmen to connect back to the company headquarters via modem (the internet wasn’t everywhere back then!) and pass on customers’ orders every evening. I developed both the DOS program that ran on their laptops, and the server that ran on AIX. I developed the whole system in C++ – gcc on AIX, I can’t remember what compiler on the DOS side. Lots of portable classes to manage things on both sides. As a goodie, I threw in a little e-mail system to communicate between them and with hq, which was out of spec – and I managed to stay on schedule! It was a once-and-only-once experience, as mostly all my other projects have suffered of delays – but the project I had just done before was so badly underscheduled and underbudgeted that I spent weeks nailing the specs to not fall in the same trap.

The part I felt was most important to keep solid was the server part – salesmen could always redial or retry, as it was an interactive process. The server part was composed of a daemon that served incoming calls on a serial port, and a batch process that was configured to run periodically and export the received files to some internal database system.

How did I do the error management? I thought through every single line in the process, and provided meaningful behavior. Not based on exceptions, mind you. Typical processing would involve sending out a warning to a log file, cleaning up whatever was left (which required its own thinking through), and returning to a well-known state (which was the part that required the most thinking through). I did this for e-v-e-r-y s-i-n-g-l-e high-level statement in the code. This meant: opening a file, reading, writing, closing a file (everyone typically checks file opens, but in such a case I felt a failure in closing a file was important to handle), memory management, all access to the modem, etc…

C++ brought exceptions. I’m not 100% sure yet, but I think exceptions are another lie of C++ (I believe it has many lies which I haven’t found documented anywhere). It promises being able to handle errors with much less effort, and it also promises to allow you to build rock-solid programs.

The deal is that exceptions are just a mechanism, and this mechanism allows you to implement a sensible error handling policy. You need a rock solid policy if you really want to get failproof behavior, and I haven’t seen many examples of such policies. What’s worse, I haven’t yet been able to figure out exactly how it should look like.

Furthermore, exceptions have a runtime cost, but the toughest point is that they force you to write your code in a certain way. All your code has to be written such that if the stack is unwound, stuff gets back automatically to a well-known-state. This means that you need to use the RAII technique: Resource-Acquisition-Is-Initialization. This covers the fact that you have to relinquish the resources you have acquired, such that it doesn’t leak them. But that is only part of returning to a well-known state! If you are doing manipulation of a complex data structure, it’s quite probable that you will need to allocate several chunks of memory, and any one of them may fail. It can be argued that you can allocate all memory in advance and only act if all that memory is actually available – but then, this would force your design around this: either you concentrate resource acquisition in a single place for each complex operation, or you design every single action in your design in two phases – first one to perform all necessary resource acquisition, second one to actually perform the operation.

This reminds me of something… yeah, it is similar what transaction-based databases do. Only elevated to the Nth degree, as a database has a quite regular structure, and your code usually doesn’t. There are collections, collections within collections, external resources accessed through different APIs, caches to other data-structures, etc…

So, I think in order to implement a nice exception-based policy, you have to design a two-phase access to everything – either that, or an undo operation is available. And you better wrap that up as a tentative resource acquisition – which requires a new class with its own name, scope, declaration, etc…

Not to talk about interaction between threads, which elevates this to a whole new level…

For an exceptions-based error-handling policy, I don’t think it is a good design to have and use a simple “void Add()” method to add something a collection. Why? Because if this operation is part of some other larger operation, something else may fail and the addition has to be undone. This means either calling a “Remove()” method, which will turn into explicit error management, or using a “TTentativeAdder” class wrapping it around, so that it can be disguised as a RAII operation. This means any collection should have a “TTentativeAdder” (or, more in line with std C++’s naming conventions, “tentative_adder”).

I don’t see STL containers having something like that. They seem to be exception-aware because they throw when something fails, but that’s the easy part. I would really like to see a failproof system built on top of C++ exceptions.

Code to add something to a container among other things often looks like this:

void function(void)
{
  //... do potentially failing stuff with RAII techniques ...

  m_vector_whatever.push_back(item);

  // ... do other potentially failing stuff with more RAII techniques
}

At first, I thought it should actually look like this:

void function(void)
{
  //... do potentially failing stuff with RAII techniques ...

  std::vector<item>::tentative_adder add_op(m_vector_whatever, item);

  // ... do other potentially failing stuff with more RAII techniques

  add_op.commit();
}

But after thinking a bit about this, this wouldn’t work either. The function calling this one may throw after returning, so all the committing should be delayed to a controllably final stage. So we would need a system-wide “commit” policy and a way to interact with it…

The other option I see is to split everything in very well defined chunks that affect only controlled areas of the program’s data, such that each one can be tentatively done safely… which I think requires thinking everything through in as much detail as without exceptions.

The only accesses which can be done normally are those guaranteed to only touch local objects, as those will be destroyed if any exception is thrown (or, if we catch the exception, we can explicitly handle the situation).

And all this is apart from how difficult it is to spot exception-correct code. Anyway, if everything has to be done transaction-like, it should be easier to spot it – suddenly all code would only consist in a sequence of tentatively-performing object constructions, and a policy to commit everything at the “end”, whatever the “end” is in a given program.

I may be missing something, and there is some really good way to write failproof systems based on exceptions – but, to date, I haven’t seen a single example.

I’ll keep trying to think up a good system-wide error handling policy based on exceptions, but for now I’ll keep my explicit management – at least, I can write code without enabling everything to transaction-like processing, and be able to explicitly return stuff to a well-known safe state.

This was my first attempt at a shorter blog entry – and I think I can safely say I failed miserably!

On blogging, payment processing, and the finite nature of time

November 16th, 2005

I think I will stop apologizing for not posting often. I’d love to post frequently, but both software development and business development take so much time.

Some people are able to post almost daily to their blogs. Of course, that depends on each person’s circumstances. If you are setting up a software business, you have sw development to do, setting up and running the business also take a lot of time, so blogging usually comes third after these. Most starting microisv’s (I don’t like the term, but everyone’s using it so why complain) don’t post that much to their blogs (with some notable exceptions which I think are all linked from the sidebar here). What they all do is put a solid amount of work into their products and websites.

For some reason, it takes me much more thought to post on the blog than to post in a forum. Probably because I kind of like to post interesting, well written, content-rich blog entries – and that takes its own time to do. If I allowed myself to post more undigested stuff I would post more often.

As well, when I get into my “writing” mood, I like it and posts grow and grow and grow and…

One other thing is that, when you’re setting up a business, there’s probably information you don’t want to disclose. At least, I still think it’s worthwhile for my business strategy to not disclose some things. Future business opportunities, etc… apart from regular business info – I’ve thought more than once about posting actual ViEmu sales figures, but I think it could be damaging in the long run. I’m sure people are curious. For those curious, it seems it’s actually taking off a bit, although nothing that makes it qualify as a major revenue stream.

More than one person asked about my experience with adwords. Again, I feel I should dig a bit into the logs and post actual stats, rather than just my impression. So I end up posting nothing, due to lack of time for proper research. The summary: they help. There is fraud, but the cost for low-competition keywords such as mine covers for it. How do I know there is clickfraud to my site? Because some hits only ever request the html page – not even the CSS or graphics get requested!

Anyway, I was going to post about payment processing – mainly due to a thread at JoS started by the wondefully informative Andy Brice of PerfectTablePlan fame, a piece of software to solve your reception seating arrangement problems, I’ve been researching into payment methods (yes, the previous link to Andy’s page was designed to help with search engine results, as he’s been so nice sharing so much info and his company seems so serious).

To the point, it seems using paypal to process your payments can help in getting commissions much lower that other services such as share-it, the one I’m currently using. I’ll be looking into setting up paypal for ViEmu, and I’ll report back on how it works.

But there was another piece of advice I wanted to pass.

When I set up my share-it account, it let me choose whether to process my statements in euros or US dollars. Given that I’m euro based, euros seemed more reasonable, but their fees were lower for US dollars. $3 + 5% for accounts in US$, €3 + 5% for accounts in euros. Given that US$3 is cheaper than €3, I chose US$.

Only to find out that currency exchange in monthly wire transfer killed me – charged both from the originating bank and from my end (the receiving bank).

No need to say that I promptly switched to an account in EUR (a non-automatic process that the share-it people solved nicely after requesting, their service being usually quite responsive).

Just so that you don’t make the same mistake.

Anyway, just to recap, I wanted to share my problems to find time to blog and share out some interesting info regarding payment processing. Nothing intersting for hardcore C++ programmers today.

No promises, but I intend to post some time soon (or not too far in the future) about my experiences with:

  • NGEDIT development (with which I’ve been pretty much all time since I released ViEmu 1.3 last week)…
  • … which will include the evolution of the C++ string class that doesn’t suck (but is sucking life out of me)…
  • …product development and release strategy for NGEDIT (I can post really often about this, as I can refine or redesign the strategy so many times before I can release the editor)…
  • …possibly on adwords (if I ever get to dig the weblogs properly)…
  • …web site traffic / marketing (although you can read the meat of the information at this JoS post)…
  • …and too many other issues to name, including open source, the software industry, and the now so popular google bashing, including the many meanings of the word evil

Wish you all nice luck with your own projects.

Enlightment

November 7th, 2005

See, I had an enlightening experience yesterday. I now know exactly what the best roadmap to follow with NGEDIT is.

I’ve just finished ViEmu 1.3 with the powerful regular expressions and ex command line support. I haven’t been able to release it yet, as Microsoft has changed the way you get the PLK (“package load key”) which you have to include in every new version of your product. It was automated beforehand, so you filled the online form and received the PLK in an e-mail 30 seconds afterwards. But with the release of Visual Studio 2005, they now want to actually approve your product before issuing the key. This means I’m stuck with the code and the whole web info revamp at home. Ah well, I’m a bit angry with that, but then you’re at their mercy – and the important lesson is: I could have filled out forms for ViEmu 1.3, 1.4, 1.5, 1.6, … and even 2.0 last month, and I would have the keys nicely stored in my hard disk. Something to watch out for in the future: anything you use which is a web service is not under your control, remove that dependency if you can.

Anyway, going back to the point, now that ViEmu 1.3 is ready, I am putting ViEmu development to a secondary position, and getting focus back on NGEDIT. ViEmu is now at a more than acceptable level, and although I’ll keep improving it, I better focus on NGEDIT which is the product with most potential.

I’ve already talked in the past about how to tackle the development of NGEDIT. I give a lot of thought to how I invest my time – not in order not to make mistakes, I make many of them and it’s not really a problem. But I like to think and rethink what my goals are, what the best way to reach them is, what the tasks are, and how (and in what order) it makes most sense to work. Once I developed ViEmu, I had quite clear that I had to keep putting a lot of effort until the product reached a “serious” level. Slowing down beforehand wouldn’t make any sense: even if the vi-lovers audience is a small one, the only way to actually discover its size is to get a decent product out there. A so-so product would leave me thinking, if sales are not big, that the problem was the product’s quality.

And now that I’m focusing on NGEDIT, deciding exactly how to work on it is no easy feat. I’ve already talked about this in the past, but I had already decided that I would focus on an NGEDIT 1.0 which had some of the compelling innovative parts – no sense to release YATE (“yet another text editor”) and hope it will bring a decently sized audience. I actually started designing and coding some of the most interesting parts of NGEDIT, even if many of the core parts are yet incomplete.

But now that I’ve already started opening the NGEDIT workspace from Visual Studio, the amount of things that I can do (and that I have to do in some moment) is mind boggling. Just to list a very few:

  • Make the menus configurable… they actually are, but the UI is not there. Of course, this is one-in-one-hundred little things that need to be done.
  • Integrate it well with windows: registry entries, shell extension, etc…
  • Let the user configure the editor window colors – now all the UI elements can be configured, with the exception of the actual text-editing window itself (funny, as that’s the most important part).
  • Finally finish off implementing that DBCS support for japanese/etc… – it’s designed in, but it’s not implemented. Either that, or properly remove support for it for 1.0.
  • Now that ViEmu is so complete… port back the emulation functionality to NGEDIT. The vi/vim emulation in NGEDIT is written in my scripting language, it’s already 4 months old, and really basic compared to what ViEmu does. While I’m at it, properly structure it so that the vi/vim emulation core is shared code between ViEmu and NGEDIT – once again template based, text-encoding independent, and supporting integration with both mammooth and invasive environments, such as VS (where the vi/vim emu core is just a “slave”), and also with a friendlier environment like NGEDIT, that is already thought out to provide custom editor emulation.
  • Clean up many things in the code base – you know, after you’ve been months away from the code, you come back and see many things which are plain malformed. Sure, you also kind of know about them when you are actually coding, but you have to get the job done and you do take shortcuts. I believe in shortcuts, it’s the way to actually advance, but then you want to properly pave those shortcuts into proper roads.
  • Really study all that maddeningly beautiful mac software, understand what the heck makes it so incredibly and undeniably beautiful, and try to bring some of that beauty to NGEDIT.
  • Actually work in the most innovative aspects of NGEDIT – the parts with which I hope to create an outstanding product.
  • etc…

You can see… this is just a sample. Not to mention the many things that I have in my todo list, in my various “NGEDIT 1.0 FEATURE LIST” lists, in my handwritten sketches, etc…

It can be quite overwhelming… where do I start? And worse than that, you need motivation to actually be productive. See, one thing is that I’m determined to work on NGEDIT. Another thing is that I need to give myself something concrete to focus on, with which I feel comfortable strategy-wise, so that I will put all my energy on it.

Not having a single idea on how to tackle a problem is blocking. Having too many ideas or tasks can be as blocking.

This is a common problem when you are developing – sometimes, the only way out of such blocking crossroads is to start from the top of the list, even if it is alphabetically sorted, and work on items one by one. I have found that this works best for me when I am about to finish a release of a product or a development cycle. When, after digging for weeks or months, you see the light at the end of the tunnel, but there is still a lot of digging to be done, a similar phenomenon happens. And what I usually do is visualize the image of the finished product on my mind, and then just work on the items in sequential order (not sequential by priority, sequential by whatever random order they ended up listed on the todo list).

But I couldn’t see myself doing this for NGEDIT now – it’s still to far from any end of any tunnel.

I decided to just start browsing around the source code, thinking about code-architecture issues (I have never stopped thinking about many NGEDIT product details even while I was working 100% on ViEmu), and just spending my time with NGEDIT until the mud would clear up.

And it’s happenned – I’ve seen exactly the right way to approach it.

What is it? I just need to start actually using NGEDIT. I’m so fortunate that I use a text editor daily during many hours, for many tasks, and I can just use that time on NGEDIT – and just implement the stuff I need along the way! It’s clear, I just need to make it the best editor for me, and the rest will follow naturally. No need to prioritize features, no need to do heavy code spelunking, just start using it and implement the stuff as I go.

Which ones have been the first tasks to come out of this? Two quite obvious one – first, I needed to fix a bug in BOM-mark autodetection code, as I tried to use NGEDIT to edit some registry-export files, and I just noticed the bug while using it. And second, and the reason I was working on registry-exports, I need to implement associating file extensions to NGEDIT so that I can just double-click on files! And that requires that I implement command line parsing in NGEDIT (which, of course, was buried somewhere around #53 in the todo list!). Why? Because if I am to use NGEDIT for all my tasks, I just need to open files efficiently now.

It’s incredible how this path is already starting to work – I’m writing this blog post in NGEDIT, and the most important part is that I already feel confident. Confident that I’m on the right track for the earliest possible release date. And that lets me relax and focus on actual work.

Long time no see

October 28th, 2005

I’ve been swamped with work in the past few days, so I didn’t have any time to blog. But just yesterday I made available the first alpha version of ViEmu 1.3, which provides a single star feature: regular expressions and ex command line emulation. This means my regular expression engine is working nicely (on top of my encoding-independent C++ string template class!). And that ViEmu is starting to bring the full power of vi/vim to Visual Studio. I hope to iron out the remaining bugs and release it to the public around next week.

I think I will have be having more time to blog starting next week, and I have a line up of stuff I want to blog about. Part of it thanks to Baruch Even who’s started the very nice Planet uISV blog aggregator (be sure to check it out if you’re interested in other small software shops and start-ups!).

Visual Studio 2005 has just been released (finally!) and I’m downloading it through msdn subscriptions, but I already have news from some customer that the build I provide for VS2005-beta-2 seems to work nicely with it. I will finally prepare a version of ViEmu that installs dually to both VS.NET 2003 and VS2005. Some customers were already using ViEmu with VS2005 beta versions quite happily, but I had some “false positives” on ViEmu problems where VS2005 was actually the culprit – I hope everything of importance will be fixed now and I’ll be able to release ViEmu for VS2005 “officially”.

To finish this post, I’ll extract the information on what ViEmu 1.3 brings – be warned, it is for heavy vi users and regex experts, so skip it as soon as you have a doubt whether you’re interested in it.

Summary of what's contained in ViEmu 1.3-a-1:


  - Regular expression support for '/' and '?' searches
  - Command line editing, with command history
    (use the cursor arrows, BACKSPACE and DEL)
  - '< , '> marks for the last active visual selection
  - gv normal mode command to restore the last visual selection
  - The following ex commands:
    - :set - basic implementation allowing [no]ig[norecase]/[no]ic,
      [no]sm[artcase]/[no]sc, and [no]ma[gic]
    - :d   - :[range]d[elete] [x] [count] to delete (x is the register)
    - :y   - :[range]y[ank] [x] [count] to yank (x is the register)
    - :j   - :[range]j[oin][!] to join the lines in the range,
               or default to the given line (or cursor line) and the next
    - :pu  - :[range]pu[t][!] [dest] to paste after (!=before) the given
               address
    - :co  - :[range]co[py] [dest] to copy the lines in range to the
               destination address (:t is a synonim for this)
    - :m   - :[range]m[ove] [dest] to move the lines in range to the
               destination address
    - :p   - :[range]p[rint] [count] to print the lines (send them to the
               output window) (:P is a synonim for this)
    - :nu  - :[range]nu[mber] [count] to print the lines (send them to the
               output window), w/line number (:# is a synonim for this)
    - :s   - :[range]s[ubstitute]/re/sub/[g] to substitute matches for the
               given regex with sub (do not give 'g' for only 1st match on
               each line)
    - :g   - :[range]g[lobal][!]/re/cmd to run ':cmd' on all lines matching
               the given regex (! = *not* matching)
    - :v   - :[range]v[global]/re/cmd to run ':cmd' on all lines *not*
               matching the given regex

You can now use :g/^/m0 to invert the file, :g/^$/d to remove
all empty lines, :%s/\s\+$// to remove all trailing whitespace,
and use many of your favorite vi/vim tricks.

In implementing the regular expression engine, I've gone through vim
documentation and implemented everything there. There are a few things
not implemented yet - I plan to add them later on. This is a summary of the
implemented features (for now, you can look at vim's documentation for
detail):

 - Regular matching characters
 - '.' for any character
 - Sets (full vim syntax): [abc], [^1-9a-z], [ab[:digit:]], ...
     (including '\_[' to include newline)
 - Standard repetitions: * for 0-or-more, \+ for 1-or-more, \= or \? for
     0 or 1
 - Counted repetitions: {1,2} for 1-to-2 repetitions, {1,} for 1-to-any,
     {,5} for 5 or less, {-1,} for non-greedy versions
 - Branches: foo\|bar matches either "foo" or "bar"
 - Concats: foobar\|.. matches the first two characters where 'foobar'
     matched
 - Subexpressions: \( and \) to delimit them ('\%(' to make them
     non-numbered)
 - ^ and $ for start- and end-of-line. (See the note on the limitation
     below)
 - \_^ and \_$ for s-o-l and eol anywhere in the pattern
 - \_. for any character including newline
 - \zs and \ze to mark the match boundaries
 - \< and \> for beg and end of word
 - Character classes: \s for space, \d for digit, \S for non-space, etc...
     and '\_x' for the '\x' class plus newline (all of them work)
 - Special chars: \n for newline, \e, \t, \r, \b
 - \1..\9 repeat matches
 - Regex control: \c to forece ignore chase, \C to force check case, \m for
     magic, \M for nomagic, \v for verymagic, \V for verynomagic

Full [very][no]magic is supported.

These are the vim regular expression features not yet implemented by ViEmu:

 - ~ to match last substitute string
 - \@>, \@=, \@!, \@< = and |@<! zero-width and dependent matches/non-matches
 - \%^ (beg-of-file), \%$ (end-of-file), \%# (cursor-pos), \%23l
     (specific line), \%23c (col) and \%23v (vcol)
 - optional tail "wh\%[atever]"
 - *NO PROTECTION* for repetitions of possibly zero-width matches, be
     careful! \zs* or \(a*\)* MAY HANG VIEMU!!!
 - ^ and $ are only detected as special at the very beginning and very end
     of the regular expression string, use \_^ or \_$
 - \Z (ignore differences in unicode combining chars)

Other limitations:

 - The :s replacement string does not yet understand the full vi/vim options,
and cannot insert multi-line text. Only & and \1..\9 are recognized as special,
and if some of them matched a multi-line range, only the regular characters
will be inserted. You can't insert new line breaks by using \r either.

 - After-regular-expression displacement strings are not implemented
     ('/abc/+1' to go to the line after the match).

 - Ex-ranges accept everything (%, *, ., $, marks, searches) but not
     references to previous searches (\/, \?, \&) or +n/-n arithmetics.

 - The command line editing at the status bar looks a bit crude, with
     that improvised cursor, but it should make the ex emulation very
     usable.

 - :p and :# output is sent to a "ViEmu" pane on the output window.

Beautiful regular expressions code

October 9th, 2005

My regular expression engine is starting to work. I can already compile and match basic regular expressions, and the framework for the most complex features is already there, even if not completely implemented yet. The first use of the engine will be for ViEmu (I might go straight from 1.2 to 1.5, as I feel regex and ex command line support take it to the next level, and Firefox is already going to do a 1.0->1.5 jump so I shouldn’t be less).

It’s probably the piece of code with which I’ve been most happy in a long time. The reason? Not that it is complex and it was hard to write (which it was), as several other pieces have been more complex, and many others have taken a lot more work. The actual reason is that it is free of tight bindings to anything else: it uses the generic string template framework I talked about, and so it can handle any variation of string encoding, format, storage, access-mechanism… whatever – and not losing an ounce of efficiency from writing code that uses straight ‘char’ or ‘wchar_t’.

For one, I feel I will never need to write another regular expression engine, and that is a good feeling.

But, most important, when I look at the code, I get a feeling of beauty. And it is a feeling that I miss most of the time I write code. I don’t know how you feel about it, but it actually hurts me when I write code that is too tightly bound to some specific circumstance. Reusability is great, but the feeling of being right is a separate issue and that’s what makes me happiest.

I’m going to try to post the declaration here so that you can have a look at it. Let’s see if the beauty survives the adjustment to the blog’s width:

//-------------------------------------------------------------------------
//
// Regular expressions support for NGEDIT and ViEmu, templatized to
//  support text and input in any encoding (ViEmu only uses wchar_t)
//
//-------------------------------------------------------------------------

#ifndef _NGREGEXP_H_
#define _NGREGEXP_H_

#include "ngbase.h"
#include "vector.h"

namespace nglib
{

template<class TREADSTR>
class TRegExp
{
  public:
      typedef typename TREADSTR::TREF           TSREF;
      typedef typename TREADSTR::TREF::iterator TSITER;
      typedef typename TREADSTR::TPOS           TSPOS;
      typedef typename TREADSTR::TCACHE         TSCACHE;
      typedef typename TREADSTR::TCHAR          TSCHAR;

    //--------------------------------------------
    enum ECompileError
    {
      E_OK,
      E_SYNTAX_ERROR,
      E_NOT_IMPL,
      E_NO_MEM,
      E_UNCLOSED_SET,
      E_UNCLOSED_SUBEXPR,
    };

    struct TCompileError
    {
      ECompileError type;
      TSPOS         pos;
    };

    struct TMatchResult
    {
      TSPOS posStart, posAfterEnd;
      struct TSubMatch
      {
        TSPOS posStart, posAfterEnd;
      };
      TVector<TSubMatch> vSubmatches;
    };

    //--------------------------------------------
    TRegExp() { m_ok = false; }
    //~TRegExp() { }

    TRet Compile (
      TSREF rsRegExp,
      TCompileError *pCompileError = NULL,
      TSCHAR chTerm = TSCHAR::zero
    );

    // Methods for simple string matching
    bool TryToMatch (
      TSREF rsInput,
      TMatchResult *pMatchResult,
      bool bBOL = true,
      bool bEOL = true
    );

    bool Contains (
      TSREF rsInput,
      TMatchResult *pMatchResult,
      bool bBOL = true,
      bool bEOL = true
    );

    // Methods for possibly multi-line regexps
    template <class TTEXTBUF>
    bool TryToMatch (
      TTEXTBUF txtBuf,
      unsigned uStartLine,
      TSPOS pos,
      TMatchResult *pMatchResult
    ); // At a certain line pos

    template <class TTEXTBUF>
    bool Contains (
      TTEXTBUF txtBuf,
      unsigned uStartLine,
      TSPOS pos,
      TMatchResult *pMatchResult
    ); // Starting anywhere in that line

    //--------------------------------------------
  private:

    //--------------------------------------------
    enum ENodeType
    {
      NT_MANDATORY_JUMP,      // + 2 bytes signed offset
      NT_OPTIONAL_JUMP,       // + 2 bytes signed offset
      NT_OPTIONAL_JUMP_PREF,  // + 2 bytes signed offset
            // jump with preference (to control greediness)

      NT_MATCH,               // + 1 byte match_type + details
      NT_ACCEPT,
      NT_OPEN_SUBEXPR,        // + 1 byte subexpr #
      NT_CLOSE_SUBEXPR,       // + 1 byte subexpr #

      NT_SAVE_IPOS,    // + 1 byte pos-reg where to save
      NT_JUMPTO_IPOS,  // + 1 byte temp to read
      NT_SET_TEMP,     // + 1 byte temp where to save
      NT_INC_TEMP,     // + 1 byte temp to read
    };

    //--------------------------------------------
    enum EMatchType
    {
      MT_CHAR,          // any literal char (+ ENC_CHAR)
      MT_DOT,           // .                        
      MT_BOL,           // ^
      MT_EOL,           // $
      MT_NEXTLINE,      // \n
      MT_SET,           // [abc] or [a-zA-Z]
                        //+ (byte)num_chars
                        //+ (byte)num_ranges
                        //+ nc * ENC_CHAR
                        //+ 2 * nr * ENC_CHAR
      MT_NEGSET,        // [^abc] or [^a-zA-Z]  same
      MT_WHITESPACE,    // \s
      MT_NONWHITESPACE, // \S
      MT_WORD,          // \w
      MT_NONWORD        // \W
    };

    //--------------------------------------------
    // Compiler and matcher classes, not accessible externally
    class TCompiler;
    class TMatcher;

    bool          m_ok;
    TVector<byte> m_vbCompiledExpr;

};

} // end namespace

#include "ngregexp.inl"

#endif

Main things: the class is in a namespace that I use for all my common code, you can see how the interface uses the string’s specific types, the TCompiler and TMatcher classes are just declared and are unnecessary for the user of the class, and the only declarations – apart from the main interface – are for the node types, which need to be accessible both by the TCompiler and the TMatcher.

Even if C++ templates force you to include the definition of the members at the point of use, I usually separate template-based code in a header file with the declarations and an “.inl” (inline) file with the definitions, which helps keep code sanity.

The only actual types are bool (which is fair enough to use without loss of generality), byte, which allows generality through the TSCHAR::EncodeToBytes() and TSCHAR::DecodeFromBytes(), and the lonely unsigned to refer to a line number within a multi-line text buffer. I will probably get rid of that one by using an abstract TIXLINE (line-index type) in TTEXTBUF.

The only shortcut I took from my original idea is that both the regex definition string and the target text have to be of the same type, but templatizing on two string types seemed a bit overkill and I can always easily convert at the use point via special-purpose inter-string-type conversion inline functions (possibly even template-based to avoid too much rewriting).

Now that I think of it, if the matched target is a sparse or arbitrarily ordered disk-based text buffer, abstracted away through a very smart TTEXTBUF class, I will probably have to allow specifying the regex itself with another type.

It’s taken a long time to develop this C++ style, but I’m starting to feel really happy with how my code is looking – for the first time after over 10 years of C++ programming!

I haven’t posted much on the blog lately, as I’ve been focusing in development. Bringing ViEmu to maturity is taking quite some work, although the result is satisfying – and I’d like to blog more in the future, but we’ll have to see if development leaves time and energy for this…

ViEmu 1.2 released & next plans

September 30th, 2005

I’ve just released ViEmu 1.2 a while ago. During the last weeks, I feel fully energized, and I’ve completed quite a lot of work.

I haven’t seen almost any traffic at all coming from the new articles, but then, Google doesn’t like my page a lot, so it’s understandable. Maybe I am in the dreaded “sandbox”, or maybe I’m missing some key stuff in the page. My consolation is that anyone who looks for “vi keystrokes visual studio” or anything remotely similar will decidedly find it through the many mentions that appear in the first search results page (my page is nowhere to be seen on the first 20 or so pages).

I thought I’d share my plans for the next steps, especially since I’m not working a lot in NGEDIT in the last times.

Well, actually a lot of the current work will reflect directly on NGEDIT. I already have the regular expression framework that will power ViEmu, but the good thing is that it’s written using the C++ string classes I talked about a while ago, so it will transplant directly to NGEDIT’s multi-format text processing engine (even if ViEmu only uses UCS-2 two-bytes-per-char support). I will be porting the latest code I did for NGEDIT, which is among the most innovative stuff, to this string support, which is evolving within ViEmu.

My intention is to focus on ViEmu for a while more – until I get it to a level with which I will feel comfortable. That means, basically, customers’ requests and vi (ex) command line emulation. NGEDIT is a product with much more potential, but I feel I need to give ViEmu enough gas so that it will be able to work well. It’s very motivating to work directly on customer’s requests. And the fact that iit is already able to generate income is also a great incentive (when comparing it with NGEDIT, which will stil take some time until it can become a product).

I will estimate a time frame, given that I can always explain later while I missed badly 🙂 I calculate that a bit over one month will be enough to get ViEmu to my desired level, and then I will be able to invest much more effort on NGEDIT while still improving ViEmu.

Revamp of the web site

September 26th, 2005

I’ve been very busy during the last week. For one, I’ve been working hard on ViEmu 1.2. It is now, hopefully, in the last beta stages, and should be out in a few days. I’ve corrected many things in previous versions, and it now provides much more reliable vi-style repeatable input, macros, etc… even if the focus was to get code folding and basic window navigation support, in the end it has become a release that “stabilizes” many features.

But during the last few days, I’ve finally made a general upgrade of the web site. Following the suggestion from reader Lena (thanks!), I found out about Cheetah, and I’ve been able to finally prepare a web site development framework that makes sense to me. Read on if you’d like to know more details.

What have I done to the main site? First, I have added a much needed ViEmu FAQ section. It answers the most common questions I get about ViEmu.

Second, I have changed the “tone” of some of the pages (about, etc…) getting rid of the ugly “About us”, “We are…”, etc… tone. I think there is no point in keeping a “corporate” look, especially since I am quite clear in the blog and I don’t really make an effort to keep a corporate face – having the web site like that was the simplest thing to do when I first prepared the web site.

And third, the most interesting – I have added an Articles section to the web site. There are links to the older posts in this blog which can be interesting, but I have also written about ten new short new articles. You can find one describing how I’ve used Cheetah to structure the web site (be sure to check it, if you have some similar problem!), and some other articles on Visual Studio extensibility, learning vi/vim, etc…

On one hand, there are a lot of useful articles I can write quite easily, which wouldn’t make sense as blog posts. They are fine there for anyone searching for the topic. On the other hand, I think they will help drive some traffic to the main NGEDIT site, and I hope that will increment the exposure of ViEmu (and, eventually, NGEDIT).

I will be posting references to new articles on the blog, apart of course from all the regular postings sharing my evolutions of the products and the company.

ViEmu 1.2 Release Candidate out, & html macro language

September 19th, 2005

I have finished implementing and packaging ViEmu 1.2, and sent out an initial release to current customers and interested users. It includes folding support and window-command support as in vim (I think none of these was in the original vi). By the way, it is already using the C++ string class I talked about in the last post – not heavy use yet, but already using it. After a bit of testing of this release candidate (as the version has already been released), I will be announcing it and putting it up for download on the main page.

The main web site is built with simple, static html files. There is quite a lot of repetition, both for common elements like the navigation bar and for common parts such as general layout. I guess that must be the case with many sites. I’ve been wanting to add two new sections to the web site during the last weeks, but having to update those elements on all pages was not something I wanted to do. I am going to set up a sensible framework such that those elements don’t have to be updated in many places.

I think many sites use a dynamic mechanism, such as ASP or PHP, to avoid replicating such elements. This blog, for example, which is based in WordPress, does such things. But I do not want to switch the whole site to a dynamic system – it seems absurd to evaluate code in each page-request when it can be a one-time-off that generates the proper html files.

I do the html and css by hand, using vim, and I like to have that kind of control. I don’t know of any system that provides what I want – some kind of “macro preprocessor” for html pages. My idea is that I will be writing “.hs” (“html source”) files, and a preprocessor will be preprocessing them to generate the actual html files. There will be a “.hi” (“html include”) file with the common element definitions.

It’s not that I like to do stuff from scratch, but I’ve never heard of tools to do such a thing. I’ve checked the “m4” macro preprocessor, but the main problem I see is that it is leant towards single-line macros – and most definitions that I’d be using will be multi-line. It need be comfortable to use in such a case.

Unless I find out about a tool that does this, I will be writing it myself. It should only take a couple of hours to get it working. If you know of such a tool, I’d be very grateful if you leave a comment here.

It’s good to see how, as months pass, I’m getting to automate common tasks and the general “work environment” is better every week. Starting from scratch, you have to live with many cumbersome methodologies for some time, but if you are patient it’s very satisfying to improve each part little by little: I can already develop text-encoding-independent text-processing code, I will be able to restructure the web site easily, … I’m dying to develop a dual installer for Visual Studio 2003 / Visual Studio 2005 (for ViEmu) and take out another thorn!

A C++ string class that doesn’t suck

September 14th, 2005

No, the title is actually not mine – although I loved it when I read it. Read on.

One stumbling block I’m finding is that desperately want to write code which is common to both ViEmu and NGEDIT. I mean, code that deals with text. And this means all areas in which I’m developing, vi emulation, the new features of NGEDIT, etc…

On one hand, ViEmu works with the internal text buffer of Visual Studio, which work with 16-bit wchar_t, called simply ‘Unicode’ by Microsoft, but which are actually the UCS-2 little-endian encoded version of Unicode (I talk about 16-bit wchar_t’s because this is actually not defined in the C++ standard, as happens with all built-in types, and GNU gcc for example implements them as 4-byte values).

On the other hand, NGEDIT deals with all types of underlying text encodings, including Unicode, platform-native one-byte-per-char, variable length UTF-8, etc… The underlying text store management code stores the text in fixed-size pages and is accessible through simple byte pointers and unsigned count of bytes – not null-terminated strings.

I’ve been working in some template classes to manipulate strings for months now. They are unfinished and only partly usable. The deal is, when you start actually trying to build complex template stuff in C++, it gets hairy very soon – and you find out about the many limitations of C++’s templates. Not only the syntax, but the sheer amount of red-tape you have to write.

I think I’ve made some significant advance in this area. My goal is to be able to write, for example, a tokenizer, in such a way that I can instance it as TTokenizer and use it right away, the same as for TTokenizer, etc… I’ve been after this for quite some time and I’ve ended up with my head spinning several times. One other reason for the problems is that I don’t want to duplicate too much code, and TCountedNativeCharString and the other ones are also templates on more basic types.

Anyway, I was Goo^H^H^Hresearching earlier this week, and stumbled into a similar initiative by Ryan Myers from Microsoft. He’s been documenting it in his blog, and although it is unfinished yet, the 8 posts he did early this year are a very interesting read for C++ programmers. His goal is not exactly the same as mine, as he is developing a single template-based string class that manages both interpretation and memory needs, and I have separate assorted template classes for each nead – I think it wil be better for my code to separate those two. But his blog posts were an awesome read. His own words: “I’m out to create a string class for C++ that doesn’t suck”. And I couldn’t agree more.

I’m hoping that I will have the template stuff working by the end of the week, and this will in turn unlock my working in the common regular expression for both applications. I will also be rewriting the new features I’ve started implementing for NGEDIT using the new common framework (among other things, like the fact I hate writing code bound to work with ‘char’s or ‘wchar_t’s, the features may some time make it into another VS .NET add-in).