A C++ string class that doesn’t suck

No, the title is actually not mine – although I loved it when I read it. Read on.

One stumbling block I’m finding is that desperately want to write code which is common to both ViEmu and NGEDIT. I mean, code that deals with text. And this means all areas in which I’m developing, vi emulation, the new features of NGEDIT, etc…

On one hand, ViEmu works with the internal text buffer of Visual Studio, which work with 16-bit wchar_t, called simply ‘Unicode’ by Microsoft, but which are actually the UCS-2 little-endian encoded version of Unicode (I talk about 16-bit wchar_t’s because this is actually not defined in the C++ standard, as happens with all built-in types, and GNU gcc for example implements them as 4-byte values).

On the other hand, NGEDIT deals with all types of underlying text encodings, including Unicode, platform-native one-byte-per-char, variable length UTF-8, etc… The underlying text store management code stores the text in fixed-size pages and is accessible through simple byte pointers and unsigned count of bytes – not null-terminated strings.

I’ve been working in some template classes to manipulate strings for months now. They are unfinished and only partly usable. The deal is, when you start actually trying to build complex template stuff in C++, it gets hairy very soon – and you find out about the many limitations of C++’s templates. Not only the syntax, but the sheer amount of red-tape you have to write.

I think I’ve made some significant advance in this area. My goal is to be able to write, for example, a tokenizer, in such a way that I can instance it as TTokenizer and use it right away, the same as for TTokenizer, etc… I’ve been after this for quite some time and I’ve ended up with my head spinning several times. One other reason for the problems is that I don’t want to duplicate too much code, and TCountedNativeCharString and the other ones are also templates on more basic types.

Anyway, I was Goo^H^H^Hresearching earlier this week, and stumbled into a similar initiative by Ryan Myers from Microsoft. He’s been documenting it in his blog, and although it is unfinished yet, the 8 posts he did early this year are a very interesting read for C++ programmers. His goal is not exactly the same as mine, as he is developing a single template-based string class that manages both interpretation and memory needs, and I have separate assorted template classes for each nead – I think it wil be better for my code to separate those two. But his blog posts were an awesome read. His own words: “I’m out to create a string class for C++ that doesn’t suck”. And I couldn’t agree more.

I’m hoping that I will have the template stuff working by the end of the week, and this will in turn unlock my working in the common regular expression for both applications. I will also be rewriting the new features I’ve started implementing for NGEDIT using the new common framework (among other things, like the fact I hate writing code bound to work with ‘char’s or ‘wchar_t’s, the features may some time make it into another VS .NET add-in).

4 Responses to “A C++ string class that doesn’t suck”

  1. Ritesh Nadhani Says:

    Hi,

    Even we will portning SQLyog to multi-lingual and Unicode from our next version and we can definitely use that string class.

    Can you give me updates on it? I will be happy to use and promote it within my development team.

    Ritesh

  2. The growing pains of NGEDIT » Blog Archive » On blogging, payment processing, and the finite nature of time Says:

    […] … which will include the evolution of the C++ string class that doesn’t suck (but is sucking life out of me)… […]

  3. The growing pains of NGEDIT » Blog Archive » Focusing my development effort Says:

    […] Well, the thing is that my tendency to drift off, my ambition, and my yearning for beautiful code kicked in. Instead of a simple solution, I found myself implementing the “ultimate” command line (of course). It’s already pretty much fully architected, and about half-working (although opening files from the command line ended up being just a small part of the available functionality). As I did this, I also started refactoring the part of the code that handles file loading into using my C++ string class that doesn’t suck, which is great, but it’s quite an effort by itself. Meanwhile, I found myself whining that I didn’t want to have all that code written using the non-portable Windows API (as a shortcut I took before summer, NGEDIT code is uglily using the Windows API directly in way too many places), so I started implementing an OS-independence layer (I know, I know, these things are better done from day 1, but you sometimes have to take shortcuts and that was one of many cases). Of course, with the OS-independence layer using said generic string class for the interface. And establishing a super-flexible application framework for NGEDIT, which was a bit cluttered to my taste. And sure, I started trying to establish the ultimate error-handling policy, which took me to posting about and researching C++ exceptions and some other fundamental problems of computing… […]

  4. The growing pains of NGEDIT » Blog Archive » This is the year! Says:

    […] The new core is written using the same concepts as the C++ string class that doesn’t suck, that is, as very loosely-coupled template-based code. Of course, using those string classes themselves. And this, together with a powerful and flexible interaction design, allows it to be used both in VS an in NGEDIT. […]

Leave a Reply