Tuesday, January 15, 2008

What to Expect in Delphi Unicode Support

Allen Bauer has a series of great blog posts (with more to come now doubt) outlining some of the technical details of what to expect with the new changes to support Unicode in the upcoming Tiburon version of Delphi.

So what is the reaction? Some people are grumpy that some of their code might break. Delphi has a long history of backwards compatibility. I am sure CodeGear will do what they can to make as much code as possible work, but this looks like it might cause some problems if you ever made assumptions about the size of a Char (which was generally discouraged) or used a string to store non-text (Which I am VERY guilty of. They are just so dang useful!) Personally I am really looking forward to Unicode in Delphi, even if there are a few growing pains.

6 comments:

Fernando Madruga said...

"about the side of a Char"

You surely mean "size", no? :)

stanleyxu (2nd) said...

The problem is not only a few growing pains, if projects cannot be built in ANSI mode. The mapping to Unicode means not only the incompatibility to old operating systems, but also intensive code review of existing code. If your project is not test-driven, it will be very risky, although CodeGear engineers have reduced the risk to a very low level.

I was involved in several Unicode migration projects for VC++ before. We migrated our modules step by step, instead of building the whole project in Unicode mode.

Anonymous said...

stanleyxu, get real! What incompatibility do you mean? With the win95 and 98, with their huge market share in year 2008?? Consequently, Borland should have stayed at win 3.x support too? ;)

Of course, this switch will cause some pain, especially because it is done so late. But - there is no other way, as Allen clearly explained to us. At least, if Deplhi wants to survive. Actually, I would wish even more "breaking" changes, also in VCL etc., to bring it more up to date.

Chava GR said...

I am exactly in your position: I'm looking forward, and I'm pretty confidence about CodeGear's R&D Team choices based on recent changes.

Things have to change. Even Microsoft is breaking code a lot, more than ever. There are surely some things that are not viable to maintain backward compatibility, and given that assertion Delphi R&D Team is certaintly doing pretty well.

stanleyxu (2nd) said...

Hi Mike, you totally misunderstood me. I am not saying that old operating systems are still *very* valuable. I want to point it out, that you cannot guarantee your existing code (especially third parties w/o source) to be built in Unicode mode w/o *potential* problems. (This is not a few pains to big projects)

For instance, the following function doesn't consider the size of char.

function TRegistry.DeleteKey(const Key: string): Boolean;
...
Len := Info.MaxSubKeyLen + 1;
...

The Unicode version of this function (created by TntControls v2.3.0)
function TTntRegistry.DeleteKey(const Key: WideString): Boolean;
...
Len := (Info.MaxSubKeyLen + 1) * 2;
...

You cannot expect D2008 to multiply the length of the string buffer by 2 automatically? If your project contains such kind of code (no source library), what can you do?

My wish[1] is simply make generic-text mapping *runtime* switchable, like:
...
{$UNICODE_MODE ON}
// string=Unicodestring now
{$UNICODE_MODE OFF}
// string=Ansistring now
...

BTW: The some VCL components really need be updated. I hope Delphi engineers can make more efforts. For instance: Chervon on toolbar, Marquee progressbar (Vista compatible), Native HintWindow[2], etc.

[1] http://stanleyxu2005.blogspot.com/2008/01/random-thoughts-on-unicode_10.html
[2] http://stanleyxu2005.blogspot.com/2008/01/native-hint-window-class.html

Unknown said...

I think one of the main reasons people use PChars and strings for binaray data processing is that PChar is the only pointer type that supports a plus operator to adjust the pointer and not only using the Inc() construction.

I do support progress and think unicode support is the way to go but I think not changing WideString (hey you might have a problem here and there with BSTRs) and changing string is not the way to go.

Currently I am investigating several million lines of code for a customer of mine and it will be hard to validate the code if Char / string point to two byte character (strings).

I think that from a technological point of view (like said in other comments quite rhetorical) a cold turkey switch can be seen as correct. But from a commercial point of view I cannot see it at all: hey, buy our new product and yes, you are probably need to go through millions of lines of code for the comming months while you loose production. This will be a vcery very expensive switch. Someone even said that for non Unicode development you should stay with Delphi 2007. does Codegear want to loose customers???