Adug Melbourne Meeting, Monday 16 March 2026

Hi,

Sorry for the late notification.

This month John McDonald will be making a presentation.

Meeting location: Melbourne Men’s Shed under Federation Square
And on Zoom.

Meeting time 6:00 pm for 6:15 start with questions and discussion, followed by the presentation at 6:30, Melbourne time (AEST).

Zoom links here just before the meeting.

John is going to be talking about Unicode.

I think we all have had some experiences both good and bad with Unicode and its various encodings. So that should be interesting.

1 Like

Sorry I’m going to miss it. I hope it makes its way to YouTube soon. :wink:

1 Like

Topic: ADUG Melbourne March Meeting
Time: Mar 16, 2026 06:00 PM Canberra, Melbourne, Sydney
Join Zoom Meeting

Meeting ID: 826 5380 7391
Passcode: 862420


One tap mobile
+61871501149,82653807391#,*862420# Australia
+61280156011,82653807391#,*862420# Australia

Join instructions

John made a good presentation on Unicode last night.

He had some issues with the built in Delphi UTF-8 stream reading functions which causes an exception when invalid UTF-8 is read in.

And wound up getting deep into how Unicode is encoded.

First, he showed how UTF16 is encoded, and how Surrogate pairs are used for characters from U+10000

During him showing us the details of how UTF16 is encoded, there was some discussion on re-syncing to the byte stream if there was some corruption or something. It was sort of decided it might be fiddly. But this morning, I realised that any real text is going to have plenty of spaces and carriage returns, etc. And would make sychronization fairly quick and easy, as the other byte is always 00. (Amazing how sometimes/often a bit of a break can bring some clarity)

Following this there was some detail on UTF8 encoding. It can potentially encode many more than the 2^20+2^16-1 characters defined by Unicode (and capable of being encoded in UTF16)

But there are many banned regions in UTF8.

Many characters eg. a quote ’ U+0027 can be encoded in multiple different ways. However, all except the first are banned.

He showed us a routine that he made for handling UTF8 that had faults in it. It would set some error codes, and replace the erroneous characters with the U+FFFD replacement character (�)

He had a sample text page with a bunch of UTF-8 errors in it (Markus Kuhn) Which UTF8 decoders can use to ensure they are ok. That he ran his routine over to ensure it worked ok.

He mentioned his routine might still have an issue that needs reviewing.

Thank You John,

1 Like