Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » Developing U++ » U++ Developers corner » Choosing the best way to go full UNICODE
Re: Choosing the best way to go full UNICODE [message #48188 is a reply to message #48187] Wed, 31 May 2017 13:43 Go to previous messageGo to previous message
cbpporter is currently offline  cbpporter
Messages: 1401
Registered: September 2007
Ultimate Contributor
Without RLE, I think you can get around this by positions not representing characters, but sequence starts.

As a mandatory condition, every time the position is updated, it is guaranteed to be at the start of a sequence.

As a more complicated example, let's say you have a selection of text, with a "begin" and "end" pos. You handle a key press. Taking your start pos as a sequence start, you determine the sequence end. This means looking to see how many code units it is, seeking over combination marks and ligatures. Basically on the fly glyph analysis. With sequence start end end you know for a fact that everything between these two values must go. You do the same for the end position. As an optimization, you can mark everything for deletion between the start sequence begin and end sequence end. The text marked to be replaced will replaced with multiple code units.

The real challenge is to standardize these operations so you don't have to repeat them.

Maybe we need some GlyphInfoExtractor class or something. Something when given a random sequence of code units and a valid code point start, it can handle such common operations?

Here is a sample from unciode.org:
index.php?t=getfile&id=5305&private=0


This is 14 code units, 5 code units, 4 glyphs. The user will see and recognize 4 items as more or less "atomic", so we should focus on this.

We need and API that can locate each glyph start and allow us to replace glyphs 2 and 3 with an already properly encoded glyph sequence, like ſʒ, as an example.

This can be done on the fly with something high level like:
s = s.GlyphMid(0, 1) + "ſʒ" + s.GlyphMid(3, 1)
or
s.GlyphReplace(2, 3, "ſʒ")

or we can go lower level. Or we can go into multi-byte String territory.

PS: the high level stuff still is StringWalker territory.



 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Some addition proposals
Next Topic: Help needed with link errors (serversocket)
Goto Forum:
  


Current Time: Wed May 15 19:07:15 CEST 2024

Total time taken to generate the page: 0.02483 seconds