Overview
Examples
Screenshots
Comparisons
Applications
Download
Documentation
Tutorials
Bazaar
Status & Roadmap
FAQ
Authors & License
Forums
Funding Ultimate++
Search on this site
Search in forums












SourceForge.net Logo
Home » U++ Library support » U++ Libraries and TheIDE: i18n, Unicode and Internationalization » Basic character set analyzer
Re: Basic character set analyzer [message #19996 is a reply to message #19980] Sun, 08 February 2009 12:45 Go to previous messageGo to next message
cbpporter is currently offline  cbpporter
Messages: 1401
Registered: September 2007
Ultimate Contributor
luzr wrote on Sat, 07 February 2009 15:53


I apologize for delay; HasChar is now officially in Draw, implemented both for Win32 and X11.


Great! I'll check out the win version too, but right now all my efforts focus on X.

Quote:


Well, thinking about the issue - do you have any idea how to determine which fonts search for replacements first?

I mean, for missing char in Arial, I would check some sans-serif font first rather than trying e.g. Times New Roman..

Mirek

I've given this issues a lot of thought, but I don't think there is a good way to do this.

This is why I chose composition rather than substitution.

So basically we have the first 3 Unicode sections which handle Latin. All characters are letters, punctuation and composites. I have basically two cases when a character is missing:
1. Font still has basic Latin characters. In this case I apply composition. I managed to fine tune diacritic placement and results are generally better than when replacing whole character from a different font., but for some size, it will look uglly.
2. Font doesn't have basic Latin. These font are quite rare, and in such cases I use substitution. But these fonts won't really have any Latin support, so I end up using substitution for every character, so font still looks consistent.

But for further Unicode sections the situation changes. These characters start to look less and less Latin, so the problem of substituting with a similar font starts to diminish. I mean, using Arial to draw an 'a' next to a strange fork like character might be a better choice than using Times, but it wont really make a difference. So here I will also use a single font for substitution.

So basically I divide the Unicode range in sub ranges, each containing roughly a single script. And I'll have a vector of vectors of fonts, with first vector indexed by the script number.
Algorithm is in pseudo code:
get scriptindex;
if (sciptindex favors composition)
  compose characters
else
  traverse font list until character found; draw box if not found

This approach works right now, but since it only handles a small number of scripts, I can't really say yet if it will be enough for real needs of rendering real internationalized texts. Only time will tell.

I attached a screenshot showing the results when using one of the later Latin ranges. Cyan characters are substituted, and also give a nice visual of how average fonts support such characters. Also illustrates problems with using fonts that look ugly one near the other.

Also notice the two lines in Arabic on the right. This is very early support for RTL languages. I give the text in normal left to right order, containing "abc", some Arabic characters, and "def". The text output method detects that the Arabic should be right to left, and renders it as such. First line does plain rendering, second uses specific Arabic rendering rules. While the two texts contain the same characters, the first is wrong and only the second is a correct way to render the characters from the first string.

PS: I can't read Arabic, so I have no idea what I wrote there. I hope I didn't manage to find randomly a swear word or anything like that Smile. I we have users who can read Arabic, I would appreciate some help in this area.
  • Attachment: snapshot7.png
    (Size: 499.06KB, Downloaded 373 times)
Re: Basic character set analyzer [message #19997 is a reply to message #19996] Sun, 08 February 2009 12:55 Go to previous messageGo to next message
cbpporter is currently offline  cbpporter
Messages: 1401
Registered: September 2007
Ultimate Contributor
A little info about the second and final change I need to Draw.

I've added some fields to CharMetrics and some methods to FontInfo. These changes are ad-hoc hacks, and I really need a better integrated solution with proper names.
struct CharMetrics : Moveable<CharMetrics> {
		int  width;
		int  lspc;
		int  rspc;
		int  y;
		int  height;
		int  x;
		int  ew;
	
		bool operator==(const CharMetrics& b) const
		     { return width == b.width && lspc == b.lspc && rspc == b.rspc; }
	};

int        GetY(int c) const                  { return GetCM(c).y; }
	int        GetX(int c) const                  { return GetCM(c).x; }
	int        GetH(int c) const                  { return GetCM(c).height; }
	int        GetW(int c) const                  { return GetCM(c).ew; }	

These methods are used to determine the exact glyph bounding box, using this method:
inline Rect GetCharRect(int x, int y, int buff, const FontInfo& fi)
{
	return Rect(x - fi.GetX(buff), y - fi.GetY(buff) + fi.GetAscent(), x - fi.GetX(buff) + fi.GetW(buff), y - fi.GetY(buff) + fi.GetAscent() + fi.GetH(buff));
}

Also:
void FontInfo::Data::GetMetrics(CharMetrics *t, int from, int count)
{
	DrawLock __;
	LTIMING("GetMetrics");
	LLOG("GetMetrics " << font << " " << from << ", " << count);
	if(xftfont) {
		for(int i = 0; i < count; i++) {
			LTIMING("XftTextExtents16");
			wchar h = from + i;
			XGlyphInfo info;
			XftTextExtents16(Xdisplay, xftfont0, &h, 1, &info);
			t[i].width = info.xOff;
			t[i].lspc = -info.x;
			t[i].rspc = info.xOff - info.width + info.x;
			t[i].y = info.y;
			t[i].height = info.height;
			t[i].x = info.x;
			t[i].ew = info.width;
		}
	}
}

I don't know if we should cache this.

I'm posting this to show what I need, but before we apply it I really need to clean this up.
Re: Basic character set analyzer [message #20003 is a reply to message #19996] Mon, 09 February 2009 08:42 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
cbpporter wrote on Sun, 08 February 2009 06:45

luzr wrote on Sat, 07 February 2009 15:53


I mean, for missing char in Arial, I would check some sans-serif font first rather than trying e.g. Times New Roman..


I've given this issues a lot of thought, but I don't think there is a good way to do this.



Well, first possibility is fixed table of substitutions (the number of widely used fonts is quite small IMO).

Second is PANOSE:

http://en.wikipedia.org/wiki/PANOSE

http://msdn.microsoft.com/en-us/library/ms534014(VS.85).aspx

(-> Win32 provides PANOSE number)

- not sure whether FreeType supports PANOSE... Also, another question is whether fonts actually support it

Third: Perform matching based on visual appearance. That is a bit though, but still possible.

Can be based on raster data processing or even curve processing....

E.g. examining "I" character raster, it should be quite easy to identify serifs, width of central line (and perhaps some other aspects I do not see yet...). By comparing width of 'l' and 'm', you should be able to tell the font is monospace (that is, if this is not stored as attribute somewhere in the font).

Mirek
Re: Basic character set analyzer [message #20005 is a reply to message #19997] Mon, 09 February 2009 08:47 Go to previous messageGo to next message
mirek is currently offline  mirek
Messages: 13975
Registered: November 2005
Ultimate Member
cbpporter wrote on Sun, 08 February 2009 06:55

A little info about the second and final change I need to Draw.

I've added some fields to CharMetrics and some methods to FontInfo. These changes are ad-hoc hacks, and I really need a better integrated solution with proper names.
[code]struct CharMetrics : Moveable<CharMetrics> {
int width;
int lspc;
int rspc;
int y;
int height;
int x;
int ew;

bool operator==(const CharMetrics& b) const
{ return width == b.width && lspc == b.lspc && rspc == b.rspc; }



Not really happy about it -> it makes CharMetrics too long.

I think we should read this directly, not to cache this.

I think that, at the end, we should in fact cache 'direct translation'. At the heart of font system there should be a function like:

struct RenderGlyph {
    int chr;
    Font fnt;
    int aux_chr; // == 0 -> no aux glyph
    Font aux_fnt;
    int16 aux_x, aux_y;
};


RenderGlyph GetRenderGlyph(int chr, Font fnt);

and we should cache it at this phase. We can easily afford to cache tens of thousands of such pairs.

Mirek
Re: Basic character set analyzer [message #20012 is a reply to message #20005] Mon, 09 February 2009 18:38 Go to previous message
cbpporter is currently offline  cbpporter
Messages: 1401
Registered: September 2007
Ultimate Contributor
Quote:

Not really happy about it -> it makes CharMetrics too long.

I think we should read this directly, not to cache this.

OK. How about adding this to FontInfo::Data:
Rect FontInfo::Data::GetCharBoundsOffset(int c)
{
	Rect r;
	XGlyphInfo info;
	unsigned int h = static_cast<unsigned int>(c);
	XftTextExtents32(Xdisplay, xftfont0, &h, 1, &info);
	r.left = -info.x;
	r.top = -info.y + ascent;
	r.right = -info.x + info.width;
	r.bottom = -info.y + ascent + info.height;
	return r;
}

and this to FontInfo:
Rect       GetCharBoundsOffset(int c) const   { return ptr->GetCharBoundsOffset(c); }


It is much cleaner this way, plus I can rewrite:
inline Rect GetCharRect(int x, int y, int buff, const FontInfo& fi)
{
	return fi.GetCharBoundsOffset(buff).Offseted(x, y);
}


Previous Topic: GetNativeLangName returns English for non-english languages
Next Topic: Improvements to several Lang.cpp functions
Goto Forum:
  


Current Time: Fri Mar 29 09:38:53 CET 2024

Total time taken to generate the page: 0.01260 seconds