|
||||
Title: Does this code have a character? Post by fiziwig on Jan 23rd, 2007, 12:27pm Some years ago as part of a linguistics project I devised a way of assigning numerical codes to written glyphs such as Chinese pictographs, hieroglyphics, or other written digits, letters, and characters. The idea was to have a way to look up an unknown glyph in a "dictionary" or database by a numeric code that could be easily determined by looking at the unknown glyph. Rather than having to visually scan tens of thousands of characters looking for a match one could work out the code for the unknown glyph and look up that code. Then only have to scan a few dozen symbols that shared that code. The code consisted of one digit each for the following features: A. Number of disconnected pieces, (e.g. colon, semicolon, exclamation point, lower case dotted i all have two disconnected pieces) B. Number of enclosed spaces, (e.g. "t" has none, "0" has one, "B" has two) C. number of intersections with an odd number of lines, (e.g. "F" has one, "M" has none) D. number of intersections with an even number of lines, (e.g. "X" has one, "#" has 4) and E. number of unattached ends (e.g. "H" has 4, "T" has 3, "0" has none). For example, the upper case letter "H" (sans serif) has one connected piece, no enclosed spaces, two odd intersections, no even intersections and four ends giving it a code number 10204. The Greek letter pi has the same code and is topologically equivalent. Upper case "A" (sans serif) has 1 piece, 1 enclosed space, 2 odd intersections and two ends for a code of 11202. The astronomical symbol for Mars (which is also used as the symbol for male, i.e. a circle with an attached arrow) also has the code 11202, but is not topologically equivalent to the upper case "A". The number "8" has two enclosed spaces, one even intersection and no ends for a code of 12010. The code does not specify the shapes of the lines so that "S", "2", "5", "M", "Z", "V", etc. (all sans serif) have the same code 10002. A dollar sign with a single upright bar has the code 12034. The question is, given a random code between 00000 and 99999, how can one determine if it is even possible for a glyph to exist that has that code? (e.g. 10004 is an impossible code) How is it possible to determine if a given code has more than one topologically non-equivalent glyphs that share that code (as do "A" and the Mars symbol)? Example: Draw a figure with the code 14310 Edited for readability |
||||
Title: Re: Does this code have a character? Post by towr on Jan 24th, 2007, 1:44am It seems most of those features can be expressed in terms of graphs. (In terms of topology (as in mathematics) it wouldn't work as well, because A and O are topologically equivalent but don't have the same code) Although, in practice, I'd suspect some might be problematic occasionally. For example parts that ought to be detached might accidentally attach or vice versa. As a sidenote, it would be more easily readable if all those features had their own paragraph/line. |
||||
Title: Re: Does this code have a character? Post by Grimbal on Jan 24th, 2007, 1:45am It seems to me that C+E must be even so I don't think there can be a symbol with code 14310. I concur with towr, 8 could be 12200 or 12010. |
||||
Title: Re: Does this code have a character? Post by fiziwig on Jan 24th, 2007, 3:15pm on 01/24/07 at 01:45:09, Grimbal wrote:
The "peace symbol" has code 14310. on 01/24/07 at 01:45:09, Grimbal wrote:
That's true. Usually there is a canonical way of drawing the figure, however. Granted the figure '8' can be drawn as two flattened circles sharing a line segment, or as two circles sharing a point, but conventionally it is drawn in one figure-8 motion of the pen. This canonical pen path can be taken to define the nature of the intersection. Pragmatically, a cross reference dictionary can be provided to link alternate codes for what is essentially the same character dawn slightly differently. Thus you might find the dictionary entry "12200 (see also 12010)" |
||||
Title: Re: Does this code have a character? Post by pex on Jan 25th, 2007, 12:05am on 01/24/07 at 15:15:01, fiziwig wrote:
Wouldn't that be 14410? http://en.wikipedia.org/wiki/Peace_symbol |
||||
Title: Re: Does this code have a character? Post by towr on Jan 25th, 2007, 12:34am on 01/25/07 at 00:05:04, pex wrote:
|
||||
Title: Re: Does this code have a character? Post by pex on Jan 25th, 2007, 12:53am on 01/25/07 at 00:34:09, towr wrote:
That's what fiziwig seems to mean: (s?)he states that "F" has one intersection with an odd number of lines. This is also where my 14410 comes from. |
||||
Title: Re: Does this code have a character? Post by towr on Jan 25th, 2007, 3:41am on 01/25/07 at 00:53:58, pex wrote:
I just thought I'd mention it could be stated less ambiguously. And of course if you approach it as a (planar) graph problem the terminology would be appropriate. |
||||
Title: Re: Does this code have a character? Post by fiziwig on Jan 25th, 2007, 12:12pm I stand corrected. The peace symbol is 14410 as stated by pex. I apologize for the ambiguity. towr's interpretation is the intended one: "edges leading from a vertex". |
||||
Powered by YaBB 1 Gold - SP 1.4! Forum software copyright © 2000-2004 Yet another Bulletin Board |