Warning: shell_exec() has been disabled for security reasons in /services/http/users/n/nss/encoder/logger.php on line 3
Encoder Tool and Resources
WRITE Tool and Resources
Last updated: 13-DEC-2004 22:08
I am currently working on WRITE (Web-Ready Input Text Encoder), an encoder tool that translates text between character sets as
defined by custom dictionaries and using Unicode. The arrows indicate the direction of translation (<-> means
text can be translated in both directions). The translation process is pretty slow, especially on this server,
so be patient. There are currently 3 versions of WRITE:
- encoder.php [source] – the original version (24 nov 2004)
- encoder2.php – modified so that the CharClick box is loaded from a predefined HTML
file rather than dynamically, making each page load approximately twice as fast. (3 dec 2004)
- encoder3.php [source] – with an experimental new
recursive version of the str_preg_replace function, which turned out to be slower than the original version. (3 dec 2004)
An important note about the encoding algorithm: it only supports single replacement. That is, once one or more
characters are replaced due to a rule, these replacement characters cannot be replaced by any other rule. This
simplifies the algorithm and also precludes infinite repetition of replacements.
Here are the supported dictionaries at the moment (if anyone wants to implement Japanese or some other encoding, let me know!):
- ASCII-SYMBOLS <-> Common Symbols [CC] – Provides an ASCII shorthand
for common Latin-1 supplemental characters (Ì, é, Å, ü, etc.) as well as some
mathematical operators (∀, ∞, ±, ⊕ …) and other symbols. Supports most of the characters with defined HTML 4.0 named entities.
- X-SAMPA <-> IPA [CC] –
Supports the standard X-SAMPA ASCII encoding for IPA (International Phonetic Alphabet) symbols in Unicode.
For more information:
- Shebrew <-> Hebrew (Unicode) [CC] and
Shebrew (no vowels) <-> Hebrew (Unicode) [CC] –
Provides an ASCII shorthand for representing Hebrew letters and points (vowels). The "no vowels" version
ignores any vowels in the input when producing output. This ONLY works for the Unicode encoding of Hebrew
characters—it does not support characters encoded using Latin-1 supplemental characters under ISO-8859-8 or
similar. Note that Hebrew is a right-to-left language. For more information on encoding Hebrew with Unicode:
- ISO-8859 Conversion – Convert text from a particular ISO-8859 encoding to Unicode.
For more on ISO-8859, see ISO-8859: Alphabet Soup. Supported ISO-8859 encodings are:
French 24 Presentation
The Unicode Standard Version 4.0
Using Unicode characters with HTML
Resources for HTML, XHTML, CSS, Javascript, PHP, and Regular Expressions
Mozilla Firefox Browser
Mozilla Firefox version 1.0 was recently released,
and I highly recommend it for everyone. It does everything Internet Explorer can do, but is more secure
and has many really cool features (such as tabbed browsing, extensions you can install if you want to see a
weather summary in the corner of your browser or something, and a nice
Calendar program that you can download). It is also
better than IE at displaying Unicode characters and conforming to web standards. The installation program will
even import your Internet Explorer bookmarks for you, so there's no reason not to
download Firefox!
Other Links
Contact
If you'd like to contact me with questions/comments/suggestions related to this project, programming in general, or
anything else, my email address is neatnateNOSPAM@NOSPAMberkeley.edu. You can also
find me on Facebook.
—Nathan Schneider, 24 November 2004