| Author |
Message |
m4ri000
MD NewNew


Joined: June 13, 2006
Posts: 3
Member
|
Posted:
June 14, 2006 - 08:01 AM |
|
| Post subject: Encoding / charset UTF-8 issue / glitch |
Hi & hello to everybody, since I'm new in MDPro community!
I have some problems with representation of UTF-8 encoding in some modules (News and NS-AddStory modules so far), but, it could be some major problem(s), so I'm writing it in this section of (support) forum.
Facts:
1) My web site (11teza.net) need to be able to display charset that support Croatian language. ISO-8859-2 charset can do that, but, as I informed myself over Internet, UTF-8 is better choice, maybe the best. Because UTF-8 really can display huge amount of characters, and, (I don't know excatly) PHP is (inter)communicating betwen his functions (etc.) through UTF-8 encoding. According to that:
2) MySQL charset is set to UTF-8 encoding (utf8), and I did 'find/replace' on every file in root directory (where I installed my MDPro web site) and replaced all 'charset=<something>' string with 'charset=UTF-8', and of course, all other files that is necessary (like 'root/language/eng/global.php' and in 'root/themes/name_of_theme/lang/eng/global.php'). Oh, yes:
3) I'm running web site (local version) on windowsXP on EasyPHP (Apache 1.3.33, PHP 4.3.10, PhpMyAdmin 2.6.1, MySQL 4.1.9), but, maybe this is not crucial, because I have same problems in on-line version, hosted on web server. (version of MDPro is 1.0.76)
//now lets dig closer to the problem, but not yet problem itself!
4) I figured out that is, before inputting anything into MDPro web site, the best solution to replace all Croatian specific characters in text with its UTF-8 pair in HTML Numeric Character Reference (NCR).
To get picture what I'm talking, I'll give an example word:
1) [Eng version]: Toothpick.
2) [Croatian version]: Čačkalica.
3) [cro. ver. where cro. specific chars are replaced with NCR): &+#268;a&+#269;kalica.
[I put char '+' between '&' and '#' because if I don't do that you will get same word in 2) and 3) example. so, the right display is without '+' char! ]
Then, I checked, in MySQL database we have that same word stored like is displayed in 3) example (with NCR). And because of translation of embedded HTML entities into real characters in our web browsers we can see: Čačkalica. (in MDPro issue, only if 'Translate embedded HTML entities into real characters' is activated, which I put 'Yes' in 'Administration Menu/Settings/Settings')
//now you maybe ask yourself, where is the problem, things configured in this way need to work! yes, to be worse, that configuration (with some minor hacks) is working completely on MDPro site of friend of mine (kontra-punkt.info) which is on same server as mine (on-line) web site and he have same version of MDPro. so, here comes:
Glitch/problem:
1) *News* module:
When this module is taking data from MySQL database, he is displaying 'Title' (class 'pn-title') like the data is originally stored in MySQL (in HTML NCR mode, like we have in 3) example at 4) fact), and you can see that here (URL that is displayed in my web browser is:
| Code:
|
|
http://127.0.0.1/11teza/modules.php?op=modload&name=News&file=art icle&sid=2
|
As you can see 'Story text' and Extended text' is displayed like it should be, and in database all this three 'strings' of Croatian letters ('š đ č ć ž') are equally. but, here there are not represented like they need to be (equally). 'Title' is not well displayed because News (this is my assumption) module write wrong output (if you look in source of that html page):
| Code:
|
|
<a class="pn-title" href="modules.php?op=modload&name=News&file=article&sid=2& amp;mode=thread&order=0&thold=0">&#353; &#273; &#269; &#263; &#382;</a>
|
instead of writing all that html (output) code without '&', just '&' char will be enough because then everything will be all right.
To be more strange, when i click to 'Edit' that article, new page have this output (here I crop only essential part of page, URL is:
| Code:
|
|
http://127.0.0.1/11teza/admin.php?module=NS-AddStory&op=EditStory &sid=2
|
And you can see that here is everything all right. But, of course, this is another module, NS-AddStory, but, he (or she?! ) is not innocent too!
2) *NS-AddStory* module:
IMHO (again, this is my assumption) this time guilty goes to 'addstory_categories.php', and only because I have problem with naming and showing categories (with croatian characters, of course!) in proper way. You can see (URL:
| Code:
|
|
http://127.0.0.1/11teza/admin.php?module=NS-AddStory&op=DelCatego ry&catid=0
|
in what way is displaying name of category that is named Čumez (&+#268;umez).
OK. These are major problems for now (hope so that I'm missing some BIG configuration setting, and then everything be all right), and I hope so that I was as much as I can helpful in analysing this problematics.
P.S. [29-06-2006]
I edited this only because I didn't figure how to put images into post, so now i changed that. |
|
|
 |
 |
m4ri000
MD NewNew


Joined: June 13, 2006
Posts: 3
bannato
|
 Posted:
June 30, 2006 - 09:05 AM |
|
| Post subject: Anybody, please?!? |
It has been almost three weeks since I posted this post about very BIG encoding problem.
And all I was able to do was watching it drowning in 'Latest Posts' page...
Without solving it I really can't get my web site functioning like it should be, in full capacity with all data in it.
Nobody is answering nothing, and I really want to know way is that?!? This is matter of not only Croatian characters, but all non-usual UTF-8 characters in most used module!!!
I know that this is open-source community, and all work is voluntary, but hey, not even one reply?!
Everything would be more easier if I can have MDLite, and play with that..but, yes, I'm not in MDBooster Club...heh...
I really like very much MDPro, but if I will not solve this issue, I will be forced to switch to another CMS, I'm afraid.
And this is a fact, not guilt-trip.
So, I will appreciate any kind of help, it can't be so hard, come on!
Tnx,
mARio |
|
|
 |
 |
PeteBest
MD user level 5


Joined: Oct 06, 2003
Posts: 4845
bannato
|
 Posted:
June 30, 2006 - 09:18 AM |
|
|
| m4ri000 wrote: |
So, I will appreciate any kind of help, it can't be so hard, come on!  |
If it's not so hard, why not just fix the problem yourself??
I don't speak or have any dealings with any unicode based language, like almost all the other users here, hence the lack of replies! While this is international support the majority of our users have little/no need to worry about multibyte encoding.
Also, your first post is very confusing to read, I remember reading your post about 3 times and I still didn't fully understand what you were trying to ask. I'd recommend keeping your posts as short as possible, as the layout and the way you've tried to explain things doesn't make easy reading.
All I can suggest is to make sure php is compiled with the mb_string addon. Aside from that you're just going to have to wait until someone that has knowledge of unicode assists, or try to do something further yourself.
We have a few Japanese/Chinese users that use UTF-8 encoding without any problems, so I don't think the problem lies with MDPro anyway. If you want to install another CMS, go for it. |
_________________ Retired from official MAXdev duties |
|
|
 |
Wiseman
MD user level 5


Joined: Mar 15, 2005
Posts: 101
Location: Spain
bannato
|
 Posted:
July 05, 2006 - 11:09 PM |
|
|
M4ri000,
The Unicode/Universal Character Set is indeed the best choice (not just for websites but for everything; other character sets are obsolete). Unicode supports any written language known to man, as well as a wide set of useful characters, and supersets every existing character set. Out of the Unicode Transformation Formats, UTF-8 will work best with PHP, MySQL and MDPro, and it's optimal for European languages (but still good for Asian languages).
I have a lot of experience with character sets and encodings so I might be able to help you.
In order to setup UTF-8 properly to use in your website, you should do the following:
1. Set your database to use UTF-8. To do this, make sure your database, server and client are set to use UTF-8. You will need some knowledge of the database in order to make it use UTF-8; it's a bit too long for me to explain without more information (like whether you have access to mysqldump and if you already have data with international characters in it). To set the server to use UTF-8 by default, edit your database configuration file (usually /etc/my.cnf on Unices, and <Installation path>\my.cnf on Windows/ReactOS) and add the following switches under [mysqld]:
| Code:
|
default-character-set=utf8
default-collation=utf8_general_ci
|
To force using UTF-8 from MDPro itself (which is recommended), you can edit your includes/pnAPI.php file, at the end of the pnDBInit() function, right before the return true;, adding the following code:
| Code:
|
$dbconn->Execute("SET SESSION character_set_client='utf8'");
$dbconn->Execute("SET SESSION character_set_connection='utf8'");
$dbconn->Execute("SET SESSION character_set_results='utf8'");
|
but remember the database has to be UTF-8 internally to support all characters, otherwise glitches may happen.
2. Set PHP to use UTF-8. Edit your php.ini file (usually /etc/php.ini on Unices, <Installation path>\php.ini on Windows/ReactOS) and under [extension section], enable it (it's named like mbstring.so, php_mbstring.dll or something like that), hope your PHP was compiled with mbstring. Then under the [mbstring] section (create one at the end of the file if it doesn't exist), set the following:
| Code:
|
mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = UTF-8,CP1252,ISO-8859-15,ASCII
mbstring.http_output = UTF-8
mbstring.encoding_translation = Off
mbstring.detect_order = UTF-8,CP1252,ISO-8859-15,ASCII
mbstring.func_overload = 6
|
3. Make sure your application supports UTF-8. In the code, remember to never use {} (or [] ) to access string characters; use substr instead. For example, if you want to retrieve the second character of a string $a, don't do $a{1}, do substr($a, 1, 1) instead.
Also, when using regular expressions, use PCRE (preg_* functions), which are recommended for a number of reasons, and use the u modifier to enable UTF-8 support (so it's able to count UTF-8 multibyte characters as a single characters, and enables other features too, check the docs).
MDPro has a fairly good support for UTF-8; I only had minor problems with it.
Be sure your language files are in the proper character set (UTF-8). If they are in anything different, you can convert them with recode (recode sourcecharset..utf8 file) or iconv (iconv --help).
If you have some problem with HTML entities, try setting/unsetting the HTML entities switch in MDPro Administration->Settings, I don't remember how it was called.
4. Make sure User-Agents (browsers) are instructed display UTF-8 properly. Search and replace all instances of iso-8859-* in MDPro, changing it to "utf-8". (Notice that browsers call it "utf-8" while MySQL calls it "utf8".)
I strongly recommend that you force it from MDPro like this: in includes/xhtml.php, at the top of the xhtml_dtd_start function, add the following line:
| Code:
|
|
header('Content-Type: text/html; charset=utf8');
|
If you don't do that, you'll need to edit your HTTP server configuration and set the default character set to UTF-8.
Hope this helps . |
Last edited by Wiseman on Aug 02, 2006 - 09:02 PM; edited 1 time in total |
|
|
 |
m4ri000
MD NewNew


Joined: June 13, 2006
Posts: 3
bannato
|
 Posted:
July 26, 2006 - 07:45 AM |
|
|
Well...here I'm, with problems solved!!!
First of all, thank you very much Wiseman, your answers was really detailed, but issue wasn't so deep, fortunately!
So, we (another MDPro 1.0.76 web site had same problem!) fixed problem with help of your 4. answer:
| Wiseman wrote: |
| Search and replace all instances of iso-8859-* in MDPro, changing it to "utf-8". (Notice that browsers call it "utf-8" while MySQL calls it "utf8".) |
I think there is only iso-8859-1 in hole MDPro (1.0.76) so you just need (if you are running your MDPro site on GNU/Linux server with perl installed) to run this command below in folder where you installed MDPro:
| Code:
|
|
perl -p -i -e 's/ISO-8859-1/utf-8/g' `grep -ril ISO-8859-1 *`
|
And all encoding problems are solved!!!
@PeteBest:
| Wiseman wrote: |
| The Unicode/Universal Character Set is indeed the best choice (not just for websites but for everything; other character sets are obsolete). Unicode supports any written language known to man, as well as a wide set of useful characters, and supersets every existing character set. Out of the Unicode Transformation Formats, UTF-8 will work best with PHP, MySQL and MDPro, and it's optimal for European languages (but still good for Asian languages). |
I really agree with him, and I do believe that your argument:
| PeteBest wrote: |
| While this is international support the majority of our users have little/no need to worry about multibyte encoding. |
is not true because EVERYBODY have dealings with some version of (multibyte) encoding. Even in MDPro majority of files are configured in UTF-8 encoding, and files that don't have that setting are causing problems which I had.
IMHO in MDPro there is no consistently solved encoding problem, because something is configured to use iso-8859-* and something is configured to use UTF-8. Like mine problem show, that is no good. And there is no difference if you using English or some other language.
My proposal is: it is better that EVERYTHING is configured to be encoded in UTF-8, and there are really lots of arguments for that!!!
For example:
| Wikipedia wrote: |
| In June 2004, the ISO/IEC working group responsible for maintaining eight-bit coded character sets disbanded and ceased all maintenance of ISO 8859, including ISO 8859-1, in order to concentrate on the Universal Character Set and Unicode. In computing applications, encodings that provide full UCS support (such as UTF-8 and UTF-16) are finding increasing favor over encodings based on ISO 8859-1. link |
Or look here: Advantages and disadvantages (of using UTF-8)
So, it would be really really really nice that in future versions of MDPro you adjust everything to use UTF-8 encoding, and I hope that people who actually code (MDLite) would read this post.
Stay well,
mARio |
|
|
 |
 |
PeteBest
MD user level 5


Joined: Oct 06, 2003
Posts: 4845
bannato
|
 Posted:
July 26, 2006 - 09:54 AM |
|
|
1. Any UTF-8 in the standard MDPro 1.0.76 download from here is part of 3rd party code so that was left in place for compatibility purposes. So no, the majority of files are not encoded with UTF-8 at all. The majority of the site is controlled by the language variable _CHARSET which is set to ISO-8859-1
2. If your language pack was an officially supported download, then your encoding would have been set correctly.
3. UTF-8 encoding for all packages may be investigated in the future, but as the majority of the user base have no need for any multibyte characters, so it's right at the bottom of the list. Compatibility would have to be ensured with MySQL/php configurations that weren't configured for UTF-8 encoding.
Since I'm still one of the main coders for MDLite, I strongly doubt that this will get put in MDLite. If people want to create language packs for multibyte languages it's assumed that they will already know what they're doing, we won't risk breaking any existing setups for 1-2 users running unsupported language packs. While it may not cause any problems, we have to be sure, and as a development team we're already as stretched as we can be, so things like this will just have to wait. |
_________________ Retired from official MAXdev duties |
|
|
 |
Wiseman
MD user level 5


Joined: Mar 15, 2005
Posts: 101
Location: Spain
bannato
|
 Posted:
Aug 02, 2006 - 09:11 PM |
|
|
While official UTF-8 support would be a Good Thing, MDPro works perfectly fine with it if you do what's described in my post (hope interested people find it ), so I wouldn't consider it a priority over adding features and ensuring stability and security, yes. |
|
|
|
 |
blackrat
MD NewNew

Joined: May 06, 2008
Posts: 1
bannato
|
 Posted:
May 10, 2008 - 02:45 PM |
|
| Post subject: charset |
look at language\eng\global.php find and change this define('_CHARSET', |
|
|
|
 |
Bonzo
MD user level 5


Joined: Sep 15, 2004
Posts: 52
Location: Rome - Italy
bannato
|
 Posted:
May 11, 2008 - 02:54 PM |
|
|
|
|
 |
|
|
|