Jan 09, 2009 | 07:42 PM  
Welcome

Don't have an account yet? You can create one, it is free, just click here

as a registered user you have some advantages like free downloads, comments and posting on our forums, depending upon this site's configuration and options.

 • •  Control Panel - Register - Login  • • 
Current Stable MDPro Lite 1.0821 Download
  Forum FAQForum FAQ   SearchSearch  UsergroupsUsergroups  PreferencesPreferences  Options forumOptions forum  Watched TopicsWatched Topics  Watched ForumsWatched Forums
Latest forum posts Latest forum posts  Log in to check your private messages Log in to check your private messages    Log inLog in 
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Author Message
m4ri000
MD NewNew
MD NewNew


Joined: June 13, 2006
Posts: 3

Member
Post 1Posted: June 14, 2006 - 09:01 AM Reply with quote Back to top
Post subject: Encoding / charset UTF-8 issue / glitch

Hi & hello to everybody, since I'm new in MDPro community! Smile

I have some problems with representation of UTF-8 encoding in some modules (News and NS-AddStory modules so far), but, it could be some major problem(s), so I'm writing it in this section of (support) forum.

Facts:
1) My web site (11teza.net) need to be able to display charset that support Croatian language. ISO-8859-2 charset can do that, but, as I informed myself over Internet, UTF-8 is better choice, maybe the best. Because UTF-8 really can display huge amount of characters, and, (I don't know excatly) PHP is (inter)communicating betwen his functions (etc.) through UTF-8 encoding. According to that:
2) MySQL charset is set to UTF-8 encoding (utf8), and I did 'find/replace' on every file in root directory (where I installed my MDPro web site) and replaced all 'charset=<something>' string with 'charset=UTF-8', and of course, all other files that is necessary (like 'root/language/eng/global.php' and in 'root/themes/name_of_theme/lang/eng/global.php'). Oh, yes:
3) I'm running web site (local version) on windowsXP on EasyPHP (Apache 1.3.33, PHP 4.3.10, PhpMyAdmin 2.6.1, MySQL 4.1.9), but, maybe this is not crucial, because I have same problems in on-line version, hosted on web server. (version of MDPro is 1.0.76)
//now lets dig closer to the problem, but not yet problem itself! Smile
4) I figured out that is, before inputting anything into MDPro web site, the best solution to replace all Croatian specific characters in text with its UTF-8 pair in HTML Numeric Character Reference (NCR).
To get picture what I'm talking, I'll give an example word:
1) [Eng version]: Toothpick.
2) [Croatian version]: Čačkalica.
3) [cro. ver. where cro. specific chars are replaced with NCR): &+#268;a&+#269;kalica.
[I put char '+' between '&' and '#' because if I don't do that you will get same word in 2) and 3) example. so, the right display is without '+' char! Smile]
Then, I checked, in MySQL database we have that same word stored like is displayed in 3) example (with NCR). And because of translation of embedded HTML entities into real characters in our web browsers we can see: Čačkalica. (in MDPro issue, only if 'Translate embedded HTML entities into real characters' is activated, which I put 'Yes' in 'Administration Menu/Settings/Settings')
//now you maybe ask yourself, where is the problem, things configured in this way need to work! yes, to be worse, that configuration (with some minor hacks) is working completely on MDPro site of friend of mine (kontra-punkt.info) which is on same server as mine (on-line) web site and he have same version of MDPro. so, here comes:

Glitch/problem:
1) *News* module:
When this module is taking data from MySQL database, he is displaying 'Title' (class 'pn-title') like the data is originally stored in MySQL (in HTML NCR mode, like we have in 3) example at 4) fact), and you can see that here Image (URL that is displayed in my web browser is:
Code:
http://127.0.0.1/11teza/modules.php?op=modload&name=News&file=art icle&sid=2
As you can see 'Story text' and Extended text' is displayed like it should be, and in database all this three 'strings' of Croatian letters ('š đ č ć ž') are equally. but, here there are not represented like they need to be (equally). 'Title' is not well displayed because News (this is my assumption) module write wrong output (if you look in source of that html page):
Code:
<a class="pn-title" href="modules.php?op=modload&amp;name=News&amp;file=article&amp;sid=2& amp;mode=thread&amp;order=0&amp;thold=0">&amp;#353; &amp;#273; &amp;#269; &amp;#263; &amp;#382;</a>
instead of writing all that html (output) code without '&amp;', just '&' char will be enough because then everything will be all right.
To be more strange, when i click to 'Edit' that article, new page have this output Image (here I crop only essential part of page, URL is:
Code:
http://127.0.0.1/11teza/admin.php?module=NS-AddStory&op=EditStory &sid=2
And you can see that here is everything all right. But, of course, this is another module, NS-AddStory, but, he (or she?! Wink ) is not innocent too!

2) *NS-AddStory* module:
IMHO (again, this is my assumption) this time guilty goes to 'addstory_categories.php', and only because I have problem with naming and showing categories (with croatian characters, of course!) in proper way. You can see Image (URL:
Code:
http://127.0.0.1/11teza/admin.php?module=NS-AddStory&op=DelCatego ry&catid=0
in what way is displaying name of category that is named Čumez (&+#268;umez).


OK. These are major problems for now (hope so that I'm missing some BIG configuration setting, and then everything be all right), and I hope so that I was as much as I can helpful in analysing this problematics. Wink



P.S. [29-06-2006]
I edited this only because I didn't figure how to put images into post, so now i changed that.
View user's profile Visit poster's website ICQ Number
m4ri000
MD NewNew
MD NewNew


Joined: June 13, 2006
Posts: 3

bannato
Post  Posted: June 30, 2006 - 10:05 AM Reply with quote Back to top
Post subject: Anybody, please?!?

It has been almost three weeks since I posted this post about very BIG encoding problem.
And all I was able to do was watching it drowning in 'Latest Posts' page...

Without solving it I really can't get my web site functioning like it should be, in full capacity with all data in it.

Nobody is answering nothing, and I really want to know way is that?!? This is matter of not only Croatian characters, but all non-usual UTF-8 characters in most used module!!!

I know that this is open-source community, and all work is voluntary, but hey, not even one reply?! Sad

Everything would be more easier if I can have MDLite, and play with that..but, yes, I'm not in MDBooster Club...heh... Confused

I really like very much MDPro, but if I will not solve this issue, I will be forced to switch to another CMS, I'm afraid.
And this is a fact, not guilt-trip. Rolling Eyes

So, I will appreciate any kind of help, it can't be so hard, come on! Razz

Tnx,

mARio
View user's profile Visit poster's website ICQ Number
PeteBest
MD user level 5
MD user level 5


Joined: Oct 06, 2003
Posts: 4845

bannato
Post  Posted: June 30, 2006 - 10:18 AM Reply with quote Back to top

m4ri000 wrote:
So, I will appreciate any kind of help, it can't be so hard, come on! Razz


Laughing If it's not so hard, why not just fix the problem yourself??

I don't speak or have any dealings with any unicode based language, like almost all the other users here, hence the lack of replies! While this is international support the majority of our users have little/no need to worry about multibyte encoding.

Also, your first post is very confusing to read, I remember reading your post about 3 times and I still didn't fully understand what you were trying to ask. I'd recommend keeping your posts as short as possible, as the layout and the way you've tried to explain things doesn't make easy reading.

All I can suggest is to make sure php is compiled with the mb_string addon. Aside from that you're just going to have to wait until someone that has knowledge of unicode assists, or try to do something further yourself.

We have a few Japanese/Chinese users that use UTF-8 encoding without any problems, so I don't think the problem lies with MDPro anyway. If you want to install another CMS, go for it.

_________________
Retired from official MAXdev duties
View user's profile
Wiseman
MD user level 5
MD user level 5


Joined: Mar 15, 2005
Posts: 103
Location: Spain
bannato
Post  Posted: July 06, 2006 - 12:09 AM Reply with quote Back to top

M4ri000,

The Unicode/Universal Character Set is indeed the best choice (not just for websites but for everything; other character sets are obsolete). Unicode supports any written language known to man, as well as a wide set of useful characters, and supersets every existing character set. Out of the Unicode Transformation Formats, UTF-8 will work best with PHP, MySQL and MDPro, and it's optimal for European languages (but still good for Asian languages).

I have a lot of experience with character sets and encodings so I might be able to help you.

In order to setup UTF-8 properly to use in your website, you should do the following:

1. Set your database to use UTF-8. To do this, make sure your database, server and client are set to use UTF-8. You will need some knowledge of the database in order to make it use UTF-8; it's a bit too long for me to explain without more information (like whether you have access to mysqldump and if you already have data with international characters in it). To set the server to use UTF-8 by default, edit your database configuration file (usually /etc/my.cnf on Unices, and <Installation path>\my.cnf on Windows/ReactOS) and add the following switches under [mysqld]:
Code:
default-character-set=utf8
default-collation=utf8_general_ci

To force using UTF-8 from MDPro itself (which is recommended), you can edit your includes/pnAPI.php file, at the end of the pnDBInit() function, right before the return true;, adding the following code:
Code:
$dbconn->Execute("SET SESSION character_set_client='utf8'");
$dbconn->Execute("SET SESSION character_set_connection='utf8'");
$dbconn->Execute("SET SESSION character_set_results='utf8'");

but remember the database has to be UTF-8 internally to support all characters, otherwise glitches may happen.

2. Set PHP to use UTF-8. Edit your php.ini file (usually /etc/php.ini on Unices, <Installation path>\php.ini on Windows/ReactOS) and under [extension section], enable it (it's named like mbstring.so, php_mbstring.dll or something like that), hope your PHP was compiled with mbstring. Then under the [mbstring] section (create one at the end of the file if it doesn't exist), set the following:
Code:
mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = UTF-8,CP1252,ISO-8859-15,ASCII
mbstring.http_output = UTF-8
mbstring.encoding_translation = Off
mbstring.detect_order = UTF-8,CP1252,ISO-8859-15,ASCII
mbstring.func_overload = 6


3. Make sure your application supports UTF-8. In the code, remember to never use {} (or [] ) to access string characters; use substr instead. For example, if you want to retrieve the second character of a string $a, don't do $a{1}, do substr($a, 1, 1) instead.

Also, when using regular expressions, use PCRE (preg_* functions), which are recommended for a number of reasons, and use the u modifier to enable UTF-8 support (so it's able to count UTF-8 multibyte characters as a single characters, and enables other features too, check the docs).

MDPro has a fairly good support for UTF-8; I only had minor problems with it.

Be sure your language files are in the proper character set (UTF-8). If they are in anything different, you can convert them with recode (recode sourcecharset..utf8 file) or iconv (iconv --help).

If you have some problem with HTML entities, try setting/unsetting the HTML entities switch in MDPro Administration->Settings, I don't remember how it was called.

4. Make sure User-Agents (browsers) are instructed display UTF-8 properly. Search and replace all instances of iso-8859-* in MDPro, changing it to "utf-8". (Notice that browsers call it "utf-8" while MySQL calls it "utf8".)

I strongly recommend that you force it from MDPro like this: in includes/xhtml.php, at the top of the xhtml_dtd_start function, add the following line:
Code:
header('Content-Type: text/html; charset=utf8');


If you don't do that, you'll need to edit your HTTP server configuration and set the default character set to UTF-8.


Hope this helps Smile .


Last edited by Wiseman on Aug 02, 2006 - 10:02 PM; edited 1 time in total
View user's profile Visit poster's website
m4ri000
MD NewNew
MD NewNew


Joined: June 13, 2006
Posts: 3

bannato
Post  Posted: July 26, 2006 - 08:45 AM Reply with quote Back to top

Well...here I'm, with problems solved!!! Smile

First of all, thank you very much Wiseman, your answers was really detailed, but issue wasn't so deep, fortunately!

So, we (another MDPro 1.0.76 web site had same problem!) fixed problem with help of your 4. answer:
Wiseman wrote:
Search and replace all instances of iso-8859-* in MDPro, changing it to "utf-8". (Notice that browsers call it "utf-8" while MySQL calls it "utf8".)

I think there is only iso-8859-1 in hole MDPro (1.0.76) so you just need (if you are running your MDPro site on GNU/Linux server with perl installed) to run this command below in folder where you installed MDPro:
Code:
perl -p -i -e 's/ISO-8859-1/utf-8/g' `grep -ril ISO-8859-1 *`
And all encoding problems are solved!!!


@PeteBest:
Wiseman wrote:
The Unicode/Universal Character Set is indeed the best choice (not just for websites but for everything; other character sets are obsolete). Unicode supports any written language known to man, as well as a wide set of useful characters, and supersets every existing character set. Out of the Unicode Transformation Formats, UTF-8 will work best with PHP, MySQL and MDPro, and it's optimal for European languages (but still good for Asian languages).

I really agree with him, and I do believe that your argument:
PeteBest wrote:
While this is international support the majority of our users have little/no need to worry about multibyte encoding.
is not true because EVERYBODY have dealings with some version of (multibyte) encoding. Even in MDPro majority of files are configured in UTF-8 encoding, and files that don't have that setting are causing problems which I had.
IMHO in MDPro there is no consistently solved encoding problem, because something is configured to use iso-8859-* and something is configured to use UTF-8. Like mine problem show, that is no good. And there is no difference if you using English or some other language.
My proposal is: it is better that EVERYTHING is configured to be encoded in UTF-8, and there are really lots of arguments for that!!!
For example:
Wikipedia wrote:
In June 2004, the ISO/IEC working group responsible for maintaining eight-bit coded character sets disbanded and ceased all maintenance of ISO 8859, including ISO 8859-1, in order to concentrate on the Universal Character Set and Unicode. In computing applications, encodings that provide full UCS support (such as UTF-8 and UTF-16) are finding increasing favor over encodings based on ISO 8859-1. link

Or look here: Advantages and disadvantages (of using UTF-8)

So, it would be really really really nice that in future versions of MDPro you adjust everything to use UTF-8 encoding, and I hope that people who actually code (MDLite) would read this post.

Stay well,

mARio
View user's profile Visit poster's website ICQ Number
PeteBest
MD user level 5
MD user level 5


Joined: Oct 06, 2003
Posts: 4845

bannato
Post  Posted: July 26, 2006 - 10:54 AM Reply with quote Back to top

1. Any UTF-8 in the standard MDPro 1.0.76 download from here is part of 3rd party code so that was left in place for compatibility purposes. So no, the majority of files are not encoded with UTF-8 at all. The majority of the site is controlled by the language variable _CHARSET which is set to ISO-8859-1

2. If your language pack was an officially supported download, then your encoding would have been set correctly.

3. UTF-8 encoding for all packages may be investigated in the future, but as the majority of the user base have no need for any multibyte characters, so it's right at the bottom of the list. Compatibility would have to be ensured with MySQL/php configurations that weren't configured for UTF-8 encoding.

Since I'm still one of the main coders for MDLite, I strongly doubt that this will get put in MDLite. If people want to create language packs for multibyte languages it's assumed that they will already know what they're doing, we won't risk breaking any existing setups for 1-2 users running unsupported language packs. While it may not cause any problems, we have to be sure, and as a development team we're already as stretched as we can be, so things like this will just have to wait.

_________________
Retired from official MAXdev duties
View user's profile
Wiseman
MD user level 5
MD user level 5


Joined: Mar 15, 2005
Posts: 103
Location: Spain
bannato
Post  Posted: Aug 02, 2006 - 10:11 PM Reply with quote Back to top

While official UTF-8 support would be a Good Thing, MDPro works perfectly fine with it if you do what's described in my post (hope interested people find it Wink ), so I wouldn't consider it a priority over adding features and ensuring stability and security, yes.
View user's profile Visit poster's website
blackrat
MD NewNew
MD NewNew


Joined: May 06, 2008
Posts: 1

bannato
Post  Posted: May 10, 2008 - 03:45 PM Reply with quote Back to top
Post subject: charset

look at language\eng\global.php find and change this define('_CHARSET',
View user's profile
Bonzo
MD user level 5
MD user level 5


Joined: Sep 15, 2004
Posts: 57
Location: Rome - Italy
bannato
Post  Posted: May 11, 2008 - 03:54 PM Reply with quote Back to top

@blackrat

You have seen the date of the last post?
The last post about two years ago, August 2006!!!!!

Bye

_________________
Bonzo (aka Matteo Carletti)

www.isartegiovagnoli.com - Istituto Statale d'Arte di Sansepolcro e Anghiari
www.agriturismoilsasso.it || www.agriturismoanghiari.it
www.beccacciaiditalia.com
View user's profile Visit poster's website
Display posts from previous:     
Jump to:  
All times are GMT + 13 Hours
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Powered by MDForum 2.0.8© 2003-2007 MAXdev Team
Credits