PostgreSQL Non-Latin Characters
I've been working on a project that requires me to create a simple search from using PHP and PostgreSQL, importing the data form XML files. The tricky part is that the XML files must contain Latin, Cyrillic, Korean and Japanese characters. I figured that if I just use UTF-8 encoding for both the XML/HTML pages and the database, everything should work just fine, and even though the non-latin characters appear all screwed up when I view them directly from the database, they actually look just fine when I get them to display on the page.
The problem comes with the searching. When I search for an English title, or anyting using latin, it works just fine, but when I enter a Cyrillic/Japanese/Korean search string, I get no results whatsoever. Any idea why that is happening and how I can fix it?
I had already enforced UTF-8 on the search page and though it should be enough, but converting the string in PHP actually did the trick. Thanks.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Edit: It appears this isn't over yet! Things got even weirder now. Now it works with Japanese/Korean as well, but what's weird is that it works for Cyrillic only if I copy the word directly from the XML file, but if I input it from the keyboard I get no results. This doesn't make any sense to me, and I'm even more confused now :right: Any idea how to solve that?
P.S Here's the title copied from the XML: Eндивaл Дoмът нa Хaoсa Here's the same title typed using the keyboard: Ендивал Домът на Хаоса
I get the same result when echoing them: copied one - Eндивaл Дoмът нa Хaoсa, typed one - Ендивал Домът на Хаоса. I tried comparing them online using http://www.textdiff.com/, and the result is that they are 100% different… I'm not really sure why that is and even if they are using a different encoding or something(like windows-1251 and UTF-8), I convert both of them to UTF-8 before searching so there shouldn't be a problem with that. I'm really at a loss here.
SOLVED: In the end it was just bad luck I guess. When trying Cyrillic I was always searching for the first entry and didn't try the others because I figured they wouldn't work as well - turned out they did and the first one was the only one that wasn't working. Thinking back I figured I typed all of the others by hand, but for this one I was lazy so I just copied the title from another site, which was obviously using a different encoding. I was converting it to UTF-8 anyway, but I guess it didn't work properly. Doesn't matter now - I just updated the XML entry and typed it myself, and everything is OK now.