php - Replace Invalid UTF-8, Not Replace -
Evening,
I have HTML files that I am cleaning in. There are some invalid unicode characters Those that appear in my text editor like:
/ B7
I want to replace them with either their character, or the replacement character of my choice. For example, the / B7 character is a midode, but I want to replace it with full-stop.
Function here:
Removes invalid characters, but I am not enough to do anything else with it to be encoded adequately on encoding.
Your file is Windows-1252 (where The above script will decode the file as Windows-1252 and encode it as UTF-8. Text editors allow you to specify that you should always configure your editor encoding before using it in the 'save as' dialog or in some configurations. If you see 0xB7 decode to
One · ) and GEDT is decoding it as UTF-8 and invalid UTF-8 bytes (invalid outside a specific sequence in
0xB7 UTF-8) directly to their value As I think you can fix the file in many ways, in PHP, you can do the following:
& lt ;? Php $ file_contents = file_get_contents ("brokenfile.txt"); $ File_contents = mb_convert_encoding ($ file_contents, "UTF-8", "Windows-1252"); File_put_contents ("brokenfile.txt", $ file_contents);
ÃÆ'à à ⠀ ™ Ã⠀ šÃ, on your website after this conversion, it means that you are telling the browsers that your stuff is Windows -1252 or ISO-8859-1 etc. You should tell the browsers that your content is in UTF-8:
header ("content-type: text / html; charset = utf-8");
Comments
Post a Comment