php - Replace Invalid UTF-8, Not Replace -


Evening,

I have HTML files that I am cleaning in. There are some invalid unicode characters Those that appear in my text editor like:

/ B7

I want to replace them with either their character, or the replacement character of my choice. For example, the / B7 character is a midode, but I want to replace it with full-stop.

Function here:

Removes invalid characters, but I am not enough to do anything else with it to be encoded adequately on encoding.

Your file is Windows-1252 (where 0xB7 decode to One · ) and GEDT is decoding it as UTF-8 and invalid UTF-8 bytes (invalid outside a specific sequence in 0xB7 UTF-8) directly to their value As I think you can fix the file in many ways, in PHP, you can do the following:

  & lt ;? Php $ file_contents = file_get_contents ("brokenfile.txt"); $ File_contents = mb_convert_encoding ($ file_contents, "UTF-8", "Windows-1252"); File_put_contents ("brokenfile.txt", $ file_contents);   

The above script will decode the file as Windows-1252 and encode it as UTF-8.

Text editors allow you to specify that you should always configure your editor encoding before using it in the 'save as' dialog or in some configurations.

If you see ÃÆ'à à ⠀ ™ Ã⠀ šÃ, on your website after this conversion, it means that you are telling the browsers that your stuff is Windows -1252 or ISO-8859-1 etc. You should tell the browsers that your content is in UTF-8:

  header ("content-type: text / html; charset = utf-8");    

Comments

Popular posts from this blog

excel vba - How to delete Solver(SOLVER.XLAM) code -

github - Teamcity & Git - PR merge builds - anyway to get HEAD commit hash? -

ios - Replace text in UITextView run slowly -