malformed - file encoding and deformed strings -
I am just working with a text file that has a lot of distorted strings like:
wapple ?? And lieutenant; 88 & gt; TE professionals AM pole "JMA ?? a My editor says that file encoding is Latin 1. String should be a check sentence that contains some directics, so no wonder it's wrong I have tried to emphasize utf8 and latin2 encoding in my editor but he could not help. I have also tried to use iconv to convert file from Latin 1 to UTF 8 or Latin 2 not He helps. I often like issues and like it and I do not know any other solution than manually writing strings. Is this a better way to fix it?
Code>
Here the hex is the dump of the part where the distorted string is:
0002640: 6A6D656 6F225D20DD20 2744 453 A2056 Gemman "= 'D : V 0002650: 7970 6 CCE 8874 6520 7072 6f73 C3Ad 6D20 IPL .. EPM 0002660: 706F 6C 65 2022 6A 6DC3A 9 6A 6F222A 273B Pol "JM .. not'. '; EDIT2: The sentence above is actually the correct UTF 8 because deception has said. But I have seen some strange things right now. If I try to transcode the file from utf8 to utf8 (with iconv), then I get an error on the word: Postgeb ¼hr on the character ¼ ¼ . If I look at the hex dump, then this character is denoted as \ xfc (252 in decimal), valid Latin for 1 byte encoding for ¼ But totally invalid UFO is 8 byte encoding. It seems that the share of the file is in the second part in Latin 1 and UTF 8. Here is a part of the file which is in Latin 1 (probably): 0000250: 506f 7374 6765 62fc 6872 273b 0a0 9 0963 Postgub hr '; ... c 0000260: 6f6e 665b 2277 6166 6572 7322 5d20 3d20 OFF ["wafers"] = 0000270: 2744 453A 206F 706Cc3A1746B20C3 273B 'DE: Opel..tike.' ; '; As I see it more, it also does not seem to be valid Latin 1 reason in Latin 1, it's a mess ( DE: oplà ?? a  Ã? ? [ DE: oplatky za ), instead of? , this part of the file contains some damaged text. I do not understand how the encoding can be found in this file
If the file is to include Latin 2 encoded text , then it is trying to convert from Latin 1 or similar things are messing up . The problem is that your text editor does not recognize the encoding automatically because single-byte Latin * encodings all change equally at a byte level. If your editor tells you "encoding" is Latin 1, then what does it mean is that the file is currently interpreted as Latin 1 is obviously wrong. You need to ask your editor to treat the file as Latin 2 (as the Open ... Latin 2, or your editor gives you this option) or the file Your editor handles correctly in encoding to convert from Latin 2.
To better understand the encoding, I recommend you to read.
In response to your posted hex dump: This file is is UTF-8 encoded.
Comments
Post a Comment