SVN Error: Can't convert string from native encoding to 'UTF-8'
I’ve got a post-commit hook script that performs a SVN update of a working copy when commits are made to the repository.
When users commit to the repository from their Windows machines using TortoiseSVN they get the following error:
post-commit hook failed (exit code 1) with output: svn: Error converting entry in directory '/home/websites/devel/website/guides/Images' to UTF-8 svn: Can't convert string from native encoding to 'UTF-8': svn: Teneriffa-S?\195?\188d.jpg
The file in question above is:
Teneriffa-Süd.jpg notice the accented u. This is because the site is German and the files have been spelt in German.
When executing a update on the working copy at the Linux command-line no errors are encountered. The above error only exists when the post-commit hook is executed via a commit by a Windows SVN client.
- Why would SVN try to change the encoding of a file?
- Are filenames allowed to contain chars that are outside the Windows standard ASCII ones?
It turns out that the file in question’s filename correctly displays as
Teneriffa-Süd.jpg when viewed from a Windows machine (via Samba) but when I view the filename from the Linux server (using SSH and PuTTY) where the file resides I get
11 Solutions collect form web for “SVN Error: Can't convert string from native encoding to 'UTF-8'”
- It does not change the encoding of the file. It changes the encoding of the filename (to something that every client can hopefully understand).
- Allowed by whom ? NTFS uses 16-bit code points, and Windows can expose the file names in various encodings, based on how you ask for it (it will try to convert them to the encoding you ask for). Now… That bit (how you ask) depends on the specific svn client you use. It sounds to me like a bug in TortoiseSVN.
Edit to add:
Ugh. I misunderstood the symptoms. the svn server stores everything in utf-8 (and it seems that it did that successfully).
The post-commit hook is the bit that fails to convert from UTF-8. If I understand what you’re saying correctly, the post-commit hook on the server triggers an svn update to a shared drive (the svn server therefore starts an svn client to itself…) ? This means that the configuration that needs to be fixed is the one for the client on the server.
Check the LANG / LC_ALL on the environment executing the svn server.. As it happens, the hooks are run in a vacuum environment (see Tip). So you should set the variable in the hook itself.
See also this page for info on how svn handles localisation
Yet another example:
$ svn update svn: Error converting entry in directory '.' to UTF-8 svn: Can't convert string from native encoding to 'UTF-8': $ export LC_CTYPE=en_US.UTF-8 $ svn update
(… and all is fine now)
If Error is –
[abc@288832-web3 public_html]$ svn update svn: Error converting entry in directory 'images' to UTF-8 svn: Valid UTF-8 data (hex: 46 65 6e 65 72 62 61 68) followed by invalid UTF-8 sequence (hex: e7 65 2b 46)
Then do this.
[abc@288832-web3 public_html]$ printf "\x46\x65\x6e\x65\x72\x62\x61\x68\n" Fenerbah
(This means that the system has some file name starting with “Fenerbah” in that folder.)
[abc@288832-web3 public_html]$ cd images [abc@288832-web3 images]$ rm -rf Fenerbahçe+Forma+2.jpg
So you can see that there is a special character in the name and it is not supported by SVN.
put this in your post-commit
export LANG=xxxxx (your lang)
Don’t forget to generate those locales in your system
example for Ru
locale-gen ru_RU.CP1251 locale-gen ru_RU.UTF-8 dpkg-reconfigure locales
It changes the encoding to a location-neutral encoding in case someone with a different encoding checks it out.
Of course. But it’s not “Windows” ASCII (Windows actually uses some strange encoding like CP1251 or so).
The best way to fix this is to make sure that your system uses UTF-8 whenever possible (check
Just use the following line in your script before executing any svn command.
User appropriate language codes, in following example I used japanese
It seems that all LC_ varables need .UTF8 at the end. For example, I happened to have LC_ALL, LC_TIME, and LC_CTYPE defined. After setting LC_CTYPE the problem was not solved, so I needed to type LC_ALL as well and then it worked:
LC_ALL=en_US.UTF-8 LC_TIME=en_DK.UTF-8 LC_CTYPE=en_US.UTF-8
In order to avoid the problem again, I copied the file to a different name, removed the old one from svn, added new one to svn, and send a message to a collaborator not to do this.
I got a similar problem when running “svn add” on a directory, but the solution was different. I couldn’t see the “hex” digits using printf (actually no hex output was shown by svn), but this command allowed me to see the results, and fix it:
LC_ALL=C svn add probealign
I think, in general, sticking LC_ALL=C before your command allows you to see the offending files… and is a lot easier than pasting in a lot of \x72 stuff (which apparently may not be available).
In my case, I had the setting in ~/.subversion/config as below
log-encoding = ...
Commenting it worked.