|Number of votes:||3|
A while back we built a website that had a chinese language version, and we had a few issues with URL Segments not looking very nice. I took me a while to figure out that what I was looking for is called Transliteration. I implemented a really hacky way of modifying the URL Segment that EPiServer produces, so that I could inject my transliterated page name, instead of the page name in chinese.
Now in CMS 10, the old UrlSegment class has been removed, and instead we have the IUrlSegmentGenerator, IUrlSegmentCreator and IUrlSegmentLocator, you can read more about that in the release note CMS-3824.
The default implementations for these interfaces are all internal, so it's still a bit hacky to extend, but, I've implemented a Transliterating UrlSegmentGenerator, and swaped out the implementation, so you don't have to.
Let's say we have a page named "伤寒论 勘误" (I don't know what that means, it's just some chinese text that I copied). The default UrlSegmentGenerator would produce the url "-", since everything but alphanumeric chars are stripped out, so the only thing that remains is the whitespace character in the name.
Using transliteration, the chinese characters are converted to their alphanumeric versions, so the same input string "伤寒论 勘误" is converted to "Shang Han Lun Kan Wu", and the Transliterating UrlSegmentGenerator then produces the url "shang-han-lun-kan-wu".
Granted, I don't know chinese, so I can't verify that this is 100% correct. But I do know that "shang-han-lun-kan-wu" is a better representation than "-", since three pages in chinese, in the same location, would have the urls "-", "-1" and"-2" using the default generator.
This approach should work for all languages, not just chinese, but you'll have to test it for yourself, if you find any bugs, please let us know by sending a pull request.
The code is available at https://github.com/creunaab/EPi.UrlTransliterator, and a package with the same name should be available in the EPiServer NuGet feed shortly.