Replace all VNese letters with the corresponding English accents
- Published on
- • 3 mins read
Convert Vietnamese Letters to Plain English
Vietnamese diacritic letters can be tricky when working with text processing in JavaScript. This function provides a way to convert all Vietnamese letters to their corresponding plain English characters using regex.
toPlainEnglish
Function
Basic This function uses the replace()
method in JavaScript, along with regex patterns, to replace all Vietnamese letters (both lowercase and uppercase) with their English counterparts.
let toPlainEnglish = (str) => {
str = str.replace(/à|á|ạ|ả|ã|â|ầ|ấ|ậ|ẩ|ẫ|ă|ằ|ắ|ặ|ẳ|ẵ/g, 'a')
str = str.replace(/è|é|ẹ|ẻ|ẽ|ê|ề|ế|ệ|ể|ễ/g, 'e')
str = str.replace(/ì|í|ị|ỉ|ĩ/g, 'i')
str = str.replace(/ò|ó|ọ|ỏ|õ|ô|ồ|ố|ộ|ổ|ỗ|ơ|ờ|ớ|ợ|ở|ỡ/g, 'o')
str = str.replace(/ù|ú|ụ|ủ|ũ|ư|ừ|ứ|ự|ử|ữ/g, 'u')
str = str.replace(/ỳ|ý|ỵ|ỷ|ỹ/g, 'y')
str = str.replace(/đ/g, 'd')
str = str.replace(/À|Á|Ạ|Ả|Ã|Â|Ầ|Ấ|Ậ|Ẩ|Ẫ|Ă|Ằ|Ắ|Ặ|Ẳ|Ẵ/g, 'A')
str = str.replace(/È|É|Ẹ|Ẻ|Ẽ|Ê|Ề|Ế|Ệ|Ể|Ễ/g, 'E')
str = str.replace(/Ì|Í|Ị|Ỉ|Ĩ/g, 'I')
str = str.replace(/Ò|Ó|Ọ|Ỏ|Õ|Ô|Ồ|Ố|Ộ|Ổ|Ỗ|Ơ|Ờ|Ớ|Ợ|Ở|Ỡ/g, 'O')
str = str.replace(/Ù|Ú|Ụ|Ủ|Ũ|Ư|Ừ|Ứ|Ự|Ử|Ữ/g, 'U')
str = str.replace(/Ỳ|Ý|Ỵ|Ỷ|Ỹ/g, 'Y')
str = str.replace(/Đ/g, 'D')
// OPTIONAL - Remove special characters
// str = str.replace(/[^a-zA-Z0-9 \s]/g, "");
return str
}
console.log(toPlainEnglish('Lấvkush Mấurya')) // => "Lavkush Maurya"
How It Works
- Regex Matching: The function uses regex patterns to match each set of Vietnamese letters (including their diacritic marks) and replaces them with the corresponding English letter.
- Case Sensitivity: The function handles both lowercase and uppercase letters.
- Optional Special Character Removal: You can optionally remove any non-alphanumeric characters by uncommenting the last line of the function.
Usage Example
console.log(toPlainEnglish('Nguyễn Văn A')) // => "Nguyen Van A"
console.log(toPlainEnglish('Lấvkush Mấurya')) // => "Lavkush Maurya"
Applications
This function is particularly useful in scenarios such as:
- URL Slug Generation: Creating SEO-friendly slugs from Vietnamese titles by removing diacritics.
- Search Normalization: Ensuring that search results match regardless of whether the input includes Vietnamese accents or not.
- Text Processing: Preparing text for systems that do not support diacritics.
Further Improvements
- Performance Optimization: For larger text processing, the function could be optimized by using a more efficient lookup mechanism like a mapping object.
- Handling Special Characters: Depending on the use case, you can expand the function to handle other special characters or replace them with spaces or other symbols.
Conclusion
This simple yet effective function makes it easy to replace Vietnamese letters with plain English counterparts, enhancing text processing and ensuring compatibility with various systems.
Happy coding!