Replace all VNese letters with the corresponding English accents

Published on
3 mins read

Convert Vietnamese Letters to Plain English

Vietnamese diacritic letters can be tricky when working with text processing in JavaScript. This function provides a way to convert all Vietnamese letters to their corresponding plain English characters using regex.

Basic toPlainEnglish Function

This function uses the replace() method in JavaScript, along with regex patterns, to replace all Vietnamese letters (both lowercase and uppercase) with their English counterparts.

let toPlainEnglish = (str) => {
  str = str.replace(/à|á|||ã|â||||||ă|||||/g, 'a')
  str = str.replace(/è|é||||ê||ế|||/g, 'e')
  str = str.replace(/ì|í|||ĩ/g, 'i')
  str = str.replace(/ò|ó|||õ|ô||||||ơ|||||/g, 'o')
  str = str.replace(/ù|ú|||ũ|ư|||||/g, 'u')
  str = str.replace(/|ý|||/g, 'y')
  str = str.replace(/đ/g, 'd')
  str = str.replace(/À|Á|||Ã|Â||||||Ă|||||/g, 'A')
  str = str.replace(/È|É||||Ê|||||/g, 'E')
  str = str.replace(/Ì|Í|||Ĩ/g, 'I')
  str = str.replace(/Ò|Ó|||Õ|Ô||||||Ơ|||||/g, 'O')
  str = str.replace(/Ù|Ú|||Ũ|Ư|||||/g, 'U')
  str = str.replace(/|Ý|||/g, 'Y')
  str = str.replace(/Đ/g, 'D')

  // OPTIONAL - Remove special characters
  // str = str.replace(/[^a-zA-Z0-9 \s]/g, "");

  return str
}

console.log(toPlainEnglish('Lấvkush Mấurya')) // => "Lavkush Maurya"

How It Works

  • Regex Matching: The function uses regex patterns to match each set of Vietnamese letters (including their diacritic marks) and replaces them with the corresponding English letter.
  • Case Sensitivity: The function handles both lowercase and uppercase letters.
  • Optional Special Character Removal: You can optionally remove any non-alphanumeric characters by uncommenting the last line of the function.

Usage Example

console.log(toPlainEnglish('Nguyễn Văn A')) // => "Nguyen Van A"
console.log(toPlainEnglish('Lấvkush Mấurya')) // => "Lavkush Maurya"

Applications

This function is particularly useful in scenarios such as:

  • URL Slug Generation: Creating SEO-friendly slugs from Vietnamese titles by removing diacritics.
  • Search Normalization: Ensuring that search results match regardless of whether the input includes Vietnamese accents or not.
  • Text Processing: Preparing text for systems that do not support diacritics.

Further Improvements

  • Performance Optimization: For larger text processing, the function could be optimized by using a more efficient lookup mechanism like a mapping object.
  • Handling Special Characters: Depending on the use case, you can expand the function to handle other special characters or replace them with spaces or other symbols.

Conclusion

This simple yet effective function makes it easy to replace Vietnamese letters with plain English counterparts, enhancing text processing and ensuring compatibility with various systems.

Happy coding!