How to do language-sensitive string comparisons with vanilla JavaScript
Today, we’re going to look at a few ways to check if two strings are equal, including how to handle strings that may or may not have character accents.
Let’s dig in!
The equality operators
The most straightforward way to compare strings is with the equality operators (==
and ===
).
The equals operator (==
) check if two items have the same value, while the strict equals operator (===
) checks if they have the same value and are the same type.
// returns true
// They have the same value
let ex1 = 42 == '42';
// returns false
// They have the same value but not the same type
let ex2 = 42 === '42';
// returns true
// They have the same value and the same type
let ex3 = '42' === '42';
This works great for most string comparisons.
But what happens when you have a string that may or may not have special characters in it.
For example, “resume” is sometimes written as “resumé” or “résumé”. Depending on what your app is doing, you might consider them to all be the same word.
But the equality operator would return false
when comparing one to the other.
// returns false
let ex4 = 'resume' === 'résumé';
The Intl.Collator
object
The Intl
API is used to run a collection of language-aware functions and APIs. One of those is the Intl.Collator
API, which can conduct language-sensitive comparisons.
To use it, you first create a new Intl.Collator()
object.
It accepts a locale as an argument. There are a variety of acceptable formats for the locale:
- A two-digit string. For example,
en
for English. - A tag and subtag. For example
en-US
for United States English oren-GB
for Great Britain English. - Multiple subtags. For example,
de-CH-1996
for the modern Swiss variant of German.
You can find a full list of tags and subtags on the IANA Language Subtag Registry.
// Create a Collator object
let collator = new Intl.Collator('en');
It also accepts an object of options as a second argument.
There are a handful of options you can provide, but for our purposes, sensitivity
is particularly useful. It tells the Intl.Collator
object which characters to treat as different and which to ignore.
base
ignores accents and capitalization (a = á, a = A)accent
ignores case only (a ≠ á, a = A)case
ignores accents (a = á, a ≠ A)
For our purposes, let’s ignore accents and case, and use a value of base
.
// Create a Collator object
let collator = new Intl.Collator('en', {sensitivity: "base"});
Now, we can run the Intl.Collator.prototype.compare()
method, passing in our two strings as arguments.
If they’re equal, it will return a value of 0
. If the first string comes first alphabetically, it returns -1
. If the second string comes first, it returns 1
.
// returns 0
// They're equal!
collator.compare('resume', 'résumé');
// returns -1
// b comes before z
collator.compare('b', 'z');
// returns 1
// a comes before b
collator.compare('b', 'a');
More language-sensitive comparisons
The Intl
API has a bunch of other language-sensitive comparison functions, including ones you can use to format dates and times as well as numbers.