Google has grown smarter at recognizing variant spellings of the same entity, but columnist Paul Shapiro observes that it’s not perfect yet.
My wife came to me with a problem. She wanted festive, whimsical, and potentially matching Hanukkah pajamas. But there weren’t enough options coming up in Google under one spelling of the holiday’s name, so she told me she was systematically going through all spellings to compile her list of shopping items.
I was pretty surprised by this — I had expected Google to be smart enough to recognize that these were alternative spellings of the same thing, especially post-Hummingbird. Clearly, this was not the case.
Some background for those who don’t know: Hanukkah is actually a transliterated word from Hebrew. Since Hebrew has its own alphabet, there are numerous spellings that one can use to reference it: Hanukkah, Chanukah, and Channukah are all acceptable spellings of the same holiday.
So, when someone searches for “Hanukkah pajamas” or “Chanukah pajamas,” Google really should be smart enough to understand that they are different spellings of the same concept and provide nearly identical results. But Google does not! I imagine this happens for other holidays and names from other cultures, and I’d be curious to know if other readers experience the same problem with those.
Why am I surprised that Google is returning different results for different spellings? Well, with the introduction of the Knowledge Graph (and Hummingbird), Google signaled a change for SEO. More than ever before, we could start thinking about search queries not merely as keyword strings, but as interrelated real-world concepts.
What do I mean by this?
When someone searches for “Abraham Lincoln,” they’re more than likely searching for the entity representing the 16th president of the United States, rather than the appearance of the words “Abraham” and “Lincoln,” or their uncle, also named Abraham Lincoln. And if they search for “Lincoln party,” Google knows we’re likely discussing political parties, rather than parties in the town of Lincoln, Mass., because this is a concept in close association with the historical entity Abraham Lincoln.
Similarly, Google is certainly capable of understanding that when we use the keyword Hanukkah, it is in reference to the holiday entity and that the various spellings are also referring to the same entity. Despite different spellings, the different searches actually mean the same thing. But alas, as demonstrated by my wife’s need to run a different search for each spelling of the holiday in order to discover all of her Hanukkah pajama options, Google wasn’t doing the best job.
So, how widespread is the Chanukah/Hanukkah/Chanukkah search problem? Here are a couple of search results for Chanukah items:
As you can see from the first screen shot, some big box retailers like Target, Macy’s and JCPenney rank on page one of Google. In screen shot two, however, they are largely absent — and sites like PajamaGram and Etsy are dominating the different spelling’s SERP.
This means that stores targeting the already small demographic of Hanukkah shoppers are actually reducing the number of potential customers by only using one spelling on their page. (Indeed, according to my keyword tool of choice, although “Hanukkah” has the highest search volume of all variants at 301,100 global monthly searches, all other spellings combined still make up a sizeable 55,500 searches — meaning that retailers optimizing for both terms could be seeing 18 percent more traffic.)
Investigating spelling variations and observations
Since I’m an ever-curious person, I wanted to investigate this phenomenon a little further.
I built a small, simple tool to show how similar the search engine results pages (SERP) for two different queries are by examining which listings appear in both SERPs. If we look at five common spellings of Hanukkah, we see the following:
|Keyword 1||Keyword 2||SERP Similarity|
The tool shows something quite interesting here: Not only are the results different, but depending on spelling, the results may only be 20 percent identical, meaning eight out of 10 of the listings on page one are completely different.
I then became curious about why the terms weren’t canonicalized to each other, so I looked at Wikidata, one of the primary data sources that Google uses for its Knowledge Graph. As it turns out, there is an entity with all of the variants accounted for:
I then checked the Google Knowledge Graph Search API, and it became very clear that Google may be confused:
|Channukah||8.081924||kg:/m/0vpq52||Channukah Love||Song by Ju-Tang||[MusicRecording, Thing]|
|Chanukah||16.334606||kg:/m/06xmqp_||A Rugrats Chanukah||?||[Thing]|
|Hannukah||11.404715||kg:/m/0zvjvwt||Hannukah||Song by Lorna||[MusicRecording, Thing]|
|Hannukkah||11.599854||kg:/m/06vrjy9||Hannukkah||Book by Jennifer Blizin Gillis||[Book, Thing]|
|Hanukkah||21.56493||kg:/m/02873z||Hanukkah Harry||Fictional character||[Thing]|
The resultScore values — which, according to the API documentation, indicate “how well the entity matched the request constraints” — are very low. In this case, the entity wasn’t very well matched. This would be consistent with the varying results if it weren’t for the fact that a Knowledge Graph is being returned for all of the spelling variants with the Freebase ID /m/022w4 — different from what is returned from the Knowledge Graph API. So, in this case, it seems that the API may not be a reliable means of assessing the problem. Let’s move on to some other observations.
It is interesting to note was that when searching for Channukah, Google pushed users to Chanukah results. When searching Hannukah and Hannukkah, Google pushed users to Hanukkah results. So, Google does seem to group Hanukkah spellings together based on whether they start with an “H” or a “Ch.”
Chanukah, Hannukah, and Hanukkah were also the only variations that received the special treatment of the Hanukkah menorah graphic:
What a retailer selling Hanukkah products should do
Clearly, if we want full coverage of terms (and my wife to find your Hanukkah pajamas), we cannot rely on just optimizing for the highest search volume variation of the keyword, as Google doesn’t seem to view all variants as entirely the same. Your best bet is to include the actual string for each spelling variant somewhere on the page, rather than relying on Google to understand them as variations of the same thing.
If you’re a smaller player, it may make sense to prioritize optimizations toward one of the less popular spelling variants, as the organic competition may not be as significant. (Of course, this does not bar you from using spelling variants in addition to that for the potential of winning for multiple spellings.)
At a bare minimum, you may opt to include a spelling beginning with H- and Ch- and hope that Google will direct users to the same SERP in most cases.
I started an experiment to see whether the inclusion of structured data with sameAs properties may be a potential avenue for getting Google to understand a single spelling as an entity, eliminating the need to include different spelling variations. As of now, it’s a little too early to know the results of the test, and they are inconclusive, but I look forward to sharing those results in the future.
Opinions expressed in this article are those of the guest author and not necessarily Marketing Land. Staff authors are listed here.