Every few years, Google announces a significant update to its organic search system. From the inclusion of semantic search in Hummingbird to the announcement of machine-learning (RankBrain). Google is trying to get better at two things: understanding the intent behind a query and understanding the language of webpages. The better Google gets at these two skills, the more people will use Google search.
Why does Google keep investing in their search engine? It’s because they simply cannot continue ranking pages based on keywords alone. Users have evolving expectations from Google in terms of answering their needs. Plus, their queries are getting longer and more conversational. Google’s only solution is a constant improvement in their technology.
Let’s pause for a definition: Natural language processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language. NLP draws from many disciplines, including computer science and computational linguistics, in its pursuit to fill the gap between human communication and computer understanding. (Definition source.)
What do we know about BERT?
- BERT is an acronym for Bidirectional Encoder Representations from Transformers.
- BERT was previously released as an open-source base for pre-training deep learning models to ultimately boost natural language processing. It may sound complicated but think of it like gathering the materials before training an employee to do a particular job.
- BERT was announced as part of Google’s systems by Pandu Nayak (VP of Search) on Oct 25, 2019.
- Pandu says, “This technology enables anyone to train their own state-of-the-art question answering system.” Certainly Google.
The specific version of BERT that Google is using is very powerful. As Britney Muller explains it in her BERT video on MOZ, it’s a multi-tool that replaces several one-off tools. And it’s only going to get smarter.
Google started with Wikipedia as the corpus. BERT looks at clips of text and converts them to vectors. (Think of a vector as a translation for computers.) Then it uses a technique called masking to blank out a word. As part of the training protocol, text before and after the word (and maybe the entire clip of text) are reviewed. BERT is trying to figure out what the missing word is. The more it practices, the better it gets at understanding the context of the whole clip of text.
Additionally, the more Google can understand the question (query), and the more Google can understand the answers it finds around the web, the better Google can match the appropriate answer to a question.
Google says you cannot optimize for BERT. That makes sense. The training data is from Wikipedia. But, if you write poor, chunky, keyword-stuffed copy as part of your SEO campaigns, you can certainly improve that copy for a better natural language processing. Then, a smarter Google can see your improved copy, and consider using it as a result for more queries. If you write chunks of copy for eCommerce pages, or phone in your blog posts, I believe Google will now have a better idea of how low-quality your copy is.
As Google gets smarter, your content should be in step. I still see plenty of poorly written copy on the web. Or, content that is so complicated that context is missing for most normal readers. If you write for an average human, I believe you’ll have a better advantage when it’s your turn to get interpreted by Google. And maybe this will provide a higher likelihood of earning a rich snippet.