What Is Google BERT? Definition, uses and how its works

Team TeachWiki

BERT is a neural network created by Google in 2018 and has already proven to be beneficial in a number of practical applications. Using it, you can solve a number of tasks: analyze text, answer questions, create translators, detect spam, create predictive text input systems, etc. In October 2019, Google added a BERT neural network to the core of Google's search algorithms for English, and in December for more than 70 languages. The new search update is called BERT and affects 10% of all search queries.

What is the BERT update?

With publication on October 24, 2019 and entry into force on December 12, 2019 in Germany, Google rolled out BERT, according to their statement, "one of the greatest advances in the history of search". Every tenth search query is affected by this algorithm! BERT stands for Bidirectional Encoder Representations from Transformers. BERT is used to better understand the context of a word and thus the actual search intention of the user. If prepositions such as "after" or "for" were ignored, Google uses BERT to recognize their meaning within the search query.

What is Google BERT?

Google BERT (eng. Bidirectional Encoder Representations from Transformers) is an algorithm of the Google search engine , the main function of which is to improve the relevance of search results . This became possible due to the fact that the analysis and construction is carried out not by key phrases, but by sentences.

An improved analysis result is available due to the introduction of a neural network into the work, which has the ability to endow the search engine? with understanding in different languages of the world.

The main testing language is English. In the future, the developers plan to expand the list to include most of the languages used in the world.

  1. BERT is a neural network created by Google

    Teaching computers to understand natural text, as humans do, is an interesting and extremely difficult task. Indeed, in natural language there are many nuances that even people can hardly understand. In computer science, there is a whole subsection of Natural Language Processing (hereinafter referred to as NLP) - natural language processing methods. NLP allows you to apply machine learning algorithms to text and speech.

    Today, many of us have smartphones with speech recognition - they use NLP. Also, many people use laptops and computers with speech recognition built into the OS.

    So, in 2018, Google announces the latest neural network BERT (Bidirectional Encoder Representations from Transformers). BERT uses Transfer learning (transfer learning), which allows you to use its basic language model, additionally training it for your specific tasks, for a wide variety of companies. This means that neural network training occurs in two stages. First, they train her for a long time and expensively on a huge corpus of billions of words (this is called pre-training). At the second stage, the network can be quickly trained for different tasks.

    Previously, one of the main pre-training tools was something like a dictionary, using a vector representation of words that described the relationships between words in the form of numbers. However, the neural network that had been pre-trained on the vector dictionary did not understand the meaning of the words. From her point of view, the sentences "the man bit the dog" and "the dog bit the man" are identical.

    Google has developed a unique pre-training system to provide the neural network with richer rules - not only vocabulary, but also syntax with context. Researchers have begun training neural networks on a more general task called language modeling, feeding vast amounts of text to the neural networks—billions of words arranged into grammatically correct sentences. After that, the neural network should be able to predict the next word in the text on its own.

    In essence, BERT consists of three essential components. Firstly, it is a pre-trained language model. Secondly, the opportunity to decide which of the features of the proposal are the most important. Third, unlike other pre-trained language models created by neural networks processing terabytes of left-to-right text, the BERT model reads both right-to-left and left-to-right at the same time, and learns to predict which words were randomly excluded from sentences.

    Each of these three components - a deep language model with pretraining, attention, and bidirectionality - existed separately before BERT. But until Google released their algorithm at the end of 2018, no one combined them in such a successful way.

  2. Introducing BERT into the core Google search algorithm

    In October 2019, Google rolled out an update to the core search algorithms called BERT. From now on, the neural network of the same name will work as a constituent core of search algorithms. According to Google, this is the biggest core update since RankBrain.

    Many webmasters have not noticed traffic spikes after the implementation of the new algorithm, although Google assures that it affected 10% of all search queries. To understand why webmasters didn't experience significant fluctuations in traffic, let's take a look at how BERT works and what queries it affects.

    The typical webmaster is mostly focused on mid-range or even high-frequency queries that are quite short (1-3 words). BERT is focused on handling long queries. That is why most webmasters do not yet notice the fluctuation that BERT has had on SERPs.

    So, BERT better understands and interprets LF and micro-LF queries, long tail queries. In particular, this can be confirmed by doorkeepers who have experienced sharp fluctuations in traffic and some webmasters who work closely with 3-5 wordlists, which is typical for product affiliates, for example, those working under Amazon.

    It must be understood that BERT is not a ranking factor. It does not directly affect the ranking of organic issuance. However, the neural network allows Google to better interpret user requests, better understand the content. This is what can have a strong impact on your traffic, after further training of the neural network.

    Just because you haven't noticed significant fluctuations in your search results doesn't mean there haven't been any. You just didn't look for them.

  3. How to find queries affected by BERT and how to optimize websites for them

    The technique for searching for queries influenced by the BERT algorithm is quite standard and I have painted it many times over the past few years. For example, see the article about YMYL in the " How to find low-quality content on the site " section.

    Another way is directly in the section of the article about BERT, where we search for queries using the Search Console .

    In general, the algorithm is very simple. You can use Google Analytics or Google Search Console to search for queries that have been affected by BERT.

    We know the release dates of the algorithm - October 21, 2019 for English-language sites and December 9 for Russian-language sites. We count 3-4 weeks after the release date (or more) and compare it with the same previous period.

    For search we use channels (organic issuance of Google). And then we build a section by keywords.

    We are interested in queries that existed in the previous period, but in the new period (after the implementation of the algorithm) have no impressions. This is how we find lost requests. Having sorted the result by those missing in the previous period and those that appeared in the new one (after the release of the algorithm core), we will see what requests appeared, how Google now interprets the content.

    If content was added to the site during the specified period, then requests suitable for new content should be excluded from the analysis.

    Having received a list of requests, you can start optimizing your site.

    As you know, Google employees claim that optimization for BERT is impossible. Danny Sullivan and John Mueller spoke about this as well.

    "Query queries are not something you can influence in terms of SEO," Mueller said.

    “If there’s one thing you can do to optimize for BERT, it’s to make sure your pages have natural text… Instead of using as many keywords as possible, write naturally.”

    Not thick, is it?

    So, in order to “optimize” for the BERT algorithm, I recommend examining your data set for acquired and lost keywords and doing traditional content optimization to improve or restore query positions.

    First, you need to identify the keywords you lost after the BERT update and start editing the content to restore them. In this case, you do not need to add these search queries to the content at all. Sometimes it is enough to add a few prepositions and rephrase a couple of sentences.

    In the case of “missing” requests, most likely the page has ceased to rank due to a “shift of emphasis” in terms of content, just like in the example about the teacher and student. The BERT update helps Google better understand semantics (the meaning of words and phrases). This means that if you previously ranked for a long tail phrase but lost positions after implementing BERT, then it is likely that the page matched the keywords in the query but did not actually match the search intent. It is necessary to add meaning to those phrases and focus on those words that correspond to the user's intent.

    To put it simply, the page used to rank undeservedly for lost queries. The new BERT algorithm helped Google understand this. You will have to try a little bit by updating the content to get back the lost requests.

    Improve your content also for those keys that appeared after the BERT update. Study what exactly competitors write and how they place accents in their texts. Use your competitors' ideas to make your content more "valuable" than theirs. You need to respond to specific queries in your content better than the competition.

    A huge number of ways to optimize are given in my book SEO Monster 2020 . On more than 700 pages, all the most important ranking factors and methods of influencing them are revealed with practical examples.

    Good growth was shown by information sites built according to the SILO structure. Here, isolated query clusters more accurately answered low-frequency queries due to the presence of a significant number of “supporting” pages.

Conclusion

The introduction of the BERT neural network into the core of Google's search algorithms is the next step for the corporation to improve the understanding of user queries set in natural language.

The neural network will develop at all levels. I am sure that she will pass not only post but pre-training, will be constantly trained, including by the assessors who control the quality of organic issuance. All this will have an impact on website promotion strategies and methods of manipulating ranking factors in the future.

And note that the Russian-language model, compared to the English-language model, went through pre-training an order of magnitude worse and shows much worse results. This means that major updates to both the language model and the trained BERT neural network are yet to come.

Get ready and read the right literature to understand how to optimize your site.

Comments