What is BERT? | Fox News


BERT is an open-source machine learning framework that is used for various natural language processing (NLP) tasks. It is designed to help computers better understand nuance in language by grasping the meaning of surrounding words in a text. The benefit is that context of a text can be understood rather than just the meaning of individual words. 

It is no secret that artificial intelligence impacts society in surprising ways. One way that most people have used AI without their knowledge is when searching on Google. When doing so, it is likely that the searcher unknowingly used BERT in the form of an artificial intelligence algorithm since about 10% of all searches utilize it. This framework has allowed Google to recognize how users search by better understanding words within their correct order and context. BERT is more than just a part of Google’s algorithm, though. As an open-source framework, anyone can use it for a wide array of machine-learning tasks. 

Google headquarters in Mountain View, California, US, on Monday, Jan. 30, 2023. Alphabet Inc. is expected to release earnings figures on February 2.  (Marlena Sloss/Bloomberg via Getty Images)

What is BERT?

BERT, Bidirectional Encoder Representations from Transformers, is a machine learning model architecture pre-trained to handle a wide range of natural language processing (NLP) tasks in ways that were not possible before. Since its release as an academic paper titled BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018), it has revolutionized the world of machine learning. Google Research then released it as an open-source platform. That means anyone can use BERT to train their own system to perform natural language processing tasks.


BERT became such a big deal in the machine learning community because instead of reading text sequentially, BERT models will look at all of the surrounding words to understand the context. It understands a word based on the company it keeps, as we do in natural language. For example, the term “rose” can carry different meanings depending on whether the surrounding words include “thorn,” “chair” or “power.” BERT can understand the target word based on the other words in the sentence, whether they come before or after. 

What can BERT do?

Part of what makes BERT unique is that it is a bidirectionally pre-trained framework that can provide contextual understanding of language and ambiguous sentences, especially those comprised of words with multiple meanings. It is, therefore, useful in language-based tasks.

BERT is used within chatbots to help them answer questions. It can help summarize long documents and distinguish between words with various meanings. As an algorithm update in Google, it distributes better results in response to a user’s query.

Since Google has made the pre-trained BERT models available to others, the open source model is ready to be utilized, after fine-tuning takes place, for a wide variety of language-based tasks, such as question answering and named entity recognition. 

How is BERT used in Google’s search engine?

A year after the research paper was released, Google announced an algorithm update to the search queries using English. At launch, Google said BERT would impact 1 out of every 10 searches. Additionally, BERT impacts featured snippets, which is a distinct box providing the answer to the searcher directly rather than a list of URLs. 

Rather than replacing RankBrain (Google’s first AI algorithm method), it is additive to the underlying search algorithm. BERT helps the search engine understand language as humans speak to one another.

Image of a human head profile with light coming from brain

Signatories of the letter argue that language models like ChatGPT and Bard are based on the neural networks of animal brains, but in the near future, AI systems will be constructed to mimic “aspects of higher-level brain architecture and functioning.”

Consider the internet as the most extensive library in existence. If Google is a librarian, this algorithm update helps the search engine produce the most accurate results based on the request made by the searcher. Google uses BERT in its algorithm to help understand not just the definition of the word but what the individual words mean when put together in a sentence. BERT helps Google process language and understand a search phrase’s context, tone and intent in the way it appears, allowing the algorithm to understand the searcher’s intent. 


This new algorithm layer also helps Google understand nuance in the query, which is increasingly vital as people conduct searches in the way they think and speak. 

Before BERT, Google would pull out words it thought were the most important in a search, often leading to less-than-optimal results. Google fine-tuned its BERT algorithm update on natural language processing tasks, such as question and answering, to help it understand the linguistic nuances of a searcher’s query. These nuances and smaller words, like “to” and “for,” are now considered when part of a search request. 

Additionally, the technology takes cues from the order of the words in the query, similar to how humans communicate. Now, Google can better understand the meaning of a search rather than just the meaning of the words in the phrase.

BERT is not used in every search, however. Google will put it to use when it thinks that the algorithm can better understand the search entry with its help. This algorithm layer may be called upon when the search query’s context needs to be clarified, such as if the searcher misspells a word. In this case, it can help locate the word it thinks the searcher was trying to spell. It is also used when a search entry includes synonyms for words that are in relevant documents. Google could employ BERT to match the synonyms and display the desired result. 

Robotic hand typing on keyboard with AI text over image

Robotic hand types on computer. AI changes the way that we interact with computers and the data that we receive.

How is BERT trained?

BERT was pre-trained simultaneously on two tasks. The first is the masked language model. The objective is to have the model learn by trying to predict the masked word in a sequence. This training method randomly masks some input words with a [Mask] token, and then the computer predicts what that token would be on the output. Over time, the model learns the different meanings behind the words based on the other words around them and the order in which they appear in the sentence or phrase. Language modeling helps the framework develop an understanding of context. 


Next sentence prediction then pre-trains BERT. With this training system, the computer receives a pair of sentences as input, and it must predict whether the second is subsequent to the first. During this training, 50% of the time, the sentences are a pair where the second sentence follows the first, while 50% of the time, the second sentence is randomly chosen from the text corpus. 

The final training stage is fine tuning for a wide variety of natural language processing tasks. Since BERT is pre-trained on a lot of text, it is distinguished from other models and only requires a final output layer and a data set unique to the task the user is trying to perform. Anyone can do this, as BERT is open source. 

What makes BERT ‘unsupervised’?

BERT’s pre-training process is considered unsupervised because it was pre-trained on a raw, unlabeled dataset, which is another reason why it is a state-of-the-art language model. BERT’s pre-training used plain text corpus, such as Wikipedia and a corpus of plain text books. 


What does bidirectional mean in BERT?

BERT aims to resolve the limits that exist during the pre-training process of previous standard language models. Previously, these models could only look at text from left to right or right to left. In that case, context does not consider subsequent words in the sequence. 

Google search shows up on laptop on Chrome browser

Google search engine displays on the computer (Cyberguy.com)

BERT, rather, can learn the context of a word based on the words around it so it can understand the entire sentence, or input sequence, at once rather than one word at a time. This is how humans understand the context of a sentence. This bidirectional learning is made possible through the way that the framework is pre-trained with transformer-based architecture.

What is a Transformer, and how does BERT use it?

The Transformer is an encoder-decoder architecture by which BERT can better understand the contextual relationship of individual words in a text. In basic terms, the advantage is that Transformer models can learn similarly to humans: identifying the most important part of a sequence (or a sentence). 


The use of self-attention layers in the Transformer architecture is how the machine can better understand context by relating specific input parts to others. As the name suggests, self-attention layers allow the encoder to focus on specific parts of the input. With self-attention, representation of a sentence is deciphered by relating words within the sentence. This self-attention layer is the main element of the transformer architecture within BERT. 

With this architecture, BERT can relate different words in the same sequence while identifying the context of the other words as they relate to one another. This technique helps the system understand a word based on context, such as understanding polysemous words, those with multiple meanings, and homographs, words that are spelled the same but have different meanings.

Is BERT better than GPT?

Generative Pre-trained Transformer (GPT) and BERT are two of the earliest pre-trained algorithms that perform natural language processing (NLP) tasks. The main difference between BERT and earlier iterations of GPT is that BERT is bidirectional while GPT is autoregressive, reading text from left to right.


The types of tasks Google BERT and ChatGPT-4 are used for are the main difference in these models. ChatGPT-4 is used primarily for conversational AI, such as within a chatbot. BERT handles question-answering and named-entity representation tasks, which require context to be understood.

BERT is unique because it looks at all the text in a sequence and closely understands the context of a word as it relates to the others within that sequence. The Transformer architecture, along with BERT’s bidirectional pre-training, accomplishes this development.

Leave A Reply