A Marketers Guide to TF-IDF Optimization for SEO

As digital marketers, content is a critical part of everything we do. And while analyzing and refreshing content may take a lot of time and effort, the results for generating more traffic and improving SEO are clear.

With the many things that go into creating content, such as competitor research, outreach and technical aspects of content, improving older content frequently takes a back seat—which in most cases, is a costly mistake.

What is TF-IDF?

While using the TF-IDF technique isn’t exclusive to the world of SEO, Moz defines it best:

TF-IDF stands for term frequency-inverse document frequency. It’s a text analysis technique that Google uses as a ranking factor — it signifies how important a word or phrase is to a document in a corpus (i.e. a blog on the internet). When used for SEO-purposes, it helps you look beyond keywords and into relevant content that can reach your audience.

On the surface, the formula may appear quite complex. So, let’s take a look at how to break things down in relation to content.

TF = (Number of times a term appears in a document) / (Total number of terms in the document)

For example, let’s assume that the term “log cabin” in a document of 100 words shows 12 times.

With TF, we have solved the first part to count how many times the term “log cabin” is showing on our document. The score of 0.12 represents the density of this term.

Now, we want to know how this term compares with rivals. We can calculate the IDF to obtain the comparison result, by dividing the number of documents the term appears in by the total number of documents in search results:

IDF = log_e(Total number of documents / Number of documents with term in it)

Let’s put the second part of this formula to use. Say that from 1,000,000 results, some are mentioning “log cabin” and the number count is 409,000 times.

Now let’s solve the logarithm:

IDF(log cabin) = log_e(1,000,000/ 409,000 with term log cabin in it)= 0.38

With that, we now have the density and the importance.

TF*IDF = Term Frequency times Inverse Document Frequency= 0.12 * 0.38= 0.046

Then you have also a result of your own TF*IDF. For the word “log cabin”, you have 0.017 while your rivals average is 0.046, which is higher than you.

The data gives you an indication that the term ‘log cabin’ is a common denominator in content that is ranking highly.

Is TF-IDF just keyword stuffing?

If you’ve been involved with SEO for some time, you’re likely aware of the concept of keyword stuffing, that is, the process of adding keywords as much as possible to help your chances of ranking higher.

This thing is, keywords density was an early attempt on how to game Google in performing TF-IDF optimisation. SEOs were trying to stuff their content with as many keywords as possible, and then Panda came and changed the rules of the game.

While keyword stuffing may have worked in the past, the data is clear that doing so now can significantly hurt your rankings.

No one gains value from seeing terms and phrases that aren’t naturally worded being added to content. While TF-IDF does help you better understand which words are used often in relation to SEO for example, the purpose isn’t to just then add those keywords randomly in your content. As always, Google continues to reward relevance to content trying to provide the best solution to a user’s query.

TF-IDF for SEO

In the world of SEO, TF-IDF involves scraping search results for a given keyword and collecting the data on the usage of those words and phrases.

For example, if you’re a SaaS owner and want to know how to attract more traffic using SEO, you’re likely interested in learning about the following topics.

An “SEO guide” could cover the following:

• SEO audit;
• Technical SEO;
• Page title;
• H1, H2.

But there are also other terms that are very important in SEO that should be considered.

• Tools;
• Reporting;
• SEO investment;

While there are many ranking factors that search engines use, algorithms naturally take note of how often certain words and phrases appear across the web, and because the algorithms are advanced, they also count how many times this term appears in all of the search results in comparison with other terms.

A TF-IDF “comparison score” can help you see how many times in a percentage a specific term appears.

To understand more with an example, these are the keywords that I want to target with a landing page for a real estate developer:

Using a TF-IDF tool, here are some of the words that are suggested to add to the copy, based on analysing the top 10 sites on Google search results:

• build home
• payments on a mortgage
• loan secured

There is a fundamental difference between retrieving variations of the same keyword and retrieving apparently unrelated, yet relevant, terms.

With TF-IDF analysis this is exactly what’s involved—with this type of analysis we will uncover exactly the terms used to consistently describe a topic better.

Hopefully, you’ll soon realise how important it is to have this type of information and the fact that it doesn’t require any data retrieval skill, you can appreciate how much time you can save. I have recently for example used a TF-IDF tool that suggested new terms to better describe the topic and improved ranking for my blog.

Inverse Document Frequency – the sweet spot between term frequency and content optimisation

How to use TF-IDF

To get the most from this exercise, make sure you’ve selected your articles and landing pages that are not performing as you’d like, for example, content you think is high quality but still stuck on page 2 or 3.

Next you’ll need to choose a TF-IDF tool to use with your website.

There are a number of tools available like this one or this one. I love to use SEMRush On Page SEO Checker (no affiliations). If you are advanced in Python, you can follow this guide to even build your own TF-IDF tool.

Enhanced keywords research

The biggest benefit of TF-IDF is that you can enrich your keyword research by adding not just those keywords people search for (hot tub breaks), but also keywords that Google found to appear quite often in search results.

Without a TF-IDF analysis, you wouldn’t be able to discover that terms like “romantic breaks”, “dog friendly” and “group of friends” were related to some of the best ranking content around hot tubs.

In-depth competitor research

If you’ve been doing SEO and content for a while, you’ve likely been in situations where you wonder why you’re ranking behind content that might otherwise be lower quality than yours.

We’ve been trained to think about getting better backlinks, longer content, more detailed content, internal links etc.

And while all of those points do matter, TF-IDF can give you a slight edge when including words and phrases that add value to your content while also being searched in relation to your terms you’re ranking for.

For example, we’ve seen that having “log cabin” and “lodges with hot tub” should be considered in the body copy of a page that wants to rank high for “hot tub break.”

Again, the point isn’t to keyword stuff. That doesn’t work. You want to achieve some sort of relevance for the terms deemed to appear in the collection of content.

One of the benefits of doing so, is that you can uncover some interesting insights on how Google sees pages that are very similar. Pages that roughly the same number of backlinks, have optimised for the same keyword, have spot on on-page SEO but still rank on different places on search results.

Once you have the data about the terms your rivals are using to better describe the topic, you can look at how competitors describe a given topic, what terms they use and how often, then optimize your content more effectively.

How to read a TF-IDF report

Now that you know which terms you are missing in your copy that would describe your topic more concisely, it’s time to read the report, understand the metrics and start implementing.

Here’s a breakdown of the important terms.

Word/ Phrase: the top 20 words used by your competitors to describe the topic of “hot tub breaks UK”

Rivals using this word: The number of your rivals using this word in the top 10 results. The more rivals using, the more important that word is.

Word/ Phrase usage: Compares how often on average this word is used in the body text from you vs your competition.

TFIDF: The result of the TF-IDF formula that retrieves the terms used in the comparison. It’s a great start for a brainstorming session of keywords describing a topic.

What to do after the report

Now that you’ve used TF-IDF to improve your research and content, it’s time to show you an example of how copy looks like before and after.

I have added the terms on the right that my TF-IDF tool suggested to add to better describe the content.

As you can see, there isn’t a lot of difference, I haven’t deleted anything, I have simply added to the content that is currently published on the page and found a natural way to add those terms in the flow.

The results? They speak for themselves.

In Google Analytics the same URL for the same period of time showed incredible growth despite the travel and hospitality industry took a big hit from Covid-19.

Older content is the best candidate for TF-IDF optimization. If you repeat the same process for each piece of content on your website, you can get quite a lot of cumulative gains across many pages without putting tons of hours into upgrading content the “old way.”

How should I use TF-IDF?

There are two main instances in which TF-IDF can be helpful

1. When you do keywords research.
2. When your content doesn’t rank on page 1 of Google search results.

When you do keyword research

Research your keywords to the best of your capabilities using the most common SEO tools at your disposal. Keep in mind that when researching these keywords, you are going to produce content that is not that much different from your competitors.

Chances are that something has already been written and Google shows millions of results for a topic.

Ranking well, is not just how long or through your content is, it’s also about how you’re able to describe things. Your goal is to target not only those keywords you search for, but also those terms people want to see in the copy (based on your data.)

After new content is published, most of the time it won’t rank on page 1 right away. Even if you have very high domain authority, a strong web presence and thousands of backlinks, there’s no guarantees.

The connection between your topics and the new TF-IDF terms should be a natural addition to your content. It shouldn’t feel like you’re just stuffing keywords here and there. While it’s always beneficial to include variations of a keyword in a copy, the aim of TF-IDF isn’t to simply stuff each word into the copy a couple of times.

Use the information from a TF-IDF analysis to refine your content, have a look at the topics you haven’t covered yet and continue expanding on angles your content might have missed before.

For example, it could be that a product is missing information about the size and delivery costs, so adding a couple of paragraphs showing how size can impact delivery costs. Might make a big difference. Ultimately, TF-IDF is a valuable tool that can help you take your content and rankings to the next level. It’s not a magic button by any means, but those small changes can add up.

Conclusion

• Start using TF-IDF to uncover more relevant terms, topics, and keywords instead of using your gut feelings on what Google deems as relevant content.
Gather data around specific competitors, keywords and topics that you want to target;
• Continue to experiment with your learnings from TF-IDF analysis, understand the reports and what needs to be done to successfully optimise for it. The best way to do this is to test different changes over time.
• Spend more time analyzing which terms are important rather than spending too much time building backlinks. Results from your TF-IDF analysis can take some time.

Author: Luca Tagliaferro

View full profile ›

(13)

This site uses Akismet to reduce spam. Learn how your comment data is processed.