How to Measure the Quality of your Backlinks

Locks of love on bridge in Paris

In a recent study by Rand Fishkin, over 1500 industry professionals indicated that the second highest ranking factor is still “quality backlinks”.

But what is a quality backlink?

We have a hypothesis.

Google is the sole determining factor of what is a quality back link.

However, they don’t provide any data on this.

Or do they?

What Google do provide is backlinks they have identified.

Our hypothesis is that if Google is not identifying a domain in Google Search Console then possibly they don’t rank that domain as quality.

Google does crawl the web constantly so if you have had a link at a domain for some time and Google does not record it then possibly it is not a domain they consider as a quality domain and therefore worthy to be included in the links list.

Therefore, if we have a list of these domains we can build a database of domains that Google may not see as quality.

Therefore meaning those they do list they see as having some quality or least merit to record as a link.

So if there is a way to see the backlinks they identify vs what everyone else identifies should be a good starting place.

Further if we can build a database of domains that Google recognises vs those they don’t then this might be a valuable tool for everyone.

To try and prove our hypothesis we will need your help.

We have done the analysis we outline below and found that in the sample we had 68 domains in our Google list and only 12% of these matched the Google list from our Ahrefs list.

Google we would hypothesize only find 12% of the links identified by Ahrefs as “worthy” and more importantly they have found a whole other list of domains with links not in Ahrefs list!

Wow!

First we need to show you how to do the analysis using Google Search Console and Ahrefs.

Then at the bottom in the comments, do the following so we can run a straw poll to see if there is any potential merit to our hypothesis.

Number of domains in the Google List- 68
Number of domains that matched – 12%  (measure of the quality of your links)

The percentage of matches provides a relative value of quality of your backlinks.  In the analysis of a sample of websites we have done the numbers range from 0% to 30%.

The second part of testing this hypothesis is that we need to get a copy of a sample of peoples csv download from Google as you cannot access this through the API.  

If you happy to share your data plus your domain we can then run this analysis and build a database of domains Google provide as a link source vs those they don’t.  

This data and hypothesis is only useful if we can test it over a large sample of websites.

If you want to participate in the study please email your CSV file to us at support@omologist.com and please include your domain in the email.

 

Process for identifying backlinks recorded and not recorded by Google

 

Step 1 download data from Google Search Console

 

 

First step is you need to collect your data from Google Search Console.

  1. Log into Google Search Console
  2. Click “Links”
  3. Under the box Top Linking Sites click “More” (see image above)
  4. Once the page opens you will see a list of external domains that link to your website. To complete the process you will need to download this list.
  5. On the top right of this box click the downward arrow to download the file – see image below. You can choose CSV as we provide the process for excel below.
 

 

Step 2 collect your data from Ahrefs

 

 

  1. Log into Ahrefs
  2. Click on “Site Explorer” and type in your domain.
  3. Once the site page opens click “Backlinks”
  4. Leave the default settings for “Group Similar”
  5. In the right corner you will see “export”. Click “export” and then choose quick export.
  6. Depending on the size of the file it may take a few seconds to a minute or so. Ahrefs will provide a message when the file is ready to download. Download the file to your desktop.

 

Step 3 Bringing the data together

You should now have two spreadsheets.

The first of these will contain the data from Google Search Console and should look something like that below.

 

 

The second sheet will have the data from Ahrefs and should look something like that below.

 

 

First step we want to take the “Referring Page URL” column from the ahrefs file and copy that to the Google Search Console.

For those that are not as familiar with excel we will provide you with some steps as we go.

Open the ahrefs file and click on the heading (once) for “Referring Page URL” so that the column is checked.

Hold Shift Control and push the Page Down key to highlight the column.

Next right click your mouse and click “copy”

You should get something like that below.

 

 

Now open the excel page that has the Google search console data and click in the cell E1

Now right click your mouse and “paste” the data into this sheet so that it looks like that below.

 

 

Step 4 Clean up the data for analysis

As Google only provide the domain, we want to clean up the URLs from Ahrefs in order to do the comparison.

First things we need to do is remove the http, https and www.

You might want to make the column E larger.

To do this, double click the E at the top and this will make the column as wide as the largest URL in the list.

Next put your mouse on E2 and click once. Hold Control shift and then push Page down once to highlight the column.

Now in your menu ribbon at the top click “Replace”.

 

find and replace

 

In the box for find what, type – https:// . See the image below for an example.

 

 

Now click “replace All” and this will remove https from all the urls that have this.

In the same find what box remove the https:// and type in http:// (ie no s on http).

Now click Replace all to remove http:// from all the URL’s.

In the same Find What box, remove http:// and type www. Don’t forget the dot at the end of the www.

Now click replace all.

 

 

The front of your URLs in that column should now look like below and clean.

 

 

Now we need to clean the other end of the URL.

Click your mouse on cell F2.

In cell F2 type the following.

=FIND(“/”,E2,1)

What this does is finds the first backslash in the URL and tells you the number position of the character in the URL.

Now click on cell G2

In cell G2 type the following

=LEFT(E2,F2-1)

What this does is tells excel to select the URL starting on the left and stopping at the character just before the backslash.

You should see a clean URL for the URL in that first row.

 

 

Now we copy these two columns with the formulas down the page so that the same process is applied to all the other URLs.

To do this we need to highlight the two cells.

Then you will see a green box around both cells like the image below and a green dot in the bottom right corner.

 

 

Double click the green dot.

That should then copy these two columns down the page for as many URLs as you have in your list.

So now you should have a nice clean list of URLs.

 

 

Last step before we match is we need to remove duplicates and excel provides a function for this.

First we need to copy the column G2 into H2 as G2 contains the formula and we need to turn the result back into URL for analysis.

Click on cell G2

To do this Hold Control, Shift and then click page down the highlight the column.

Click on H2

Right click your mouse and click “Paste Special”.

Now choose “Values” and click “OK”.

 

 

This will have copied a copy into the column H

Now click on H2

Now Hold Control Shift and push the page down button to highlight the column.

 

 

With the column highlighted, now click “Data” in the top menu and then click “Remove Duplicates”

This will ensure you have one of each domain.

Now in I2 type the following.

=MATCH(H2,$A$2:$A$69,0)

You will need to replace $A$69 with however many data rows you have.

So for example if go down to row 250 in column A from Google then your match would be as follows.

=MATCH(H2,$A$2:$A$250,0)

 

 

Step 5 Doing some analysis

In column I, we have have a list of matches (those with a number) and those that don’t match.

We want to now count the number of domains that match. If you have a short list then this will be easy, however, to make this process easy, lets do the following.

Click the cell J2.

In J2 type the following.

=IF(ISNA(I2),0,1)

This will check to see if there is an error and if so, then place a 0 in the cell, but if there is a number (non error) then it places a 1.

Copy this cell down as far as you have domains in column I.

Now at the top of column J, click cell J1.

Type in the following in cell J1

=SUM(J2:J25)

You may will likely need to replace J25 with J whatever number of rows of data you have.

For example if your data goes to row 897, then edit the formula to be

=SUM(J2:J897)

You should now have a number at the top in J1 which is telling you how many domains matched the Google List.

 

 

Now we need to count the number of domains we had in our ahrefs cleaned list as well as the number of domains in the Google List.

To do this we will use a function called countA in excel.

Click your mouse on cell H1

In cell H1 type the following

=COUNTA(H2:H25)

As with other columns the number of rows of data will ary so you will need to edit the H25 to be H the number of rows data you have as we have previously.

 

 

Now do the same process for column A.

Click on cell A1 and hit delete to remove the column heading “Site”

Now type the following, once again amending the final number to the number of rows of domains you have.

=COUNTA(A2:A69)

Click cell I1.

Now type the following.

=J1/H1

Now set that cell to be a percentage by clicking the % symbol as per the image below.

 

 

Now we repeat this process to get a percentage of domains that match the Google list.

Click cell K1.

Type the following into cell K1.

=J1/A1

Now turn this to a percentage as we did in the last step.

Reading the Data

So in our example we had 68 domains in the Google List.

In our cleaned and de-duped Ahrefs list we had 24 domains.

Of the 24 domains in Ahrefs and the the Google List only 12% of these domains matched!

Limitations on the hypothesis

There are some limitations on the hypothesis.

  1. Google will only provide you with the first 1000 links of data.  If you have a website with a lot of backlinks then it is harder for us to test this hypothesis on your website.  
  2. The analysis assumes Ahrefs is really really good an finding all backlinks.  However, this is not the case because even in our sample in the article Google had identified a number of domains with links that Ahrefs had not.  They do a pretty good job so for this purpose it’s a good way to start testing the hypothesis.
  3. Google does not provide any meterics around their data that point to quality which is why it’s a hypothesis.  But we can take the conspiracy theory view and assert that because Google does not allow you to download your links through the API and that they limit the data to 1000 that its because the link data has a quality aspect.  It could be that they just don’t want us downloading mountains of data of course.

Conclusion

So what are your numbers?  How good are the quality of your links?

Help us do some quick and dirty analysis and put your numbers in the comments plus and perspectives.

If you can use the format

Number of domains in the Google List- 68
Number of domains that matched – 12%  (measure of the quality of your links)

More importantly if you are willing to share your data please email your Google CSV file to us at support@omologist.com as well as your domain that these links relate to and our plan is to try and do this analysis on scale and be able to provide a list of domains.

Thanks and looking forward to see what other people get!