This week MajesticSEO released a new “wildcard” feature to their site explorer.
What this does is allow you to explore the links to all pages of a site under a specific folder or part URL.
This means that even if a web page is a 404 error page itself, if there are a range of pages under that URL syntax that you would like to analyse, then you now can.
The primary way I plan to use this tool is for content analysis and planning; In order to test this out I used it to analyse the popularity of every post (in terms of links) on the Moz blog in one spreadsheet (Moz blog being the most popular/trafficked resource in the SEO industry).
But first, the important stuff: Being British, the first site that I thought was ripe for testing this on was the premiership website to see which teams have the most links.
The URL of the above page is:http://www.premierleague.com/en-gb/clubs.html.
In order to check that the folder structure does not alter when you go into an individual club page I had a quick check on the URL’s:
http://www.premierleague.com/en-gb/clubs/profile.overview.html/everton
http://www.premierleague.com/en-gb/clubs/profile.overview.html/arsenal
Good – they are consistent.
This means that we can run the analysis using the URL syntax:http://www.premierleague.com/en-gb/clubs/
Another good feature of this tool is that it takes into consideration the trailing slash and the non trailing slash version and also can deal with SSL (HTTPS) pages. The screenshot below shows the set up for the report:
Steps:
- Go to site Explorer
- Select the radio button “Path (URL*)”
- Click on the “Pages” tab
As you can see from the screenshot above Hull City are more popular than Man U! I should contact the Hull Supporters club about this, surely I could get a link out of that 😉
Real life SEO uses for the Majestic Wildcard
1. Determining which sites like what types of content
Let’s start a little closer to home. I decided that I want to know which sites and blogs in the SEO community particularly like certain pieces of content. Therefore I ran an analysis on the 1,683 posts on the Moz.com/blog as this is the most popular resource in our industry.
So this is nothing new really – we can see all of the top resources on the Moz blog and how many links and how many referring domains they each have pointing at them.
You can click on the little icon to the right of the link icon to run a report to analyse the backlinks for each of these pages separately to see which sites link to them but what I really want is a report that tells me which sites link to which posts so that we can see aggregated data for each of the linking sites and see patterns. Moz.com has 6,036 sites linking to their blog in total (according to MajesticSEO) so which types of post do each of these sites tend to link to? What are the patterns?
This information can help you to quickly formulate target sites for outreach depending on the content that you are producing.
Fortunately; if you navigate to the “Reports Tab” in MajesticSEO then you can run a report on the Moz.com/blog syntax to get a report on all of the links to all of the pages in the blog: Set up like this:
So; just as a reminder of our aims: we want to find out which blogs are linking to which pieces of content on the Moz blog so that we can determine their interests.
The below screenshot shows that in total there are 6,036 sites linking to the Moz blog with a total of 294,765 links. There are obviously a lot of blogroll links there!
Next we need to get the report into Excel so that we can start playing with it.
The first thing that I wanted to do in Excel is to extract the linking domains from the actual links. This is easy to do using 2 functions in Excel:
- Run a find and replace (CTRL + F) on the link source column.
- Use the text to columns function using a trailing slash to extract the domain only and to get rid of all sub folder on the links – see screenshot below:
Running That function leaves you with a spreadsheet like this; from which you just need to delete all columns after the domain to the last one with something in it:
There are a few more steps to cleaning this data but I do not want this to turn into an Excel tutorial so I will just explain what I did and why, if anyone is confused or wants to know how something was done then please just leave a comment below and I will give the detail.
Once I had the domains I needed to remove the duplicate cases of the rss driven type links – where for example every page in sitexyz.com was linking to every post on the Moz blog. Those details tell us nothing really.
There were also many other things to clean up such as people linking to Moz.com posts with the google analytics tracking tags intact or certain resources with multiple pages etc. I cleaned all of this up bit by bit (did not take more than a few minutes).
What I wanted to see was which sites link to certain posts and which posts get the most interesting sites linking to them.
3 hours later
Arrrrgggghhhhhh – Unless you have a powerful computer that can handle very large Excel files then do not try running this analysis on a large site like Moz. I almost gave up with each crash of Excel. The great thing is that once you have the data in a pivot table it is really valuable; now I can see all kinds of things about what content pulls in links on the Moz blog. But – it is just incredibly slow to process any filters that I apply!
This is what the pivot table looks like:
This image shows just 9 articles and around 30 sites but there are in total 1,683 articles along the top (X Axis) and 6,036 sites linking to them (Y Axis). There is a shit load of data here but it is heavy in Excel.
What I would really like to do is to transpose it onto the opposite axis so that I can easily filter by specific keywords in the article titles so see which sites like to link to that type of content. But unfortunately in order to make the spreadsheet more flexible I deleted the working pivot sheet once I had copied it so would need to do the whole damn thing again in order to view it that way as the Excel transpose function will not work with this large a file.
Here are a few insights:
Sites that link to the Moz blog the most
Seeing which sites/blogs like certain posts
You can look at which sites linked to certain posts such as this one about tools for creating infographics. This will help you to determine sites that you could carry out outreach to if you are doing something similar such as you have a service or a tool to create infographics.
Or viewed another way – Say that you want a link from 20 or 30 sites in the industry and you want to get a quick snapshot of what content they like without doing a link analysis for each site individually. You can use this pivot table to filter by your target sites and see which posts they link to:
It is also possible to do this on aggregate by filtering the file for all posts that have “infographic” or “not provided”, or any other topic that you cover on your blog, in the title and to see which blogs link the most to these types of post. They are the ones that are most likely to link out to those topics on other blogs such as yours, unless of course they are just one of those sites that simply links to everything on Moz.com, but fortunately there seem to be few sites like that in our industry.
Moz.com have many themed posts such as “10 ways to …..” or posts that begin with “how to” or “why does” – just scanning through the post names you can see so many potential areas for analysis.
I do think that the best though is to filter the Moz.com/blog content by specific keywords that relate to whatever you are doing and find the sites that link the most to that type of content and to then reach out to them first.
No – I have not validated any of this or tested to see what success if any I get. I just saw the Majestic tool and wondered what I could do with it and Excel.
I am in the process of uploading the spreadsheet to Google Docs so that anyone who wants to take a look can freely do so. I will paste the link in here as soon as it has finished loading (if it finishes loading – more than 1 hour now)!
You can read Majestic’s blog post about the new feature here.