Whatever the reason, this is a perfect example of why raw correlation is flawed

zihadhasan019 · Post by **zihadhasan019** » Sun Dec 22, 2024 6:19 am

This graph is the clearest illustration yet of why it's so important to build systems more advanced than simple, direct correlation. According to this chart, employing the query term in the path or filename of the URL is actually slightly negatively correlated with ranking highly, while the subdomain appears largely useless and the root domain has strong correlation. Granted, all of these (except the root domain) are on a very narrow band of the x-axis, but SEO experience tells us that using keywords in the name of a page is a very good thing, for both search rankings and click-through rate.

Whenever we see data like this, a number of hyp email lists australia otheses arise. The one we like best internally right now is that the URL path/filename data may be skewed by the root domain keyword usage. Essentially, when a root domain name already employs the keyword term, the engines may see those who also employ it in the path/filename as potentially keyword stuffing (a form of spam). It may also be that raw correlation sees a large number of less-well URL-optimized pages performing well due to other factors (links, domain authority, etc.

It's also true that most sites that employ the keyword in the path/filename don't use it in the root domain as well, so the negative of the one may be mixed-in with the positive of the other. and why a greater depth of analysis - and much more sophisticated models - are critical to getting more value out of the data. Can We Build a Ranking Model that Gives more Actionable Takeaways? To get to a true representation of the potential value of any given SEO action, we need a model that imitates Google's.