This has been a frequent topic and piece of known wisdom in techie circles for probably five to ten years already. Over time, Google (and other search engines) have slowly lost the war against SEO spam and gotten worse and worse at surfacing high quality, 'organic' content. Reasonable minds can disagree over to what extent this might have been allowed to happen versus how much they just weren't capable of stopping it, but the trend is clear.
There has also been a simultaneous trend of intentionally designing Google to be less and less literal with search queries and trying to infer what the user 'really' wants, which probably does (or at least did) improve the average quality of results for unsophisticated users but also makes search much worse for power users and more precise queries.
The real problem is that the Internet has just changed since the early days of Google, and the algorithms that used to do a good job of finding quality results no longer work on the present-day Internet, and nobody has figured out better algorithms that do work (other than “google, but append ‘reddit’ to the query”).
Most people on the Internet these days don’t remember what search was like before Google invented the PageRank algorithm, but the TL;DR is that it sucked. People are so used to Google now (or at least, Google as it used to be) that everyone thinks of quality search results as being something that you just get automatically; if Google’s search results are bad now, it must because Google is intentionally not giving you the right results. But that’s not true! Getting quality search results is hard! (Consider the fact that searching for files on your hard drive (or even your Google Drive) has never worked as well as searching on the web. And corporate intranets invariably have terrible search result quality.)
PageRank (Google’s algorithm for ranking search results by tracking which pages linked to which other pages) was a game changer; suddenly you could search for something, and Google would actually find exactly what you wanted on the first try! All of the older search engines basically went out of business immediately (or became wrappers around Google), and “google” became synonymous with “web search” (at least until enough of their patents expired that it was possible to compete with them).
And for a while, things were good.
The problem though, is that PageRank is not a “find exactly what a user is searching for in any data set” algorithm. It’s a “find exactly what a user is searching for in the 90s/00s Internet” algorithm. It depends on the idea that lots of people will link to good web sites and few people will link to bad web sites, in a way that was true of the 90s/00s-era Internet, with its personal home pages and fan sites and special-interest forums and giant hierarchical directories, but which is much less true on the present-day Internet, which is almost entirely dominated by people trying to make money and direct traffic only to other sites that they own.
And as a result, Google search has stopped working well. To some extent this is because there are fewer high-quality not-trying-to-sell-you-anything web pages out there to find results on, but it’s also because the basic “lots of people will link to good pages and few people will link to bad pages” assumption that is the foundation of PageRank is just completely not true any more. (To some extent, PageRank was even self-destroying: in the old days, you needed those giant Internet directories and such to point out the good data for people, and PageRank could consume that to figure out the good link destinations. But as everyone came to depend on search instead, there was less need for people to explicitly link to sites that they found useful, which in turn meant there was less data for PageRank to learn from.)
In theory, there might be some other algorithm that does a better job of extracting signal from noise on the present-day Internet, but at this point everyone has basically given up on trying to find it, and is hoping AI will save the day instead.
The real problem is that the Internet has just changed since the early days of Google, and the algorithms that used to do a good job of finding quality results no longer work on the present-day Internet, and nobody has figured out better algorithms that do work (other than “google, but append ‘reddit’ to the query”).
Most people on the Internet these days don’t remember what search was like before Google invented the PageRank algorithm, but the TL;DR is that it sucked. People are so used to Google now (or at least, Google as it used to be) that everyone thinks of quality search results as being something that you just get automatically; if Google’s search results are bad now, it must because Google is intentionally not giving you the right results. But that’s not true! Getting quality search results is hard! (Consider the fact that searching for files on your hard drive (or even your Google Drive) has never worked as well as searching on the web. And corporate intranets invariably have terrible search result quality.)
The search engines before Google were quite poor at determining which pages effectively presented the information you sought. I recall that with AltaVista, the typical process was to initiate a search, and it would present several average results along with an interface for adjusting your search to indicate “more similar to this result, fewer like that result.” Users were anticipated to comprehend Boolean query terms (“python AND programming AND NOT (monty OR snake)”) and so on. Because search engines were poor and the Internet was considerably smaller, many people would locate sites via directories that organized websites into hierarchical categories rather than performing searches. (This is what Yahoo! initially was.)
And for a while, things were good.
The problem though, is that PageRank is not a “find exactly what a user is searching for in any data set” algorithm. It’s a “find exactly what a user is searching for in the 90s/00s Internet” algorithm. It depends on the idea that lots of people will link to good web sites and few people will link to bad web sites, in a way that was true of the 90s/00s-era Internet, with its personal home pages and fan sites and special-interest forums and giant hierarchical directories, but which is much less true on the present-day Internet, which is almost entirely dominated by people trying to make money and direct traffic only to other sites that they own.
And as a result, Google search has stopped working well. To some extent this is because there are fewer high-quality not-trying-to-sell-you-anything web pages out there to find results on, but it’s also because the basic “lots of people will link to good pages and few people will link to bad pages” assumption that is the foundation of PageRank is just completely not true any more. (To some extent, PageRank was even self-destroying: in the old days, you needed those giant Internet directories and such to point out the good data for people, and PageRank could consume that to figure out the good link destinations. But as everyone came to depend on search instead, there was less need for people to explicitly link to sites that they found useful, which in turn meant there was less data for PageRank to learn from.)
In theory, there might be some other algorithm that does a better job of extracting signal from noise on the present-day Internet, but at this point everyone has basically given up on trying to find it, and is hoping AI will save the day instead.
Have your Google search results altered?
Reviewed by Kanthala Raghu
on
January 14, 2025
Rating:
No comments: