the death of search

jul 26 2024

in 1998, Google revolutionized the way we browse the web, by creating the first actually good search engine. with an advanced page ranking algorithm, they provided better results than anyone else; and with a streamlined interface, you'd get everything you want, with nothing you didn't.

go to Google today, however, and you'll find quite the opposite. you'll be given a giant box of AI-generated lies, a page full of ads, and finally one or two "articles" of meaningless SEO sludge. if you try to look up anything remotely related to a product, you'll get nothing but results trying to sell things to you.

so where did it all go so terribly wrong?

for the first fifteen years or so of Google's existence, they fought hard against SEO. any time someone tried to game the system, they'd change the ranking algorithm to punish that behavior. from link farms to keyword stuffing, these primitive techniques were quickly discovered by Google, and prevented from functioning further.

by the mid-2010s however, Google seemingly stopped caring. while hard to prove this, of course — due to Google's desire to keep their algorithms a secret — anyone who has been using the internet since then probably agrees with this assertion. it makes sense why they would, too; being they have a comfortable monopoly on web search¹, why should they care?

even then, the changes that make Google miserable to use now aren't just that the results you get suck ass; it's that the whole page around the results sucks too. perhaps the most obvious example of this is the ads. originally, Google's ads were in fully highlighted boxes, making the sponsored content obviously different from the actual results. later, this was reduced to a small yellow label; now, the label is almost invisible.

but is Google really the only problem here?

think back to the early days of the Web. in order to express yourself, you'd simply make your own website. the pages were static and stupid. the most advanced features you'd get is perhaps a guestbook. while fine to share your own ideas, they didn't let you communicate with others.

in order to communicate through the Web, server-side tricks are used, and so this "Web 2.0" changed the relationship of server and client. no longer did the server send the same data to every client; now, the data sent to each client depends on who they are, and what they can prove.

now, this ability is not all bad. by allowing multiple users to see different things, each user can be given the ability to post their own things directly, instead of just consuming the content. without it, the rise of the social media we know today may not ever have happened!²

instead, it was a much subtler change that this ushered on; the ability to control who can access what information, and how. while seldom used at first, the ability to keep certain websites private has become one of the core uses of such account systems today. even "public" groups on Discord require one to register an account and prove they are "human" by default; while in the days of forums, such protection was almost never applied to the site as a whole.

with many types of online interaction pushed outside the Web and into dedicated apps, what else could there be to search? of course, that relates to the final issue at hand; the rise of generative AI.

language models are, of course, not new. and with exponential computation abilities, so too does our ability to model language on them grow. but a funny thing started to happen around with the release of OpenAI's ChatGPT; these language models got good enough to fool not just people, but search engines, too.

the more worrying trend, of course, was where the data to train these language models was coming from: crawling the Web. many webmasters were rightfully furious at the breach of trust, and retaliated by blocking crawlers from browsing their pages entirely. since search companies have been eager to sell out to AI startups, there is simply no way to block one but not the other; and so your page is lost to both.

most notably, however, was Reddit's behavior. Reddit, of course, being a for-profit company, had no issue with turning their user data into a quick buck by selling it. instead, they only cared to prevent supposed freeloaders from accessing their pages, for whatever reasons. the result? to crawl Reddit for search, you now must pay them. of course, their site has its own search tool — one which only shows you their ads — so that may explain this behavior.

with all this noted, you may find that the natural language chat interface of ChatGPT is an improvement over a standard search bar. after all, allowing you to simply ask a question and get an answer — instead of having to fumble through pages upon pages of things that don't even seem very relevant — sure seems like an improvement.

but is a fuzzy memory of the Web enough to replace a proper search engine? not directly. after all, the core value of the Web is in the ability to discover page after page, to go from one site to another with ease. perhaps the future is using the large language model to query an actual search engine to find actual information, like asking a librarian to help you find a book.

in the end, there is no one nail that is to blame; all we can be sure of is that the coffin is sealed.

1. in the West, at least. Baidu and Yandex have their own turf in China and Russia.

2. perhaps it shouldn't have, honestly.