The future of search
There are many reasons I’m grateful to Josh Hall for inviting me to the Foresight conference, but one of the main ones is that now, I get to advertise myself as a genuine expert on the Future.
I realized this when a small party, including both Professor Hanson and I, went to one of the Stanford strip-malls for a deli lunch. “So what are you guys speaking about?” asked a lady in the line. A fiftyish Palo Alto tennis wife. “What?” I said. “What? How about some ether?” Then I looked down, and noticed that someone had stuck a “Speaker” tag on my “Foresight 2010” badge. There was only one possible answer. “The Future,” I said.
And when I think of the Future, I am of course reminded of one of the real speakers at Foresight—the incorrigible Paul Saffo, who is perhaps one of the five glibbest people on the planet. When Paul Saffo opens his mouth, you see hundred-dollar bills—rowed-up like a shark’s jaw. Paul Saffo is not a dumb guy by any means, but his benjamin flow is definitely in rather than out. I keep telling my wife I’m working on it.
At the risk of sounding like Paul Saffo, I have a subject that I think will please the general UR community: the future of search. The Future of Search! Are we ready for that? Can we go there? The Future of Search? Or is this just too presumptuous?
While apparently a prominent expert on the Future, I actually know nothing at all about Search. Frankly, the problem has never really interested me. Nothing in software is really interesting except system software design. Search (at least, search as we know it) is a heuristic algorithm—a class of solution for which both education and career have taught me nothing but contempt.
In short, as an ancient system software guy (I started CS grad school in Berkeley the same year Sergey Brin started at Stanford), my attitude toward search is much that of a cavalry officer toward the machine-gun. A Victorian cavalry officer. I can tell you exactly what I thought the first time I heard about full-text search: it made me think of library science. In fact, it still does.
So in reality, I am anything but an expert on either Future or Search—let alone the Future of Search. Rather, I confess some expertise as to the Past; and I know a thing or two about Protocols, Kernels, Languages, etc., etc. On Past Kernels I am really at my best. With that said, let’s look at the Future of Search.
There is only one real question about the Future of Search. Will it be Google, or won’t it? And if it won’t be, what will it be? In other words: who kills Google, and how? Or is the Google Age doomed to last forever?
Since I have been kicking around this town for almost twenty years, I have seen the colossi come and go. The Google of 20 years ago, for instance, was SGI. Where is SGI now? Managed into the ground. (As it so happens, Blogger videos were recently broken for over two weeks. Reading that thread, I see instantly that Google has no concept whatsoever of QA. And perhaps that works for you, Google! Or perhaps it’s worked in the past. But…)
The Google Age could end just because Google grows old and sclerotic, despite its vast pool of brains, and starts regularly screwing up like this. In this case, it will be replaced by a younger, crisper Google. This matter does not really interest me—a mere corporate transition. (Anyway, despite little crap like this, Google is still so far as I can tell an extremely effective operation.)
As a pseudo-expert on the Future of Search, I cannot tell you when the Google Age will end, or who will end it. All I can tell you is what will end it. This is probably what you wanted to know, anyway.
The Google Age will end when the application we presently know as “search” is replaced by some other application, which does the same job for the user, but (a) does it much better, and (b) does it in a way that leaves no role for Google or anything so profitable.
Ancient geek that I am (my first tech bubble was the CD-ROM bubble), I have seen this process dozens of times. It’s called commoditization. Google makes money because search is extremely difficult to implement, and just about impossible to implement well. It makes money in the only righteous way: solving a hard but necessary problem.
But once this problem becomes easy, such a company has a tough time of it—even if that company itself defined the problem and led the market in solving it. Even if that company itself makes the problem. For instance: if search becomes easy to implement, users start expecting ad-free search. Problem goes away; company goes away.
And Google—a collection of atomic individuals, really in fact among humanity’s finest, who have at great profit to themselves clustered together on Earth’s surface, in this place, under this name, eating this lunch, to solve the problem of ad-supported search—dissipates into air, like spirits, like smoke, like time itself. Where are the SGIs of yesteryear? There were so many good people at SGI.
Nonetheless, commoditization has not happened to Google. It is not about to happen. Because as anyone at Google (or any of its competitors, none of which is anywhere near killing it) can tell you—search is and will always be very, very, very hard. At least, search as we know it.
Commodity search, if there is any such thing, is clearly the Future of Search. But commodity search cannot be search as we know it. It cannot be the same technical problem that today we know as “search.” That is, it cannot be the library-science problem which Google is solving. Rather, it must be a generic utility.
Commodity or utility search must be a solution to some different problem, which fulfills roughly the same user need as Google search. Clearly, utility search can only be system software: a platform, not an algorithm. At least, so my prejudices inform me!
What is search doing, anyway? The search experience I, the user, need: I type a line of text into a box. In response, I get a list of links relevant to that text, listed in order of importance.
Of course, producing this metric—importance—is the hard problem in search. The problem of crawling and indexing the Web, while unnecessarily annoying due to certain design mistakes by Tim B-L, is not a hard problem. Okay, it is a hard problem. But it is not a really hard problem, and it was solved well before Google.
Importance is a product of two factors: relevance and reputation. Relevance is nontrivial, but not hard. Reputation is hard. At least, as the problem is presently defined.
As everyone knows, the very hard problem that Google is solving is computing global reputation (i.e., PageRank) from the graph of all HTML links on the Web. Its algorithms are now considerably more refined than the original PageRank, of course. But the problem is what it is.
In this problem as defined by the age of Google, just distinguishing between actual content and spam is a difficult problem. Google is not a good producer of reputation data. It is a competent producer of reputation data—at best. And given the problem that Google is solving, mere competence is almost a miracle.
The Google Age ends when the Internet migrates to some new global reputation algorithm, and users switch to it for their searches. To trigger any such switch, the new algorithm must suck less, maybe by an order of magnitude. There is only one way of beating Google this badly: change the problem.
The obvious such change is some systematic advance to some form of editorial reputation—i.e., a reputation system in which reputation is generated not by passive algorithms, but by proactive human assessment. For example, consider one of the great achievements of Russian post-medievalism: Peter the Great’s Table of Ranks. If we could hire Peter the Great to crawl the Web and assign a rank to every page, we could get rid of Google.
Feudal search is exactly this: a different way to compute global reputation. Which does not require either Google, or Peter the Great. If feudal is too strong a word for you, you can say hierarchical. Feudal search is search which uses, as its quality metric, hierarchical reputation.
We cannot hire Peter the Great to crawl the Web. We can, however, force everyone to join a community. We can ask that community what it thinks of you; and we can ask Peter the Great what he thinks of that community. This puts far less load on Peter the Great.
But wait—we still don’t have Peter the Great. We can’t actually force people to join a community. No, but we can create a general-purpose namespace of extremely consistent general quality, which will attract high traffic from the legacy Web and thus be highly searchable, even through Google.
In other words, feudal search posits a content namespace which, because ranked feudally, is a much more desirable neighborhood than the Internet. At least, if you don’t want to wallow in the slums, you don’t have to. You will not turn the digital corner and find yourself in a digital favela. Eventually, all desirable content will move out of the anarchic slums and into this new, happy gated community. And junkies will be shooting up in the old Google building.
A feudal search engine (Feudle, perhaps) separates the task of reputation assignment into two levels. Feudle assigns reputation not to pages, but to communities—a much smaller task. For pages within a community, it defers strictly to the community’s own reputation system, connecting directly to it with an actual, standard API.
Thus we have two reputation values, perhaps on the unit interval, which multiplied produce another unit—global reputation. More generally, every search engine assigns every community a reputation transform, effectively grading its grades.
Thus, as a user, my map of global reputation gives high ratings to high-reputation pages at high-reputation communities, medium ratings to high-reputation pages at low-reputation communities, etc. Doesn’t it seem to you that this makes sense?
Feudal search is feudal because, rather than computing a single democratic algorithm on the global, unstructured Web, it follows the natural hierarchical structure of all human institutions. Rather than passively computing rank from the random patterns of interaction in an atomized society, provide the institutions necessary for that society to recompose itself in an organized, aristocratic hierarchy. And stand back—the result will work a lot better.
Moreover, the analogy is historically correct. Just as it is not the king’s business to involve himself in a dispute between two serfs of the same baron, it is not Feudle’s business to decide which Blogger blogs are the best blogs. It is only the king’s business if two of his barons quarrel. Likewise, Feudle must compute the value of a Blogger reputation versus a Wordpress reputation. Like an admissions department deciding that an Andover B is worth as much as a Montclair High A, it might decide that a Blogger A is worth a Wordpress B—or vice versa.
To continue the metaphor, Feudle’s job is easy because, rather than computing the quality of every high-school student in America, it only needs to compute the quality of every high school in America. It still needs a quality-rating algorithm, but this algorithm rates communities rather than their members. A much smaller problem. Of course, Feudle cannot exist today, because neither Blogger nor Wordpress, like hippie high-schools, assign their bloggers grades.
If Feudle is such a great idea, why hasn’t anyone built it? There is no way to grade the grades, because there are no grades to begin with. The trick about feudal search is that, since it’s a platform, it faces a large chicken-egg problem. This is normal in system software. If your new language has no libraries, no one will use it. If no one uses the language, why write a library for it? Thus, since there is no local feudal reputation, there can be no global feudal reputation. Since there is no global feudal reputation, there is not much use for local feudal reputation.
The Web is just not optimized for feudal search. It is optimized for Google search. For one thing, feudal search requires a feudal search-reputation protocol—which doesn’t exist. Even if the protocol existed, the information behind it is often absent. Even local reputation on the Internet is anything but a solved problem.
For instance, Blogger is not in any sense a community, and it has nothing like a communal reputation system. Or rather, its reputation system is Google PageRank. (Last time I checked, I got the impression Google hates UR for its long posts—it assumes a blog with 10K-word posts is a spamblog. Bing knows we’re the real deal, and brings up UR as the first choice if you type “unq.” I will always love Google, however, for the fact that “true history of the American Revolution” produces, as first match, the true history of the American Revolution.)
If Blogger had a reputation system, Feudle could exist—indeed, Google, which after all owns and operates Blogger, might even stop using PageRank on Blogger pages and rely on Blogger reputations. But, of course, Blogger has no reputation system and Feudle does not exist. Chicken-and-egg.
So nothing like Feudle exists or can exist. You cannot go to Paul Graham and apply to start up Feudle as a startup. The world is not even ready to begin to be ready for feudal search. What, therefore, convinces me that it is.. the Future?
While I am no Paul Saffo, I actually do have one test for whether I expect something to be around in the Future. Actually, it is two tests: one for things that exist, and one for things that don’t.
For things that exist, I ask: if this didn’t exist, would anyone invent it? For things that don’t exist, I ask: if this existed, could anything kill it? Thus I conclude that newspapers as we know them will cease, sometime in the Future, to exist. And I conclude that feudal reputation systems are, somewhere out there, the Future of the Internet.
Another way to say this is that I’m convinced that, if these systems existed, they would grow stronger rather than weaker. Since they do not exist, the incentive to create them must be quite weak, which means they must be not that useful. However, they should experience a network effect: as they organize, they grow larger, stronger and more desirable, sucking in both traffic and content. Eventually, there will simply be no content worth searching which remains outside this network.
Let’s narrow in on the Internet’s feudal future by looking, again, at Blogger. Blogger is actually a microcosm of the Internet, because it is not in any sense a community. Rather, it is a general-purpose service. There is no sense in which blogs A and B are closer to each other because they both use Blogger, not Wordpress.
In the Feudal Age, Blogger as it stands today would do quite poorly as a community. It is not, of course, a community. It would have no way to assign its blogs high-quality local reputations, and therefore would not earn high global reputation. Therefore, any high-quality bloggers on it would promptly flee to other communities in which their talents would be recognized. This would further decrease the communal reputation of Blogger, and so on—basically a power-dive on flames into the Pacific.
What could Blogger do to avoid this fiery end? Up its game, of course. Specifically, before its bloggers flee to greener pastures, it would have to replicate the problem of feudalizing teh Internets, within its vast network of existing blogs.
It can only do this by replicating the feudal solution internally. Socially, Blogger is not and cannot be a single community. Or rather: as a single community, it could only be a totalitarian dictatorship. For instance, if Blogger was a single community, either left-wing political bloggers would dominate right-wing political bloggers, or vice versa. Expecting to integrate wingnuts and moonbats into a single community is like expecting to integrate alligators and peccaries into a single zoo exhibit. And this is just the start.
Rather, assessment gets better as it gets narrower. Basketball blogs can only be graded, presumably by other basketball bloggers, relative to other basketball blogs. They certainly cannot be compared to UR—which is not a very good basketball blog. (Disqus makes this mistake by assigning commenters global reputation across all Disqus comments; each forum should assign its own rank, or no one can possibly take the process seriously.)
To produce a high-quality reputation signal, Blogger must self-categorize. As a single community, it is a joke; for its reputations to matter, they must be accurate; for them to be accurate, they must be local; for them to be local, Blogger must fragment itself into hundreds, if not thousands, of self-rating communities. Each of these communities must perform its own quality control, and be rewarded for achieving quality by a communal grade across the whole service.
For instance—this suggestion, while nowhere near appropriate, is perhaps more appropriate than one might think—Blogger could adopt another content characterization, one into which enormous human effort was invested: the Usenet newsgroup hierarchy. It could say: blogs in Blogger are now organized into guilds, each of which is named after a newsgroup in the main Usenet hierarchy. Pick a newsgroup, and join its guild.
In each of these guilds, however organized, bloggers must be in some way ranked anonymously by their peers. If this devolves into a poisonous and corrupt travesty, as of course any such process can, it is time for a new guild. However, guilds are expected to function as aristocracies; it is in the interest of the entire guild to obtain a good reputation, and most of all in the interest of the guild’s leaders.
Therefore, in the Future, if you are blogging for the public audience and want random strangers to find and read your blog (not every blogger does, of course), you will have to find some category or community in which you feel it belongs, and submit your blog for review by your peers in that category. Or, of course, create a new community if none suits you. Standard technology for this purpose must exist.
How is the reputation of an online clothing store determined? How is the reputation of an Internet poet determined? The poets must get together and evaluate each others’ poetry. The online clothing stores must get together and evaluate each others’ products and customer service. In short, they must form guilds. An elegant medieval social structure, which functioned beautifully for many centuries. Applicable to any form of content for which anyone might be searching—right down to commercial advertising.
In general, to have a reputation in the Feudal Age, you have to be part of something larger. You must either join some community which can assign you some reputation, or organize a new community of your own. As an isolated atom, you are scum by definition. You belong in the stocks, and you’ll probably end up there. (We’ll know teh Internets are dead when everyone who still uses them is a spammer.)
This effective communal coercion is good, not bad. Because the upside is: you don’t have to be an isolated atom. You’re a human being, a communal animal—you have to join, but you can join. Since these communities have to exist, they do exist. As a result, you can start producing content and acquire an accurate reputation very quickly and easily. If you’re a poet, other poets will read your poetry. If you’re a filmmaker, other filmmakers will watch your films. If you’re a clothing designer, other clothing designers will try on your clothes. (If this costs money, you will have to pay for it.) And if all these someones are idiots, it is time to start your own guild!
And for the user—the poor schmuck behind the browser, who is, after all, the customer—the search experience is improved by roughly a bazillion. With Feudle, the reputation mechanism that orders his hits is not a heuristic algorithm, but a human process—facilitated, of course, by system software. Rank is not an algorithm, but a grade. To satisfy the customer’s needs, all at all levels are working hard to get good grades.
The crucial fact about the Feudal Age is that, in that age, the Internet becomes deatomized. It does not get organized by Google. It is not passively organized. It actively organizes itself—which means ranking itself. The resulting ranks, since they follow the natural structure of human authority, are much more accurate than anything Google’s algorithms can produce. As a result, Google dissipates in smoke etc. And Google, perhaps, is the least of it!
As for how it has to start: feudal reputation can only start from the bottom up. That is: communities must migrate from shitty tools, which don’t support reputation, to good ones that do. Perhaps something like StackExchange is the beginning of the trend…