Technorati, tags and topics
May 23rd, 2007 by David Weinberger
(Disclosure: I’m on Technorati’s board of advisors. I saw an advance version of the changes, but otherwise had no direct influence. Also, although at some point I conceivably could make some indeterminate amount of money from Technorati, the fact that Dave Sifry is a friend influences my judgment more.)
Technorati has just done a major re-shaping of itself, which is interesting as a response to the increasing need for both pinpoint accuracy and broad context. Dave Sifry, the ceo, blogs about it here.
Technorati is driving down both roads simultaneously, which I think makes sense. On the one hand, if you want to do an old fashioned text search through blogs, the site has improved its engine and pared down the experience. If, on the other hand, you want to see information in context (and on the Web that of course means being able to explore that context further), the site has taken several steps:
1. The default search now is for tags, not for text in blogs. Tags are expressions of what the readers think a post is about, so some types of searches should return more accurate, relevant and interesting results. Of course, we also use tags in idiosyncratic ways, so only experience will tell whether and when tag searching is more satisfying than text searching. In any case, Technorati lets you click to search through text, if that’s what you want. (You can go straight to the text search page via s.technorati.com.)
2. Technorati continues to include more sources and more types of information. In fact, the home page no longer positions Technorati as a blog search engine. “Include everything” is one of the key recommendations of Everything is Miscellaneous, so I like its continuing inclusiveness :)
3. These changes seem to move Technorati towards embracing topics as a basic unit of meaning. For example, if you search for “ron paul,” you are taken to a page that assembles blog posts, videos and photos about the controversial Republican. There are tabs for music and events as well, although in this case Technorati didn’t find any. There’s also a “WTF” post, an explanation of the topic generated and voted on by users. (It’s displaying the WTF by siegheilneocon, which only got 27 votes, instead of the one by beckychr007, which got 61 votes, seeming to prefer the most recent to the most popular, which is either a bug or I’m not understanding it.)
Topics are an important way to cluster ideas. At the moment, Technorati has no concept of a topic apart from a tag, however. The infrastructure to do more is in place, because the site already displays a list of related tags. The results pages don’t bring in the content from those tags, though. For example, if “john mccain” were a related tag, it might make sense to bring some of that tagged material into the “ron paul” topic page. That would give us a broader view of the topic. Conflating topics with tags can increase the precision of results — but not for highly ambiguous tags such as “shot” — but can also reduce the context and thus our understanding. Granted, figuring out algorithmically what’s relevant and how it’s relevant is no small challenge. (Maybe if some topic pages were marked as especially worthwhile and stable, not all of the clean up and construction would have to be done algorithmically.)
Likewise, at some point it’d be good to start relating topics, so that the system knows that “ron paul” is (in some sense) contained by “republicans” and republicans are related to “politics.” This sort of information can eventually be gleaned folksonomically from the tags. Of course there’d be nothing wrong with using existing taxonomies and ontologies to help further refine the relationships among topics. It’s always going to be a messy, overlapping, shifting mass of connections, but, well, so are we.
This is not a criticism of what Technorati has done. In fact, I mean it as a way of expressing my excitement about where it goes from here.
Interesting analysis as always. It seems that as more and more engines expand their reach in terms of underlying datasets, the more obvious it becomes that the real hard problem, conveying the “probably, sort of related” relationships in some sort of meaningfully navigable way, still remains mostly unsolved. I think it’s nice to see Technorati and others hold off on the data overload that’d result from unioning the sets until they can figure out some practical way to show the ron paul -> john mccain vector/venn/web/whatever. So many smart academics are working on this, it’d be nice to see some more practical implementations showing up more regularly on the web as integrated components of a “search results” UI…
[…] Weinberger (notable most recently for Everything is Miscellaneous) with his take the new […]
Want to start your private office arms race right now?
I just got my own USB rocket launcher :-) Awsome thing.
Plug into your computer and you got a remote controlled office missile launcher with 360 degrees horizontal and 45 degree vertival rotation with a range of more than 6 meters – which gives you a coverage of 113 square meters round your workplace.
You can get the gadget here: http://tinyurl.com/2qul3c
Check out the video they have on the page.
Cheers
Marko Fando