Subscribe to
Posts
Comments

Archive for November, 2008

Philosophical problems with folksonomies

Elaine Peterson, associate professor at Montana State University, has an article in D-Lib Magazine called “Beneath the Metadata: Some Philosophical Problems with Folksonomy.” It’s good to see the issues taken seriously, and many of her premises strike me as true. But, I disagree with her pragmatic conclusion that “A traditional classification scheme will consistently provide better results to information seekers.” And I think I disagree with her philosophical critique, although I am not confident that I’m understanding it as she intends.

I read the article two different ways. At first I thought it was a critique of folksonomies on the grounds that they contradict traditional philosophical premises. The next time I read it, I thought it was simply pointing out the differences. Now I’m tending toward my first reading, in part because her section on the traditional defends it against some objections while about half of the section on folksonomies is critical of them.

Her philosophical criticism seems to be rooted in what she presents as the Aristotelian approach to classification: Things are lumped with other things like them, and simultaneously distinguished from them. Most important, she says, is the idea that “A is not B,” which means that A cannot be truthfully classified also as a B. But what about digital items that “can reside in more than one place”? That is “irrelevant,” she says, “since one is talking about a classification scheme, not about the items themselves.” I have to admit I don’t understand this. What is the philosophical basis for restricting things to one category if not that that restriction reflects the metaphysical truth that A cannot also be B? So, I think she’s saying we are to reject multiple classifications because such classifications are untrue metaphysically.

This reading is supported by the section on folksonomy, where she identifies philosophical relativism as “the underlying philosophy behind folksonomies,” and pretty clearly intends this as a criticism. (I personally am no fan of philosophical relativism, although there’s a longer story there.) The problem with relativism, she writes, is that it means classification escapes from the demand that A be A and not be B. I take this as indicating that, in her section on traditional classification, she is agreeing with the 1930 textbook she cites that recommends that classifiers give “emphasis to what the author intended to describe.” If you’re arguing that, on metaphysical grounds, things should only be classified in a single category, I guess looking for the author’s intention gives you a way forward…even though categorizing only by the author’s intent is to me like insisting that readers only underline passages that the author considers significant.

And this highlights what I think is my root disagreement with Elaine’s piece (if I’m understanding it correctly). It’s fine to raise pragmatic problems with folksonomies, as she does. But Elaine is pointing at philosophical problems. And those problems require assuming that folksonomists are trying to do what Aristotelian categorizers are trying to do. But they’re not. Aristotelians (I’m using this sloppily as shorthand, so pardon my “tagging”) are trying to find the one true and right category for each thing, creating a well-ordered system free of contradictions. Folksonomies are trying to help us find stuff.

Inconsistencies in tags actually make a folksonomy useful; a folksonomy that consists of 1,000 instances of a single tag isn’t worth the folksonomizing. But these inconsistencies are a problem for Elaine because she is thinking of a folksonomic classification as a philosophical statement rather than as a mere tool. She says that “perhaps … the strongest criticism one could make of folksonomies” is that because tags can be true for one group and false for another,

a folksonomy universe allows both true and false statements to coexist. Because tags are relativized, personal, idiosyncratic views can coexist and thrive in the form of tags, in spite of their inconsistencies. Readers of texts on the Internet become individual interpreters, despite the document author’s intent.

To this many of us will say “Hallelujah!” because we disagree with Elaine’s opening claim that all classification is about answering the philosophical question, “What is it?” Indeed, she’s a hard-liner: An inconsistency to Elaine is any multiple classification, not simply one that contradicts others. Classifying a dissertation about “Moby-Dick” under “ecology” as well as under “novels: 19th Century” would introduce an insupportable inconsistency (in Elaine’s terms). She seems to assume that tags are Aristotelian judgments in which we say that A is a B. But, when I tag a photo of my wife as “ann,” “birthday,” “2008,” and “family events,” I am not saying the essence of Ann (or her photo) is any of those things. Even if I believed in essentialism (I pretty much don’t), we could make use of Aristotle’s idea of “accidental properties” (non-essential but true) to explain what I’m doing. And if I tag Oliver Stone’s “Alexander” as “Angelina Jolie” or “tripe” knowing full well that I am not staying true to the author’s intent, well, tough on Oliver. Tags are not always truth claims, and a folksonomy is not intended to mirror nature. Indeed, a folksonomy can reveal the most appalling areas of ignorance and prejudice in a populace — and, pragmatically, we may well want to address those popular errors, especially since a folksonomy can indeed reinforce them

But, Elaine is right to point to the philosophical implications of folksonomies. An individual folksonomy may make no claim to providing the real truth about how the world is ordered, but the use of folksonomies generally carries some philosophical implications. Elaine sees relativism underneath them while I see a form of pragmatism. But folksonomies didn’t arise out of philosophy. They are a “found” ordering: Hey, we have all these tags, so why don’t we make use of them in a more systematic way? So, I think Elaine is mislocating the philosophical moment in folksonomies. Philosophy isn’t underneath them or behind them. It’s after them, in their effect. Folksonomies reinforce our move away from the essentialist view that every thing has a single category that reflects its single and real essence. We’ve been moving away from that view for a long time as a culture. The success of folksonomies as a tool reveals that we accepted the traditional Aristotelian scheme in part because it was useful. If its utility has been undercut, then we have to ask for the other reasons we should believe in an Aristotelian metaphysics.

The ball is in Aristotle’s court.

* * *

Most of Elaine’s outright criticisms of folksonomies are actually practical, not philosophic. She makes them without empirical evidence. She has not convinced me that she’s right. For example, her final paragraph says:

A traditional classification scheme based on Aristotelian categories yields search results that are more exact. Traditional cataloging can be more time consuming, and is by definition more limiting, but it does result in consistency within its scheme. Folksonomy allows for disparate opinions and the display of multicultural views; however, in the networked world of information retrieval, a display of all views can also lead to a breakdown of the system… Most information seekers want the most relevant hits when keying in a search query.

By “exact” she apparently means the results include fewer false results (where a result is false if the search term doesn’t really apply to the result, as when you search for “fish” and get back posts about dolphins). And that seems correct: A professionally constructed index should have fewer of those sorts of mistakes. But the second criterion in her concluding paragraph is relevancy, and there folksonomies well may beat a professionally constructed index. Not only might a folksonomy retrieve results more relevant to me personally or to my cultural sub-group, but it constructs a semantic system that can retrieve results the narrow and carefully categorizing by experts might miss. So, I disagree with her last sentence: “A traditional classification scheme will consistently provide better results to information seekers.” Traditional classification is best for certain types of searches — ones where you want precision over recall and relevancy, and especially where there is a confined domain of contents that you have to be sure you’ve searched thoroughly — but is not as good as a folksonomy for other types of searches.

In short, neither traditional nor folksonomic classifications are best. Each is best for something.

[Tags: ]

LibraryThing vs. Library of Congress

Vincent Sterken has posted his master’s thesis, which examines LibraryThing.com to understand the dynamics and utility of social tagging. It begins with an exceptionally clear backgrounder on tagging and taxonomies, and then moves to a fascinating exploration of LibraryThing’s folksonomy, including a comparison of how LibraryThing’s community and the Library of Congress classify books.

[Tags: ]

Control doesn’t scale

I sometimes put up a Powerpoint (well, Keynote) slide that says “Control doesn’t scale.”The assumption that large projects only succeed if they’re centrally controls led and managed turns out to have been true because we limited the scope of what we we considered realistic. You can build a Britannica using a centrally controlled system, but you could not build a Wikipedia that way.

But I know that there are some important counter-examples, so I’ll frequently add, “Except at an huge cost in expense and freedom,” for we know all too well that some regimes have managed to maintain intense control over massive populations for generations.

Today there’s an interview in the Sydney Morning Herald with Isaac Mao, pioneering Chinese blogger and Berkman fellow, in which he says the Chinese authorities are unable to keep up with increasing volume of social communications the 108M bloggers, millions in social networks, and people texting and twittering away.

So, maybe control doesn’t scale after all.

[Tags: ]

Twittering reality

At search.twitter.com, the query “near:mumbai within:15mi” will bring you a remarkable stream. (Via Rick Levine.)

We are thinking of you, Mumbai.

[Tags: ]

Google SeachWiki’s surprising missteps

If you log into your Google account when searching (you can tell if you’re logged in by seeing if it puts your login name at the top of the page), Google has enhanced its results page with new features. The features are slightly useful (and largely mirror Wikia Search), but they also commit two rookie mistakes. Surprising, coming from Google.

The enhancements let you move a particular result to the top of the rankings, so that next time you search for that term, you’ll get that result first; doing so does not affect the results for anyone else (although Google isn’t ruling out that possibility). You can also demote, add or remove a result from the list the next time you do that search, or write a public comment. These are features some of us may find sometimes useful.

So, what’s my beef? (What are my beeeves?)

First, opting us in is obnoxious enough, but not giving us a way to opt out is unsupportable. Where’s the big “No thanks” button? (If you put your “I heart hackers” t-shirt on, you can use GreaseMonkey to turn SearchWiki off.)

Second, the results page shows you the nicknames of other users who have voted the page up. So, now the whole world will see that “dweinberger” not only searched for “Angelina Jolie” but thumbs-upped the page of closeups of her tattoos? Guess who just changed his nickname to something less identifiable! This is a feature without value — the list of names isn’t clickable or complete or tell you how many people voted it up — unless you recognize someone’s nickname, in which case it has negative value.

So, here’s a new question for Jeff Jarvis: Not “What would Google do?” but “What was Google thinking?” [Tags: ]

Internet not the child-devouring swamp many adults fear

A three-year research project, headed by Mimi Ito, involving 28 researchers and 800 subjects, and sponsored by the MacArthur Foundation, finds that the stereotypical idea of the Internet as a soul-devouring, anti-social wasteland for our kids is just plain wrong. If you suspected otherwise, now you know you were right.

The report makes a key distinction that helps explain some of the confusion we’ve been living through. From the press release:

The researchers identified two distinctive categories of teen engagement with digital media: friendship-driven and interest-driven. While friendship-driven participation centered on “hanging out” with existing friends, interest-driven participation involved accessing online information and communities that may not be present in the local peer group.

Here’s one interesting observation, from the overview:

Some youth “geek out” and dive into a topic or talent. Contrary to popular images, geeking out is highly social and engaged, although usually not driven primarily by local friendships. Youth turn instead to specialized knowledge groups of both teens and adults from around the country or world, with the goal of improving their craft and gaining reputation among expert peers. While adults participate, they are not automatically the resident experts by virtue of their age. Geeking out in many respects erases the traditional markers of status and authority.

The study’s implications for education are significant. From the overview:

Youths’ participation in this networked world suggests new ways of thinking about the role of education. What, the authors ask, would it mean to really exploit the potential of the learning opportunities available through online resources and networks? What would it mean to reach beyond traditional education and civic institutions and enlist the help of others in young people’s learning? Rather than assuming that education is primarily about preparing for jobs and careers, they question what it would mean to think of it as a process guiding youths’ participation in public life more generally.

[Tags: ]v

Bertha Bassam lecture

I gave a lecture at my alma mater, the University of Toronto, a few weeks ago, at the Faculty of Information. The video is here. (Nit: The slides have the wrong font.)

[Tags: ]

Alex Osterwalder and Yves Pigneur are writing a book on innovative business models that’s due out in May. That seems to them to be too far away, so they’re thinking that maybe for $24 you could get a subscription to their book that provides:

* first & exclusive access to raw book content

* influence authors

* x installments of book chunks (in a non-linear order – as we write them)

* 50% discount off the final book (approx.)

* participate in exclusive book chunk webinars

* access to templates

* being part of the business model innovation community

Alex calls this idea a prototype and welcomes comments, as well as suggestions for what other benefits the authors might offer. (He does not require that you pay a subscription to read his blog and comment on this idea itself, however. Recursion is not always a good idea.)

I’m glad they’re floating this idea — because floating ideas rises all tides? — although I am skeptical. This doesn’t sound like a book that’s so urgent that people will pay a 50% premium ($24 + half off the printed version) for some number of out-of-sequence rough drafts. Of course, I could be wrong about that, especially since about a dozen people in the comments to Alex’s post have already said they’d sign up. But, since the authors benefit from comments from early readers, this business model also has a cost to the authors. It limits the community, but maybe it will also gel the community. We won’t know until we know.

These social projects are all in the details. In 2000-1, I wrote Small Pieces Loosely Joinedcompletely in public, posting my current draft every night. I got some excellent commentary and during the dark days of writing that book I received encouragement that was quite important to me. But I inadvertently structured the engagement in way that discouraged readers. The writing process was Penelope-like, so I think I would have done better to have updated the site only when I had finished a complete draft of a chapter. Readers get understandably discouraged by commenting on a draft that is undrafted the next day.

I wrote the next book, Everything Is Miscellaneous, offline for reasons I can’t articulate, except to say that I felt that the book posed a challenge to me as a craftsperson. So, I blogged about the ideas in the book and floated pieces from it in various forms, but I composed the actual text with the door closed. I’m not recommending that. I’m thrilled by the fact that writers now routinely break out of the old “private ’til it’s published” constraint. But there are many ways to do that, as well as times when you shouldn’t do it. There may even be times when you should charge $24 for the service.

All ideas are good until proven otherwise. [Tags: ]

[berkman] Craig Newmark

Craig Newmark has dropped by the Berkman Center to chat. He begins by asking us what we want him to talk about. A voice opts for the history of CraigsList.com. [NOTE: I’m live-blogging, typing quickly, not correcting typos, getting things wrong, missing entire paragraphs, etc.]

He says that he got a better education than he needed at Case-Western. In early 1995 he wanted to give back some of what he received, he started some mailing lists, including for events, AnonSalon (a fundraiser) and others. People suggested new categories, including apartments. He was using Pine for email, but it started breaking at 240 mailing addresses. He was going to call the list “SFEvents,” but people said they already call it “CraigsList” and that it’s a brand. Craig didn’t know what a brand is, but he stuck with it.

He says he was a literal nerd in HS. He was not on the AV Squad [I was] but he was on the debating team, which led him to delusions about the effectiveness of rational discourse. He says he’s now comfortable with being a nerd.

Eventually he realized he could turn emails into HTML, an instant Web-publishing solution. Over the next few years, he refined the software. If a task took more than an hour a day, he would automate it. At the end of 1997, he hit three milestones: 1. A million page views per month (he hit a billion in 2004 and now is headed toward 13B. There are 26 people at the company). 2. Microsoft Sidewalk asked him to run banner ads. He turned them down because “I am an overpaid programmer.” 3. People volunteered to help. But it failed because he didn’t lead. So, in 1999 he turned it into a business.

He hired Jim Buckmaster “who is a full foot taller than I am.” He’s a really good manager. “I suck as a manager.” The culture there is that people make suggestions, they listen, and they decide what to act on. Also, it’s continued to try to be simple. And they decided to charge people who are already paying but for less effective ads, so they started charging people listing jobs and real estate brokers. “They asked us to charge them to cut down on certain types of spam, and on the need to post and repost.”

He’s always surprised people are willing to pay for what he does for fun. He’s generalized it to nerd values, including: once you have a comfortable living, it’s more fun to change things than to make more money. His business model: “We can do really well by treating people well and doing some good.”

He says he’s now going to half time as a customer service rep, after 14 years of fulltime. You sometimes see ugly things in customer service, he says. E.g., they saw ugly racist stuff during the campaign. “That takes something out of you.”

“I’ve only regretted giving my email address out once.” It was when he was on The View.

Over the past several years, they’ve begun to understand why CL is successful. “It has to do with the culture of trust we have.” There are bad guys but they’re a tiny percentage. “People look out for one another.” E.g., you can flag abusive ads. If enough people vote for it, it’s removed automatically. “That’s a flawed mechanism,” but it works better than not doing it. As Jon Stewart says, (Craig says) you do hear from extremists, but that’s because moderates have stuff to do. You should treat people the way you want to be treated. Corollaries: Live and let live, and give the other person a break. Nothing profound, he says, but it’s hard to follow through. “We’re trying to listen to people still.” “We decide on new cities based primarily on requests for them.” (567 cities now.) Novel ideas are rare. Most of what’s on the site is based on community feedback, although the child care section was Craig’s idea.

“I have no vision at all, but I know how to keep things simple, and I listen some.”

“We’re a good example of how people collaborate in mundane ways to make things happen. Not bad.” On his way to One Web Day he realized, “I’m a community organizer. I’m more of a meta-organizer.”

Nothing about CraigsList is, in his view, altruistic. It’s just people giving another person a break. “I figured I should extend this to other areas.” E.g., “I help people smarter than me help figure out the future of journalism.” E.g., Jeff Jarvis and Jay Rosen.

He’s also interested in grassroots democracy. Face to face is a better way of communicating but it doesn’t scale. On the Net, we get millions of people working together to make stuff happens. “This changes the nature of our democracy,” so that grassroots democracy can address the traditional problems with representative democracy. Craig thanks Joe Trippi and Zephyr Teachout. Now we have this big grassroots infrastructure. What do we do with it? “2008 is the new 1776.”

All sorts of things are happening. “It used to be that the guys with money, power and guns got to write the history and our narratives about ourselves. With Wikipedia, everyone has a shot at doing that…It changes the whole course of human history.” We are at a “singularity,” he says. We’re living in a time like 1776. It’s happening faster because the Internet accelerates everything. “I’m trying to play a microscopic part in it.”

He’s involved with the SunlightFoundation.org. He’s working with ConsumerReports. He was involved a little bit in SF’s 311 number. “Mundane, but it’s part of everyday governance. In my fantasies, I apply that to all levels of government.” A bunch of this is in the Obama platform, he says, and we could see some of it next year.

Veterans have been treated badly by the White House, he says, so he’s on the IAVA.org board. To screen claims faster, maybe they shouldn’t care about fraud so much, since veterans and their families are suffering as they wait for their claims to be processed.

As a nerd, it’s a “crime against nature” to be involved in promotion or communication. But he does it anyway. For one thing, he likes the idea of more people getting involved in service. “I do have one message for the kids: Stay off of my lawn.” :)

“The Constitution will be restored on January 20.”

He says focuses on people who can get things done. He lacks patience for those who can’t get things done.

Q: Are there any ways Craigslist has gone in directions you couldn’t have imagined?
A: I never tried to foresee them so it’s hard to answer. I had to have my arms twisted to create personals. They’ve done much more good than problems. Like “missed connections.” I’ve been asked to perform marriages. In a way, the whole thing has been a surprise. I have no vision. I’ve only responded to feedback. It’s all very surreal, but that’s life now.

Q: Why did CL succeed in the early days, as opposed to doing it over newsgroups?
A: Part of it was that everyone understood mail and Web browsers, while newsgroups were hard to used. And newsgroups were ad-spammed badly.We have a problem with spam, and last week we announced a suit against a company that sells ad-spam software. We aren’t litigious but we thought that was a good way to do it.

Q: Has it been a problem keeping CL simple?
A: Keeping it simple is a habit. There are times when we have to debate whether there should be a specific category, or should people have to register with a valid email on the message boards, but I don’t know how to do things except simply.

Q: What about the deal with National Center for Missing and Exploited Children.. How consensual was the deal?
A: Jim knows the details. He felt strongly about it. There was genuine abuse of our site involving minors. We’re not law enforcement professionals, so we got advice from the real experts. There is that sort of abuse and we have to help out. We just started charging for erotic services and we’ll contribute all that to philanthropies. And how do you manage anonymity? Sometimes you need it, for whistleblowers. We tend to the anonymity side. But congresspeople want to know that an email comes from a constituent rather than a mass spammed email. We’re talking about ways to balance anonymity and authentication, but we do need anonymity as a kind of check and balance against an oppressive government.

Q: Does your exposure to some of the uglier aspects has led you to see a more expansive role for government?
A: I have become more balanced, but mainly because I’ve been doing customer service. The best label I can figure is “moderate Libertarian.” I’m looking for a better label. I’m increasing interested in private-public partnerships since I’ve seen market solutions don’t always work, like for health insurance. I’m in the Net neutrality debate and see people misrepresenting it on purpose. (He adds that most lobbyists are ok, and a small number are predatory.)

Q: You’re in many cities but it still seems to be geared towards regional breakdowns. On purpose?
A: Initially we just followed our gut. CL is like a flea market. People get together to do commerce, but really just to socialize. Penelope Green talked about our site being a market in the ancient sense: chaotic and vividly human.

[me] Why doesn’t your company have meetings?

A: We have some. But we minimize them. A meeting of more than six people is already going to be dysfunctional (small group comms theory). Effective communication is a meeting is tough. This also reflects my impatience, a flaw as a human being.

What will be the future of the Communications Decency Act?
A: This is the part of law that says that a site isn’t responsible for what people say on the site, so long as they take some reasonable measures. I think it will stay and possibly be improved.

Q: Have you had any negative interactions with the police?
A: Not really. Once the FBI called asking if we knew there was an ad for plutonium on our site. The result was that someone got a stern talking to from his parents. The police just want to be treated decently and not jerked around. That’s our customer service idea.

Q: Why can’t people search for subsections?
A: Mysql chews up server time doing these searches. We have some ideas for how to do this, but there are bigger things they’re working on.

[Tags: ]

Obama v. Bush: Google counts

“George Bush”: 25M hits at Google (with the quotation marks)
“George W. Bush”: 48M
“Barack Obama”: 105M
“Obama”: 248M
“Bush”: 344M

Wow, that seems screwy! The combined total for “George Bush” and “George W. Bush,” after 9 years of coverage (campaign and presidency), and including two George Bushes, is only about 75% of the number of hits for Barack Obama before he’s taken office?

Either we’re really excited about Barack Obama or something’s gone wrong in my Google searches. Or, more likely, both.

[Tags: ]

Next »