Everything is Miscellaneous

You've arrived at Everything is Miscellaneous's blog page that was active 2008-2012. You'll find links to some useful information about the book and its subject matter, but don't be surprised by some dead links, etc.

To order a copy, go to your local bookstore, or Amazon, etc.

For information about me, David Weinberger, click here.

To visit the page underneath this text, click here.

Thanks - David Weinberger

Tibetan taggers

September 16th, 2010

This is a couple of years old, but it’s interesting. (Thanks to Norm Jacknis for the tip.)

Tibetans living in Switzerland and non-Tibetan Swiss were asked to provide tags for an exhibit of traditional Tibetan work. Then those tags were analyzed, wondering what cultural differences might show up. Some were fairly obvious:

Taggers disagreed in their perceptions of the esoteric deity Chakrasamvara. Tibetans tagged it frequently with “buddha”, accurately identifying its wisdom aspect; however, Swiss Germans found it bÃ¶se or “angry-looking” and associated it with death. This exemplifies how tags can help uncover cultural misunderstandings: rather than anger, Chakrasamvara actually embodies the union of bliss and emptiness.

It also revealed (or suggests) some differences in how people approach tagging itself:

When Tibetans were asked which images were easiest to tag and why, their responses were contradictory. One person said artworks she knew were easy to tag because she already has something to say about them. Another found unfamiliar works easier to tag because they seemed “freer” The rating indicates that symbolic and familiar works do elicit less diverse responses from Tibetan taggers. And although some people may find them easier to tag because their meanings are culturally pre-defined, the way in which viewers react to them is likely to be less personal and even “less free.”

1 Comment »

SpokenWord lookng for curators

June 30th, 2010

SpokenWord.org aggregates podcasts, almost all of which are free, and makes it easy for users to export them to, say, iTunes. It’s a non-profit site and is all about the openness. (Disclosure: I’m on its board.) Now SpokenWord is looking for volunteers to curate podcast feeds and episodes in topics that interest them. Their curated collections will be the main feature at the SpokenWord site, because nothing knows what’s interesting to humans better than other humans do. Details here.

3 Comments »

Twitter metadata and where standards come from

June 20th, 2010

Matthew Ingram at Gigagom blogs about an upcoming Twitter feature called Twitter Annotations. Well, it’s not actually a feature. It’s the ability to attach metadata to a tweet. This is potentially great news, since it will give us a way to add context to tweets and to enable machine-processing of tweets, not to mention that URLs could be sent as metadata rather than as subtractions from the 140-character limit. This is yet another example of information scaling to the point where we have to introduce more information to manage it. How about one of those bogus “laws” people seem to like (well, I know I do): Information sufficiently scaled creates a need for more information.

Twitter is specifying the way in which Annotations will be encoded, but not what the metadata types will be. You can declare a “type” with its own set of “attributes.” What types? Whatever you (or, more exactly, developers and hackers) find useful. Matthew cites a number of folks who are basically positive but who express a variety of worries, including Google open advocate Chris Messina who warns that there could be a mare’s nest of standards, that is, values for types and attributes. Dave Winer takes Google to task for slagging off on Twitter for this. I agree with his sentiment that Goliath Google ought to be careful about their casual criticisms. Nevertheless, I think Chris is right: Specifying the syntax but not the actual types and attributes will inevitably give rise to confusion: What one person tags as “topic,” someone else will tag as “subject,” and some people might have the nerve to actually use words for types in, say, Spanish or Arabic. The nerve! [THE NEXT DAY: Here’s Chris’ original post on the topic, which is more balanced than the bit Matthew excerpts, and which basically agrees with the next paragraph:]

But, so what? I’d put my money on Ev Williams and Biz Stone any time (important note: If I had money). You couldn’t have seriously proposed an idea as ridiculous as Twitter in the first place if you didn’t deeply understand the Web. So, yes, Chris is right that there’ll be some confusion, but he’s wrong in his fear. After the confusion there will be a natural folksonomic (and capitalist) pull toward whatever terms we need the most. Twitter can always step in and suggest particular terms, or surface the relative popularity of the various types, so that if you want to make money by selling via tweets, you’ll learn to use the type “price” instead of “cost_to_user,” or whatever. Or you’ll figure out that most of the Twitter clients are looking for a type called “rating” rather than “stars” or “popularity.” There’ll be some mess. There’ll be some angry angry hash tags. But better open confusion than expecting anyone â€” even the Twitter Lads â€” to do a better job of guessing what its users need and what clever developers will invent than those users and developers themselves.

3 Comments »

Every color is miscellaneous

June 13th, 2010

I’m embarrassed to say that I just read Randall Munroe’s fabulous color survey from early May. Readers were asked to supply names for colors. It’s a rich experiment: Naming and discrimination, gender differences, hacking, tagging, spamming, hilariousness. The results also seem to support prototype theory’s idea that we agree on what the “real” (prototypical) colors are, at least within a culture: This is blue, but that one is a variant that needs a modifier in front of it (“light blue”) or for which we use a variant name (“teal”).

Randall writes the webcomic XKCD, of course, which is the Doonesbury of his generation, except while you can imagine Garry Trudeau writing a satiric HBO series, you can’t imagine him running and analyzing a color survey.

(I heard about Randall’s color survey via the Mainstream: Christopher Shea at the Boston Globe blog. Christopher also points to Stephen von Worley’s color map. BTW, that post by Christopher also has a great note about iPad censoring a graphic version of the oft-banned James Joyce’s Ulysses. Anyway, I’ve really got to do a better job keeping up with XKCD.)

2 Comments »

Democratized curation

June 6th, 2010

JP Rangaswami has an excellent post about the democratizing of curation.

He begins by quoting Eric Schmidt (found at 19:48 in this video):

“â€¦. the statistic that we have been using is between the dawn of civilisation and 2003, five exabytes of information were created. In the last two days, five exabytes of information have been created, and that rate is accelerating. And virtually all of that is what we call user-generated what-have-you. So this is a very, very big new phenomenon.”

He concludes â€” and I certainly agree â€” that we need digital curation. He says that digital curation consists of “Authenticity, Veracity, Access, Relevance, Consume-ability, and Produce-ability.” “Consume-ability” means, roughly, that you can play it on any device you want, and “produce-ability” means something like how easy it is to hack it (in the good O’Reilly sense).

JP seems to be thinking primarily of knowledge objects, since authenticity and veracity are high on his list of needs, and for that I think it’s a good list. But suppose we were to think about this not in terms of curation â€” which implies (against JP’s meaning, I think) a binary acceptance-rejection that builds a persistent collection â€” and instead view it as digital recommendations? In that case, for non-knowledge-objects, other terms will come to the fore, including amusement value, re-playability, and wiseacre-itude. In fact, people recommend things for every reason we humans may like something, not to mention the way we’s socially defined in part by what we recommend. (You are what you recommend.)

Anyway, JP is always a thought-provoking writer…

2 Comments »

The rectangular display of information

May 12th, 2010

Search engines have traditionally focused on building lists. Increasingly, they’re turning to the rectangular display of information: Boxes and tables. Boxes require extracting the relevant information and presenting it four-square in front of the user. While lists sort in a single dimension, tables show at least two dimensions. Boxes and rectangles are useful filters.

Google today announced the further boxing and tabling of data, in response (one supposes) to Bing.com. The Google Blog recommends trying searching for dog breeds, broadway shows, catherine zeta-jones date of birth, or zebra. (Look for the “something different” list in the left margin when you do the zebra search.) I especially like the summary of sources Google gives when it flat-out answers a question.

More boxes! More tables!

2 Comments »

[berkman] Luis von Ahn on free lunches, captcha, and tags

April 27th, 2010

Luis von Ahn of Carnegie Mellon University is giving a Berkman lunchtime talk. [NOTE: I’m liveblogging. I’m making mistakes, leaving stuff out, paraphrasing, getting things wrong. This is an unreliable record.]

Luis invented captchas, the random characters you have to type in to convince a web page that you are a human and not a hostile software program. (He shows randomly generated sequences that happened to spell out “wait” and “restart.”) Captchas are useful, he says, when you’re trying to prevent people from gaming a system by writing a program to enter data robotically. They’re also useful to prevent spammers from signing up for free email accounts. To get around this, spammers have started up sweat shops where humans type captchas all day long; it costs the spammers about $0.33/account. And some porn companies ask users to type in a captcha to see photos; the captchas are drawn from email account applications. Damn clever!

He shows some variants. A Russian asks you to solve a mathematical limit. In India one asks you to solve a circuit. Luis says these aren’t all that effective because compputers can solve both problems, but they’re still better than the “what is 1 + 1?” captchas he’s found on US sites.

He says that about 200M captchas are typed every day. He was proud of that until he realized it takes about 10 seconds to type them, so his invention is wasting 500,000 hours per day. So, he wondered if there was a way to use captchas to solve some humungous problem ten seconds at a time. result: ReCAPTCHA. For books written before 1900, the type is weak and about 30% of the text cannot be recognized by OCR. So, now many captchas ask you to type in a word unrecognized when OCR’ing a book. (The system knows which words are unrecognized by running multiple OCR programs; ReCAPTCHA uses those words.) To make sure that it’s not a software program typing in random words, ReCAPTCHA shows the user two words, one of which is known to be right. The user has to type in both, but doesn’t know which is which. If the user types in the known word correctly, the system knows it’s not dealing with a robot, and that the user probably got the unknown word right.

ReCAPTCHA is a free service. Sites that use it have to feed back the entries for the unknown word. About 125,000 sites use it. They’re doing about 70M words per day, the equivalent of 2-4M books per year. If the growth continues, they’ll run out of books in 7 years, but Luis doesn’t think the growth will continue, so it might take twenty years. (There are 100M books.)

(In response to a backchannel question, Luis tells the penis captcha story.)

The ReCAPTCHA system filters out nationalities, known insult terms, and the like, to avoid unfortunate juxtapositions. It’s soon going to be released in 40 languages. Google acquired ReCAPTCHA.

Q: When will OCR be good enough to break captchas?
A: I don’t know. We’ll probably run out of books first.

Q: Business model?,br>
A: Google Books gets help digitizing.

ReCAPTCHA “reuses wasted human processing power.” The average American spends 1.9 seconds per day typing captchas. We also spend 1.1 hours a day playing electronic games. We humans spent 9B hours spending in 2003. It took less than a day of that to build the Panama Canal. So, Luis switches topics a bit to talk about how to solve human problems by playing games.

First is tagging images with words. Image search works by looking at file names and html text, because computers can’t yet recognize objects in images very well.

Does typing two words take twice as long as typing random letters? No, it takes about the same time, he says. Luis says about 10% of the world’s population have typed in a captcha. The ESP game asks two people unknown to each other to label an image until they agree. The game taboos words that other players have already agreed on. The system passes images through until they get no new labels. They’ve gotten over 50M agreements. 5,000 players playing simultaneous could label all Google images in a month. Google has itsown version; Google has an exclusive license to the patent.

Q: Demographics?
A: For my version, average age is 29 (with huge variance), evenly split between women and men.

Q: Compared to Flickr tags?
A: Only a small fraction of Flickr images have useful tags. The tags from flickr tend to be significantly more exact, but also significantly noisier (e.g., a person tagging an image in a way that means something idiosyncratic).

Q: Bots?
A: Yes, we don’t want you to wait for a partner, so sometimes we’ll give you a bot that replays the moves a human had made with the same image.

Q: Google Images benefits from its version of your game. Who benefits from your version of the game?
A: No one.

For some images, guesses change over time. E.g., a Britney Spears photo five years ago got labels like britney and hot. About two years ago, the labels changed to crazy, rehab, and shaved head. Now they’re back to britney and hot. By watching a player for 15 mins, you can guess whether the player is male or female with 95-98% accuracy.

Why do people like the ESP game? Sometimes they feel an intimacy with their partners. They have to step outside of themselves to make the match. They can have a sense of achievement.

He ends by saying that the about the same number of people — 100,000 — have worked on humanity’s big projects, e.g., pyramids, Panama Canal, putting a person on the moon. That’s in part (he says) because it is so hard to coordinate large numbers of people. Now we can get 100M people to work on something. What can we do?

8 Comments »

Shirkyâ€™s myth of complexity

April 5th, 2010

Clay Shirky has given us a surprising number of Internet myths. And by this I mean not falsehoods but the opposite: Broad, illuminating ways of making sense of what’s going on. For example, Clay’s post about the power law distribution of links in the blogosphere (based on research by Cameron Marlow) changed how we view authority, fame, and success in the Web ecosystem, and provided the structure within which Chris Anderson could point to the Long Tail. And Clay’s Ontology Is Overrated made clear that a change in how we categorize our world affects very real power relationships; that essay was highly influential, including on my own Everything Is Miscellaneous.

Clay’s new post â€” The Collapse of Complex Business Models â€” gives us a broad way of understanding why those who used to provide us with content will not be the ones who give us content in the future…and why they cannot fathom why not.

1 Comment »

Order, art, and the miscellaneous

March 31st, 2010

Giulia Ricci’s investigates:

the shift between order and disorder within different systems, which is the reason why I recurrently use geometrical grids, although on a more abstract level I am also interested in systems of categorisation and lists and how these can be visualised with diagrams and geometrical drawings.

For example, take a look at these. I find them fascinating as they swim close to resolution but never quite make it.

1 Comment »

How to use the Web to teach: An example

January 25th, 2010

Want to see one way to use the Web to teach? Berkman‘s Jonathan Zittrain and Stanford Law’s Elizabeth Stark are teaching a course called Difficult Problems in Cyberlaw. It looks like they have students creating wiki pages for the various topics being discussed. The one on “The Future of Wikipedia” is a terrific resource for exploring the issues Wikipedia is facing.

Among the many things I like about this approach: It implicitly makes the process of learning â€” which we have traditionally taken as an inward process â€” a social, outbound process. By learning this way. we are not only enriching ourselves, but enriching our world.

My only criticism: I wish the pages had prominent pointers to a main page that explains that the pages are part of a course.

1 Comment »

« Prev - Next »

Everything is Miscellaneous
About David Weinberger's book (May, 2007) and how we're pulling ourselves together now that we've blown ourselves to bits.

You can buy the book at your favorite online or real world book. Or go to Isbn.nu for a choice of online stores.

NOTE: To navigate this site, click on the tabs at the top of the page...

See Michael Wesch's fantastic video of some of the ideas in Everything Is Miscellaneous

The Berkman-Wired
Miscellaneous Podcasts

A series of interviews with very smart people on topics in David Weinberger's book

Cory "BoingBoing, Activist, Writer" Doctorow
Markos "DailyKos" Zuniga
Arianna "HuffingtonPost" Huffington
Neil DeGrasse "Astrophysicist" Tyson
Jimmy "Wikipedia" Wales
Craig "sList" Newmark
Paul "Kayak" English
Richard "BBC World Service" Sambrook

Sponsored by the Harvard Berkman Center and Wired magazine

Bookmark this on Delicious

Tags

1 (16)

2b2k (8)

afp (1)

berkman (1)

bibframe (2)

big data (2)

blogging (1)

blogs (1)

bono (1)

books (1)

boston marathon (1)

culture (5)

dpla (1)

echo chambers (2)

eim (2)

entertainment (1)

everything is miscellaneous (1)

everythingis (1)

everythingismisc (7)

everythingIsMiscellaneous (34)

everytingismisc (1)

free culture (1)

free-making software (1)

harvard (1)

hoder (1)

humor (1)

isbn (1)

journalism (2)

law (1)

libraries (16)

liveblog (4)

lodlam (3)

metadata (7)

microformats (1)

music (1)

old days (1)

open access (3)

podcast (2)

poetry (1)

schema.org (2)

serendipity (1)

sexism (1)

sharing (1)

social media (4)

social networking (1)

stacklife (1)

stuttgart (1)

tags (1)

taxonomies (1)

taxonomy (3)

teaching (1)

tech (2)

too big to know (10)

twitter (1)

web difference (1)

wikipedia (1)

Sites to See
Sites tagged as "EverythingIsMiscellaneous" at Delicious

items; foreach($items as $item) { echo '
' . $item['title'] . '
'; } ?>

Archives

October 2015 (1)

August 2015 (1)

June 2015 (1)

October 2014 (1)

September 2014 (2)

July 2014 (1)

April 2014 (1)

March 2014 (1)

January 2014 (1)

December 2013 (3)

October 2013 (1)

September 2013 (1)

August 2013 (2)

June 2013 (5)

May 2013 (1)

April 2013 (3)

March 2013 (1)

January 2013 (1)

December 2012 (4)

October 2012 (1)

April 2012 (1)

January 2012 (2)

December 2011 (2)

November 2011 (1)

October 2011 (1)

September 2011 (1)

June 2011 (5)

May 2011 (1)

April 2011 (1)

March 2011 (2)

February 2011 (3)

January 2011 (2)

December 2010 (3)

September 2010 (3)

June 2010 (4)

May 2010 (1)

April 2010 (2)

March 2010 (1)

January 2010 (1)

November 2009 (4)

October 2009 (8)

September 2009 (14)

August 2009 (11)

July 2009 (16)

June 2009 (21)

May 2009 (14)

April 2009 (25)

March 2009 (18)

February 2009 (9)

January 2009 (6)

December 2008 (16)

November 2008 (14)

October 2008 (11)

September 2008 (13)

August 2008 (4)

July 2008 (5)

June 2008 (16)

May 2008 (9)

April 2008 (10)

March 2008 (10)

February 2008 (9)

January 2008 (15)

December 2007 (6)

November 2007 (12)

October 2007 (17)

September 2007 (8)

August 2007 (14)

July 2007 (14)

June 2007 (49)

May 2007 (59)

April 2007 (27)

March 2007 (28)

February 2007 (41)

January 2007 (38)

December 2006 (19)

November 2006 (5)

Pages

About

Bibliography

Errata

PlayPen

Reviews

Samples

Syndicated

Joho the Blog » everythingIsMiscellaneous

Meta

Log in

Valid XHTML

XFN

WordPress