Everything is Miscellaneous

Shelley disagrees with and dislikes “Miscellaneous”

May 7th, 2007 by

Shelley Burningbird Powers both disagrees with what I say and doesn’t much care for how I say it in Everything Is Miscellaneous. I appreciate Shelley’s care and thought. She does exactly what an author hopes reviewers will do â€” engage with the ideas â€” although of course I’d rather that she loved every comma and period in it. But, I didn’t ask the publishers to send her a copy thinking that she was likely to agree with it.

There are portions where I shake my head because she states as a disagreement with me precisely what I in fact was trying to say….which means I failed to communicate, i.e., wrote badly. For example, she says that as “any computer person would know immediately,” the way bits are arrayed on platters isn’t chaotic and messy, but rather is due to a “carefully constructed application consisting of programming algorithms and data model…” I actually do know that. But my point was: “The gap between how we access information and how the computer accesses it is at the heart of the revolution in knowledge. Because computers store information in ways that have nothing to do with how we want it presented to us, we are freed from having to organize the original information the way we eventually want to get at it.” I didn’t intend to imply that computers arrange bits chaotically or messily from the computer’s point of view, so to speak.

Shelley points out that while I use Flickr clustering as an example of order emerging bottom up from bits of meaning created for some other purpose, people now sometimes tag in order to get their photos into a cluster. Good point, but it actually only further makes my overall case that the chaos some predicted would result from free tagging in fact is not occurring (or at least not occurring to the degree feared).

Some of Shelley’s criticisms I certainly accept as accurate. I am indeed sloppy about the distinction between metadata and data. Sometimes that was slightly intentional â€” I sort of knew I was doing it â€” but I thought honoring the distinction would be confusing to less computer-literate readers than Shelley. Sometimes it wasn’t intentional; I’m just wrong. And overall my point in the book is that digitizing information erases all but the operational difference between data and metadata: Metadata now is what we know and data is what we’re looking for.

But when Shelley says she’s “flummoxed” by my referring to the spaces between words as metadata, I am flummoxed by her being flummoxed â€” a classic sign of a disconnect (a space between our words?). Spaces between words were introduced rather late in the literary day, and they seem to me to be as clearly metadata as are parentheses. They are data used to delimit data. Doesn’t that make them metadata? Where am I going wrong with that?

Some of the criticism I’d argue with, although the fact that it has arisen indicates that I’m at fault for not making myself clear. For example, Shelley says that I don’t provide examples of how Eleanor Rosch’s prototype theory applies to tagging systems. True, but that’s because I don’t think it applies directly. And I say so: “While prototypes are unlikely to become a dominant way of organizing Web materials, the fundamental property of prototype theory is already quite important in the digital order: The Web is full of sort-of, kind-of clustering based on multiple attributes, not based on Aristotlian definitions.” I then tie that to Joshua Schachter saying something can be “73 percent in a category.” (p. 196)

Shelley is absolutely right that I cover the Semantic Web too quickly. Initially, I planned on spending more time on it. But, I don’t think it’s at the heart of the change I think we’re going through. I may, of course, be wrong, but the section in the book tries to explain why I think that. While Shelley is correct that I am a skeptic about large-scale Semantic Web projects, I think my overall presentation is fairly balanced. But, then, I would think that. What it comes down to is the fact that Shelley and I disagree.

As for what Shelley says was the process by whch I developed the book â€” “David had a concept, a belief, and then sought out specific knowledge and other witnesses to the faith who would provide the evidence to support such” â€” there is an element of truth to that as well, but less than Shelley thinks. Everything Is Miscellaneous is an argument. In once sense, it’s an argument with Aristotle, although that’s not what we put on the jacket cover. In another and more important sense, it’s an argument against the idea that there is a best way to organize ideas. My task in the book is to surface our assumption that there is a best way, to show that it actually has a history (and isn’t itself “natural”), and then to point out the ways in which digital technology doesn’t fit with that old idea. Finally â€” the second half of the book, actually â€” tries to show how overturning that old belief affects business, science, politics, education, etc. As an argument, there is a polemical side to the book. But, I did try to be fair. And if Shelley could have seen what I believed when I first started working on this book three years ago, she would be less likely to think that I went about listening only to people I agreed with. Three years ago, other than smelling a rat in the idea that there is a single best way to order ideas, I didn’t know who I agreed with.

So, thank you, Shelley, for the thoughtful review. It’s a lot more helpful than “Loved the book. Gotta go.” [Tags: everything_is_miscellaneous shelley_powers burningbird]

29 Comments »

29 Responses to “Shelley disagrees with and dislikes “Miscellaneous””

on 07 May 2007 at 10:54 am1 xian

I’m not sure spaces are metadata. Yes, of course they delimit words, but what is the data about the word that that you think they contain? “Word has an ending (y/n)”?
on 07 May 2007 at 10:58 am2 David Weinberger

The contain the data “The word ends here.”

If they’re not metadata, then I assume they’re data. But what is their data other than information about the piece of information they delimit?
on 07 May 2007 at 11:19 am3 vaspers the grate

Now we are all devotees of Jacques Derrida in the post-modem world, and we settle for the style of firecrackery and jouissance as Lacan might add surreptitiously.

“Is that a word?” some lang-wanker chortles self-defeatedly.

Of course it is a word. It is a word because you spoke it into being, inscribing it with your mouth on the aerial effluvium of hibernated real world, offline space, then injected it into the venerable digital osmotics.

Metadatically speaking, er, typing, I must say, er, write, that words are confused about their own nature, as Derrida has proven quite literally and monolingually in the phallologocentric mess of Western metaphysics.

Thus, I agree angrily with David Weinberger.

Go crowdsourcing. Go swarm creativity. Go user generated content w/out hierarchy or categorical imperatives.
on 07 May 2007 at 11:45 am4 sean coon

interesting thought re: spaces as meta-data… what about classifying them as a linguistic synapse — bridging the end of data to the beginning of data in order to forge information?

unfortunately, that gives kerning way too much power. ;)
on 07 May 2007 at 12:59 pm5 Andrew Hinton

It’s hard enough to articulate new ideas that don’t have conventional language for them yet.
It’s painful to *then* express those ideas so they convince people of their import who are highly entrenched in other frames of reference.
And it’s well night unto impossible to do that if said people’s frames of reference are precisely the things being challenged by your newly articulated ideas.

Generally, when I encounter bits and pieces of a book or article that I can quibble with factually, I try to hold off and see if I can get the spirit of the book anyway. You know. Benefit of the doubt. Not doing so usually means I’m coming into the situation with my mind kind of made up already…
on 07 May 2007 at 1:29 pm6 vaspers the grate

Forgot to add that there is a version of the Bible that is in ALL CAPS withnospaces between letters. Will dig up the name of it. In a theology tome at home.
on 07 May 2007 at 1:32 pm7 vaspers the grate

The ultimate “difficult” discussion of things that have no good corresponding words would be the Gospel.

Jesus used parables of earthly things to speak of Mind and Heaven and Eternity.

So it’s not that hard really to convey new technical realities, just try to find analogies, metaphors, comparisons, correspondences as Swedenborg would say.

Take something they already understand intimately, and draw the comparison.

That’s basically the entire secret of all teaching.
on 07 May 2007 at 1:36 pm8 vaspers the grate

“RSS is like home delivery by a pizzaria, except the pizzarias don’t have to be local, the delivery system is over wires or waves, the RSS reader does the polling, er, calling, and the pizza is web content (blog posts, site changes, video, podcasts, etc.)” for example.

Depends on audience and purpose of communication.
on 07 May 2007 at 3:20 pm9 Michael R. Bernstein

Spaces between words were introduced rather late in the literary day, and they seem to me to be as clearly metadata as are parentheses. They are data used to delimit data. Doesnâ€™t that make them metadata? Where am I going wrong with that?

I think (but am not sure) that what’s going on here is that you are confusing data-format features (that make implicit aspects of the data explicit) with the data itself.
on 07 May 2007 at 4:26 pm10 xian

It still smells different from most other metadata to me, even if it really is metadata. Are there other metadata about words that apply to a specific part of the word?

I tend to think of metadata as applying to the data as a whole – facts about the data: I suppose you could say the fact about that word is that it ends after its 5th letter… Yes, I suppose the space might mean “word is x letters long… OK. I’m sold.
on 07 May 2007 at 5:01 pm11 AKMA

On the â€œspacesâ€ argument (and here I must stipulate that I haven’t read the book): whatâ€™s the difference between a space and some other arbitrary word-delimiter (say, a bullet, a vertical line, a hyphen)? Isn’t a space essentially just a glyph of emptiness? In which case, I infer that David rtightly points out that the space doesnâ€™t convey semantic or syntactical data by itself (how would you mispronounce a space, or parse its function in a sentence?), but rather conveys information about the other glyphs among which it appears (â€œthese belong together, as opposed to thoseâ€). In other words, â€œmetadata.â€ Nice point, David!
on 07 May 2007 at 5:41 pm12 Michael R. Bernstein

AKMA, I disagree. Where you see metadata, I see data that was simply missing from the original serialization format.

In other words, sentences without spaces were just a lossy representation. Adding spaces to the format fixed that, but that does not make them data about data.
on 07 May 2007 at 5:57 pm13 AKMA

Sentences without spaces were a lossy data format before the correct format was known? That sounds weird to me; if at the time no one thought of spaceless sentences as less-than-full, on what basis do we determine that their formatting is lossy? And how can we make a more correct data format by adding data that the original format lacked (how do we know we’re right?)?

If it’s all in code, are the sequences that represent spaces different from the sequences that represent glyphs?
on 07 May 2007 at 8:02 pm14 David Weinberger

Michael, if a space is a datum, what is that datum? I believe a space is a datum about the data it delimits. But data about data is exactly my definition of metadata.

I wouldn’t put spaces forward as a good example if someone genuinely didn’t know what metadata is; spaces are not a prototype of metadata. For that I’d probably point to a label. But I don’t use spaces in the book as a primary way of explaining metadata. I use them as an example of how pervasive metadata data are, and how they can be quite implicit and unnoticed.
on 07 May 2007 at 8:53 pm15 Jay Fienberg

I actually don’t quite get how you’re considering words to be data–in some kind of general sense. (Obviously, specific data types may include words, and specific words can act as data in certain types.)

So, that may be part of others’ confusion / arguments, as well.

There are data formats that are space delimited (e.g., in an HTML element with multiple class names, the names are space delimited within a single class attribute). The “space delimited” rule for those formats itself could be codified as data, e.g., in some kind of schema language or ontology language. Then, the rule-codified-as-data would probably get called metadata.

***

In any specific data type kind of way of looking at it, data types can said to be defined in terms of a set of legal symbols that are used to represent data of that type. For example, a data type for “1-to-5 star reviews” might allow only the * character.

In this way of looking at data, a space character between words would generally be considered just another legal symbol–the same as any of the alphabet-letters that make up the words.

And, in certain systems, data associated with the data type definitions, e.g., the data that is the set of legal characters for a specific type, are called “metadata.”
on 07 May 2007 at 9:00 pm16 Michael R. Bernstein

Spoken speech has tiny pauses between words and stress changes within words that help to denote their beginning and end, though we don’t notice them in the normal course of things. Try listening to someone speaking in a monotone with no pauses to see how much it these subtle changes (and the larger pauses at the end of sentences) aid comprehension.

Running the written text together loses that, making it more difficult (though not impossible) to decode the text back into words.

So, while the spaces are data, it is just redundant data, like a checksum. With them, decoding the text stream back into words is unambiguous.

No one thinks of checksums as metadata, it is just a feature of a data *format*, and of no more semantic significance than that.
on 08 May 2007 at 7:13 am17 David Weinberger

Jay, I’m surprised you don’t count words as data. Would you count words as information?

Michael, I take the stress changes and pauses as metadata that give us important information about the data (the words).
on 08 May 2007 at 10:36 am18 AKMA

“Spoken speech has tiny pauses between words” — I’m surprised too hear that; I’d be interested in learning about research that demonstrated such a thing. I can hear distinct words easily in spoken English (usually), when concentrating in spoken French (usually), with great effort in spoken German (sometimes), and only occasionally in spoken Spanish. Are the pauses “there,” or are they an effect of the relative fluency with which I perceive and process the words?
on 08 May 2007 at 11:45 am19 Jay Fienberg

“Jay, Iâ€™m surprised you donâ€™t count words as data. Would you count words as information?”

Yes, but that’s not a property of the words as much as it’s a property of we humans always having or creating a sense of persistent context for words.

(Or, we’re not recognizing the characters as words, and just talking about the characters as information.)

For example, someone hands you a piece of paper with three letters: “die.” That’s already a bit of a context. If you’re playing a German word game, “die” is the definite article (“the” in English). If you’re a character in horror movie in English, it brings bad tidings of you imminent doom. . .

So, until writing all of that, I felt like there was no similarly persistent context / contextualizing in which we recognize words as data. I was just thinking there are specific contexts (of which, of course, there could be many). For example, “die,” which represents a number of points in Scrabble, is data in the context of Scrabble scoring.

But, now I see that there is at least one persistent context / contextualizing in which we recognize words as data: the “word count” context. For example, this is turning into a really long comment, which means it has a lot or words: the individual words are the data of this word count context.

To get the spaces = metadata, in this context, I’d suggest that a word count includes a list of characters that act as delimiters or non-word characters. That items in that list, including the space character, are data: it’s not the word data, but it’s a constraint on the word data, and therefore, in this context, it could rightly be called metadata.
on 08 May 2007 at 3:42 pm20 Michael R. Bernstein

AKMA, the pauses are genrally there for languages that need them to the degree that they are needed (which can be highly variable). Note that German, for example, has words structures that are more regular and agglutinative as a matter of course, and consequently more run-on words in print.

Also note that a speaker’s pauses get more distinct when they assume the listener is less fluent, and that pauses are less necessary when vocabulary is constrained (so that words are more easily distinguished as unique utterances).

This is far from a hard-and-fast rule, just a tendency, and there are other forms of redundancy like the use of ‘a’ vs. ‘an’ or even regional dialects (both pronunciation and idiom) that reduce the need for pauses in many cases, but of course printed word boundaries for a given language are less susceptible to variation (precisely in order to make them more comprehensible across time and space in the same way that more distinct pauses do).

David, on top of the foregoing, pauses and stress can of course also convey metadata (the pregnant pause, sarcasm, the stress conveying a double meaning, the change in tone indicating a digression), but note that these are generally conveyed in print in other ways, if at all.

In those cases (commas, parentheses, the ellipsis) and others, we usually *are* talking about metadata (because it adds to the literal meaning of the words), but a literal interpretation of the lowly space in-and-of-itself is just that it’s a simple delimiter and a data format detail (and, if I recall correctly, bullets were tried first).

It is interesting to note that the affordances of the textual equivalent may not match those of the spoken utterance (this is, in part, why you don’t usually notice the micro pauses consciously). For example, few people can manage to convey by tone alone more than one level of digression, whereas nested parenthetical remarks run rampant in print. This is also why we need commas to indicate deliberate, meaningful pauses, rather than relying on double or triple spaces in print (which is the ‘obvious’ solution).
on 08 May 2007 at 7:35 pm21 Shelley

For anything to be meaningful, there’s must be some value with attaching a formalized concept of ‘metadata’, same for attaching a value of ‘data’.

A word by itself has meaning–the dictionary attests to that. But words as data–depends on the context. The title of your book consists of words, and they form data. The metadata is ‘title’ or even ‘book title’. This may seem an uptight statement to make to some, or overly simplistic to others, but it’s a very tangible example of data and metadata, as well as words as ‘data’.

But red

Yellow apple rain not not worm peach. Brick fog chip!

Very real words. Each has meaning. In this context and how they’re used? Worthless. So are they data? Only to the person who really does count the cracks in the sidewalk.

Returning to the space. I know that in your book, David, you used space to represent how even absence can be metadata, and used spaces between words as example. But spaces are not ‘nothing’, nor are they an absence of something. They are actual characters, they have their own key on the keyboard, their own width based on font, they take up real estate on a page. They don’t have a visible presence, but they ‘something’.

Now, are the metadata? Not really. They are a syntactic element, just like the alphabet that is used to form what can end up being data or even metadata. But they don’t really describe anything, no more than the letter ‘b’ describes anything. Well, unless we’re watching Sesame Street.

Even if they were, they’re not inherently a part of a word — they’re a part of the sentence, phrase, term, what have you. A word can be terminated with any number of characters, or even nothing at all. It’s not dependent on a space for its termination. It’s the sentence, phrase, or term that is counting on its existence.

Thisisaperfectlygooduseofwordsbutnotagreatsentence.

This/is/a/perfectly/good/use/of/words/but/not/a/great/sentence.

Now, when is the absence of something ‘metadata’? “When’s the wedding?” “Oh, we haven’t set the date yet.”

But then, I’m an uptight Semantic Web nerd ;-)
on 09 May 2007 at 7:59 pm22 roy belmont

The space-between-words has clear lines forming its sides – it’s the lines of the letters in the words themselves that mark it. But above and below are only spaces with indeterminate edges, the lines between the lines, and again out at the margins of the page those line-data spaces become something else again. White space is cohesive, unbroken, all through the text unless a line gets made out to the edges of the page, as it almost never does. We revere the margins in this way.
It’s sort of a concrete abstraction, that.
“They are actual characters, they have their own key”
Yes. But the “Delete” key, the “COPY” and “PASTE” commands own the whole document in that sense don’t they?
It’s where that space-between-words character meets its upper and lower “other” – the space-between-lines – that seems interesting in this way, as meaning imputed to the space-between-words character disappears…where? Or when? At the parallel lines of the capitals? Top of the “I” is top of the space?
Without being facetious – it’s all there, and on out to the far edge of the page.
Like a Mercator projection, it’s not accurate, but it serves well enough to get the job done. We use the written word to map the spoken, or did until whole specialized demographics became so familiar with it they could be said to have inhabited the maps themselves.
In speech the outside margins of the info are the world around the spoken words – everything that is not-spoken – it’s cylindrical more than linear, yes?
The little gaps and elisions of spoken language take place, when they do, within a larger space they blur seamlessly into as well, though we supply the linear flow, by hearing it. Which is to say in order to have a discussion about the spaces between words we have to pretend those spaces aren’t part and parcel of the spaces-between-lines which are themselves the margin-stuff and the blank foolscap all one.
on 13 May 2007 at 11:46 am23 Brian H

One, or maybe the primary, benefits of written spacing is speed of absorption/reading. It is possible to absorb printed text much faster than it can be coherently spoken, especially if you’ve been give a bit of speed-reading training, and the space-format-metadata clues help by reducing the decoding effort. But there are mysteries about how we order information and data. For example, in the standard intro/promo session for the Evelyn Wood speed reading course, one is shown a series of flip cards, riffled quickly to show that we pick up on the meaning of a series of words forming a sentence quite quickly. And then comes the kicker: the stacks of cards are shown slowly, in the order “riffled”, and the last few sentences are in garbled or random order. But were perceived and read as though properly sequenced.

The clear implication is that there is a rather larger “comprehension buffer” in the brain than we are used to considering, and that meaning and structure are derived/comprehended/imposed in a far more active and contextually integrated form than we naively assume. To continue with the E.W. example, the extension of the point is that with practice and trust in our background sorting and meaning-assignment capacities it is possible to scan pages “slashing” diagonally across chunks of paragraphs, whole paragraphs, or even pages and yet perceive and absorb what is written. And then the demonstration concludes with a some “speed-reading” of poetry — which turns out to be reading aloud, with feeling. There is SO much of the weight of communication borne in poetry by the cadences, rhymes, and patterned reflections of image that it would be almost pointless to try and “grok at a glance”. ;)

So spaces HELP with perception of content. Metadata organizes semantically. Perhaps you could expand your argument to encompass capitalization conventions, many of which r txtg yth r gvg up ttlly. No space at the inn on a tiny cell phone screen for all the convenient clues that foolscap, whether material or electronic, allows.

Speaking of conventions, agreed-upon spelling and syntax have much to do with comprehension, too; having spent years editing articles written for the Web, I know that many disdain (in error, IMNSHO) to follow them. Pointing out errors and typos in, e.g. comments sections like this (“well night unto impossible” s/b “well-nigh impossible”; “pauses are genrally there” s/b “pauses are generally there”; “imminent doom. . .” s/b “imminent doom …”) get one excoriated as a “spelling-and-grammar Nazi!”. Speed of comprehension, though, has a lot to do with such pickiness; decoding misspells and malaprops takes time and effort and is distracting.

So are formalisms and formatting metadata? Is it simply a matter of preference about whether the definition should be stretched to include them? Or is there something more important at stake? Maybe there’s a special category of metadata which relates specifically to internal consistency and comprehensibility, but is otherwise “neutral”.
on 14 May 2007 at 4:31 pm24 johne

Marshall McLuhan, somewhere, quotes the memoirs of an African who passed an illiterate childhood in a village society, and began school only when he was old enough to reflect on what was happening. He said, if I remember correctly, that once he realized that the ocean of meaning that flowed toward him when someone opened their mouth could be thought of as divided into individual words, each one expressing an idea, and each usable in a different context, literacy was a snap.

That would imply that the very concept of words, and hence spaces, definitely falls in the realm of metadata.
on 19 May 2007 at 5:40 am25 roy belmont

Brian H thanks that was really groovy. Whether there’s something more imprtant at stake or not I think we should act like there is. because there probably is even if we can’t tell right now.
JohnE it would also imply that said African when he was hearing and comprehending speech before his literacy breakthrough was hearing and comprehending something that was not made of words. Kind of how the day isn’t really divided into 24 equal parts, and never was.
on 28 Nov 2013 at 2:18 am26 omega nc800hdr masticating juicer

Offer something somewhat more specific? ;-) As for instance, information or perhaps publications dealing
with juicer comparison charts. Can I sign up for your site
content?

My blog – omega nc800hdr masticating juicer
on 13 Oct 2014 at 11:03 pm27 Sibyl

Attractive part of content. I just stumbled upon your site and in accession capital to claim that I get actually loved account your blog posts.

Any way I will be subscribing on your augment and even I success you access persistently fast.
on 21 Oct 2014 at 9:05 pm28 Lomita Mortgage

No matter if some one searches for his vital thing, therefore he/she wishes to be available that in detail, therefore that thing
is maintained over here.
on 23 Oct 2014 at 11:48 pm29 Strickfaden Family members

You can definitely see your expertise within the work you write.
The world hopes for more passionate writers such as you who aren’t afraid
to say how they believe. Always follow your heart.

Shelley disagrees with and dislikes “Miscellaneous”

29 Responses to “Shelley disagrees with and dislikes “Miscellaneous””

Tags

Sites to See

Archives

Pages