Subscribe to
Posts
Comments

Archive for May, 2008

Scan and Release: Digitizing the Boston Public Library

I’ve lived in Boston since 1986, but have never made it into the great Boston Public Library. Until today. My streak was totally broken because the little group digitizing the BPL’s holdings invited me in to see what they’re doing. And, oy, the work they have cut out for them!

But they’re an intrepid band. And they recognize that they’re up to something important. Although some in the BPL may have thought that digitized prints and photos are just lesser-qualities backups, the group knows that they’re not only bringing hidden images into the public sun, they are engaged in a social project that changes how and what we know. (What’s not to love about librarians?)

The Print Stack, where photos, prints and miscellaneous other objects are stored, only seems to be in the basement. The ceiling is low, there are now windows, and the lighting leaches vitamin D out of your body. It’s long and overflowing, reminiscent of the warehouse that ends Citizen Kane, and that is echoed in two Indiana Jones movies.

Boston Public Library storage area
Boston Public Library Print Stack

If you want to find a particular image in the roughly two million prints and images (no one knows for sure), you ask Aaron. Some bits and portions have catalogs of various sorts, but overall, it’s a disarray of metadata. For example, the Herald Traveler collection of photos has about 1.2 million pieces, arranged in 104 cabinets, each with four drawers. The folders and drawers are labeled, which helps a lot, but they’re not indexed, much less cross-indexed.

Herald Traveler collection in file drawer
Herald Traveler collection

At least those photos have captions. Aaron shows me some beautiful 19th century photographs of Indian architecture. Many years ago, the BPL went to enormous trouble to paste the photos into multiple volumes — turning the photos into a book, as Aaron points out — but didn’t bother to record the notes on the back of the photos. Aaron is now going to have to dissolve the pages to expose the notes.

Eroded negative
Aaron holds up a degraded negative.
A dirigible is barely visible on it.
Tough reclamation project.

The archive doesn’t just have pictures and prints. It’s got, well, everything, including a couple of old typewriters and a collection of matchbook covers from Boston restaurants.

matchbook covers
Boston matchbook cover collection

Of this abundance, the digital group has so far scanned about 24,000 objects. When I point out to Maura Marx, the group’s head, that, given the library’s estimate that it has maybe 23 million objects, she’s looking at a 2,000 year project, she tells me that they’re just getting started. They’re going to bulk up, maybe do some offsite digitizing, and begin to make some serious progress. When I ask Thomas Blake, who does the actual digitizing, how he decides which stuff to do, he laughs a little and says, “What I think is cool.” And, since the public has an appetite for “choochoo trains, maps and postcards,” he’s done a bunch of them. The BPL is, after all, a public institution that both serves the public and relies upon the public’s support.

stacked volumes

The Library has been posting digitized works at Flickr. Take a look at the 19th century photos of Egypt, or, yes, the postcards And the book fetishists among you should definitely check out the “Art of the Book” collection. Predictably and hearteningly, the public — you and me, sister — have been commenting and adding to what’s known. Maura hopes to get permission to put the images into the Commons. Digitizing and posting — “scan and release,” in the group’s memorable way of putting its mission — turns patrons into historians.

The scanning is slow because it’s one guy who’s doing a careful job. The camera has a 22 megapixel chip, but they’ve been known to digitize at 88mps, creating files that are half a gig in size. Tom likes saving the RAW files to avoid unnecessary data loss. You never know what’s going to be useful. For example, he had been scanning postcards at 300 dpi, but a curator pointed out that then you couldn’t see the dotscreen pattern, which might be of interest to someone. So now Tom scans them at 600dpi. Overall, they have about 1.5 terabytes of stored images.

The metadata is a whole ‘nother issue. Chrissy Watkins, who has been there for four days — she had been at the JFK Presidential Library — is working on it. For now, Tom gives every item an arbitrary and unique ID number, the key piece of any metadata scheme. But the BPL is facing the inevitable conundrum: Maximize the metadata but slow the process, or do grave less metadata but go at a far faster clip. The group seems to be leaning toward the latter, which makes sense to me. They’ve been using what Tom calls the “Curator Core,” a reference to the Dublin Core metadata standard for books. Trying to capture everything that might be useful is a task beyond daunting. For example, Michael Klein points to “fore-edge paintings,” paintings done on the edges of a book that are revealed when you fan the book slightly. Does the BPL have to come up with a standard that includes whether you fan the book to the left or right? There are so many different types of objects that building a standard or an ontology that captures them all would absorb all of the team’s time. (”The special case is not as special as you’d think,” says Michael.) Instead, they need to scan scan scan, and capture some reasonable set of metadata, to which more metadata can accrete.

OCA
One of the ten Open Content Alliance book scanners.

“We’re going from collect and hide to scan and release,” says Tom. And in so doing, they’re going not just from no value to some value. They are in fact radically multiplying the value of the Boston Public Library’s holdings. And as we the recipients of this gift incorporate the images, adding information to them, and contextualizing them, we are further enriching the holdings, far beyond what any small group, no matter how intrepid, could manage.
[Tags: ]

Buy It Like You Mean It

BuyItLikeYouMeanIt.org is having a luanch party on Tues., June 3. BILYMI lets people review products and companies, and then publishes a score based on which of the factors matter to you as an individual — how green, how well they treat their employees, etc. According to the press release:

Starting with the chocolate industry, students and volunteers are already reviewing harvesting, mining, manufacturing, packaging, and shipping practices. Shoppers will soon be able to access a single digit product score that summarizes all the information available about a product. This score will be based on a shopper’s unique “portfolio of interests” and will be accessible through: phones, text messages, supermarket shelf labels, and web browsers. Buy It Like You Mean It plans to have over 200 reviews and 1000 ratings by August.

I like the ability to decide for yourself which of the factors matters to you. Very miscellaneous!

The launch party is at 7pm, June 3, at the Taza Chocolate Factory at 561 Windsor Street in Somerville, MA.

[Tags: ]

A moment of Google silence

The China Vortex runs the search log for Google China that dramatically shows the three minutes of silence China observed on May 19th in remembrance of those who died in the earthquake. It is, eerily, like the inverse of a seismograph.

[Tags: ]

Science Commons, in its relentless drive for product line expansion (I kid because I love), has posted a white paper proposing a Health Commons. In it, the authors, Marty Tenenbaum and John Wilbanks, lay out the problems and suggest a solution.

They write:

We are no longer asking whether a gene or a molecule is critical to a particular biological process; rather, we are discovering whole networks of molecular and cellular interactions that contribute to disease. And soon, we will have such information about individuals, rather than the population as a whole. Biomedical knowledge is exploding, and yet the system to capture that knowledge and translate it into saving human lives still relies on an antiquated and risky strategy of focusing the vast resources of a few pharmaceutical companies on just a handful of disease targets.

After citing more problems with the current system, the authors propose a Health Commons:

Imagine a virtual marketplace or ecosystem where participants share data, knowledge, materials and services to accelerate research. The components might include databases on the results of chemical assays, toxicity screens, and clinical trials; libraries of drugs and chemical compounds; repositories of biological materials (tissue samples, cell lines, molecules), computational models predicting drug efficacies or side effects, and contract services for high- throughput genomics and proteomics, combinatorial drug screening, animal testing, biostatistics, and more. The resources offered through the Commons might not necessarily be free, though many could be. However, all would be available under standard pre-negotiated terms and conditions and with standardized data formats that eliminate the debilitating delays, legal wrangling and technical incompatibilities that frustrate scientific collaboration today.

The paper emphasizes the need for metadata standards: “Providing such standards, Heath Commons improves and extends the public domain by
integrating hundreds of public databases into a single framework…” The Commons also provides the needed “social and legal infrastructure,” and a portal that provides the right set of services.

They hope that by lowering research costs, some of the 5,000 tropical diseases currently “uneconomical to address,” for example, will become the target of pharmaceutical R&D. “Health Commons makes it cost effective for small groups of researchers to conduct industrial scale R&D on rare diseases by exploiting the economies of scale afforded by an ecosystem of shared knowledge…”

The authors see the benefits going beyond the Commons’ value to non-profits. “Every pharmaceutical company sits on a wealth of promising targets and leads that they won’t develop themselves.”

The Health Commons could be a huge step forward. But it will take some work. “To realize the full potential, existing companies need to rethink their business models to leverage the commons.” As an example, the paper points out that “Only six out of the 1800 biotechnology companies funded since 1980 have made more money than was cumulatively invested in them.” Rather than counting striking it rich with proprietary drugs discovered via proprietary R&D platforms, perhaps companies could profit by opening up their platforms and taking a cut of any drugs discovered with them.

Finally, Health Commons will provide a way to continuously publish research, along with comments, to supplement the traditional publishing model.

Health Commons can and should be a big deal. It requires lots of pieces coming together over time, but its acknowledgment of the role of profit is encouraging, and it is in the hands of serious, committed, and wickedly smart people. [Tags: ]

Health Commons launched

Science Commons, in its relentless drive for product line expansion (I kid because I love), has posted a white paper proposing a Health Commons. In it, the authors, Marty Tenenbaum and John Wilbanks, lay out the problems and suggest a solution.

They write:

We are no longer asking whether a gene or a molecule is critical to a particular biological process; rather, we are discovering whole networks of molecular and cellular interactions that contribute to disease. And soon, we will have such information about individuals, rather than the population as a whole. Biomedical knowledge is exploding, and yet the system to capture that knowledge and translate it into saving human lives still relies on an antiquated and risky strategy of focusing the vast resources of a few pharmaceutical companies on just a handful of disease targets.

After citing more problems with the current system, the authors propose a Health Commons:

Imagine a virtual marketplace or ecosystem where participants share data, knowledge, materials and services to accelerate research. The components might include databases on the results of chemical assays, toxicity screens, and clinical trials; libraries of drugs and chemical compounds; repositories of biological materials (tissue samples, cell lines, molecules), computational models predicting drug efficacies or side effects, and contract services for high- throughput genomics and proteomics, combinatorial drug screening, animal testing, biostatistics, and more. The resources offered through the Commons might not necessarily be free, though many could be. However, all would be available under standard pre-negotiated terms and conditions and with standardized data formats that eliminate the debilitating delays, legal wrangling and technical incompatibilities that frustrate scientific collaboration today.

The paper emphasizes the need for metadata standards: “Providing such standards, Heath Commons improves and extends the public domain by
integrating hundreds of public databases into a single framework…” The Commons also provides the needed “social and legal infrastructure,” and a portal that provides the right set of services.

They hope that by lowering research costs, some of the 5,000 tropical diseases currently “uneconomical to address,” for example, will become the target of pharmaceutical R&D. “Health Commons makes it cost effective for small groups of researchers to conduct industrial scale R&D on rare diseases by exploiting the economies of scale afforded by an ecosystem of shared knowledge…”

The authors see the benefits going beyond the Commons’ value to non-profits. “Every pharmaceutical company sits on a wealth of promising targets and leads that they won’t develop themselves.”

The Health Commons could be a huge step forward. But it will take some work. “To realize the full potential, existing companies need to rethink their business models to leverage the commons.” As an example, the paper points out that “Only six out of the 1800 biotechnology companies funded since 1980 have made more money than was cumulatively invested in them.” Rather than counting striking it rich with proprietary drugs discovered via proprietary R&D platforms, perhaps companies could profit by opening up their platforms and taking a cut of any drugs discovered with them.

Finally, Health Commons will provide a way to continuously publish research, along with comments, to supplement the traditional publishing model.

Health Commons can and should be a big deal. It requires lots of pieces coming together over time, but its acknowledgment of the role of profit is encouraging, and it is in the hands of serious, committed, and wickedly smart people. [Tags: ]

Libguides … letting librarians be librarians!

I’m about to run for an airport (this is probably the single phrase I utter the most in the course of a month, alas), so I only had time to take a quick look at Libguides, but it looks very interesting. It aims to let librarians (and others) share their wisdom and insight, while engaging the community of readers. Interesting! (Thanks to Karen Schneider for the link, via a tweet. And, congratulations to Karen on her new job!)

[Tags: ]

The long tail of baby names

Parade magazine today reports on the top ten names for baby boys and girls this year:

Jacob

Emily

Michael

Isabella

Ethan

Emma

Joshua

Ava

Daniel

Madison

Christopher

Sophia

Anthony

Olivia

William

Abigail

Matthew

Hannah

Andrew

Elizabeth

Ok, but I seem to meet more and more kids with one-off names. Isn’t the long tail of names getting longer every year?

[Tags: ]

1860 Census now open for browsing

Footnote has posted the 1860 Census with its usual array of tools and goodies, some of which require a free membership. But the basic browsing and viewing is open to all. Footnote does a nice job with this stuff, including annotation tools and other social amenities.

For those who are keeping score, there were about a dozen David Weinbergers listed in the census that year, including one whom the FBI investigated I think for draft dodging. [Tags: ]

Harvard Law goes Open Access

The Harvard Law faculty has voted unanimously for an Open Access policy based on the one that the Harvard Faculty of Arts and Sciences passed a few months ago. Yay!

John Palfrey, Harvard Law’s new vice dean for library and information resources (and, of course, the soon-to-be-former exec dir of the Berkman Center) gets to implement this happy policy.

[Tags: ]