Everything is Miscellaneous

Category: libraries

[lodlam] Richard Wallis on Schema.org

Posted in bibframe, everythingIsMiscellaneous, libraries, lodlam, metadata, podcast, schema.org on June 20th, 2013 2 Comments »

Richard Wallis [twitter: rjw] of OCLC explains the appeal of Schema.org for libraries, and its place in the ecosystem.

[lodlam] Bibframe update

Posted in bibframe, everythingIsMiscellaneous, libraries, liveblog, lodlam on June 19th, 2013 Comments Off on [lodlam] Bibframe update

Kevin Ford from the Library of Congress is talking about BIBFRAME, which he describes as a replacement for MARC and a rethinking of the entire ecosystem.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

(If a response isn’t labeled “Kevin,” then it wasn’t Kevin. Also, this is much compressed, incomplete, and choppy. Also, I haven’t re-read it.)

Q: From the Bibframe mailing list it seems like there isn’t agreement about what Bibframe is trying to achieve.

Kevin: Sometimes people see it narrowly.

Q: It’s not clear how Bibframes gets to where it replaces MARC.

Kevin: We’re not holding back some plan or roadmap that we’ve mapped out perfectly with milestones and target dates. We’re taking it as it comes.

Q: There’s a perception on the part of vendors and customers of vendors that this is a new data specification that vendors will have to support, and that that’s its main function, and possibly that’s pushing the knowledge representation in a direction that’s favorable to the vendors — a direction that’s too simple.

Q: Is there an agreement about the end point?

Kevin: There’s agreement that it needs to do what MARC does but better. We’re doing data representation, not predicting the systems built on top of it.

Q: What are the functional requirements that Bibframe’s trying to meet with this new model? What are your metrics? And who are you trying to satisfy?

Kevin: It’s not vendor focused. We hope systems will be built that expose the data as linked data.

Q: Bibframe let’ you associate a record with a particular work, which is a huge advance.

Q: Bibframe used to talk about roundtripping from MARC to Bibframe to MARC. But Bibframe is now adding info, so I don’t see how roundtripping is possible.

Kevin: Not losslessly.

Q: Bibframe is intended for libraries, but from what I’ve seen it doesn’t seem that Bibframe is intended for use outside of libraries. There doesn’t seem to be any thought about how other ontologies might be overlaid. And that was a problem with MARC: it was too library-centric. Why not investigate mapping it into other vocabularies?

Kevin: Nothing stops you from including other namespaces. As for mapping to other vocabularies, we’re working on a 40 year time scale and can’t know that other vocabularies will be around.

Q: We need some community-building to make that happen. We need to be careful not to build an ontological silo.

Q: The naming of this data set is unfortunate: Why” bib”, which has a connotation of books, when really it should be about any kind of information-bearing object. Why not call it “InfoFrame”? Who uses “bibliographic” other than libraries? Why limit yourself?

Kevin: I cannot begin to tell you how much time was spent on what this thing should be called. It went through a couple of different names. It’s not an ideal name, but I hope that the “bib” association falls by the wayside.

Q: The library ecosystem includes articles, licenses, and many other things that weren’t part of MARC. Is Bibframe aiming at representing all of that?

Kevin: Yes, it’s in scope. Certainly data about journal articles.

Kevin: Yes, Bibframe lets you define your own fields, as in MARC.

Q: We’re going from cataloging to catalinking: from records about resources to links related to topics, etc.

A: We need services that will link resources to other resources. Bibframe doesn’t do that, but it’s more amenable to it than MARC.

Kevin: [Sorry, but I missed the beginning of this.] When it comes to subject headings, we expect you to resolve that URI. If people are doing that every single time, then it’s a candidate for being included. That lookup could be a query into your local system. I’ve assumed you’ll have to have a local copy of it.

Q: Versioning? Why did you ignore the work of the British Library?

Kevin: We didn’t ignore it at all. We need to attend to what’s achievable by the smallest institutions as well as the largest.

Q: For a small institution, is it practical to move away from MARC?

Kevin: Not for some. Some still use card catalogs. I expect some of the first systems will be an outward layer around legacy systems.

Q: We need a larger discussion about provenance and about trust on the semantic web. Libraries should be better participants in that discussion; it’s a deeply important space for us.

Q: This conversation makes me cynical about our profession’s involvement. We need be talking with users. We need community involvement. We’re worried about the longevity of FOAF? It’ll outlast Bibframe because people actually use it. Let’s keep turning inward until we’re completely irrelevant.

Q: Yeah, the idea that there has to be one namespace seems so counter to the principles of linked data.

Q: Do we have anyone outside of the library community here?

A: I’m mainly a web developer. There’s a really big gulf. The Web will win when it comes to how libraries operate. Whether Bibframe will be a part of it remains to be seen. In the web community, everything seems exciting, but I feel so much angst in the library community.

[misc] The loneliness of the long distance ISBN

Posted in books, eim, everything is miscellaneous, everythingIsMiscellaneous, isbn, libraries on May 20th, 2013 2 Comments »

NOTE on May 23: OCLC has posted corrected numbers. I’ve corrected them in the post below; the changes are mainly fractional. So you can ignore the note immediately below.

NOTE a couple of hours later: OCLC has discovered a problem with the analysis. So please ignore the following post until further notice. Apologies from the management.

Ever since the 1960s, publishers have used ISBN numbers as identifiers of editions of books. Since the world needs unique ways to refer to unique books, you would think that ISBN would be a splendid solution. Sometimes and in some instances it is. But there are problems, highlighted in the latest analysis run by OCLC on its database of almost 300 million records.

Number of ISBNs	Percentage of the records
0	77.71%
2	18.77%
1	1.25%
4	1.44%
3	0.21%
6	0.14%
8	0.04%
5	0.02%
10	0.02%
12	0.01%

So, 78% of the OCLC’s humungous collection of books records have no ISBN, and only 1.6% have the single ISBN that God intended.

As Roy Tennant [twitter: royTennant] of OCLC points out (and thanks to Roy for providing these numbers), many works in this collection of records pre-date the 1960s. Even so, the books with multiple ISBNs reflect the weakness of ISBNs as unique identifiers. ISBNs are essentially SKUs to identify a product. The assigning of ISBNs is left up to publishers, and they assign a new one whenever they need to track a book as an inventory item. This does not always match how the public thinks about books. When you want to refer to, say, Moby-Dick, you probably aren’t distinguishing between one with illustrations, a large-print edition, and one with an introduction by the Deadliest Catch guys. But publishers need to make those distinctions, and that’s who ISBN is intended to serve.

This reflects the more general problem that books are complex objects, and we don’t have settled ways of sorting out all the varieties allowed within the concept of the “same book.” Same book? I doubt it!

Still, these numbers from OCLC exhibit more confusion within the ISBN number space than I’d expected.

MINUTES LATER: Folks on a mailing list are wondering if the very high percentage of records with two ISBNs is due to the introduction of 13-digit ISBNs to supplement the initial 10-digit ones.

[misc] StackLife goes live – visually browse millions of books

Posted in dpla, everythingismisc, everythingIsMiscellaneous, harvard, libraries, stacklife on April 18th, 2013 Comments Off on [misc] StackLife goes live – visually browse millions of books

I’m very proud to announce that the Harvard Library Innovation Lab (which I co-direct) has launched what we think is a useful and appealing way to browse books at scale. This is timed to coincide with the launch today of the Digital Public Library of America. (Congrats, DPLA!!!)

StackLife (nee ShelfLife) shows you a visualization of books on a scrollable shelf, which we turn sideways so you can read the spines. It always shows you books in a context, on the ground that no book stands alone. You can shift the context instantly, so that you can (for example) see a work on a shelf with all the other books classified under any of the categories professional cataloguers have assigned to it.

We also heatmap the books according to various usage metrics (“StackScore”), so you can get a sense of the work’s community relevance.

There are lots more features, and lots more to come.

We’ve released two versions today.

StackLife DPLA mashes up the books in the Digital Public Library of America’s collection (from the Biodiversity Heritage Library) with books from The Internet Archive‘s Open Library and the Hathi Trust. These are all online, accessible books, so you can just click and read them. There are 1.7M in the StackLife DPLA metacollection. (Development was funded in part by a Sprint grant from the DPLA. Thank you, DPLA!)

StackLife Harvard lets you browse the 12.3M books and other items in the Harvard Library systems 73 libraries and off-campus repository. This is much less about reading online (unfortunately) than about researching what’s available.

Here are some links:

StackLife DPLA: http://stacklife-dpla.law.harvard.edu
StackLife Harvard: http://stacklife.law.harvard.edu
The DPLA press release: http://library.harvard.edu/stacklife-browse-read-digital
The DPLA version FAQ: http://stacklife-dpla.law.harvard.edu/#faq/

The StackLife team has worked long and hard on this. We’re pretty durn proud:

Annie Cain
Paul Deschner
Kim Dulin
Jeff Goldenson
Matthew Phillips
Caleb Troughton

[misc] I bet your ontology never thought of this one!

Posted in everythingIsMiscellaneous, libraries, metadata on December 18th, 2012 Comments Off on [misc] I bet your ontology never thought of this one!

Paul Deschner and I had a fascinating conversation yesterday with Jeffrey Wallman, head of the Tibetan Buddhist Resource Center about perhaps getting his group’s metadata to interoperate with the library metadata we’ve been gathering. The TBRC has a fantastic collection of Tibetan books. So we were talking about the schemas we use — a schema being the set of slots you create for the data you capture. For example, if you’re gathering information about books, you’d have a schema that has slots for title, author, date, publisher, etc. Depending on your needs, you might also include slots for whether there are color illustrations, is the original cover still on it, and has anyone underlined any passages. It turns out that the Tibetan concept of a book is quite a bit different than the West’s, which raises interesting questions about how to capture and express that data in ways that can be useful mashed up.

But it was when we moved on to talking about our author schemas that Jeffrey listed one type of metadata that I would never, ever have thought to include in a schema: reincarnation. It is important for Tibetans to know that Author A is a reincarnation of Author B. And I can see why that would be a crucial bit of information.

So, let this be a lesson: attempts to anticipate all metadata needs are destined to be surprised, sometimes delightfully.

[2b2k][everythingismisc]“Big data for books”: Harvard puts metadata for 12M library items into the public domain

Posted in 2b2k, everythingIsMiscellaneous, libraries, metadata, open access, too big to know on April 24th, 2012 1 Comment »

(Here’s a version of the text of a submission I just made to BoingBong through their “Submitterator”)

Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. The metadata, in the standard MARC21 format, is available for bulk download from Harvard. The University also provided the data to the Digital Public Library of America’s prototype platform for programmatic access via an API. The aim is to make rich data about this cultural heritage openly available to the Web ecosystem so that developers can innovate, and so that other sites can draw upon it.

This is part of Harvard’s new Open Metadata policy which is VERY COOL.

Speaking for myself (see disclosure), I think this is a big deal. Library metadata has been jammed up by licenses and fear. Not only does this make accessible a very high percentage of the most consulted library items, I hope it will help break the floodgates.

(Disclosures: 1. I work in the Harvard Library and have been a very minor player in this process. The credit goes to the Harvard Library’s leaders and the Office of Scholarly Communication, who made this happen. Also: Robin Wendler. (next day:) Also, John Palfrey who initiated this entire thing. 2. I am the interim head of the DPLA prototype platform development team. So, yeah, I’m conflicted out the wazoo on this. But my wazoo and all the rest of me is very very happy today.)

Finally, note that Harvard asks that you respect community norms, including attributing the source of the metadata as appropriate. This holds as well for the data that comes from the OCLC, which is a valuable part of this collection.