Taxonomies Now!

It’s time to start thinking about taxonomies.

Last October, the SWXML team wrote a post called “To Chunk or Not To Chunk”, where we discussed tagging and infrastructure issues, and a discussion ensued about what happens when you don’t know what you’ll be using chunks for. How do you tag those?

Later, in our StartwithXML One Day Forum, we included a presentation on tagging and chunking best practices, where it was pointed out that no taxonomy for chunk-level content currently exists.

We have taxonomies for book-level content. These include formalized code sets such as the Library of Congress subject codes, the BISAC codes, the Dewey Decimal System, among others. There are also informal code sets, like the tag sets on Shelfari or Library Thing. There are proprietary taxonomies at and B&N.com that enable effective browsing.

But nothing like this exists for sub-book-level content. It’s never been traded before. We’ve never really needed a taxonomy for it before.

Other industries that traditionally distribute “chunks” have their own taxonomies that might prove useful in building a book-chunk schema. These include the IPTC news codes, which identify the content of a particular news story – that’s the closest analogy I can find for small gobbets of content that require organization.

Industries have proprietary taxonomies to identify certain concepts – culinary arts, music, agriculture, engineering, the sciences, literature and criticism, education, and on and on and on. But these do not necessarily identify concepts within a book.

But why do we even need taxonomies?

Let’s think about some of the other book-level systems. LC and Dewey codes exist so that librarians will know where to shelve and find books and other materials. BISAC codes exist so that, likewise, bookstore staff will know where to find and shelves books and other materials. BISAC codes have been extended into the online world so that…you got it – online bookstores know where to “shelve” (virtually) books and customers know where to browse to find them.

As books get broken up into chunks, users can find them in one of two ways (same as you find anything else on the web): search, and browse.
Search, of course, is not as simple as slapping a keyword box onto a site – as SEO marketers can attest. There are all sorts of algorithms that are assisted by metadata, tags, classifications, categories – all of which add weight to a basic keyword search, and allow the most relevant results to appear at the top of search results. This is why a taxonomy for book chunks is critical – how else are you going to contextualize search and point users to the most relevant portion of a book?

Browsing more or less speaks for itself – but it’s pretty clear that the virtual bookshelf based on the BISAC categories is going to expand (if not explode) into smaller and more targeted pigeonholes for information. And a well-thought-out, organized taxonomy for these pigeonholes will be utterly essential.

So knowing that there are taxonomies out there, and knowing that the book industry will need one for chunk-level content, what happens now?
Essentially, the book industry will probably have to go down the list of BISAC codes, examine taxonomies from other industries, possibly look at the IPTC codes as a model, and come up with the necessary schema for selling pieces of books. Taxonomies are most effective when an entire industry buys into them and uses them – they can then be used in trade and distribution of material, and act as a de facto standard (and in some cases are actual standards).

Obviously this is a job for the Book Industry Study Group, which created the BISAC codes, maintains the ONIX standard for the US, and supervises many other book industry standards. The current Subject Codes committee, which maintains the BISAC codes, probably needs a task force to begin the work of collating and sifting through requirements for chunk-level categories.

But this needs to be done sooner rather than later. Retro-coding, in this instance, will be a hideous job – the proliferation of content will be on a fairly large scale. The sooner we get a taxonomy in place, the more ready we will be for transacting on portions of books as well as the entire volumes themselves.

Email:

Username:

Password: