My passion for metadata isn’t a big secret – since my days at Muze and B&N.com, I’ve witnessed firsthand how good metadata helps people find the books they are looking for, and how bad metadata prevents people from finding what they want.
Why is this relevant now?
Well, CES showed us that there is a great interest in ebook readers – 23 of them debuted there, and an entire “Ebook Zone” was created. Apple is negotiating with publishers to sell content (books, magazines, newspapers) on its soon-to-appear tablet. With all these digitized books, search becomes more crucial than ever – web search is the ONLY way people are going to purchase these digital products.
Discovery/review services like NetGalley – as well as all the ecommerce sites – are heavily reliant on metadata not just for listing titles, but also for search algorithms themselves. (You’d think that would go without saying, but it doesn’t.)
Whether it’s “semantic” search or a more traditional browsing hierarchy, search technologies rest on metadata. Tags, definitions, clarifications (“when we say ‘porcelain’ we mean fine china, not toilets”) are all necessary to guide users to the information they want.
This metadata may not come in the form of the traditional ONIX feed. If a book file is marked up in XML (whether via InDesign or anything else), the title, author, BISAC and LC subject codes, price, publisher, and copyright date can all be easily derived from that book file – because those data points are defined in the file (usually in the front matter) with tags.
But just as with ONIX, what’s inside those tags has to be correct. This has a better shot at happening if the search engine is pulling from the book itself (the author name, for example, is not likely to be misspelled in the actual book).
In recently-released recommendations to the publishing industry, BIC has stated: "Publishers must retain responsibility, wherever possible and appropriate, for the metadata of the products they publish, in all formats, print and digital." Another company, Giant Chair has built its entire business around hosting a metadata platform for publishers: “When equipped with the appropriate tools, publishers are naturally the most qualified and motivated source for metadata creation and enrichment.”
Which makes sense!
Except in the real world it doesn’t quite play out that way. In my career, I’ve seen lots of publisher-generated metadata. There’s a reason why NetRead, Eloquence, and other data-scrubbing services exist. There’s a reason why Ingram, Bowker, and Baker & Taylor have departments of data editors who normalize and standardize that data. There’s a reason why librarians spend countless hours re-cataloguing titles for WorldCat. There’s a reason why BISG launched its Product Data Certification Program.
And that reason is: while publishers make the books, they continue not to pay sufficient attention to the accuracy of their data. While publishers are the definitive source of who the author is, what the list price is, what the book is about…they are not recording a lot of that information accurately. Because if they were, Fran Toolan and Greg Aden would have to find new things to do. Richard Stark would suddenly find himself with weeks and weeks of free time. Thousands of library cataloguers would be out of work. Ingram, Bowker, and B&T databases would be redundant. PDCP would not be necessary.
But good metadata IS publishers’ responsibility, fundamentally. They can outsource that responsibility, but ultimately it does all come back to the publishers. As our digital landscape explodes – as web search becomes not just one way but THE way readers find what’s next on their reading lists – metadata only becomes more important. If your sales are dipping, it’s entirely possible that readers can’t find your books. Take a look at your data. The solution is probably there.