Looking for Trouble -- Why?

How did this happen?

In a perfect world with infinite resources, we would read all of the books before buying them for the library's collection. With such close attention, we would be fully confident that all of our books had the highest quality artistic and intellectual content, were appropriate for our collection, and were constructed with paper and binding that would last the ages. In the real world, we need to work more efficiently to make the best use of scarce resources. While we do buy some books based on our personal experience with them, we also depend on reviews from our colleagues, recommendations from vendors, and reputation of authors and publishers. Mostly this works out great, but there are hiccups from time to time.

The same kind of thing happens with cataloging. In a perfect world we would scrutinize each catalog record, ensuring that each one exactly matched the book in hand, and also complied with all cataloging rules and standards. We would do a full collection inventory every day. While we do some careful original cataloging, we also need to work more efficiently to make the best use of scarce resources. We batch load or subscribe to record sets from vendors, and copy catalog from utilities like OCLC. We convert records to new standards and migrate data to new systems. Mostly this works out great, but...

Technical debt

In software development, we are said to accrue technical debt when we make decisions (for expediency, efficiency, or due to other pressures) that get things done in the moment, but are likely to result in more work later on. Some examples in the catalog might be:

Migrating data from one system to another without checking every record to make sure it arrived safely.
Acquiring new materials "shelf-ready" without confirming every call number against classification schedules.
Allowing materials to have multiple barcodes, or not barcoding older materials until needed for circulation.

None of these are necessarily bad choices -- we would not get nearly as much done without making choices like this! But they do have repercussions. A book with no barcode will be slower to check out. Call numbers with some types of errors may result in books being mis-shelved or even lost. Migrated data may have subtle corruption. Reports of catalog data somehow always manage to showcase all of these problems! Each problem stops an otherwise smooth library process right in its tracks until the problem is addressed.

Strategies

Just because our more efficient processes are not perfect does not mean we should stop using them; but our catalogs would be tidier and our processes smoother if we proactively pay off some of that technical debt with some targeted cleanup.

Recognizing problems

Even if patrons are not bringing problems to cataloging as they encounter them, you may run into them in the course of your regular work:

Did a search not retrieve a result you expected? Why not?
If you run a search where the display has facets, is anything under those "unknown", "undetermined", or "undefined"?
If you can see date ranges, is anything too far into the past? Or too far into the future?

You can make a habit of tracking down why a search didn't work, and clicking on those bad facets to fix those records and identify potential problems with other records in the system.

Checking extremes

Some searches may retrieve more problematic results than others. While there is a lot of variety in catalog records, they tend to be similar to each other in many ways, and tend to bunch up into groups. Outliers are not necessarily wrong but tend to have more issues. You can check:

What do your shortest records look like? Are they long enough to be useful?
What do your longest records look like? Is their display meaningful?
Which book comes first in your shelflist? Is that call number valid? How about the last book?
What is the oldest book in your catalog by publication date? Is it really that old?
What is the newest book in your catalog by publication date? Is that possible?
If you run a report and sort the rows (by any field), what bubbles up to the top? What sinks to the bottom?

Identifying patterns

One strategy for tracking down problems is generalizing a single error to a pattern. If you recognize CB83 .S as an incomplete call number, how do you know that? What makes it wrong, and how could you search for other call numbers that are wrong in the same way? Every issue described on this site started with a single book, and someone wondering: I wonder if there are any others like that?

Tools like analytics, sets, shelflists, and other reports can help you seek out and correct these problems in a systematic way. Setting up repeatable searches with tools like these and checking them regularly can help prevent new instances of those problems from entering the catalog, and even identify where they are coming from.

Identifying the source

Once you have a particular problem cleaned out of your catalog and a systematic way of identifying it, new instances of that problem are very easy to spot. If a problem has been introduced recently, it's much easier to determine:

Do records from particular sources need more attention than they are currently getting?
Is there software misconfigured somewhere? (Does somebody have their Connexion set up to export MARC-8 into a system that expects unicode?)
Are there hardware or resource issues? (Is somebody typing in barcodes because there are too few barcode scanners?)
Is additional training or documentation on a particular process needed for staff?

System configuration and other tools

Many library systems do data validation in the cataloging/metadata interface, displaying warning or error messages when there are problems in records. In many systems, this behavior is configurable, and can be made to display different warnings to reflect local standards and priorities. If the system is too sensitive and displays too many warnings, catalogers may learn to ignore them, missing problems that affect access and discovery.

Many types of errors may also be preventable with additional tools and training for cataloging staff.

Consistent Data -> Stronger Assumptions

When catalog data is more consistent, you can make stronger assumptions and be more confident of your results in searching, reporting, and other library processes. For example:

If all bibliographic records have OCLC numbers, you can run reports to estimate how widely your resources are held.
If all bibliographic records include a valid language fixed field code, you can be more confident that search limited to a particular language includes all of your resources in that language.
If all call numbers are well-formed, you can be more confident that a book will be correctly shelved and retrieved, and that resources will be correctly sorted by call numbers in reports.