Document and file retrieval metadata

28 08 2009

Far too much focus is made today on providing complex retrieval fields within ECM solutions, and far too much is made of them from customers. For sure, inherited values and properties can be of great use, but when you start to look at your actual requirements, far too often retrieval fields are simply made too complex.

Points to remember

When designing your retrieval fields, metadata or indexes (whatever you wish to call them), keep in mind just what a user will want / need to do to actually locate this file / document. Here is a quick list to help you:

  1. How much information will the user have on a file?
  2. How much time do you want to allow them to enter search information
  3. How can your metadata fields actually assist in this
  4. What sort of results will be brought back and how clear will these be to the user (clear as in how can they quickly see the file they want)

Many systems recently spend a lot of time on very accurately identifying files, however, by doing this they also make it very complex at the data capture stage (scanning and indexing) and also require the user to spend longer setting up their search.

Keep it simple

When designing / identifying metadata fields for files, always try to make and keep things as simple as possible.

First things first, identify the types of files you are storing. This doesn’t mean pdf, word, tiff etc. rather it relates to their type within your business. So some examples may include personnel files, expense claim forms, insurance claim form, phone bill, customer details etc. (dependent on your business).

Once you have made this identification, we get onto the point of retention. How long will a particular file type stay “live”, then move to an “archive” then be completely deleted. When doing this you may find that you logically have some separation of files appearing. NB only create a new classification of file type if it is needed. Don’t do it as some logical separation, rather classifications should only be created to separate either groups of metadata or address such issues as migration and retention periods.

The tricky part is to now identify the metadata fields associated with your types of files. I would always suggest you try to keep these as simple as possible and try not to use more than 7 fields to identify a file. This is where often designers get carried away using inherited fields from different objects within the repository. This is all well and good and can really help in displaying search results back to users (or a heirachyy of files back to a user). However what I try to do is the following:

  1. Imagine you don’t know if there are other files out there in the system (nothing to inherit from)
  2. Identify at least one key field (policy number, customer  number, telephone number etc)
  3. Provide a list of options to the type of file it is (Date of birth certificate, driving license, claim form, phone contract, interview, recorded conversation etc)
  4. Only provide other fields that help logically identify this file from other files of the same type, or they help identify, for example, a customer entity within your business
  5. Provide as many “drop down list” options as possible. This ensures data is accurate and not reliant on spelling or interpretation
  6. Identify any metadata that may be “shared” with other file types. For example a Policy Number may be found on multiple types of files within multiple classifications of files. In addition Policy Number is unique within the business so therefore it can be used to tie together a number of files to a particular policy holder.

If you stick to these 5 principles you will find that 9 times out of 10 you will not have any call for using complex inheritance or complex storage concepts. You more than likely have also identified your classifications in full. Please note that your file types along with classification will also 9 times out of 10 provide you with enough criteria to accurately assign security information to these files.

Once you have identified how information is to be retrieved, think about what information could be automatically captured at the data capture side of things. This sometimes illustrates fields that could be used to help identify files at retrieval; it also sometimes identifies fields that really aren’t appropriate.

Showing results

Often your retrieval system will display results of searches in a format which isn’t always that great to you or your business needs. This is why there are so many “professional services” offered to customers of such systems. As a customer, linking objects together, even showing them in a “tree view” type fashion can help the end user. However, this isn’t a call for inherited properties, rather a call to logically display business related information.

Also remember different types of searches can require different ways of displaying search results. This is sometimes overlooked by designers and system providers to the detriment of the user experience.

Finally, always think past the retrieval process. Once a user has found the file they want they will need to interact with it in some way, this could be to simply view its content or to pass on to another user etc.

Conclusion

I am a firm believer in keeping things as simple as possible and often adopt that IT term the “80 – 20” rule. Far too often IT tries to deliver too much, and in doing so it over complicates areas of the system or worryingly the business. When this happens more often than not a project can be seen as a failure, when really, by delivering less the customer gets more.

When putting together metadata for the retrieval of files remember to try and keep things as simple as possible. Identify key fields and not get carried away in capturing too much retrieval data. Also, always keep your end user in mind, so that’s the end user at the scanning and index stage and end users searching for files. Sticking to these simple rules will ensure you deliver a file retrieval system that works efficiently, quickly and well for your end users and your business…