Document and file retrieval metadata

28 08 2009

Far too much focus is made today on providing complex retrieval fields within ECM solutions, and far too much is made of them from customers. For sure, inherited values and properties can be of great use, but when you start to look at your actual requirements, far too often retrieval fields are simply made too complex.

Points to remember

When designing your retrieval fields, metadata or indexes (whatever you wish to call them), keep in mind just what a user will want / need to do to actually locate this file / document. Here is a quick list to help you:

  1. How much information will the user have on a file?
  2. How much time do you want to allow them to enter search information
  3. How can your metadata fields actually assist in this
  4. What sort of results will be brought back and how clear will these be to the user (clear as in how can they quickly see the file they want)

Many systems recently spend a lot of time on very accurately identifying files, however, by doing this they also make it very complex at the data capture stage (scanning and indexing) and also require the user to spend longer setting up their search.

Keep it simple

When designing / identifying metadata fields for files, always try to make and keep things as simple as possible.

First things first, identify the types of files you are storing. This doesn’t mean pdf, word, tiff etc. rather it relates to their type within your business. So some examples may include personnel files, expense claim forms, insurance claim form, phone bill, customer details etc. (dependent on your business).

Once you have made this identification, we get onto the point of retention. How long will a particular file type stay “live”, then move to an “archive” then be completely deleted. When doing this you may find that you logically have some separation of files appearing. NB only create a new classification of file type if it is needed. Don’t do it as some logical separation, rather classifications should only be created to separate either groups of metadata or address such issues as migration and retention periods.

The tricky part is to now identify the metadata fields associated with your types of files. I would always suggest you try to keep these as simple as possible and try not to use more than 7 fields to identify a file. This is where often designers get carried away using inherited fields from different objects within the repository. This is all well and good and can really help in displaying search results back to users (or a heirachyy of files back to a user). However what I try to do is the following:

  1. Imagine you don’t know if there are other files out there in the system (nothing to inherit from)
  2. Identify at least one key field (policy number, customer  number, telephone number etc)
  3. Provide a list of options to the type of file it is (Date of birth certificate, driving license, claim form, phone contract, interview, recorded conversation etc)
  4. Only provide other fields that help logically identify this file from other files of the same type, or they help identify, for example, a customer entity within your business
  5. Provide as many “drop down list” options as possible. This ensures data is accurate and not reliant on spelling or interpretation
  6. Identify any metadata that may be “shared” with other file types. For example a Policy Number may be found on multiple types of files within multiple classifications of files. In addition Policy Number is unique within the business so therefore it can be used to tie together a number of files to a particular policy holder.

If you stick to these 5 principles you will find that 9 times out of 10 you will not have any call for using complex inheritance or complex storage concepts. You more than likely have also identified your classifications in full. Please note that your file types along with classification will also 9 times out of 10 provide you with enough criteria to accurately assign security information to these files.

Once you have identified how information is to be retrieved, think about what information could be automatically captured at the data capture side of things. This sometimes illustrates fields that could be used to help identify files at retrieval; it also sometimes identifies fields that really aren’t appropriate.

Showing results

Often your retrieval system will display results of searches in a format which isn’t always that great to you or your business needs. This is why there are so many “professional services” offered to customers of such systems. As a customer, linking objects together, even showing them in a “tree view” type fashion can help the end user. However, this isn’t a call for inherited properties, rather a call to logically display business related information.

Also remember different types of searches can require different ways of displaying search results. This is sometimes overlooked by designers and system providers to the detriment of the user experience.

Finally, always think past the retrieval process. Once a user has found the file they want they will need to interact with it in some way, this could be to simply view its content or to pass on to another user etc.

Conclusion

I am a firm believer in keeping things as simple as possible and often adopt that IT term the “80 – 20” rule. Far too often IT tries to deliver too much, and in doing so it over complicates areas of the system or worryingly the business. When this happens more often than not a project can be seen as a failure, when really, by delivering less the customer gets more.

When putting together metadata for the retrieval of files remember to try and keep things as simple as possible. Identify key fields and not get carried away in capturing too much retrieval data. Also, always keep your end user in mind, so that’s the end user at the scanning and index stage and end users searching for files. Sticking to these simple rules will ensure you deliver a file retrieval system that works efficiently, quickly and well for your end users and your business…





True ECM Savings…#5

20 07 2009

This is my penultimate entry in this series of posts, and in this post I will be looking at Content Security in terms of not only access, but what happens to content in the case of flooding or fire (something that is often overlooked).

Flooding and Fires…

Not the nicest of titles, but it’s something every organisation must think about, “What happens to our content if the whole building goes up in smoke, or we are flooded out?” This is a question that is more often overlooked than you may think. I have visited many “large” organisations that really haven’t taken such disasters into consideration.

When storing files (especially paper) you will be amazed how often fires do crop up, simply do a search online and you will find examples of fires destroying organisations documents and content, including governmental records. A great example of disasters destroying content can be found looking at the after effects of hurricane Katrina. Warehouses full of content and documents relating to criminal prosecutions were lost, leading to hundreds of criminals being released, simply because the content couldn’t be retrieved electronically.

Now think of the actual cost to your organisation if you lost all of that content and documents. Not only may you be looking at issues regarding compliance, but no doubt massive costs will be incurred not to mention potential loss of business.

If all your content and documents are stored electronically, within a good ECM platform, these issues just aren’t there. Sure in some cases you may still want to store the physical paper, but this can be done off site at dedicated centres (outsourced). You still have access to all that content, even if the physical paper is destroyed. You can also distribute your backups of content easily; having backups at multiple sites ensuring that content is never lost.

And to think, I haven’t even mentioned theft of content…

Our content is secure without ECM?

Well no it isn’t. Paper is the most insecure form of storing content, think, if I can get physical access to the location of a file, anyone can read it, or worse, photo copy it and re-distribute as they choose. It really isn’t hard to open a file cabinet, pull out a file and start reading. Content security is more than just ensuring the office is locked at night, or having locks on the HR file cabinet.

It’s imperative that content is secured, in many cases for compliance, but in general, you cannot have employees looking at information they should have no access to. Think of the issues that may arise, loss of business to competitors, stolen ideas, staff suffering identity theft, I could go on.

Though many of us have watched Hollywood films with computer hackers gaining access to lots of sensitive information, and many of us have read about online hackers gaining access to our personal details, the reality is that electronic content is far more secure than paper.

With a good ECM solution, your content is secured in a number of ways, allowing you to grant different levels of access to content, dependent on individuals or their roles. You can also track just what files have been looked at and or any interactions with that content a particular user may have. No one can tamper with or replace a file, leaving you with false documentation. Such ECM solutions ensure content security is therefore controlled by the organisation itself, and not left open to any form of abuse.

Conclusion

ECM speeds up the way in which people work, by providing them with access to content when they need it. This remains true in the cases of natural disasters and theft. ECM also protects your organisations content and access to that content, reducing many content related risks.

This post can be seen as a post about highlighting the “potential” savings of ECM in times of crises. With this in mind, think of ECM as content insurance. You hope that none of these situations arise for your organisation, but if they do, you will be safe guarded and rewarded for your prudence, and save a lot of time and money.





Deploying ECM across the enterprise

23 06 2009

I was keeping the world up to date with my day on Twitter, when I read quite an interesting article based around an organisation looking to invest in ECM and deploy it in one lump across the whole of their enterprise.  The article was looking at the “main” players in ECM, so Oracle, IBM and EMC Documentum. It really highlighted the problems these companies had in pitching to the client, demonstrating their product and trying to show how this would work across the complete enterprise…

Having worked with all of these companies in some form in the past, I remembered just how great their platforms are, but also how heavily entrenched they are with marketing and hype. Putting together a demonstration was never a 5 minute job!

Deploying across the enterprise

This is a lovely idea, but in practice unbelievably hard to achieve (don’t listen to sales banter). I really don’t see how it can work well.

Let’s look at some of the basic challenges of deploying a single ECM solution across the complete enterprise in one go:

  1. Scale – If your enterprise sprawls across the UK, or even Europe and the world, think of the challenges you have of implementing the system, allowing access and dealing with distributed performance…None of these are show stoppers, but remember a lot of logistical work will be needed
  2. Training – Ok, how you going to train hundreds maybe thousands of users for a go live date?
  3. Individual requirements – different parts of your business will have different requirements / needs from their ECM platform and BPM
  4. Administration – again logistical challenges
  5. Support – you need to have in place vast support services

None of these points will stop a project; however each one requires a lot of thought, a lot of processes to be put into place and more importantly a lot of people with drive to ensure everything runs smoothly. However, point 3 (individual requirements) is potentially a show stopper, and it is this point that the large players in the ECM platform try to address with management, configuration, integration and mapping tools. These all demonstration well (when the sales agent gets it right) but actually require a lot of “professional services” to get them working to meet your actual requirement.

Delivering for everyone

It looks great in a demonstration, the presenter simply clicks on a wizard, answers some questions, fills in some datafields and hey presto, your system is integrated and reading in data from a third party. Wow. Likewise, the presenter clicks on a nice processing map, drags some icons onto the screen, joins them up, again ads some datafields and hey presto, you have a workflow….

Now this does look great in a demonstration. And in simple cases, this will work for you. However across the complete enterprise? Will it be flexible enough to meet everyone’s requirements? Are the simplish points of integration shown so well in a demonstration going to work like that for your organisation…….I am guessing a strong NO here is the answer.

These sorts of tools are great for demonstrations, even great for very simple integrations and maps, however the price you pay for such tools far outweighs their actual benefit to your organisation, unless of course you leverage some “professional services” to ensure the system meets your businesses complexities.

The investment…

So to achieve a massive roll out of ECM across your enterprise, you are looking at a massive investment in both time and financially, and then no doubt you will need to address individual units of requirements….All of these factors make it harder for your solution to succeed and deliver that promised ROI.

How would I go about things?

Well first things first. My ECM platform / purchase would not include fancy integrator and mapping modules. For me these add vast costs to the initial purchase and licensing, on top of which you have to purchase additional “professional services”. In my experience, it would work out cheaper to just pay for “professional services” to develop the integration (for example) with your other systems from scratch. Essentially this is more often than not what happens under the term “Professional configuration services” or something similar. However, you have also made a purchase for that integration module license in any case….

This is one of the reasons why I stopped working with the big ECM players, and decided to invest time and resources into our own ECM platform workFile (www.workFileECM.com) We have not wasted time nor money investing in complex integration tools and modules, that look great in demonstrations, but fail to deliver real business benefit. Rather, we develop the integration you require specifically for an organisation, meeting 100% their requirements, based around our open XML Web Services API (something you should insist your ECM provider provides). The same applies to our business process maps, workFile utilises the development platform of Visual Studio to design process maps. Why? Well a developer has so much freedom here, so much so, they can code complex business rules, algorithms, calculations, integrate with numerous other systems and make the workflow work seamlessly for the end user, basically ensure the process map is the power behind the actual solution.

Secondly, look to deploy your ECM unit by unit, or department by department. Each department will have a different requirement, and each department requirements need to be looked at seriously. An enterprise wide ECM solution will only work if each department takes it on board and uses it correctly. This is only going to happen if the system meets their requirements and is championed by the staff.

By implementing ECM unit by unit, you ensure that requirements are not lost, you ease the load of training, administration and support, while easing new processes onto the organisation a step at a time.

 

Conclusion…

By working in a unit by unit basis you not only identify all the requirements needed across the enterprise, but you also ease your implementation headache and keep costs down (often removing the need for fancy enterprise integration modules). For sure, your ECM provider will try to make you go for a “big bang” implementation across the complete enterprise, and no doubt, show you some wonderful tools that make it all seem so easy. But there is nothing stopping you striking that enterprise wide deal, then addressing the implementation on a unit by unit basis, re-negotiating cost if needs be as you go.

Remember fancy demonstration tools may look great and promise the earth, but almost always won’t meet 100% of your requirements. So you need to know what you are looking at for “professional services”, again on a unit by unit basis…





Successful document capture…

14 05 2009

Well this is something close to my heart. My first ever project after leaving university was to help write a document capture application that was built on-top of the FileNET Panagon Capture platform. Ahh happy days…Though I did seem to earn the name “scan man” from then on, which wasn’t so great, as every document capture project our company then had, I had to be involved with….

Ok so how do you implement a successful document scanning / capture solution. Well it’s very simple, follow these 5 guidelines and you are well on the way.

  1. Throughput is everything. Make sure people can load the scanner and let it do its thing. You don’t want to be stopping to separate documents or batches. Make sure your software can do this and purchase a scanner with a big document holder.
  2. Ensure you maximise the quality of the images you are capturing. If this could be a problem, then make sure you get in place good quality control and re-scan technology
  3. Identify as much information as possible up-front with your software. The more a user has to do, the slower and more expensive the process will become
  4. Ensure your data captured or assigned to a document is accurate. Remember your retrieval of these images depends on the accuracy of your data capture
  5.  Your document capture is pointless, unless you release the images into your storage repository with all the correct information. Again make sure this is done seamlessly and accurately. The longer the files are in your capture process, the longer it will take for them to turn up in a customer file for example…

 

So where to start?

Well this is with your document capture software, and there are lots of solutions out there. Firstly, when choosing your capture software, have those 5 guidelines in your mind. You want to automate as much as possible (unless we are talking only the odd scanned document through the day). In addition, you don’t just want to watch a sales pitch on the actual scanning process, or the physical scanner being used. You want, and need, to see the process all the way through, and with a variety of documents.

It’s best if you can use forms wherever possible, but you will always have un-structured documents coming to you, such as letters. Now you MUST see a demonstration of how these are dealt with, then ask yourself;

“is that efficient?”

“how could that be speeded up?”

“am I happy with the way data is entered / captured?”

“now let’s find the document in the retrieval system”

I don’t want to start recommending software, as depending on your storage repository etc you may find you have a limited selection. What I will say, is that for our workFile ECM repository we use software that I have been familiar with and more than happy with for sometime, Kofax. I have worked on numerous projects with Kofax Accent Capture and with Nuerascript recognition modules (which are now part of Kofax). Kofax provides you with all the technology and features you could want to streamline any capture environment. And, more importantly, they allow you to write your own release processes into the repositories of your choice.

What about architecture

Scanning can be quite intensive for your PC. A while back, all of your “steps” if you like were carried out on a single machine, so you scanned, had the batches and documents recognised, processed, enhanced then sent on for an agent to index. However, this isn’t great, ideally you want to split out this intense processing work and let your scan station simply scan images.

Server based solutions are best, freeing up staff to scan and pull documents as and when they are ready. Your images should always be ready quicker than your staff can quality assess them or carry out indexing tasks. Oh, don’t be fooled by “thin” document capture, something has to drive the scanner and therefore it’s not “thin client”…

What about staff?

This can be a boring task, so rotate your staff to different jobs, every couple of hours. They may still get bored, but if you don’t do this, they will be making lots of errors and getting really bored. Trust me, just spend a couple of hours doing one task such as scanning and your brain can go numb…

You will also need a “champion” of the capture process. Someone who can keep people motivated and ensure they maximise the potential of the system. All too often the system capacity is not met as staff becoming lazy or complacent. This negates your investment and diminishes your return on your investment, so a champion is very important.

It’s also worth noting that from time to time, you will need someone with more experience of the scanning process, again that champion, simply because you will get issues with stuck paper, batches not getting recognised, image quality problems etc. At this point, you need someone with a little more knowledge of how things work.

 

Finally

Remember no matter how good your capture process is, your retrieval system is only as good as the quality of the images and the data associated to those images. Also, please don’t invest heavily in a great capture system then scrimp on your retrieval system. If you do this, you will find no benefit of the capture process and document imaging at all. Your first port of call is still ensuring you purchase the right retrieval / document management system. Then address the capture side of things.