Successful document capture…

14 05 2009

Well this is something close to my heart. My first ever project after leaving university was to help write a document capture application that was built on-top of the FileNET Panagon Capture platform. Ahh happy days…Though I did seem to earn the name “scan man” from then on, which wasn’t so great, as every document capture project our company then had, I had to be involved with….

Ok so how do you implement a successful document scanning / capture solution. Well it’s very simple, follow these 5 guidelines and you are well on the way.

  1. Throughput is everything. Make sure people can load the scanner and let it do its thing. You don’t want to be stopping to separate documents or batches. Make sure your software can do this and purchase a scanner with a big document holder.
  2. Ensure you maximise the quality of the images you are capturing. If this could be a problem, then make sure you get in place good quality control and re-scan technology
  3. Identify as much information as possible up-front with your software. The more a user has to do, the slower and more expensive the process will become
  4. Ensure your data captured or assigned to a document is accurate. Remember your retrieval of these images depends on the accuracy of your data capture
  5.  Your document capture is pointless, unless you release the images into your storage repository with all the correct information. Again make sure this is done seamlessly and accurately. The longer the files are in your capture process, the longer it will take for them to turn up in a customer file for example…


So where to start?

Well this is with your document capture software, and there are lots of solutions out there. Firstly, when choosing your capture software, have those 5 guidelines in your mind. You want to automate as much as possible (unless we are talking only the odd scanned document through the day). In addition, you don’t just want to watch a sales pitch on the actual scanning process, or the physical scanner being used. You want, and need, to see the process all the way through, and with a variety of documents.

It’s best if you can use forms wherever possible, but you will always have un-structured documents coming to you, such as letters. Now you MUST see a demonstration of how these are dealt with, then ask yourself;

“is that efficient?”

“how could that be speeded up?”

“am I happy with the way data is entered / captured?”

“now let’s find the document in the retrieval system”

I don’t want to start recommending software, as depending on your storage repository etc you may find you have a limited selection. What I will say, is that for our workFile ECM repository we use software that I have been familiar with and more than happy with for sometime, Kofax. I have worked on numerous projects with Kofax Accent Capture and with Nuerascript recognition modules (which are now part of Kofax). Kofax provides you with all the technology and features you could want to streamline any capture environment. And, more importantly, they allow you to write your own release processes into the repositories of your choice.

What about architecture

Scanning can be quite intensive for your PC. A while back, all of your “steps” if you like were carried out on a single machine, so you scanned, had the batches and documents recognised, processed, enhanced then sent on for an agent to index. However, this isn’t great, ideally you want to split out this intense processing work and let your scan station simply scan images.

Server based solutions are best, freeing up staff to scan and pull documents as and when they are ready. Your images should always be ready quicker than your staff can quality assess them or carry out indexing tasks. Oh, don’t be fooled by “thin” document capture, something has to drive the scanner and therefore it’s not “thin client”…

What about staff?

This can be a boring task, so rotate your staff to different jobs, every couple of hours. They may still get bored, but if you don’t do this, they will be making lots of errors and getting really bored. Trust me, just spend a couple of hours doing one task such as scanning and your brain can go numb…

You will also need a “champion” of the capture process. Someone who can keep people motivated and ensure they maximise the potential of the system. All too often the system capacity is not met as staff becoming lazy or complacent. This negates your investment and diminishes your return on your investment, so a champion is very important.

It’s also worth noting that from time to time, you will need someone with more experience of the scanning process, again that champion, simply because you will get issues with stuck paper, batches not getting recognised, image quality problems etc. At this point, you need someone with a little more knowledge of how things work.



Remember no matter how good your capture process is, your retrieval system is only as good as the quality of the images and the data associated to those images. Also, please don’t invest heavily in a great capture system then scrimp on your retrieval system. If you do this, you will find no benefit of the capture process and document imaging at all. Your first port of call is still ensuring you purchase the right retrieval / document management system. Then address the capture side of things.



11 responses

21 05 2009
Andrew Smith @onedegree

worth reading about true ECM savings…this will be a series

27 05 2009
Andrew Smith @onedegree

Additional point, remember you can use a “cloud computing” solution and scan successfully to that. Though make sure your cloud solution is robust enough for the job, and that you can release securely into that solution.

Security here is the big big issue!

17 08 2009
The DocuMentor mobile edition

[…] some info on scanning documents and finding the right document capture programs. I came across a nifty blog post by scanning guru Andrew Smith about successful document capture. He offers five bits of wisdom on capturing your documents the […]

24 08 2009
Logo mats

Any advice on what type scanner works best. I asume it need automatic feeding of documents.

25 08 2009
Andrew Smith @onedegree

It really depends on your actual requirements. If you dont have large volumes to scan then you can look at lower end scanners. Some implementations really only need an automatic feeded and will only scan a couple of pages in a minute.

More often than not mid-range scanners offer the best versitility and the best option to businesses. They provide value for money, good throughput and good image quality. I have used a number of these types of scanners, mainly the Kodak 3500 becuase it delivers everything you want and is really reliable. At the high end of the scale Kodak 7500 scanners are also a good option, however you really do need to be scanning a hell of a lot, around 7500 pages per day to warrent such an investment

13 09 2009
What is HIPAA

Successful document capture should also consider regulations such as Sarbanes-Oxley and HIPAA compliance.

14 09 2009
Andrew Smith @onedegree

Something to remember, if you carry out great practices for your Document Capture and retrieval you will have covered such compliances as Sarbanes-Oxley and HIPAA.

Its a good point though, and I am sure something that many organisation fail to follow. Though these regulations do not apply to all countries…

11 12 2009
Document Capture

Great bullets…I scour the web looking for good tidbits. I will link to this post. Thanks.

11 12 2009
Andrew Smith @onedegree

Thanks…I do try…

11 12 2009
9 05 2012

Interesting article on document capture.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: