Centralise Document Capture

11 12 2009

For quite some time I have been a strong advocate for larger organisations taking control, and responsibility, for their own scanning processes. I have nothing against outsourced scanning organisations, it’s just that organisations are entrusting what could be their most sensitive data to a third party, and not only that, they are relying on them to deliver it back to you as good accurate images and more often than not along with key associated data.

 I now hear cries of “what’s wrong with that?” Well a number of things actually…

  1. Just who are the people carrying out the scanning? Who has access to these files
  2. What skills do they have in identifying key parts of a document?
  3. Compliance issues / complications
  4. Quality control
  5. Speed

Let’s look at these one at a time.

So who is actually doing the scanning and indexing tasks? Well in-house you have control over this, basically you choose who to employ. However, when outsourced you have no idea who has access to these files, sometimes you don’t even know what information could be found in these files (if sent directly to an outsourced document capture organisation), let alone then what sensitive information is being read by who.

Let’s be honest, being a document scanner is not the most thrilling of jobs, so outsourcing companies will often employ “lower skilled staff” (please don’t take that the wrong way) and staff working on a project per project of very temporary basis.  This brings me on to point 2…

What skills do your outsourcing company staff deliver? Have they any experience of scanning or indexing and if so, do they understand your business and what content to expect / look for in scanning documents?

Compliance is a big thing here and even I sometimes get a little lost with it in regards to outsourcing. For many markets, compliance means you have to know where all your data and content is stored at any point. Now if you are using an outsourcing company, does this mean you need to know what machines that content is being stored on? Where those machines are? With regards to cloud computing this is a big problem as organisations simply don’t know exactly what server is holding what information of theirs…so does the same apply when outsourcing your document capture. Worth taking some time to think about that one….

Quality control is a big bear of mine. In IT circles remember “shi* in, equals shi* out” and that’s so true with document capture. If your image quality is poor, or the accuracy of its accompanying data, then when trying to locate that content, you will find it rather hard, and your great document retrieval / ECM system will be almost pointless…

Ahhh, speed. This is often, along with cost, the big factor for organisations choosing to outsource document capture, but is it any quicker? In my experience the answer is no. I have worked on numerous projects which have used outsourcing companies for their document capture, only to find it has taken an unexpectedly long time to get the images into the retrieval system (based on the data received / postal date of content for example).

So get centralised

It’s cost effective for larger organisations to get their own centralised scanning environment. Not only will the business process of capturing this content be smoother, but also the quality of your images and accompanying data will be better. With greater investment in scanning software and the automation of data capture (OCR / ICR, Forms recognition, Auto-indexing etc) organisations will find it easier than ever before to reap the rewards and enjoy a quick ROI.

There is already currently a trend back towards centralised scanning. A recent AIIM industry watch article highlights this. Have a read here; http://www.aiim.org/research/document-scanning-and-capture.aspx, then ensure you take ownership of your own document capture requirements…

For a good place to start when thinking about document capture and scannign solutions, read one of my earlier posts on Document Capture success….

https://andrewonedegree.wordpress.com/2009/05/14/successful-document-capture/





Document and file retrieval metadata

28 08 2009

Far too much focus is made today on providing complex retrieval fields within ECM solutions, and far too much is made of them from customers. For sure, inherited values and properties can be of great use, but when you start to look at your actual requirements, far too often retrieval fields are simply made too complex.

Points to remember

When designing your retrieval fields, metadata or indexes (whatever you wish to call them), keep in mind just what a user will want / need to do to actually locate this file / document. Here is a quick list to help you:

  1. How much information will the user have on a file?
  2. How much time do you want to allow them to enter search information
  3. How can your metadata fields actually assist in this
  4. What sort of results will be brought back and how clear will these be to the user (clear as in how can they quickly see the file they want)

Many systems recently spend a lot of time on very accurately identifying files, however, by doing this they also make it very complex at the data capture stage (scanning and indexing) and also require the user to spend longer setting up their search.

Keep it simple

When designing / identifying metadata fields for files, always try to make and keep things as simple as possible.

First things first, identify the types of files you are storing. This doesn’t mean pdf, word, tiff etc. rather it relates to their type within your business. So some examples may include personnel files, expense claim forms, insurance claim form, phone bill, customer details etc. (dependent on your business).

Once you have made this identification, we get onto the point of retention. How long will a particular file type stay “live”, then move to an “archive” then be completely deleted. When doing this you may find that you logically have some separation of files appearing. NB only create a new classification of file type if it is needed. Don’t do it as some logical separation, rather classifications should only be created to separate either groups of metadata or address such issues as migration and retention periods.

The tricky part is to now identify the metadata fields associated with your types of files. I would always suggest you try to keep these as simple as possible and try not to use more than 7 fields to identify a file. This is where often designers get carried away using inherited fields from different objects within the repository. This is all well and good and can really help in displaying search results back to users (or a heirachyy of files back to a user). However what I try to do is the following:

  1. Imagine you don’t know if there are other files out there in the system (nothing to inherit from)
  2. Identify at least one key field (policy number, customer  number, telephone number etc)
  3. Provide a list of options to the type of file it is (Date of birth certificate, driving license, claim form, phone contract, interview, recorded conversation etc)
  4. Only provide other fields that help logically identify this file from other files of the same type, or they help identify, for example, a customer entity within your business
  5. Provide as many “drop down list” options as possible. This ensures data is accurate and not reliant on spelling or interpretation
  6. Identify any metadata that may be “shared” with other file types. For example a Policy Number may be found on multiple types of files within multiple classifications of files. In addition Policy Number is unique within the business so therefore it can be used to tie together a number of files to a particular policy holder.

If you stick to these 5 principles you will find that 9 times out of 10 you will not have any call for using complex inheritance or complex storage concepts. You more than likely have also identified your classifications in full. Please note that your file types along with classification will also 9 times out of 10 provide you with enough criteria to accurately assign security information to these files.

Once you have identified how information is to be retrieved, think about what information could be automatically captured at the data capture side of things. This sometimes illustrates fields that could be used to help identify files at retrieval; it also sometimes identifies fields that really aren’t appropriate.

Showing results

Often your retrieval system will display results of searches in a format which isn’t always that great to you or your business needs. This is why there are so many “professional services” offered to customers of such systems. As a customer, linking objects together, even showing them in a “tree view” type fashion can help the end user. However, this isn’t a call for inherited properties, rather a call to logically display business related information.

Also remember different types of searches can require different ways of displaying search results. This is sometimes overlooked by designers and system providers to the detriment of the user experience.

Finally, always think past the retrieval process. Once a user has found the file they want they will need to interact with it in some way, this could be to simply view its content or to pass on to another user etc.

Conclusion

I am a firm believer in keeping things as simple as possible and often adopt that IT term the “80 – 20” rule. Far too often IT tries to deliver too much, and in doing so it over complicates areas of the system or worryingly the business. When this happens more often than not a project can be seen as a failure, when really, by delivering less the customer gets more.

When putting together metadata for the retrieval of files remember to try and keep things as simple as possible. Identify key fields and not get carried away in capturing too much retrieval data. Also, always keep your end user in mind, so that’s the end user at the scanning and index stage and end users searching for files. Sticking to these simple rules will ensure you deliver a file retrieval system that works efficiently, quickly and well for your end users and your business…





Deploying ECM across the enterprise

23 06 2009

I was keeping the world up to date with my day on Twitter, when I read quite an interesting article based around an organisation looking to invest in ECM and deploy it in one lump across the whole of their enterprise.  The article was looking at the “main” players in ECM, so Oracle, IBM and EMC Documentum. It really highlighted the problems these companies had in pitching to the client, demonstrating their product and trying to show how this would work across the complete enterprise…

Having worked with all of these companies in some form in the past, I remembered just how great their platforms are, but also how heavily entrenched they are with marketing and hype. Putting together a demonstration was never a 5 minute job!

Deploying across the enterprise

This is a lovely idea, but in practice unbelievably hard to achieve (don’t listen to sales banter). I really don’t see how it can work well.

Let’s look at some of the basic challenges of deploying a single ECM solution across the complete enterprise in one go:

  1. Scale – If your enterprise sprawls across the UK, or even Europe and the world, think of the challenges you have of implementing the system, allowing access and dealing with distributed performance…None of these are show stoppers, but remember a lot of logistical work will be needed
  2. Training – Ok, how you going to train hundreds maybe thousands of users for a go live date?
  3. Individual requirements – different parts of your business will have different requirements / needs from their ECM platform and BPM
  4. Administration – again logistical challenges
  5. Support – you need to have in place vast support services

None of these points will stop a project; however each one requires a lot of thought, a lot of processes to be put into place and more importantly a lot of people with drive to ensure everything runs smoothly. However, point 3 (individual requirements) is potentially a show stopper, and it is this point that the large players in the ECM platform try to address with management, configuration, integration and mapping tools. These all demonstration well (when the sales agent gets it right) but actually require a lot of “professional services” to get them working to meet your actual requirement.

Delivering for everyone

It looks great in a demonstration, the presenter simply clicks on a wizard, answers some questions, fills in some datafields and hey presto, your system is integrated and reading in data from a third party. Wow. Likewise, the presenter clicks on a nice processing map, drags some icons onto the screen, joins them up, again ads some datafields and hey presto, you have a workflow….

Now this does look great in a demonstration. And in simple cases, this will work for you. However across the complete enterprise? Will it be flexible enough to meet everyone’s requirements? Are the simplish points of integration shown so well in a demonstration going to work like that for your organisation…….I am guessing a strong NO here is the answer.

These sorts of tools are great for demonstrations, even great for very simple integrations and maps, however the price you pay for such tools far outweighs their actual benefit to your organisation, unless of course you leverage some “professional services” to ensure the system meets your businesses complexities.

The investment…

So to achieve a massive roll out of ECM across your enterprise, you are looking at a massive investment in both time and financially, and then no doubt you will need to address individual units of requirements….All of these factors make it harder for your solution to succeed and deliver that promised ROI.

How would I go about things?

Well first things first. My ECM platform / purchase would not include fancy integrator and mapping modules. For me these add vast costs to the initial purchase and licensing, on top of which you have to purchase additional “professional services”. In my experience, it would work out cheaper to just pay for “professional services” to develop the integration (for example) with your other systems from scratch. Essentially this is more often than not what happens under the term “Professional configuration services” or something similar. However, you have also made a purchase for that integration module license in any case….

This is one of the reasons why I stopped working with the big ECM players, and decided to invest time and resources into our own ECM platform workFile (www.workFileECM.com) We have not wasted time nor money investing in complex integration tools and modules, that look great in demonstrations, but fail to deliver real business benefit. Rather, we develop the integration you require specifically for an organisation, meeting 100% their requirements, based around our open XML Web Services API (something you should insist your ECM provider provides). The same applies to our business process maps, workFile utilises the development platform of Visual Studio to design process maps. Why? Well a developer has so much freedom here, so much so, they can code complex business rules, algorithms, calculations, integrate with numerous other systems and make the workflow work seamlessly for the end user, basically ensure the process map is the power behind the actual solution.

Secondly, look to deploy your ECM unit by unit, or department by department. Each department will have a different requirement, and each department requirements need to be looked at seriously. An enterprise wide ECM solution will only work if each department takes it on board and uses it correctly. This is only going to happen if the system meets their requirements and is championed by the staff.

By implementing ECM unit by unit, you ensure that requirements are not lost, you ease the load of training, administration and support, while easing new processes onto the organisation a step at a time.

 

Conclusion…

By working in a unit by unit basis you not only identify all the requirements needed across the enterprise, but you also ease your implementation headache and keep costs down (often removing the need for fancy enterprise integration modules). For sure, your ECM provider will try to make you go for a “big bang” implementation across the complete enterprise, and no doubt, show you some wonderful tools that make it all seem so easy. But there is nothing stopping you striking that enterprise wide deal, then addressing the implementation on a unit by unit basis, re-negotiating cost if needs be as you go.

Remember fancy demonstration tools may look great and promise the earth, but almost always won’t meet 100% of your requirements. So you need to know what you are looking at for “professional services”, again on a unit by unit basis…





Successful document capture…

14 05 2009

Well this is something close to my heart. My first ever project after leaving university was to help write a document capture application that was built on-top of the FileNET Panagon Capture platform. Ahh happy days…Though I did seem to earn the name “scan man” from then on, which wasn’t so great, as every document capture project our company then had, I had to be involved with….

Ok so how do you implement a successful document scanning / capture solution. Well it’s very simple, follow these 5 guidelines and you are well on the way.

  1. Throughput is everything. Make sure people can load the scanner and let it do its thing. You don’t want to be stopping to separate documents or batches. Make sure your software can do this and purchase a scanner with a big document holder.
  2. Ensure you maximise the quality of the images you are capturing. If this could be a problem, then make sure you get in place good quality control and re-scan technology
  3. Identify as much information as possible up-front with your software. The more a user has to do, the slower and more expensive the process will become
  4. Ensure your data captured or assigned to a document is accurate. Remember your retrieval of these images depends on the accuracy of your data capture
  5.  Your document capture is pointless, unless you release the images into your storage repository with all the correct information. Again make sure this is done seamlessly and accurately. The longer the files are in your capture process, the longer it will take for them to turn up in a customer file for example…

 

So where to start?

Well this is with your document capture software, and there are lots of solutions out there. Firstly, when choosing your capture software, have those 5 guidelines in your mind. You want to automate as much as possible (unless we are talking only the odd scanned document through the day). In addition, you don’t just want to watch a sales pitch on the actual scanning process, or the physical scanner being used. You want, and need, to see the process all the way through, and with a variety of documents.

It’s best if you can use forms wherever possible, but you will always have un-structured documents coming to you, such as letters. Now you MUST see a demonstration of how these are dealt with, then ask yourself;

“is that efficient?”

“how could that be speeded up?”

“am I happy with the way data is entered / captured?”

“now let’s find the document in the retrieval system”

I don’t want to start recommending software, as depending on your storage repository etc you may find you have a limited selection. What I will say, is that for our workFile ECM repository we use software that I have been familiar with and more than happy with for sometime, Kofax. I have worked on numerous projects with Kofax Accent Capture and with Nuerascript recognition modules (which are now part of Kofax). Kofax provides you with all the technology and features you could want to streamline any capture environment. And, more importantly, they allow you to write your own release processes into the repositories of your choice.

What about architecture

Scanning can be quite intensive for your PC. A while back, all of your “steps” if you like were carried out on a single machine, so you scanned, had the batches and documents recognised, processed, enhanced then sent on for an agent to index. However, this isn’t great, ideally you want to split out this intense processing work and let your scan station simply scan images.

Server based solutions are best, freeing up staff to scan and pull documents as and when they are ready. Your images should always be ready quicker than your staff can quality assess them or carry out indexing tasks. Oh, don’t be fooled by “thin” document capture, something has to drive the scanner and therefore it’s not “thin client”…

What about staff?

This can be a boring task, so rotate your staff to different jobs, every couple of hours. They may still get bored, but if you don’t do this, they will be making lots of errors and getting really bored. Trust me, just spend a couple of hours doing one task such as scanning and your brain can go numb…

You will also need a “champion” of the capture process. Someone who can keep people motivated and ensure they maximise the potential of the system. All too often the system capacity is not met as staff becoming lazy or complacent. This negates your investment and diminishes your return on your investment, so a champion is very important.

It’s also worth noting that from time to time, you will need someone with more experience of the scanning process, again that champion, simply because you will get issues with stuck paper, batches not getting recognised, image quality problems etc. At this point, you need someone with a little more knowledge of how things work.

 

Finally

Remember no matter how good your capture process is, your retrieval system is only as good as the quality of the images and the data associated to those images. Also, please don’t invest heavily in a great capture system then scrimp on your retrieval system. If you do this, you will find no benefit of the capture process and document imaging at all. Your first port of call is still ensuring you purchase the right retrieval / document management system. Then address the capture side of things.