Document Scanning | Andrew Smith CTO

NHS needs to get efficient…It needs ECM and BPM

14 06 2010

Let’s face it the NHS is a great example of diseconomies of scale, and a great example of the lack of administration efficiency is shown with the amount of paper that is getting generated and pushed around. In the past 2 years, the boards of NHS trusts, created at least 22 million paper documents over the past two years. If that figure itself isn’t a little worrying, then just think, we are only talking about documents generated for communications to senior managers and to each other! The South West Essex Trust alone generated 333,000 documents, that’s just mad…

The department of health spent close to half a billion pounds in fees to external consultants in the year 2009-2010, so why has no one in the NHS really adopted ECM on a large scale? Just looking at these paper figures alone, it is very clear that each NHS trust should be using some form of ECM solution.

So just what could ECM do to help make savings in the NHS and raise efficiency? Well for starters, it can remove the majority of the paper costs, increase the efficiency of sharing knowledge, rationalise communications through knowledge and content sharing and increase collaboration.

I don’t want this post to turn into a long list of all the benefits of ECM, I have written many other posts on these and there are so many out there, rather it was just to highlight the fact that the NHS should be embracing Enterprise 2.0 concepts, ECM and BPM.

I will leave you with this thought, the health watchdog, the King’s fund, reports that while the number of staff rose 35% from 1999 to 2009 (to 1,117,000), the number of managers rose by 85%! Now please someone find me any example in the public sector where this kind of in-efficiency and top heavy organisation is a success….I bet you can’t, because any organisation in the private sector that was run in this fashion would be long bust…The NHS needs to get efficient just like any private sector organisation, it needs BPM, ECM and hell of a lot of dead wood removing…

Comments : 3 Comments »
Tags: BPM, ECM, Enterprise 2.0, NHS
Categories : BPM, Business Arguments, Case Management, Collaboration, Communications, document management, Document Scanning, ECM, economics, Enterprise 2.0, NHS

FileNET Panagon Capture…How to…

25 02 2010

Ahhh now the inspiration behind today’s post is that I have noticed people finding my blog looking for the good old FileNET Panagon Capture objects – such as a RepServer, RepObject and how to unlock these components….

Now it has been a little while since I was programming in Panagon Capture, but this is the environment I first cut my teeth on when leaving uni. (Panagon Capture, is a document capture environment for the FileNet Image Services, Doc Management repositories). Panagon Capture has seen me working all over the UK, Ireland and places of Europe implementing capture solutions for FileNET implementations. From leaving uni, it was getting dropped in the deep end, but I have to say I enjoyed it – and it was how I made a name for myself at my first place of work…

Things to remember with the Capture object model

Ok well first things first, the Capture object model got slated in its early days, it was too confusing to pick up and many people struggled with it. However, I actually think it is quite elegant in places (sorry). So why did it get slated, well primarily because no matter what you are working with, you always have the same object – RepObject. So if I am working with a particular scanned page / image, I have a RepObject. If I am working with a document, it’s a RepObject, if a separator a RepObject, a batch, a RepObject …. So you can see it can get confusing…

In addition, it is also worth remembering that many of the features of Capture are ActiveX COM components (OCX controls). These are used to wrap up a bunch of functionality – typically the actual Scan process, Capture Path configuration, Document Processing options etc.

Capture out of the box

Now the Capture environment out of the box is ok, not great, ok. It can get confusing when trying to use it in a real production environment – I will explain why in a moment. Key things to remember here is to ensure Batches are the only objects you can see floating around from the root of the Capture environment. If you have images, or documents, then you are asking for trouble. In addition, separate all your capture paths into another folder (if you choose to use these – I recommend you don’t to be honest – well not in the way Capture encourages you too).

Always remember, that Capture out of the box is a good tool to monitor what is going on with your software if you are using the API to create your own FileNET capture applications. It does help, if only for logic checks.

The object model

In my early days working with Capture – it was hard to logically separate out functionality and implementations of classes etc. It was even harder to then put this in a way other developers could pick up quickly and easily. Because of this I decided to “wrap” up the Capture object model so that it logically made more sense to others in the company, and in addition to logically separate out functionality and instances of particular types of RepObjects (there is a nodeType property that helps identify the type of object you are working with e.g. Batch, Document). I strongly urge people to do this; it helps no end and makes developing your own Capture applications a lot easier. If you don’t have time to do this – or the in-house skills, perhaps look at purchasing a “toolkit” that an old FileNET VAR may have written. My old toolkit is probably still in circulation, but it is written in COM. If anyone wants it, I can put you in touch with the company that owns the IPR to it (an old employer).

By wrapping up the Capture object model into your own, it makes life a lot easier, especially for things like identifying types of objects, as your own object model should have objects such as “Batch”, “Document”, “Image”, “Server” etc. These objects can then logically contain relevant information and functions. A good example is status. Unfortunately you cannot unlock batches when they are being processed (unless you are an admin user). This means you need to check a status of a batch to see if it can be unlocked. Within your own object model this is easy and needs only be written and wrapped once (you see why life can get easier with your own object model). This makes life a lot easier in a real world environment when your capture environment is a workflow in itself.

Separate out the capture environment

Many people here still use capture paths, I suggest you minimise their use as much as possible. These are fiddly and troublesome to say the least. First things first, scanning and document recognition, assembly etc should not be done on the same machine (though Capture suggests it should). Separate out the actual pure scan function from document processing activities – allow the scan station to only scan, nothing more. Remember scan stations are expensive and the big benefit of expensive scanners is throughput. You cannot afford to have the machine processing power being wasted on other tasks…

Document processing activities (such as splitting images into documents, batches, image enhancement etc) should all happen off of the scan station. So ensure you get a background service or application in place on a dedicated machine that does this job. It will be critical this process to the success of your implementation – so test, test, test, test and carry out some more testing.

Indexing is a critical part of capture. If you are slow here, you really have a negative impact on system performance. In addition, if you are sloppy and data is not correct, you will have a negative impact on the whole retrieval system and its capabilities to meet business requirements. Things to remember are that you may be working with different classes of documents. You may also need to pull in validation from external systems so Indexing applications can prove tricky. On top of this, you may well be releasing images into a workflow system – so data capture that is not going to be stored as index properties may also need to be captured….If you have your own object model, all of this becomes a hell of a lot easier….

A good tip – ensure your scanners always put only the same classification of documents in a batch. Sounds obvious but far too often this is overlooked. It is hard to change a documents class once it has been scanned, trust me….

Extend the object model

The Capture object model does allow for attributes to be placed on objects. This means you can extend your own object model with properties and store these as attributes onto a RepObject. I have seen others decide to implement their own database to do this, however that is just a massive overhead, and why, when you have all that you need in Capture. In addition, when testing it is so easy to look at RepObject attributes in Capture itself.

For particular requirements, extending the object model is a great way of attaching data that won’t be stored in the retrieval system, but may be required for other purposes (either to help index agents, or to trigger workflow systems, integration with other LOBs).

Another key area to extend the object model is that of locking. Basically, when an item is being worked on it is locked by Capture. However, you need to take control of this, as again it can get messy – with batches getting left at locked stats etc. In your object model I strongly suggest you explicitly call the locking of an object when you need to. In addition, you explicitly unlock it when finished with the object. Also, if you have a good “status” set up, this makes life easier when checking if you can or cannot work on an object. At the Indexing stage and document processing stage, this is crucial…

Success in a nutshell…

Wrap up the Capture API, extend the object model with your properties that utilise attributes, add your own functions to your logical components and explicitly take control of things such as locking. Once you have this type of API in place, splitting out scanning from document processing, from image enhancement is easy. It is also a lot easier to then implement good indexing applications (or one that can do everything) that promote quick working and integrate with validation components other LOBs. Releasing the captured images into the actual repository can also be separated, freeing up processing on the index station or from QA (if you have this in place).

If you do all of this, your Capture environment will be very successful and flexible enough to meet all your needs. If you at a later date want to plug in third party features, you can (such as ICR or something similar) . You can do this elegantly too, by storing the data from the third party component as further attributes on your object (probably a document). You can then pick these up at your indexing station or anywhere in the capture path and use them accordingly….

If you want help with Capture feel free to contact me directly. I still provide consultancy for this environment and am always happy to help…

Comments : 6 Comments »
Tags: FileNET, Panagon Capture
Categories : Document Capture, document management, Document Scanning, ECM

Centralise Document Capture

11 12 2009

For quite some time I have been a strong advocate for larger organisations taking control, and responsibility, for their own scanning processes. I have nothing against outsourced scanning organisations, it’s just that organisations are entrusting what could be their most sensitive data to a third party, and not only that, they are relying on them to deliver it back to you as good accurate images and more often than not along with key associated data.

I now hear cries of “what’s wrong with that?” Well a number of things actually…

Just who are the people carrying out the scanning? Who has access to these files
What skills do they have in identifying key parts of a document?
Compliance issues / complications
Quality control
Speed

Let’s look at these one at a time.

So who is actually doing the scanning and indexing tasks? Well in-house you have control over this, basically you choose who to employ. However, when outsourced you have no idea who has access to these files, sometimes you don’t even know what information could be found in these files (if sent directly to an outsourced document capture organisation), let alone then what sensitive information is being read by who.

Let’s be honest, being a document scanner is not the most thrilling of jobs, so outsourcing companies will often employ “lower skilled staff” (please don’t take that the wrong way) and staff working on a project per project of very temporary basis. This brings me on to point 2…

What skills do your outsourcing company staff deliver? Have they any experience of scanning or indexing and if so, do they understand your business and what content to expect / look for in scanning documents?

Compliance is a big thing here and even I sometimes get a little lost with it in regards to outsourcing. For many markets, compliance means you have to know where all your data and content is stored at any point. Now if you are using an outsourcing company, does this mean you need to know what machines that content is being stored on? Where those machines are? With regards to cloud computing this is a big problem as organisations simply don’t know exactly what server is holding what information of theirs…so does the same apply when outsourcing your document capture. Worth taking some time to think about that one….

Quality control is a big bear of mine. In IT circles remember “shi* in, equals shi* out” and that’s so true with document capture. If your image quality is poor, or the accuracy of its accompanying data, then when trying to locate that content, you will find it rather hard, and your great document retrieval / ECM system will be almost pointless…

Ahhh, speed. This is often, along with cost, the big factor for organisations choosing to outsource document capture, but is it any quicker? In my experience the answer is no. I have worked on numerous projects which have used outsourcing companies for their document capture, only to find it has taken an unexpectedly long time to get the images into the retrieval system (based on the data received / postal date of content for example).

So get centralised

It’s cost effective for larger organisations to get their own centralised scanning environment. Not only will the business process of capturing this content be smoother, but also the quality of your images and accompanying data will be better. With greater investment in scanning software and the automation of data capture (OCR / ICR, Forms recognition, Auto-indexing etc) organisations will find it easier than ever before to reap the rewards and enjoy a quick ROI.

There is already currently a trend back towards centralised scanning. A recent AIIM industry watch article highlights this. Have a read here; http://www.aiim.org/research/document-scanning-and-capture.aspx, then ensure you take ownership of your own document capture requirements…

For a good place to start when thinking about document capture and scannign solutions, read one of my earlier posts on Document Capture success….

https://andrewonedegree.wordpress.com/2009/05/14/successful-document-capture/

Comments : 1 Comment »
Tags: Document Capture, scanning
Categories : Document Capture, document management, Document Scanning, ECM

Document and file retrieval metadata

28 08 2009

Far too much focus is made today on providing complex retrieval fields within ECM solutions, and far too much is made of them from customers. For sure, inherited values and properties can be of great use, but when you start to look at your actual requirements, far too often retrieval fields are simply made too complex.

Points to remember

When designing your retrieval fields, metadata or indexes (whatever you wish to call them), keep in mind just what a user will want / need to do to actually locate this file / document. Here is a quick list to help you:

How much information will the user have on a file?
How much time do you want to allow them to enter search information
How can your metadata fields actually assist in this
What sort of results will be brought back and how clear will these be to the user (clear as in how can they quickly see the file they want)

Many systems recently spend a lot of time on very accurately identifying files, however, by doing this they also make it very complex at the data capture stage (scanning and indexing) and also require the user to spend longer setting up their search.

Keep it simple

When designing / identifying metadata fields for files, always try to make and keep things as simple as possible.

First things first, identify the types of files you are storing. This doesn’t mean pdf, word, tiff etc. rather it relates to their type within your business. So some examples may include personnel files, expense claim forms, insurance claim form, phone bill, customer details etc. (dependent on your business).

Once you have made this identification, we get onto the point of retention. How long will a particular file type stay “live”, then move to an “archive” then be completely deleted. When doing this you may find that you logically have some separation of files appearing. NB only create a new classification of file type if it is needed. Don’t do it as some logical separation, rather classifications should only be created to separate either groups of metadata or address such issues as migration and retention periods.

The tricky part is to now identify the metadata fields associated with your types of files. I would always suggest you try to keep these as simple as possible and try not to use more than 7 fields to identify a file. This is where often designers get carried away using inherited fields from different objects within the repository. This is all well and good and can really help in displaying search results back to users (or a heirachyy of files back to a user). However what I try to do is the following:

Imagine you don’t know if there are other files out there in the system (nothing to inherit from)
Identify at least one key field (policy number, customer number, telephone number etc)
Provide a list of options to the type of file it is (Date of birth certificate, driving license, claim form, phone contract, interview, recorded conversation etc)
Only provide other fields that help logically identify this file from other files of the same type, or they help identify, for example, a customer entity within your business
Provide as many “drop down list” options as possible. This ensures data is accurate and not reliant on spelling or interpretation
Identify any metadata that may be “shared” with other file types. For example a Policy Number may be found on multiple types of files within multiple classifications of files. In addition Policy Number is unique within the business so therefore it can be used to tie together a number of files to a particular policy holder.

If you stick to these 5 principles you will find that 9 times out of 10 you will not have any call for using complex inheritance or complex storage concepts. You more than likely have also identified your classifications in full. Please note that your file types along with classification will also 9 times out of 10 provide you with enough criteria to accurately assign security information to these files.

Once you have identified how information is to be retrieved, think about what information could be automatically captured at the data capture side of things. This sometimes illustrates fields that could be used to help identify files at retrieval; it also sometimes identifies fields that really aren’t appropriate.

Showing results

Often your retrieval system will display results of searches in a format which isn’t always that great to you or your business needs. This is why there are so many “professional services” offered to customers of such systems. As a customer, linking objects together, even showing them in a “tree view” type fashion can help the end user. However, this isn’t a call for inherited properties, rather a call to logically display business related information.

Also remember different types of searches can require different ways of displaying search results. This is sometimes overlooked by designers and system providers to the detriment of the user experience.

Finally, always think past the retrieval process. Once a user has found the file they want they will need to interact with it in some way, this could be to simply view its content or to pass on to another user etc.

Conclusion

I am a firm believer in keeping things as simple as possible and often adopt that IT term the “80 – 20” rule. Far too often IT tries to deliver too much, and in doing so it over complicates areas of the system or worryingly the business. When this happens more often than not a project can be seen as a failure, when really, by delivering less the customer gets more.

When putting together metadata for the retrieval of files remember to try and keep things as simple as possible. Identify key fields and not get carried away in capturing too much retrieval data. Also, always keep your end user in mind, so that’s the end user at the scanning and index stage and end users searching for files. Sticking to these simple rules will ensure you deliver a file retrieval system that works efficiently, quickly and well for your end users and your business…

Comments : 2 Comments »
Tags: Application Design, digital documents, Document Capture, document management, ECM, image management, Imaging
Categories : Application Architecture, Application Design, Document Capture, document management, Document Scanning, ECM

Successful document capture…

14 05 2009

Well this is something close to my heart. My first ever project after leaving university was to help write a document capture application that was built on-top of the FileNET Panagon Capture platform. Ahh happy days…Though I did seem to earn the name “scan man” from then on, which wasn’t so great, as every document capture project our company then had, I had to be involved with….

Ok so how do you implement a successful document scanning / capture solution. Well it’s very simple, follow these 5 guidelines and you are well on the way.

Throughput is everything. Make sure people can load the scanner and let it do its thing. You don’t want to be stopping to separate documents or batches. Make sure your software can do this and purchase a scanner with a big document holder.
Ensure you maximise the quality of the images you are capturing. If this could be a problem, then make sure you get in place good quality control and re-scan technology
Identify as much information as possible up-front with your software. The more a user has to do, the slower and more expensive the process will become
Ensure your data captured or assigned to a document is accurate. Remember your retrieval of these images depends on the accuracy of your data capture
Your document capture is pointless, unless you release the images into your storage repository with all the correct information. Again make sure this is done seamlessly and accurately. The longer the files are in your capture process, the longer it will take for them to turn up in a customer file for example…

So where to start?

Well this is with your document capture software, and there are lots of solutions out there. Firstly, when choosing your capture software, have those 5 guidelines in your mind. You want to automate as much as possible (unless we are talking only the odd scanned document through the day). In addition, you don’t just want to watch a sales pitch on the actual scanning process, or the physical scanner being used. You want, and need, to see the process all the way through, and with a variety of documents.

It’s best if you can use forms wherever possible, but you will always have un-structured documents coming to you, such as letters. Now you MUST see a demonstration of how these are dealt with, then ask yourself;

“is that efficient?”

“how could that be speeded up?”

“am I happy with the way data is entered / captured?”

“now let’s find the document in the retrieval system”

I don’t want to start recommending software, as depending on your storage repository etc you may find you have a limited selection. What I will say, is that for our workFile ECM repository we use software that I have been familiar with and more than happy with for sometime, Kofax. I have worked on numerous projects with Kofax Accent Capture and with Nuerascript recognition modules (which are now part of Kofax). Kofax provides you with all the technology and features you could want to streamline any capture environment. And, more importantly, they allow you to write your own release processes into the repositories of your choice.

What about architecture

Scanning can be quite intensive for your PC. A while back, all of your “steps” if you like were carried out on a single machine, so you scanned, had the batches and documents recognised, processed, enhanced then sent on for an agent to index. However, this isn’t great, ideally you want to split out this intense processing work and let your scan station simply scan images.

Server based solutions are best, freeing up staff to scan and pull documents as and when they are ready. Your images should always be ready quicker than your staff can quality assess them or carry out indexing tasks. Oh, don’t be fooled by “thin” document capture, something has to drive the scanner and therefore it’s not “thin client”…

What about staff?

This can be a boring task, so rotate your staff to different jobs, every couple of hours. They may still get bored, but if you don’t do this, they will be making lots of errors and getting really bored. Trust me, just spend a couple of hours doing one task such as scanning and your brain can go numb…

You will also need a “champion” of the capture process. Someone who can keep people motivated and ensure they maximise the potential of the system. All too often the system capacity is not met as staff becoming lazy or complacent. This negates your investment and diminishes your return on your investment, so a champion is very important.

It’s also worth noting that from time to time, you will need someone with more experience of the scanning process, again that champion, simply because you will get issues with stuck paper, batches not getting recognised, image quality problems etc. At this point, you need someone with a little more knowledge of how things work.

Finally

Remember no matter how good your capture process is, your retrieval system is only as good as the quality of the images and the data associated to those images. Also, please don’t invest heavily in a great capture system then scrimp on your retrieval system. If you do this, you will find no benefit of the capture process and document imaging at all. Your first port of call is still ensuring you purchase the right retrieval / document management system. Then address the capture side of things.

Comments : 11 Comments »
Tags: digital documents, Document Capture, document management, ECM, image management, image processing, Imaging, scanning
Categories : Document Capture, Document Scanning

Andrew Smith CTO

NHS needs to get efficient…It needs ECM and BPM

FileNET Panagon Capture…How to…

Centralise Document Capture

Document and file retrieval metadata

Successful document capture…

Subscribe

What am I doing now…

Recent Posts

Categories

Recent Comments

Business Process Management

Enterprise Content Management

General Interest

Social Media

Archives