Ahhh now the inspiration behind today’s post is that I have noticed people finding my blog looking for the good old FileNET Panagon Capture objects – such as a RepServer, RepObject and how to unlock these components….
Now it has been a little while since I was programming in Panagon Capture, but this is the environment I first cut my teeth on when leaving uni. (Panagon Capture, is a document capture environment for the FileNet Image Services, Doc Management repositories). Panagon Capture has seen me working all over the UK, Ireland and places of Europe implementing capture solutions for FileNET implementations. From leaving uni, it was getting dropped in the deep end, but I have to say I enjoyed it – and it was how I made a name for myself at my first place of work…
Things to remember with the Capture object model
Ok well first things first, the Capture object model got slated in its early days, it was too confusing to pick up and many people struggled with it. However, I actually think it is quite elegant in places (sorry). So why did it get slated, well primarily because no matter what you are working with, you always have the same object – RepObject. So if I am working with a particular scanned page / image, I have a RepObject. If I am working with a document, it’s a RepObject, if a separator a RepObject, a batch, a RepObject …. So you can see it can get confusing…
In addition, it is also worth remembering that many of the features of Capture are ActiveX COM components (OCX controls). These are used to wrap up a bunch of functionality – typically the actual Scan process, Capture Path configuration, Document Processing options etc.
Capture out of the box
Now the Capture environment out of the box is ok, not great, ok. It can get confusing when trying to use it in a real production environment – I will explain why in a moment. Key things to remember here is to ensure Batches are the only objects you can see floating around from the root of the Capture environment. If you have images, or documents, then you are asking for trouble. In addition, separate all your capture paths into another folder (if you choose to use these – I recommend you don’t to be honest – well not in the way Capture encourages you too).
Always remember, that Capture out of the box is a good tool to monitor what is going on with your software if you are using the API to create your own FileNET capture applications. It does help, if only for logic checks.
The object model
In my early days working with Capture – it was hard to logically separate out functionality and implementations of classes etc. It was even harder to then put this in a way other developers could pick up quickly and easily. Because of this I decided to “wrap” up the Capture object model so that it logically made more sense to others in the company, and in addition to logically separate out functionality and instances of particular types of RepObjects (there is a nodeType property that helps identify the type of object you are working with e.g. Batch, Document). I strongly urge people to do this; it helps no end and makes developing your own Capture applications a lot easier. If you don’t have time to do this – or the in-house skills, perhaps look at purchasing a “toolkit” that an old FileNET VAR may have written. My old toolkit is probably still in circulation, but it is written in COM. If anyone wants it, I can put you in touch with the company that owns the IPR to it (an old employer).
By wrapping up the Capture object model into your own, it makes life a lot easier, especially for things like identifying types of objects, as your own object model should have objects such as “Batch”, “Document”, “Image”, “Server” etc. These objects can then logically contain relevant information and functions. A good example is status. Unfortunately you cannot unlock batches when they are being processed (unless you are an admin user). This means you need to check a status of a batch to see if it can be unlocked. Within your own object model this is easy and needs only be written and wrapped once (you see why life can get easier with your own object model). This makes life a lot easier in a real world environment when your capture environment is a workflow in itself.
Separate out the capture environment
Many people here still use capture paths, I suggest you minimise their use as much as possible. These are fiddly and troublesome to say the least. First things first, scanning and document recognition, assembly etc should not be done on the same machine (though Capture suggests it should). Separate out the actual pure scan function from document processing activities – allow the scan station to only scan, nothing more. Remember scan stations are expensive and the big benefit of expensive scanners is throughput. You cannot afford to have the machine processing power being wasted on other tasks…
Document processing activities (such as splitting images into documents, batches, image enhancement etc) should all happen off of the scan station. So ensure you get a background service or application in place on a dedicated machine that does this job. It will be critical this process to the success of your implementation – so test, test, test, test and carry out some more testing.
Indexing is a critical part of capture. If you are slow here, you really have a negative impact on system performance. In addition, if you are sloppy and data is not correct, you will have a negative impact on the whole retrieval system and its capabilities to meet business requirements. Things to remember are that you may be working with different classes of documents. You may also need to pull in validation from external systems so Indexing applications can prove tricky. On top of this, you may well be releasing images into a workflow system – so data capture that is not going to be stored as index properties may also need to be captured….If you have your own object model, all of this becomes a hell of a lot easier….
A good tip – ensure your scanners always put only the same classification of documents in a batch. Sounds obvious but far too often this is overlooked. It is hard to change a documents class once it has been scanned, trust me….
Extend the object model
The Capture object model does allow for attributes to be placed on objects. This means you can extend your own object model with properties and store these as attributes onto a RepObject. I have seen others decide to implement their own database to do this, however that is just a massive overhead, and why, when you have all that you need in Capture. In addition, when testing it is so easy to look at RepObject attributes in Capture itself.
For particular requirements, extending the object model is a great way of attaching data that won’t be stored in the retrieval system, but may be required for other purposes (either to help index agents, or to trigger workflow systems, integration with other LOBs).
Another key area to extend the object model is that of locking. Basically, when an item is being worked on it is locked by Capture. However, you need to take control of this, as again it can get messy – with batches getting left at locked stats etc. In your object model I strongly suggest you explicitly call the locking of an object when you need to. In addition, you explicitly unlock it when finished with the object. Also, if you have a good “status” set up, this makes life easier when checking if you can or cannot work on an object. At the Indexing stage and document processing stage, this is crucial…
Success in a nutshell…
Wrap up the Capture API, extend the object model with your properties that utilise attributes, add your own functions to your logical components and explicitly take control of things such as locking. Once you have this type of API in place, splitting out scanning from document processing, from image enhancement is easy. It is also a lot easier to then implement good indexing applications (or one that can do everything) that promote quick working and integrate with validation components other LOBs. Releasing the captured images into the actual repository can also be separated, freeing up processing on the index station or from QA (if you have this in place).
If you do all of this, your Capture environment will be very successful and flexible enough to meet all your needs. If you at a later date want to plug in third party features, you can (such as ICR or something similar) . You can do this elegantly too, by storing the data from the third party component as further attributes on your object (probably a document). You can then pick these up at your indexing station or anywhere in the capture path and use them accordingly….
If you want help with Capture feel free to contact me directly. I still provide consultancy for this environment and am always happy to help…