FileNET Panagon Capture…How to…

25 02 2010

Ahhh now the inspiration behind today’s post is that I have noticed people finding my blog looking for the good old FileNET Panagon Capture objects – such as a RepServer, RepObject and how to unlock these components….

Now it has been a little while since I was programming in Panagon Capture, but this is the environment I first cut my teeth on when leaving uni. (Panagon Capture, is a document capture environment for the FileNet Image Services, Doc Management repositories). Panagon Capture has seen me working all over the UK, Ireland and places of Europe implementing capture solutions for FileNET implementations. From leaving uni, it was getting dropped in the deep end, but I have to say I enjoyed it – and it was how I made a name for myself at my first place of work…

Things to remember with the Capture object model

Ok well first things first, the Capture object model got slated in its early days, it was too confusing to pick up and many people struggled with it. However, I actually think it is quite elegant in places (sorry). So why did it get slated, well primarily because no matter what you are working with, you always have the same object – RepObject. So if I am working with a particular scanned page / image, I have a RepObject. If I am working with a document, it’s a RepObject, if a separator a RepObject, a batch, a RepObject …. So you can see it can get confusing…

In addition, it is also worth remembering that many of the features of Capture are ActiveX COM components (OCX controls). These are used to wrap up a bunch of functionality – typically the actual Scan process, Capture Path configuration, Document Processing options etc.

Capture out of the box

Now the Capture environment out of the box is ok, not great, ok. It can get confusing when trying to use it in a real production environment – I will explain why in a moment. Key things to remember here is to ensure Batches are the only objects you can see floating around from the root of the Capture environment. If you have images, or documents, then you are asking for trouble. In addition, separate all your capture paths into another folder (if you choose to use these – I recommend you don’t to be honest – well not in the way Capture encourages you too).

Always remember, that Capture out of the box is a good tool to monitor what is going on with your software if you are using the API to create your own FileNET capture applications. It does help, if only for logic checks.

The object model

In my early days working with Capture – it was hard to logically separate out functionality and implementations of classes etc. It was even harder to then put this in a way other developers could pick up quickly and easily. Because of this I decided to “wrap” up the Capture object model so that it logically made more sense to others in the company, and in addition to logically separate out functionality and instances of particular types of RepObjects (there is a nodeType property that helps identify the type of object you are working with e.g. Batch, Document). I strongly urge people to do this; it helps no end and makes developing your own Capture applications a lot easier. If you don’t have time to do this – or the in-house skills, perhaps look at purchasing a “toolkit” that an old FileNET VAR may have written. My old toolkit is probably still in circulation, but it is written in COM. If anyone wants it, I can put you in touch with the company that owns the IPR to it (an old employer).

By wrapping up the Capture object model into your own, it makes life a lot easier, especially for things like identifying types of objects, as your own object model should have objects such as “Batch”, “Document”, “Image”, “Server” etc. These objects can then logically contain relevant information and functions. A good example is status. Unfortunately you cannot unlock batches when they are being processed (unless you are an admin user). This means you need to check a status of a batch to see if it can be unlocked. Within your own object model this is easy and needs only be written and wrapped once (you see why life can get easier with your own object model).  This makes life a lot easier in a real world environment when your capture environment is a workflow in itself.

Separate out the capture environment

Many people here still use capture paths, I suggest you minimise their use as much as possible. These are fiddly and troublesome to say the least. First things first, scanning and document recognition, assembly etc should not be done on the same machine (though Capture suggests it should). Separate out the actual pure scan function from document processing activities – allow the scan station to only scan, nothing more. Remember scan stations are expensive and the big benefit of expensive scanners is throughput. You cannot afford to have the machine processing power being wasted on other tasks…

Document processing activities (such as splitting images into documents, batches, image enhancement etc) should all happen off of the scan station. So ensure you get a background service or application in place on a dedicated machine that does this job. It will be critical this process to the success of your implementation – so test, test, test, test and carry out some more testing.

Indexing is a critical part of capture. If you are slow here, you really have a negative impact on system performance. In addition, if you are sloppy and data is not correct, you will have a negative impact on the whole retrieval system and its capabilities to meet business requirements. Things to remember are that you may be working with different classes of documents. You may also need to pull in validation from external systems so Indexing applications can prove tricky. On top of this, you may well be releasing images into a workflow system – so data capture that is not going to be stored as index properties may also need to be captured….If you have your own object model, all of this becomes a hell of a lot easier….

A good tip – ensure your scanners always put only the same classification of documents in a batch. Sounds obvious but far too often this is overlooked. It is hard to change a documents class once it has been scanned, trust me….

Extend the object model

The Capture object model does allow for attributes to be placed on objects. This means you can extend your own object model with properties and store these as attributes onto a RepObject. I have seen others decide to implement their own database to do this, however that is just a massive overhead, and why, when you have all that you need in Capture. In addition, when testing it is so easy to look at RepObject attributes in Capture itself.

For particular requirements, extending the object model is a great way of attaching data that won’t be stored in the retrieval system, but may be required for other purposes (either to help index agents, or to trigger workflow systems, integration with other LOBs).

Another key area to extend the object model is that of locking. Basically, when an item is being worked on it is locked by Capture. However, you need to take control of this, as again it can get messy – with batches getting left at locked stats etc. In your object model I strongly suggest you explicitly call the locking of an object when you need to. In addition, you explicitly unlock it when finished with the object. Also, if you have a good “status” set up, this makes life easier when checking if you can or cannot work on an object. At the Indexing stage and document processing stage, this is crucial…

Success in a nutshell…

Wrap up the Capture API, extend the object model with your properties that utilise attributes, add your own functions to your logical components and explicitly take control of things such as locking. Once you have this type of API in place, splitting out scanning from document processing, from image enhancement is easy. It is also a lot easier to then implement good indexing applications (or one that can do everything) that promote quick working and integrate with validation components other LOBs. Releasing the captured images into the actual repository can also be separated, freeing up processing on the index station or from QA (if you have this in place).

If you do all of this, your Capture environment will be very successful and flexible enough to meet all your needs. If you at a later date want to plug in third party features, you can (such as ICR or something similar) . You can do this elegantly too, by storing the data from the third party component as further attributes on your object (probably a document). You can then pick these up at your indexing station or anywhere in the capture path and use them accordingly….

If you want help with Capture feel free to contact me directly. I still provide consultancy for this environment and am always happy to help…

Advertisements

Actions

Information

6 responses

25 02 2010
Max J. Pucher

Thanks for an elaborate description on how painful these old products are!

Yes, the object oriented model is great, as long as you don’t have to write code to make it work. That is the concept behing the Papyrus Platform. The best of the OO world with FULL flexibility to create the OO model that you need, all without programming and still centrally change managed and deployed through the Papyrus WebRepository to our distributed peer-to-peer production servers. All with RIA-web or PC-GUI frontends defined in similar OO models.

So if you really want to do something for your content and process management users then dump this old stuff and do something modern …

25 02 2010
Andrew Smith @onedegree

I cannot see that happening. Far too idealistic….Lets face it, organisations have lots of applications out there that use Cobal, Visual Basic 5 even looks modern in some places…And it is becuase it still works for the business very well, and because they invested heavily in that platform. To move away from it and spend yet more money means there really has to be a big business case for doing so…and i mean vast differences and benefits with ROIs over short periods…Thats a tough sell…

Businesses would have invested a lot of money in the FileNET platform and would need something unbelievably vastly better to move away from it (and I cant blame them for not moving away – its a good platform once you get it working)

From the end user point of view, there is nothing wrong with the Capture environment – it works very well for users, very efficiently and very reliably. The headache is at the development end (if you choose) or config / implementation. Thats why there were VARs who made this easy for companies using FileNET.

23 04 2010
Uma

Hi there,
I need some idea how indexing is done in FileNET. Actually I am very new to FileNET but must have some basic idea how the documents are stored there. Is there is any internal database that used to save all the FIle Information something like. Could you please help me.

Thanks
Uma

26 04 2010
Andrew Smith @onedegree

HI Uma,

All content stored in FileNET is “indexed” and these indexes are built within a class. In this case, an index is a retrievable field, not an index in teh same sense a database is indexed. Think of indexes as properties….These indexes (properties) are a part of a particular class / classification. Yes this data is held inside a database for FileNET, but you cannot access this directly, you must use their API, or native products. When “indexing”, what you mean is a user is entering the data into the index fields for a particular piece of content. These index values can then be used as part of a query to locate that file at a latter date….Indexing can be done in a number of ways, it all depends on your own solution and what sort of version you are running of FileNET (P8, Image Services, using Panagon??)

Hope that helps…

31 08 2011
Jorge Osejo

Hi Andrew; I have asked to write a couple of capture applications to import tiff and multi tiff images; now they want me to write an application that will bring word, excel, pdf and other formats into filenet using capture import objects; I started with a word document; the batch was created and the word file appears under the batch; I got some errors but I think this is doable for any other file formats. Do you have any advice you can provide me with? I had to stop because I was using prod for testing this and one of the errors logged some things in the elogs… thanks a lot,

jo

1 09 2011
Andrew Smith @onedegree

HI Jorge,

To be honest, I wouldnt use the Capture environment for capturing electronic documents such as word, excel, pdf etc. I would use Panagon Desktop objects for this, as the object model is a lot simpler to deal with and on top of that, you don’t need to be using elements from a typical capture path (such as docment processing, image enhancement etc etc). So, I suggest you import the word docs for example using the desktop objects. Here you can set the index properties for the document either in code or via your own application and the save / commit the document to FileNET. Typically what you write here you will re-use with other desktop applications, allowing the user to import documents from their desktop / off the network.

If though you are set on using the Capture objects, I would suggest that you have to set the node objects correctly (so a document node is identified within a batch node etc). Assign your document to the corresponding document node and then make sure you set its phase / step to indexing. (You will have also had to declare its class before that). You should be able to then index the document as if it were scanned and then follow the capture path as normal (committal).

Hope that helps

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: