Unstructured repository Vs Structured Repository

24 04 2009

I am talking here about Content Management repositories or Enterprise Content Management repositories (ECM).

This is a subject close to my heart; after all if I hadn’t started designing my own ECM repository a number of years ago with our technical director, I wouldn’t be here writing this blog…(tad cliché I know). For us, it was a passion to design the repository, and we felt that we could deliver a repository that met so many requirements at comparative low costs to customers…

I have spent sometime today wondering round some posts about repositories, structures, design principles, open source etc. Most of which I found quite interesting, however, as usual, these articles are all so very very technical. Masses of space are used to argue one particular implementation /API benefits over another, and don’t really look at real world solutions. Especially when talking about repository design or structure.

I have read quite a bit about how a repository designed with no “structure” specifications is far more flexible than one that requires structured specifications. For me, this is just not looking at the whole picture. I fear like so many IT articles, that technology and new methods of doing things are hiding the requirement for good, solid design.

A solid specification

Your business applications (ECM applications in this case) need to know exactly what sort of information is being requested by the user, after all, they know what they expect / want to find. Because of this, a good ECM repository MUST allow designers / administrators the flexibility to define different types or structures for content. Now I know there is an argument not to do this, rather specify query objects to locate the content, however, I find this a less elegant and messy approach. I also feel that this detracts from the number of important services the actual repository can offer, such as retention period specifications.

Even with structures being required by the ECM repository, it can still act as a generic repository and deliver equal flexibility. The flexibility to define different structures must, however, be extended to the point where each property of that structure can contain its own definition / structures. If your repository can do this, then all the flexibility in the world is at your finger tips, in a strict and managed fashion. For me this is a solid approach to designing and specifying a system that can grow with an enterprise and deliver great performance.

Query and Create

Whenever you query an ECM repository you should think about how the actual user will make the query, and what they will expect. You don’t want to be too strict with what they can and can’t do, but you don’t want to be so “loose” that you risk performance implications. With a structured specification approach, as laid out above, your repository provides you with the options to allow the user to structure their query as much, or as little as they choose. In this way, the user is informing the system of the resulting structures they expect. It also allows developers to build highly efficient queries for repository based applications.

When users need to create content within the management repository, this is where I see there could be an argument for not specifying types / structures for content. If you have a structured type expected, the user creating the content has to state values for the expected structure. This could be a little time consuming for users and a real issue if you have vast amounts of content that needs to be constantly added to your repository. Please note though, I don’t see this as an argument for unstructured design, rather an argument for good data capture processes and high levels of automation.

If you have a “known structure” being created in the repository, you have greater control over how the repository can work. For example, if you have known structures the following can all be known and therefore provide valued services:

· Encryption type for content

· Streaming rules (if required)

· Digital Rights / Licensing of content and media

· How and where to physically store the content in the repository

· Specific migration and retention periods.

Accessibility, Applications and API

ECM repositories often require integration with other applications and services. Because of this, they really need to deliver a good API. For me, a good repository delivers a full API. The API also needs to deliver flexibility to developers to use the repository as they see fit, though without compromising security and support of repository functionality. If your API does this, then developers will be able to develop in their own style and with the freedom they require. With a good full API, developers can leverage that repository in new ways, extending the value that repository can provide to an organisation. This is key to any organisation that understands the importance of ECM.

However, your API cannot be proprietary to a particular technology or platform these days. This is where web services really help. We ourselves have spent a great deal of time ensuring our workFile Vision repository delivers a full web service API, so much so, that all our own applications only interact with the repository through that API. We know that if all our applications work in this way, there is no integration requirement that cannot be met.

Conclusion?

So what have I tried to say in all this rambling. Well, if you are thinking of investing in ECM, make sure you look at how the repository is designed, what performance can it deliver and how scalable is it. Unstructured repositories will struggle to compete with well designed repositories that allow the structured specification of various types of content. Also make sure that your future requirements are not hampered by a lack of a flexible and full API.

If on the other hand, you are a developer / designer looking to create your own content management repository, make sure you think about what you want to achieve. What expectations you have. While an “unstructured” repository store sounds attractive and highly flexible, it may well cause you issues in the longer term, especially with retention periods! Also think about interoperability. If this is more than just a design / development exercise, you are going to need to provide an API, one which can be consumed by other technologies and platforms.

Parting shot…

In the end, a good repository design should provide users and developers alike, with as much or as little structure and flexibility as required…

Actions

Information

Date : April 24, 2009
Tags: Application Design, ECM, Software Architecture
Categories : Application Architecture, Application Design, ECM

2 responses

30 04 2009: Robor (09:50:48) :

Where are you from? Is it a secret? 🙂
Robor

Reply
11 05 2009: Andrew Smith @onedegree (16:19:42) :

I am situated just outside London, UK…

Reply

Andrew Smith CTO