Itemfield
Content Management—Data Transformation
Unstructured and semi-structured mapping and data transformation are Itemfield's raison d'xtre, and has enabled the run-time engine in its core ContentMaster offerings into IBM MQSeries, Microsoft BizTalk, Informatica PowerCenter and SAP NetWeaver, among others. The focus is to deliver libraries for industry-specific integration issues involving unstructured and semi-structured data formats. The forthcoming version of ContentMaster, due in March 2006 sporting a new IDE in the shape of the Eclipse open source framework, among other additions.
ContentMaster offers something genuinely different to the database adapters, messaging technologies and Web services tools currently being used to integrate unstructured and semi-structured data. Database connectors – a traditional method of getting data formats to talk to each other – are hardwired, inflexible and proprietary; ContentMaster does have an edge in its ability to reuse data transformations.
ContentMaster can run inside most of IBM's middleware, including MQSeries and its WebSphere application server, through technology partnerships, also integration partnership with Oracle to integrate with 11i via its BPEL (Business Process Execution Language) server. In the PowerCenter scenario, ContentMaster brings data from message queues into the data integration platform.
ContentMaster's intrinsic value is its ability to do the 'heavy lifting' on transformations of various data formats on the unstructured/semi-structured side of life – although it also performs more basic database-to-database mapping functions. It effectively parses, disassembles and reassembles data, which can then be mapped to an XML format.

ContentMaster works with a number of unstructured data formats including PDF files and Excel spreadsheets, as well as semi-structured sources including Acord, the data format used by the healthcare industry, and the Swift codes used in the banking industry. It is not only the format and specifications – or lack thereof – in semi-structured/unstructured data sources that create integration issues, but also that each version of a particular format is slightly different.
ContentMaster 3.2 – can be broken down into three components. Studio is the GUI-based development tool developers use to expose semi-structured and unstructured data sources and create scripts that are then parsed by example. The scripts are automatically generated once a developer has imported the format – a PDF file for example – and shown the tool an example of how the file should be mapped into an XML format. The Engine is the run-time environment that runs the scripts. It's also the component that is embedded into third-party data integration tools such as NetWeaver. The libraries are predefined transformations for specific markets that provide out-of-the-box transformations for particular formats including Swift, Acord, Mismo – an electronic commerce standard for the mortgage industry – and the electronic data interchange standard EDI x12.
ContentMaster will always be a tool used by technically minded people, to make integration tasks more business-oriented by relating a task to a business process. A number of libraries that are geared toward the insurance, healthcare and banking sectors are, being delivered. These prepackaged libraries will include a bundle for version control of Acord data formats, for example.