Informatica acquires entity and schema matching AI start-up

Informatica, the leading enterprise cloud data management firm, has acquired GreenBay Technologies, an AI start-up that extends machine learning capabilities into matching data entities and the schemas that represent them.

The new capabilities will be integrated into Informatica’s offerings as they pertain to:

  • Enterprise data catalogue
  • Master data management
  • Governance
  • Data integration
  • Privacy

Informatica and GreenBay Technologies had already been acquainted with each other in the past, as the former served as the latter’s sole investor.

GreenBay Technologies utilises machine learning for handling various tasks such as matching customers or products from sets containing structured or unstructured data.

This can mean the exact data in a particular field or extracting data from a block of text.

Using schema matching, it goes above individual data entry to tables or objects, and has the capacity to map columns that represent the same thing.

In essence, it is about forming relationships between data sources.

GreenBay Technologies will expand Informatica’s existent repertoire, which is loosely branded as the CLAIRE engine.

The capabilities include:

  • Data domain inference
  • Business rules translation
  • Operational anomaly detection
  • Data transformation recommendations
  • Mass data correction

Although similar solutions have been presented before, GreenBay Technologies goes a step further by approaching the matter through scaling that is capable of mapping thousands upon thousands of data sets.

Moreover, it is much more polished in terms of accepted data diversity.

A crowdsourced approach translates to performance improvements.

Schema matching is much more than looking at column names to plan the next move, as it evaluates clues from different sources such as:

  • Nearby columns
  • Data values
  • Documentation
  • Historical query patterns

This approach helps with discovering more meaningful data relationships.

The crowdsourced consensus is a crucial part of identifying the best possible results.

With that said, the company did mention the possibility of introducing unsupervised techniques in the future.