Skip to main content

Column Mapping

Query Explanation

  • Purpose: The query, InferNormalizeMapping, is aimed at inferring the mapping between input columns from a source dataset and predefined output columns that adhere to a specified schema. This mapping is crucial for converting the data into a standardized format.
  • Operation: It takes a set of input columns, a list of example rows (data records), and a definition of the desired output columns. The query then analyzes the example rows to determine how the input columns correspond to the defined output columns.

Variables Explanation

  • exampleRows: These are sample data entries from the dataset being processed. Each entry is a JSON object representing a row of data, with key-value pairs corresponding to column names and their values.
  • outputColumns: This is a list of the desired output columns for the normalized data. Each output column is defined with a name and guidelines describing what the column should contain.
  • inputColumns: These are the names of the columns in the input dataset that need to be mapped to the output schema.

Response Explanation

  • mappingWithConfidence: The response contains mappings from the input columns to the most appropriate output columns, along with a confidence score for each mapping. The confidence score reflects the algorithm's certainty in its mapping decision.
    • For example, the "Company" column from the input data is mapped to the "Vendor ID" output column with a high confidence of approximately 0.989. This indicates that the Protege Engine is highly certain that the "Company" column corresponds to the "Vendor ID" in the desired output format.

Practical Implications

This functionality is particularly useful in scenarios where data from different sources must be integrated into a unified database or analytics platform. It automates the otherwise labor-intensive and error-prone task of manually mapping diverse datasets to a common schema, significantly enhancing efficiency and data quality.

The query showcases the power of the Protege Engine in handling and transforming unstructured data, leveraging AI to understand and map data elements between different formats based on the content and context provided by the examples and guidelines. This process is integral to data integration, enabling businesses to consolidate and utilize information from various sources more effectively.

Example Query

Query

query InferNormalizeMapping($outputColumns: [OutputColumnDefinitionInput!]!, $inputColumns: [String!]!, $exampleRows: [JSONObject!]!) {
inferNormalizeMapping(outputColumns: $outputColumns, inputColumns: $inputColumns, exampleRows: $exampleRows) {
mappingWithConfidence
}
}

Vars

{
"exampleRows": [
{
"Company": "ACME Corp",
"Date": "2024-01-15",
"Units in Warehouse": "100",
"Item": "Widget A",
"Price": "5000"
},
{
"Company": "BETA LLC",
"Date": "2024-01-16",
"Units in Warehouse": "150",
"Item": "Widget B",
"Price": "7500"
}
],
"outputColumns": [
{
"name": "Vendor ID",
"guidelines": "The ID of the vendor that submitted the file -- sometimes vendors are referred to as partners."
},
{
"name": "Product Manufacturer SKU",
"guidelines": "Only include the SKU of the product in the Product Description. If the product is a computer system, any other SKUs (i.e. part or sub-component columns) in the data should be placed in the 'Product Description' column."
},
{
"name": "Product Description",
"guidelines": "Include data about the product or its manufacturer, including the information about or SKUs of sub-components like CPU and GPU, in this column."
},
{
"name": "Quantity on Hand",
"guidelines": "The number of units of the product that the vendor has in stock."
},
{
"name": "Quantity on Order",
"guidelines": "The number of units of the product that the vendor has on order."
},
{
"name": "Date of Inventory Update",
"guidelines": "The date on which the vendor last updated the inventory information for the product."
}
],
"inputColumns": ["Company", "Date", "Units in Warehouse", "Item", "Price"]
}

Response

{
"data": {
"inferNormalizeMapping": {
"mappingWithConfidence": {
"Company": {
"outputColumn": "Vendor ID",
"confidence": 0.9891391539648242,
"predictionId": 1468
},
"Date": {
"outputColumn": "Date of Inventory Update",
"confidence": 0.9681885078817516,
"predictionId": 1469
},
"Units in Warehouse": {
"outputColumn": "Quantity on Hand",
"confidence": 0.9976204649195086,
"predictionId": 1470
},
"Item": {
"outputColumn": "Product Manufacturer SKU",
"confidence": 0.743513277820945,
"predictionId": 1471
},
"Price": {
"outputColumn": "Price",
"confidence": 0.4449232457050688,
"predictionId": 1472
}
}
}
}
}