Various metadata and statistics can be retrieved from a job during the configuration process using our API.

Available labels

The labels available are:

Data Properties

  • columnNames: the names of columns in the data
  • dataLength: how many rows the data contains

Statistics

  • mean: the mean value of each column
  • median: the median value of each column
  • mode: the modal value of each column

Missing Values

  • missingValues: each row that contains a missing value
  • missingIndices: for each column, the indices of each row with a missing value
  • missingRatios: for each column, the proportion of missing values present

Unique Values

  • uniqueValues: for each column, the list of unique values present in the column
  • uniqueRatio: for each column, the proportion of unique values present in the column

Duplicate Values

  • duplicateValues: for each column, the list of values that occur more than once
  • duplicateIndices: for each column, the indices of each row with a duplicate value
  • duplicateRatios: for each column, the proportion of duplicate values present

📘

Additional parameters for Duplicate Values API

The metadata API with the duplicate values label accepts an additional subset query parameter that can be used to choose the subset of columns over which to compute duplicate values.