Skip to main content

Job Configuration API

After you start a job, you can call these optional APIs to view and update the configurations of the job before running:

Sheet Selection

The Sheet Selection API return the currently selected sheet (in an Excel workbook), and to configure which sheet to select.

getSelectedSheet

To get the selected sheet and a list of available sheets:

const sheetSelectionResponse = segna.getSheetSelection(jobId);

Example Response:

{
"sheetSelection": {
"selected": "Sheet Name 1",
"confidence": 0.5,
"sheets": [
"Sheet Name 1",
"Sheet Name 2",
"Sheet Name 3"
]
}
}

updateSelectedSheet

To configure the selected sheet:

segna.updateSheetSelection(jobId, {
sheetSelection: {
selected: "Sheet Name 2"
}
});
warning

You can only select from the sheets listed under the sheets key found on the response of getSheetSelection



Fields

The Fields API returns the current mapping between the input data columns to the desired output columns (also known as fields).

getFields

const fieldsResponse = segna.getFields(jobId);

Example Response:

{
"expectedFields": ["field 1", "field 2"],
"optionalFields": ["optionalField1"],
"inputColumns": ["column 1", "column 2", "column 3"],
"columnFieldMapping": {
"dataSource 1": {
"column 1": {
"field": "field 1",
"confidence": 0.7,
},
"column 2": {
"field": "field 2",
"confidence": 0.6,
}
}
}
}

updateFields

segna.updateFields(jobId, {
columnFieldMapping: {
"dataSource 1": {
"column 1": {
field: "field 2", // Correct the field mapping
},
"column 2": {
field: "field 1",
}
}
},
allowColumnCombining: false
});

Optional Parameters:

  • allowColumnCombining is an optional flag that can be passed when updating the fields mapping. By enabling allowColumnCombining, you are allowed to map multiple input columns to a field. These input columns will then get combined with a separator.


Column Combining Methods

info

This API is only useful if there are multiple columns mapped to a field. Please refer to the Fields API above.

The Column Combining Methods API defines the method, configuration, and order of columns to be combined into a field.

getColumnCombiningMethods

const columnCombiningMethodsResponse = segna.getColumnCombiningMethods(jobId);

Example Response:

{
"columnsBeingCombined": {
"field 1": {
"method": "stringAppend",
"methodConfig": {
"separator": "_"
},
"combiningOrder": [
"column 1", "column 3"
],
},
"field 2": {
"method": "json",
"methodConfig": {},
"combiningOrder": [
"column 2", "column 4"
],
},
}
}

updateColumnCombiningMethods

segna.updateColumnCombiningMethods(jobId, {
columnsBeingCombined: {
"field 1": {
method: "stringAppend",
methodConfig: {
separator: "_"
},
},
"field 2": {
method: "JSON",
combiningOrder: [
"column 2", "column 4"
],
},
}
});

The following column combining methods are allowed:

  • stringAppend
  • json

methodConfig is an optional arugment which can be used to, for example, specify a separator when combining values.

combiningOrder is an optional argument which specifies the order in which columns are combined together.

stringAppend

This is the default column combining method for when two columns are mapped to the same field.

stringAppend will cast the contents of the specified columns as strings and concatenate them using the separator defined in the methodConfig. The default separator is an empty string. For example, consider the following dataframe:

col1col2
Apples1.99
Bananas1.89

Specifying col1 and col2 to map to the field1 in the updateFields command will result in the following dataframe:

field1
Apples1.99
Bananas1.89

Specifying combiningOrder with ["col2", "col1"] and the methodConfig separator as "_" would result in

field1
1.99_Apples
1.89_Bananas

JSON

Columns can be combined into a JSON object. The same dataframe above with the json combining method would be:

field1
{ "col1: "Apples", "col2": 1.99 }
{ "col1": "Bananas", "col2": 1.89 }
info

All parameters (method, methodConfig, combiningOrder) are optional and if unspecified the default values (retrieved from getColumnCombiningMethods()) will be used.



Data Types

The Data Types API defines the current data type of each field.

getDataTypes

const dataTypesResponse = segna.getDataTypes(jobId);

Example Response:

{
"fieldDataTypes": {
"field 1": {
"dataType": "rich_text",
"confidence": 0.6
},
"field 2": {
"dataType": "number",
"confidence": 0.8
}
},
"expectedFieldDataTypes": {
"field 1": "number",
"field 2": "rich_text"
}
}

updateDataTypes

segna.updateDataTypes(jobId, {
fieldDataTypes: {
"field 1": {
dataType: DATATYPE.CATEGORY,
},
"field 2": {
dataType: DATATYPE.DATE_TIME,
}
}
});

Valid data types are:

  • DATATYPE.NUMBER or "number"
  • DATATYPE.RICH_TEXT or "rich_text"
  • DATATYPE.CATEGORY or "category"
  • DATATYPE.DATE_TIME or "date_time"


Unit Conversion

The Unit Conversion API defines the desired units for each field. For each field, both from and to units need to be specified to perform the conversion.

Note: Unit Conversion is not set by default.

getUnits

const unitsResponse = segna.getUnits(jobId);

Example Response:

{
"fieldUnitConversions": {
"field 1": {
"columnName": "column 1",
"to": null,
"from": null,
"confidence": 1.0
},
"field 2": {
"columnName": "column 2",
"to": null,
"from": null,
"confidence": 1.0
},
},
"expectedFieldUnits": {
"field1 1": null,
"field 2": null,
}
}

updateUnits

segna.updateUnits(jobId, {
fieldUnitConversions: {
"field 1": {
to: "UTC_p6",
from: "UTC_m4"
},
"field 2": {
to: null,
from: null
}
}
});

The units API supports converting between different timezones by specifying UTC deltas. For example, a unit string of "UTC_p6" means +6 hours relative to UTC time. "UTC_m6" is -6 hours relative to UTC time.



Impute Methods

The Impute Methods API defines the imputation method for each field. An imputation method can be specified to fill in any missing values.

Note: Impute Methods is not set by default.

getImputeMethods

const imputeMethodsResponse = segna.getImputeMethods(jobId);

Example Response:

{
"imputeMethods": {
"field 1": {
"imputeMethod": null,
"confidence": 1.0
},
"field 2": {
"imputeMethod": null,
"confidence": 1.0
},
},
"expectedImputeMethods": {
"field 1": null,
"field 2": null,
}
}

updateImputeMethods

segna.updateImputeMethods(jobId, {
imputeMethods: {
"field 1": {
imputeMethod: "fill",
imputeMethodConfig: {
fillValue: "No Answer"
}
},
"field 2": {
imputeMethod: null
}
}
});

Valid impute methods are:

  • fill
  • dropMissing
  • null

fill

The fill impute method replaces missing values in the given field with the value specified in the fillValue argument of the imputeMethodConfig.

dropMissing

The dropMissing impute method drops rows where missing values occur in the given field.

null

No changes are made to the data for this field.



Pivot Methods

The Pivot API defines the pivot methods, as well as the pivot, pivot value and index columns.

  • pivotColumns defines the pivot column whose values will be the column headers once the data has been pivoted
  • valueColumns defines columns to used to populate the values of the pivoted data
  • indexColumns defines the columns that will make a unique row index once the data has been pivoted

aggregationMethod handles how duplicate values in the indexColumns are handled. Currently the only supported value of aggregationMethod is "first" which takes the first row where duplicate values are present.

Note: Pivot Methods is not set by default.

getPivot

curl -X GET https://backend.segna.io/public/client-side/v1/pivot/{jobId} \
-H 'x-api-key: YOUR_API_KEY'

Example Response:

{
"pivot": {
"pivotColumns": {"columnNames": ["col to pivot"], "confidence": 0},
"valueColumns": {"columnNames": ["pivot value column"], "confidence": 0},
"indexColumns": {"columnNames": ["col2", "col3"], "aggregationMethod": "first", "confidence": 0}
}
}

updatePivot

curl -X POST https://backend.segna.io/public/client-side/v1/pivot/{jobId} \
-H 'x-api-key: YOUR_API_KEY' \
-d '{
"pivot": {
"pivotColumns": {
"columnNames": ["col to pivot"]
},
"valueColumns": {
"columnNames": ["pivot value column"]
},
"indexColumns": {
"columnNames": ["col2", "col3"],
"aggregationMethod": "first"
},
}
}'


Row Dropping Methods

The Row Dropping Methods API can be used to specify constraints on the values of a field. By specifying fields you may drop rows for which there are:

  • Duplicate values
  • Missing values

getRowDroppingMethods

curl -X GET https://backend.segna.io/public/client-side/v1/row-dropping-methods/{jobId} \
-H 'x-api-key: YOUR_API_KEY'

Example Response:

{
"presetRowDroppingMethods": [
{
"method": "duplicateValues",
"methodConfig": {
"fields": ["field1"]
}
},
{
"method": "missingValues",
"methodConfig": {
"fields": ["field5", "field2"]
}
}
],
"additionalRowDroppingMethods": [
{
"method": "missingValues",
"methodConfig": {
"fields": ["field4", "field3"]
},
"confidence": 1
},
]
}

updateRowDroppingMethods

curl -X POST https://backend.segna.io/public/client-side/v1/row-dropping-methods/{jobId} \
-H 'x-api-key: YOUR_API_KEY' \
-d '{
"additionalRowDroppingMethods": [
{
"method": "missingValues",
"methodConfig": {
"fields": ["field7", "field8"]
}
},
{
"method": "duplicateValues",
"methodConfig": {
"fields": ["field9"]
}
},
]
}'

Valid row dropping methods are:

  • missingValues
  • duplicateValues

missingValues

This row dropping method will drop rows where missing values occur in the set of fields specified in the methodConfig. A potential use of this could be to enforce a NOT NULL database constraint.

duplicateValues

This row dropping method will drop rows where duplicate values occur in the set of fields specified in the methodConfig. A potential use of this could be to enforce a PRIMARY KEY or UNIQUE database constraint.



Text Processing Methods

Text processing methods can be used to perform string operations on columns containing text. Two types of operations are possible: Enforcing the case of text and regular expression based text replacement.

getTextProcessingMethods

curl -X GET https://backend.segna.io/public/client-side/v1/text-processing-methods/{jobId} \
-H 'x-api-key: YOUR_API_KEY'

Example Response:

{
"textProcessingMethods": {
"regexReplacement": {
"field1": {
"^[M|m]$": "Male",
"^[F|f]$": "Female",
},
"field2": {
"^[N.?A|n.?a|.issing]$": "",
}
},
"casing": {
"field3": "upper",
"field4": "lower",
"field5": "capitalize",
}
}
}

updateTextProcessingMethods

curl -X POST https://backend.segna.io/public/client-side/v1/text-processing-methods/{jobId} \
-H 'x-api-key: YOUR_API_KEY' \
-d '{
"textProcessingMethods": {
"regexReplacement": {
"field1": {
"^[M|m]$": "male",
}
},
"casing": {
"field3": "title",
"field4": "swap",
}
}
}'

Two methods of text processing are available:

  • regexReplacement
  • casing

regexReplacement

For a given field, specify a regular expression to be matched and replaced with desired text. For example, the below example will replace values in field1 beginning and ending with "M" or "m" with "Male".

{
"textProcessingMethods": {
"regexReplacement": {"field1": {"^[M|m]$": "Male"}}}
}

casing

The casing method converts the case of a field. Supported values for case are:

  • "upper": UPPER CASE
  • "lower": lower case
  • "title": Title Case (first letter of every word is capitalised)
  • "swap": swaps the case between upper and lower depending on what was in the data


Scripts

After adding scripts to a job on the start job configuration, they can be viewed at any time:

getScripts

const job_scripts = segna.getScripts(jobId);

Example Response:

{
"preColumnMapping": ["scriptId1", "scriptId2"],
"postClean": ["scriptid3"]
}

See the Scripts section for more information on using scripts in your job.