Skip to main content

Pipeline

A Pipeline defines how your data will be wrangled when running a Job.

The things that can be specified on a pipeline are:

  • An output schema. This can be:

    • Pre-defined and fixed for every Job.
    • Undefined, and added dynamically per job.
    • A mix of both - some fields will be required for every job, but other columns can be added dynamically to the output data, per job.
  • Automatic mappings for the input data to the output schema. Two variations can be specified:

    • An exact mapping, for cases where you always expect certain input columns to map to a field in the output schema, and
    • A “smart” mapping, for cases where you want Segna to intelligently guess which input column maps to a field in the output schema.
  • The data types of each field in the output schema

    • For datetimes, the timezone and datetime format of the fields can be specified.

Beyond this, other operations can be configured through our APIs when running a job.

Once you have created a pipeline, you can track the jobs that run using that pipeline, track error rates, a summary of the data for each job, and the amount of data passed through that pipeline.