A Pipeline defines how your data will be wrangled when running a Job.
The things that can be specified on a pipeline are:
An output schema. This can be:
- Pre-defined and fixed for every Job.
- Undefined, and added dynamically per job.
- A mix of both - some fields will be required for every job, but other columns can be added dynamically to the output data, per job.
Automatic mappings for the input data to the output schema. Two variations can be specified:
- An exact mapping, for cases where you always expect certain input columns to map to a field in the output schema, and
- A “smart” mapping, for cases where you want Segna to intelligently guess which input column maps to a field in the output schema.
The data types of each field in the output schema
- For datetimes, the timezone and datetime format of the fields can be specified.
Beyond this, other operations can be configured through our APIs when running a job.
Once you have created a pipeline, you can track the jobs that run using that pipeline, track error rates, a summary of the data for each job, and the amount of data passed through that pipeline.