bulk-cdk-core-load/io.airbyte.cdk.load.pipeline

Package-level declarations

Types

interface BatchAccumulator<S, K : WithStream, T, U>

BatchAccumulator is used internally by the CDK to implement io.airbyte.cdk.load.write.LoadStrategys. Connector devs should never need to implement this interface.

BatchAccumulatorResult

sealed interface BatchAccumulatorResult<S, U>

BatchEndOfStream

data class BatchEndOfStream(val stream: DestinationStream.Descriptor, val taskName: String, val part: Int, val totalInputCount: Long) : BatchUpdate

BatchStateUpdate

data class BatchStateUpdate(val stream: DestinationStream.Descriptor, val checkpointCounts: Map<CheckpointId, CheckpointValue>, val state: BatchState, val taskName: String, val part: Int, val inputCount: Long = 0) : BatchUpdate

BatchUpdate

sealed interface BatchUpdate

Used internally by the CDK to track record ranges to ack.

ByPrimaryKeyInputPartitioner

class ByPrimaryKeyInputPartitioner : InputPartitioner

ByStreamInputPartitioner

@Singleton

@Secondary

class ByStreamInputPartitioner : InputPartitioner

The default input partitioner, which partitions by the stream name. TODO: Should be round-robin?

DefaultPipelineFlushStrategy

@Singleton

class DefaultPipelineFlushStrategy(@Value(value = "${airbyte.destination.core.record-batch-size-override:null}") microBatchOverride: Long? = null, config: DestinationConfiguration) : PipelineFlushStrategy

This composes the two built-in flush strategies

DirectLoadAccResult

data class DirectLoadAccResult(val state: BatchState) : WithBatchState

DirectLoadPipeline

@Singleton

@Requires(bean = DirectLoaderFactory::class)

class DirectLoadPipeline(val pipelineStep: DirectLoadPipelineStep<*>) : LoadPipeline

Used internally by the CDK to implement the DirectLoader.

DirectLoadPipelineStep

@Singleton

@Requires(bean = DirectLoaderFactory::class)

class DirectLoadPipelineStep<S : DirectLoader>(val directLoaderFactory: DirectLoaderFactory<S>, val accumulator: DirectLoadRecordAccumulator<S, StreamKey>, val taskFactory: LoadPipelineStepTaskFactory, @Named(value = "numInputPartitions") numInputPartitions: Int) : LoadPipelineStep

DirectLoadRecordAccumulator

@Singleton

@Requires(bean = DirectLoaderFactory::class)

class DirectLoadRecordAccumulator<S : DirectLoader, K : WithStream>(val directLoaderFactory: DirectLoaderFactory<S>) : BatchAccumulator<S, K, DestinationRecordRaw, DirectLoadAccResult>

Used internally by the CDK to wrap the client-provided DirectLoader in a generic BatchAccumulator, so that it can be used as a pipeline step. At this stage, the loader's public interface is mapped to the internal interface, hiding internal mechanics.

FinalOutput

data class FinalOutput<S, U>(val output: U) : BatchAccumulatorResult<S, U>

InputPartitioner

interface InputPartitioner

A dev interface for expressing how incoming data is partitioned. By default, data will be partitioned by a hash of the stream name and namespace.

IntermediateOutput

data class IntermediateOutput<S, U>(val nextState: S, val output: U) : BatchAccumulatorResult<S, U>

LoadPipeline

abstract class LoadPipeline(steps: List<LoadPipelineStep>)

Used internally by the pipeline to assemble a launcher for any loader's pipeline. CDK devs can use this to implement new flavors of interface. Connector devs should generally avoid using this.

LoadPipelineStep

interface LoadPipelineStep

NoOutput

data class NoOutput<S, U>(val nextState: S) : BatchAccumulatorResult<S, U>

OutputPartitioner

interface OutputPartitioner<K1 : WithStream, T, K2 : WithStream, U>

Used internally by the CDK to determine how to partition data passed between steps. The dev should not implement this directly, but via specialized child classes provided for each loader type.

PipelineFlushStrategy

interface PipelineFlushStrategy

RandomInputPartitioner

class RandomInputPartitioner : InputPartitioner

RoundRobinInputPartitioner

open class RoundRobinInputPartitioner(rotateEveryNRecords: Int) : InputPartitioner

Declare a singleton of this type to have input distributed evenly across the input partitions. (The default is to ByStreamInputPartitioner.)