Kubeflow Pipelines (KFP) v1-v2 버전간 차이 정리 (Breaking Changes)

Jaemun Jung
3 min readMar 11, 2024

--

2023년 하반기에 Kubeflow Pipelines SDK V2가 릴리즈되었고, https://www.kubeflow.org/docs/components/pipelines/v2/migration 상 볼 수 있듯이, SDK v1과 v2간 변화가 꽤 크다.

v2에서는 JSON을 더 이상 intermediate representation 형태로 지원하지 않고 YAML만 지원한다. namespace도 변화가 있다.

SDK v1과 v2간 compatible하지 않은 내용 중 큰 내용은 대부분 Kuebeflow 공식문서의 해당 페이지(https://www.kubeflow.org/docs/components/pipelines/v2/migration/)에 정리되어 있다. 다만 code change의 경우 일부만 추가되어있고, 이중에 필수적인 주요 메소드에 대한 변화도 있어, 해당 내용을 위 문서에 업데이트 하고자 한다.

Kubeflow 공식문서는 아래 내용 중에서 확실한 일부 내용만 업데이트할 것 같고, 또 업데이트까지는 작성에도 리뷰에도 꽤 시간이 걸릴 수 밖에 없으니, 그 전에 마구 메모해둔 내용을 그대로 먼저 여기 붙여둔다.

(팀동료분과 KFP v2 tutorial(https://github.com/kubeflow/pipelines/tree/master/samples)을 업데이트하면서 마구 정리해놓은 내용이다.)

.set_gpu_limit deprecated

feat(sdk): deprecate .set_gpu_limit in favor of .set_accelerator_limit #8836 https://github.com/kubeflow/pipelines/pull/8836

.add_node_selector_constraint deprecated

use .set_accelerator_type instead

ContainerOp deprecated

https://www.kubeflow.org/docs/components/pipelines/v2/migration/#containerop-support

ContainerOp is deprecated. use the @dsl.container_component decorator.
ContainerSpec only accepts three arguments:

e.g.

image=’gcr.io/flip-image’
command=[‘flip’],
argument

The arguments ContainerOp used to handle need to be handled differently.

https://kubeflow-pipelines.readthedocs.io/en/1.8.13/source/kfp.dsl.html?highlight=file_outputs#kfp.dsl.ContainerOp

classkfp.dsl.ContainerOp(name: str, image: str, command: Union[str, List[str], None] = None, arguments: Union[str, int, float, bool, kfp.dsl._pipeline_param.PipelineParam, List[T], None] = None, init_containers: Optional[List[kfp.dsl._container_op.UserContainer]] = None, sidecars: Optional[List[kfp.dsl._container_op.Sidecar]] = None, container_kwargs: Optional[Dict[KT, VT]] = None, artifact_argument_paths: Optional[List[kfp.dsl._container_op.InputArgumentPath]] = None, file_outputs: Optional[Dict[str, str]] = None, output_artifact_paths: Optional[Dict[str, str]] = None, is_exit_handler: bool = False, pvolumes: Optional[Dict[str, kubernetes.client.models.v1_volume.V1Volume]] = None)

  1. file_outputs: add an output parameter to the function using a dsl.OutputPath(str) annotation: https://www.kubeflow.org/docs/components/pipelines/v2/components/container-components/#create-component-outputs

caching_strategy deprecated?

https://kubeflow-pipelines.readthedocs.io/en/1.8.13/source/kfp.components.structures.html?highlight=max_cache_staleness#kfp.components.structures.CachingStrategySpec

Platform-specific Features

kfp.gcp deprecated (kfp.gcp, kfp.aws, etc)

you may encounter:

ModuleNotFoundError: No module named ‘kfp.gcp’

\use kfp.derecated.gcp if it’s really needed. but at leaset it shouldn’t be the best practice that is added on the sample cases.

.apply (for v1) deprecated

e.g. v100_op.apply(gcp.use_preemptible_nodepool(hard_constraint=True))
https://googlecloudplatform.github.io/kubeflow-gke-docs/docs/pipelines/preemptible/#2-schedule-your-pipeline-to-run-on-the-preemptible-vms

[sdk] How to apply PVCs to pipeline tasks when using KFP SDK v2

#8596 https://github.com/kubeflow/pipelines/issues/8596#issuecomment-1367640471

PVC is not supported yet in SDK v2. It’s on our roadmap to be supported once platform-specific runtime support is ready

https://docs.google.com/document/d/10Cx-B18V6gR35VOmTe8_8gB67srOFF_un7NN8qXP1T4/edit#heading=h.x9snb54sjlu9

Platform specific runtime

https://www.kubeflow.org/docs/components/pipelines/v2/platform-specific-features/
Currently the only KFP SDK platform-specific plugin library is kfp-kubernetes, which is supported by the Kubeflow Pipelines open source backend and enables direct access to some Kubernetes resources and functionality.

Kubernetes related ops
ResourceOp, VolumeOp, VolumeSnapshotOp, PiplelineVolume are removed. To manipulate Kubernetes resources, use the kfp-kubernetes plugin library instead.

kfp.dsl.PipelineConf removed

Any pipeline level settings like add_op_transformer, set_dns_config, set_image_pull_policy, set_image_pull_secrets, set_parallelism, etc are not available in v2. Getting pipeline configuration in runtime with kfp.dsl.get_pipeline_conf is not available as well.

kfp.dsl.graph_component removed

Recursive calls with the @graph_component decorator are not available in v2.

kfp.dsl.SubGraph removed

kfp.dsl.RUN_ID_PLACEHOLDER, kfp.dsl.EXECUTION_ID_PLACEHOLDER removed

replaced with other placeholders:

PIPELINE_JOB_NAME_PLACEHOLDER
PIPELINE_JOB_RESOURCE_NAME_PLACEHOLDER
PIPELINE_JOB_ID_PLACEHOLDER
PIPELINE_TASK_NAME_PLACEHOLDER
PIPELINE_TASK_ID_PLACEHOLDER
PIPELINE_ROOT_PLACEHOLDER
PIPELINE_JOB_CREATE_TIME_UTC_PLACEHOLDER
PIPELINE_JOB_SCHEDULE_TIME_UTC_PLACEHOLDER

`create_component_from_func`

v1-v2 stale issues

Vertex AI Pipelines compatibility issues

  • List of dictionaries with ParallelFor is not supported in Vertex AI Pipeline
  • To use list of dictionaries as input for Vertex AI Pipeline, you should stringify each dictionary with json.dumps()
  • Related issue: https://github.com/kubeflow/pipelines/issues/9366
  • Compiled yaml file from pipeline with kfp-kubernetes plugin library failed to pass validation
  • Sample error message from Cloud Console: Cannot read properties of null (reading ‘pipelineInfo’)

--

--

No responses yet