Kubeflow Pipelines (KFP) v1-v2 버전간 차이 정리 (Breaking Changes)
2023년 하반기에 Kubeflow Pipelines SDK V2가 릴리즈되었고, https://www.kubeflow.org/docs/components/pipelines/v2/migration 상 볼 수 있듯이, SDK v1과 v2간 변화가 꽤 크다.
v2에서는 JSON을 더 이상 intermediate representation 형태로 지원하지 않고 YAML만 지원한다. namespace도 변화가 있다.
SDK v1과 v2간 compatible하지 않은 내용 중 큰 내용은 대부분 Kuebeflow 공식문서의 해당 페이지(https://www.kubeflow.org/docs/components/pipelines/v2/migration/)에 정리되어 있다. 다만 code change의 경우 일부만 추가되어있고, 이중에 필수적인 주요 메소드에 대한 변화도 있어, 해당 내용을 위 문서에 업데이트 하고자 한다.
Kubeflow 공식문서는 아래 내용 중에서 확실한 일부 내용만 업데이트할 것 같고, 또 업데이트까지는 작성에도 리뷰에도 꽤 시간이 걸릴 수 밖에 없으니, 그 전에 마구 메모해둔 내용을 그대로 먼저 여기 붙여둔다.
(팀동료분과 KFP v2 tutorial(https://github.com/kubeflow/pipelines/tree/master/samples)을 업데이트하면서 마구 정리해놓은 내용이다.)
.set_gpu_limit deprecated
feat(sdk): deprecate .set_gpu_limit in favor of .set_accelerator_limit #8836 https://github.com/kubeflow/pipelines/pull/8836
.add_node_selector_constraint deprecated
use .set_accelerator_type instead
ContainerOp deprecated
https://www.kubeflow.org/docs/components/pipelines/v2/migration/#containerop-support
ContainerOp is deprecated. use the @dsl.container_component decorator.
ContainerSpec only accepts three arguments:
e.g.
image=’gcr.io/flip-image’
command=[‘flip’],
argument
The arguments ContainerOp used to handle need to be handled differently.
classkfp.dsl.ContainerOp(name: str, image: str, command: Union[str, List[str], None] = None, arguments: Union[str, int, float, bool, kfp.dsl._pipeline_param.PipelineParam, List[T], None] = None, init_containers: Optional[List[kfp.dsl._container_op.UserContainer]] = None, sidecars: Optional[List[kfp.dsl._container_op.Sidecar]] = None, container_kwargs: Optional[Dict[KT, VT]] = None, artifact_argument_paths: Optional[List[kfp.dsl._container_op.InputArgumentPath]] = None, file_outputs: Optional[Dict[str, str]] = None, output_artifact_paths: Optional[Dict[str, str]] = None, is_exit_handler: bool = False, pvolumes: Optional[Dict[str, kubernetes.client.models.v1_volume.V1Volume]] = None)
- file_outputs: add an output parameter to the function using a dsl.OutputPath(str) annotation: https://www.kubeflow.org/docs/components/pipelines/v2/components/container-components/#create-component-outputs
caching_strategy deprecated?
Platform-specific Features
kfp.gcp deprecated (kfp.gcp, kfp.aws, etc)
- https://github.com/kubeflow/pipelines/blame/master/sdk/python/kfp/deprecated/gcp.py
- https://github.com/kubeflow/pipelines/pull/7291/files
you may encounter:
ModuleNotFoundError: No module named ‘kfp.gcp’
\use kfp.derecated.gcp if it’s really needed. but at leaset it shouldn’t be the best practice that is added on the sample cases.
.apply (for v1) deprecated
e.g. v100_op.apply(gcp.use_preemptible_nodepool(hard_constraint=True))
https://googlecloudplatform.github.io/kubeflow-gke-docs/docs/pipelines/preemptible/#2-schedule-your-pipeline-to-run-on-the-preemptible-vms
[sdk] How to apply PVCs to pipeline tasks when using KFP SDK v2
#8596 https://github.com/kubeflow/pipelines/issues/8596#issuecomment-1367640471
PVC is not supported yet in SDK v2. It’s on our roadmap to be supported once platform-specific runtime support is ready
Platform specific runtime
https://www.kubeflow.org/docs/components/pipelines/v2/platform-specific-features/
Currently the only KFP SDK platform-specific plugin library is kfp-kubernetes, which is supported by the Kubeflow Pipelines open source backend and enables direct access to some Kubernetes resources and functionality.
Kubernetes related ops
ResourceOp, VolumeOp, VolumeSnapshotOp, PiplelineVolume are removed. To manipulate Kubernetes resources, use the kfp-kubernetes plugin library instead.
kfp.dsl.PipelineConf removed
Any pipeline level settings like add_op_transformer, set_dns_config, set_image_pull_policy, set_image_pull_secrets, set_parallelism, etc are not available in v2. Getting pipeline configuration in runtime with kfp.dsl.get_pipeline_conf is not available as well.
kfp.dsl.graph_component removed
Recursive calls with the @graph_component decorator are not available in v2.
kfp.dsl.SubGraph removed
kfp.dsl.RUN_ID_PLACEHOLDER, kfp.dsl.EXECUTION_ID_PLACEHOLDER removed
replaced with other placeholders:
PIPELINE_JOB_NAME_PLACEHOLDER
PIPELINE_JOB_RESOURCE_NAME_PLACEHOLDER
PIPELINE_JOB_ID_PLACEHOLDER
PIPELINE_TASK_NAME_PLACEHOLDER
PIPELINE_TASK_ID_PLACEHOLDER
PIPELINE_ROOT_PLACEHOLDER
PIPELINE_JOB_CREATE_TIME_UTC_PLACEHOLDER
PIPELINE_JOB_SCHEDULE_TIME_UTC_PLACEHOLDER
`create_component_from_func`
v1-v2 stale issues
- [sdk] Cannot pass a GCSPath output as an input to a python function component yaml spec
- convert string to GCSPath object or create one #4710
- [backend] OutputPath is giving “permission denied” — why? #7629
- PipelineParameterChannel could be not supported as input parameter
- Some methods like set_memory_limit, set_memory_request, set_cpu_limit, set_cpu_request, set_accelerator_type expect str as input. If you try to use outputs of previous op like prev_op.outputs[‘output_name’], you will encounter type error exceptions. Here is how the exception looks like, TypeError: expected string or bytes-like object, got ‘PipelineParameterChannel’
Vertex AI Pipelines compatibility issues
- List of dictionaries with ParallelFor is not supported in Vertex AI Pipeline
- To use list of dictionaries as input for Vertex AI Pipeline, you should stringify each dictionary with json.dumps()
- Related issue: https://github.com/kubeflow/pipelines/issues/9366
- Compiled yaml file from pipeline with kfp-kubernetes plugin library failed to pass validation
- Sample error message from Cloud Console: Cannot read properties of null (reading ‘pipelineInfo’)