Data Ops

Builds Workflow and Automate Job Execution for Data Processing Operations

Data Ops is an Apache Airflow-based workflow orchestration service that creates data processing workflow and automates job scheduling. It can be used independently in a Kubernetes Engine cluster environment of Samsung Cloud Platform or along with other application SW.




Service Architecture

  • User → Request/Distribute Products → Data Ops Image & Chart Repository
    • Data Ops Image & Chart Repository
      Manager/ Manager Client
      Airflow/ Web Server , Scheduler, Executor
    • Data Ops Image & Chart Repository → >Kubernetes Engine
      Kubernetes Engine
      Container/ Container/ Container
  • Data Engineer → Process Data → Data Ops Image & Chart Repository

Key Features

  • Easy installation

    - Install open source Airflow in a container environment

  • GUI-based easy management

    - Easily manage Airflow settings in a container environment
    - Distribute Airflow plug-in
    - Status monitoring of Airflow services

  • Write/Schedule workflow

    - Easy scalability thanks to python-based workflow
    - Task performed automatically by a scheduler
    - Manage resource by DAG task
    - Reprocess data processing error/failure

  • Airflow components

    - Web server : Support visualization of DAG components and status, manage Airflow configurations
    - Scheduler : Orchestrate various DAG and tasks, support DAG reservation/execution
    - Executor : Provide KubernetesExecutor, a Kubernetes-based dynamic executor
    - Metadata DB : Store metadata on DAG, execution and user, role and connection, and other Airflow components


    • Billing
    • The amount of CPU time used by the Kubernetes Engine pod of Data Ops
Let’s talk

Whether you’re looking for a specific business solution or just need some questions answered, we’re here to help