Skip to content

Orchestration

Overview

The Orchestration is the workflow orchestration layer embedded in Basement to modenrize the layer extract data from the financial markets.

It is responsible for:

  • Scheduling data pipelines

  • Monitoring and providing UI: state of task, result of flow, etc to interacting with data pipelines

  • REST API supported in place

  • Mordernize and isolated and system and infrastructure


Table of Content:

SAD - System Architechture Design

Logic View

flowchart LR
  subgraph gcp
    subgraph prefect
      ui
      api
    end
  end

  subgraph workpool
  end

  subgraph agent
    server
  end

  %% Flow
  prefect -- interactive --> workpool --> server

https://discourse.prefect.io/t/what-are-the-components-of-prefect-2-0-architecture/909

flowchart LR
  subgraph Artifact-Registry
    image
  end
  subgraph Basement
    file([prefect.yaml])
  end
  subgraph Prefect_Server
      Deployment --Assign to --> wp[Work-Pool]
      Deployment([Deployment]) --Store in--> db[(Database)]
  end
  subgraph Execution_Environment
    Worker --Create--> Flow_Run_Infra1
    Worker --Create--> Flow_Run_Infra2
    subgraph Flow_Run_Infra1
        direction TB
        subgraph tr1[Task_Runner]

        end
        fr1((Flow Run))
        fr1 --Submit Task--> tr1
    end
    subgraph Flow_Run_Infra2
      direction TB
      subgraph tr2[Task_Runner]
      end
      fr2((Flow Run))
      fr2 --Submit Task--> tr2
    end
  end

  Basement --Build and Push image--> Artifact-Registry
  Worker --Polling--> wp
  Artifact-Registry --Pull image--> Worker
  Basement --Create--> Deployment

Physical View

flowchart LR
  subgraph gcp[Google Cloud Platform]
    subgraph Artifact-Registry[Artifact Registry]
    end
    subgraph Cloudbuild
    end
    subgraph Prefect_Server[Compute Engine]
      wp[Docker Work Pool]
      db[(PostgreSQL)]
    end
  end
  subgraph Execution_Environment[Onpremies Server]
      subgraph Docker
      end
  end

  Cloudbuild --Build and Push image--> Artifact-Registry
  Execution_Environment --Polling--> Prefect_Server
  Artifact-Registry --Pull image--> Execution_Environment
  Cloudbuild --Push Flow Deployment--> Prefect_Server

The mechanism of Prefect can be summarized with the following diagram:

flowchart LR
  subgraph Artifact-Registry
    image
  end
  subgraph Client
    file([prefect.yaml])
  end
  subgraph Prefect_Server
    Deployment --Assign to--> wp[Work-Pool]
    Deployment([Deployment]) --Store in--> db[(Postgres)]
  end
  subgraph Execution_Environment
    Worker --Create--> Flow_Run_Infra1
    Worker --Create--> Flow_Run_Infra2
    subgraph Flow_Run_Infra1
      direction TB
        subgraph tr1[Task_Runner]
      end

      fr1((Flow Run))
      fr1 --Submit Task--> tr1
    end
    subgraph Flow_Run_Infra2
      direction TB
      subgraph tr2[Task_Runner]
      end

      fr2((Flow Run))
      fr2 --Submit Task--> tr2
    end
  end

  Client --Build and Push image--> Artifact-Registry
  Worker --Polling--> wp
  Artifact-Registry --Pull image--> Worker
  Client --Create--> Deployment

In this diagram, we have 4 component:

  1. Client: This component contain all flow, Dockerfile and a prefect.yaml file which define Deployment object from flow, we will say more about this file in this section (There are two approaches to deploy a flow to Prefect Server, I just call them to be static infrastructure and dynamical infrastructure, you can read more in here. But in this project, we use the second, which is more efficient allocation of resources than other one. And with this approaches, we have two deployment creation options, read more at here)

This component contain all flow, Dockerfile and a prefect.yaml file which define Deployment object from flow (There are two approaches to deploy a flow to Prefect Server, I just call them to be static infrastructure and dynamical infrastructure, you can read more in here. But in this project, we use the second, which is more efficient allocation of resources than other one. And with this approaches, we have two deployment creation options, read more at here)

  1. Prefect_Server: This component contain all metadata in a database (PostgreSQL or SQLite) about flow, deployment, etc. Additional, it contain work pool, it is like a bridge between Prefect_Server and Execution_Environment, include pull work pools (available in both comunity and Cloud Prefect) and push work pools (this is only available in Cloud Prefect), read more at here

This component contain all metadata in a database (PostgreSQL or SQLite) about flow, deployment, etc. Additional, it contain work pool, it is like a bridge between Prefect_Server and Execution_Environment, include pull work pools (available in both comunity and Cloud Prefect) and push work pools (this is only available in Cloud Prefect), read more at here

  1. Execution_Environment: This component include worker. It will poll with work pool, which it belong to, for new runs to execute (Because we are using community version (pull work pool) so this component is requied). When having a flow triggered, worker will create a infrastructure for it and execute a flow run in there. Another part is task_runner, which be chosen when define a flow, each task in flow will be submited to task_runner. Dependent on task_runner type, running behavior for task will be different.

This component include worker. It will poll with work pool, which it belong to, for new runs to execute (Because we are using community version (pull work pool) so this component is requied). When having a flow triggered, worker will create a infrastructure for it and execute a flow run in there. Another part is task_runner, which be chosen when define a flow, each task in flow will be submited to task_runner. Dependent on task_runner type, running behavior for task will be different.

  1. Artifact-Registry: This component contain image. It can be replaced by other registry (docker hub, etc).

Deployment view

flowchart LR
  cg[Compute Engine]
  on-prem[Onpremies Server]

  subgraph gb["Cloud Build"]
  end

  gb --> |Deploy Prefect Server | cg
  gb --> |Deploy Docker Worker| on-prem
  cg --> on-prem

Service Account and Permissions

SA name Use for Role Description
sa-morphling Run cloudbuild for deploy server. worker and flow roles/storage.admin [inno-cbuild-staging] Write log on cloud build
roles/iam.serviceAccountUser Access VMs with this service account
roles/logging.logWriter Write log on cloud build
roles/compute.osLogin Access to log in to a Compute Engine instance as a standard user
roles/artifactregistry.writer Access to read and write repository items.
roles/cloudbuild.builds.editor Trigger a cloud build
roles/cloudbuild.builds.builder Run trigger the Cloud Build job
sa-arc-warden Run docker worker on premise roles/artifactregistry.reader │ View and get artifacts, view repository metadata
  • For Prefect oschestration

  • Register the development of https://prefect.dev.data.innotech.vn over PREFECT_URL=http://34.124.143.40:4200 with IAP control

  • Add production version with different service accounts

    • Cloud build: sa-morphling -> new-sa

    • Compute engine: sa-witch-doctor -> new-sa

    • Server 2: sa-arc-warden -> new-sa

  • Manage compute engine(prefect server) in teraform

  • Seceret: access_token is explode in the Git

  • Set up reboot and time sync for worker server.

  • Documentation: SAD, physical component

  • Transition the format handler component

    • How to build this? asia-southeast1-docker.pkg.dev/storm-spirit/inno-artifact-registry/test_prefect

    • Flow into the master global ~> supported not duplicated work

  • Token handle the private package

    • GitHub Actions for test

    • Add the GitHub Token for Deer runner

    • Add the GITHUB_TOKEN for mount in the dockerfile

  • The component of Prefect server:

  • For server

    • Change to using secret from Secret Manager directly over call directly

    • Use full name of declaration on engine

    • Change network namespace (not use the name of service account)

    • Authentication on GCP VM? Compute authentication workload

  • Change variable of the component go out prefect-staging

  • Change the default set of runner

  • Standard for configuration the handler

  • Centralized the IP process in one place

Appendix

Appendix A: Record of Changes

Table: Record of changes

Version Date Author Description of Change
0.0.1 05/18/2024 Bao Truong Initation documentation

Source Reference