Orchestration¶
Overview¶
The Orchestration is the workflow orchestration layer embedded in Basement to modenrize the layer extract data from the financial markets.
It is responsible for:
-
Scheduling data pipelines
-
Monitoring and providing UI: state of task, result of flow, etc to interacting with data pipelines
-
REST API supported in place
-
Mordernize and isolated and system and infrastructure
Table of Content:
SAD - System Architechture Design¶
Logic View¶
flowchart LR
subgraph gcp
subgraph prefect
ui
api
end
end
subgraph workpool
end
subgraph agent
server
end
%% Flow
prefect -- interactive --> workpool --> server
https://discourse.prefect.io/t/what-are-the-components-of-prefect-2-0-architecture/909
flowchart LR
subgraph Artifact-Registry
image
end
subgraph Basement
file([prefect.yaml])
end
subgraph Prefect_Server
Deployment --Assign to --> wp[Work-Pool]
Deployment([Deployment]) --Store in--> db[(Database)]
end
subgraph Execution_Environment
Worker --Create--> Flow_Run_Infra1
Worker --Create--> Flow_Run_Infra2
subgraph Flow_Run_Infra1
direction TB
subgraph tr1[Task_Runner]
end
fr1((Flow Run))
fr1 --Submit Task--> tr1
end
subgraph Flow_Run_Infra2
direction TB
subgraph tr2[Task_Runner]
end
fr2((Flow Run))
fr2 --Submit Task--> tr2
end
end
Basement --Build and Push image--> Artifact-Registry
Worker --Polling--> wp
Artifact-Registry --Pull image--> Worker
Basement --Create--> Deployment
Physical View¶
flowchart LR
subgraph gcp[Google Cloud Platform]
subgraph Artifact-Registry[Artifact Registry]
end
subgraph Cloudbuild
end
subgraph Prefect_Server[Compute Engine]
wp[Docker Work Pool]
db[(PostgreSQL)]
end
end
subgraph Execution_Environment[Onpremies Server]
subgraph Docker
end
end
Cloudbuild --Build and Push image--> Artifact-Registry
Execution_Environment --Polling--> Prefect_Server
Artifact-Registry --Pull image--> Execution_Environment
Cloudbuild --Push Flow Deployment--> Prefect_Server
The mechanism of Prefect can be summarized with the following diagram:
flowchart LR
subgraph Artifact-Registry
image
end
subgraph Client
file([prefect.yaml])
end
subgraph Prefect_Server
Deployment --Assign to--> wp[Work-Pool]
Deployment([Deployment]) --Store in--> db[(Postgres)]
end
subgraph Execution_Environment
Worker --Create--> Flow_Run_Infra1
Worker --Create--> Flow_Run_Infra2
subgraph Flow_Run_Infra1
direction TB
subgraph tr1[Task_Runner]
end
fr1((Flow Run))
fr1 --Submit Task--> tr1
end
subgraph Flow_Run_Infra2
direction TB
subgraph tr2[Task_Runner]
end
fr2((Flow Run))
fr2 --Submit Task--> tr2
end
end
Client --Build and Push image--> Artifact-Registry
Worker --Polling--> wp
Artifact-Registry --Pull image--> Worker
Client --Create--> Deployment
In this diagram, we have 4 component:
- Client: This component contain all flow,
Dockerfile
and aprefect.yaml
file which defineDeployment
object fromflow
, we will say more about this file in this section (There are two approaches to deploy a flow to Prefect Server, I just call them to be static infrastructure and dynamical infrastructure, you can read more in here. But in this project, we use the second, which is more efficient allocation of resources than other one. And with this approaches, we have two deployment creation options, read more at here)
This component contain all flow, Dockerfile
and a prefect.yaml
file which define Deployment
object from flow
(There are two approaches to deploy a flow to Prefect Server, I just call them to be static infrastructure and dynamical infrastructure, you can read more in here. But in this project, we use the second, which is more efficient allocation of resources than other one. And with this approaches, we have two deployment creation options, read more at here)
- Prefect_Server: This component contain all metadata in a database (PostgreSQL or SQLite) about
flow
,deployment
, etc. Additional, it containwork pool
, it is like a bridge between Prefect_Server and Execution_Environment, include pull work pools (available in both comunity and Cloud Prefect) and push work pools (this is only available in Cloud Prefect), read more at here
This component contain all metadata in a database (PostgreSQL or SQLite) about flow
, deployment
, etc. Additional, it contain work pool
, it is like a bridge between Prefect_Server and Execution_Environment, include pull work pools (available in both comunity and Cloud Prefect) and push work pools (this is only available in Cloud Prefect), read more at here
- Execution_Environment: This component include
worker
. It will poll withwork pool
, which it belong to, for new runs to execute (Because we are using community version (pull work pool) so this component is requied). When having a flow triggered,worker
will create a infrastructure for it and execute a flow run in there. Another part istask_runner
, which be chosen when define a flow, eachtask
inflow
will be submited totask_runner
. Dependent ontask_runner
type, running behavior for task will be different.
This component include worker
. It will poll with work pool
, which it belong to, for new runs to execute (Because we are using community version (pull work pool) so this component is requied). When having a flow triggered, worker
will create a infrastructure for it and execute a flow run in there. Another part is task_runner
, which be chosen when define a flow, each task
in flow
will be submited to task_runner
. Dependent on task_runner
type, running behavior for task will be different.
- Artifact-Registry: This component contain image. It can be replaced by other registry (docker hub, etc).
Deployment view¶
flowchart LR
cg[Compute Engine]
on-prem[Onpremies Server]
subgraph gb["Cloud Build"]
end
gb --> |Deploy Prefect Server | cg
gb --> |Deploy Docker Worker| on-prem
cg --> on-prem
Service Account and Permissions¶
SA name | Use for | Role | Description |
---|---|---|---|
sa-morphling | Run cloudbuild for deploy server. worker and flow | roles/storage.admin [inno-cbuild-staging] | Write log on cloud build |
roles/iam.serviceAccountUser | Access VMs with this service account | ||
roles/logging.logWriter | Write log on cloud build | ||
roles/compute.osLogin | Access to log in to a Compute Engine instance as a standard user | ||
roles/artifactregistry.writer | Access to read and write repository items. | ||
roles/cloudbuild.builds.editor | Trigger a cloud build | ||
roles/cloudbuild.builds.builder | Run trigger the Cloud Build job | ||
sa-arc-warden | Run docker worker on premise | roles/artifactregistry.reader │ View and get artifacts, view repository metadata |
-
For
Prefect
oschestration -
Register the development of
https://prefect.dev.data.innotech.vn
overPREFECT_URL=http://34.124.143.40:4200
with IAP control -
Add production version with different service accounts
-
Cloud build: sa-morphling -> new-sa
-
Compute engine: sa-witch-doctor -> new-sa
-
Server 2: sa-arc-warden -> new-sa
-
-
Manage compute engine(prefect server) in teraform
-
Seceret: access_token is explode in the Git
-
Set up reboot and time sync for worker server.
-
Documentation: SAD, physical component
-
Transition the format handler component
-
How to build this? asia-southeast1-docker.pkg.dev/storm-spirit/inno-artifact-registry/test_prefect
-
Flow into the master global ~> supported not duplicated work
-
-
Token handle the private package
-
GitHub Actions for test
-
Add the GitHub Token for Deer runner
-
Add the GITHUB_TOKEN for mount in the dockerfile
-
-
The component of Prefect server:
-
-
Change to using secret from Secret Manager directly over call directly
-
Use full name of declaration on engine
-
Change network namespace (not use the name of service account)
-
Authentication on GCP VM? Compute authentication workload
-
-
Change variable of the component go out prefect-staging
-
Change the default set of runner
-
Standard for configuration the handler
-
Centralized the IP process in one place
Appendix¶
Appendix A: Record of Changes¶
Table: Record of changes
Version | Date | Author | Description of Change |
---|---|---|---|
0.0.1 | 05/18/2024 | Bao Truong | Initation documentation |
Source Reference¶
-
[1] Prefect
-
[2] Overview of Prefect Cloud https://discourse.prefect.io/t/what-are-the-components-of-prefect-2-0-architecture/909