SAD¶
Overview¶
The Basement project is the first layer to handle element of database layer of market dataset from providers and transfer to into the data warehouse.
Table of Contents:
- SAD
- Overview
- SAD - System Architechture Design
- Physical View
- Component
- Design Requirement for Schedule
- Securities
- Appendix
- Source Reference
SAD - System Architechture Design¶
Tech Stack¶
The list of tech stack that used in the project:
Stack | Selection |
---|---|
Programming | Python, Bash |
with [ORM: sqlalchemy ], [Test framework: pytest ], httpx | |
ORM | sqlalchemy |
Database | MySQL, BigQuery |
Storage | Cloud Storage |
Message Bus | Pubsub |
Communication | Slack, Email |
Schedule | Crontab Linux |
CICD | GitHub Actions, CloudBuild |
Code Storage | GitHub |
Local development | Docker, Docker compose |
Oschestration | Crontab Linux, Prefect |
Network | OpenVPN |
Logical View¶
Thea project consistent with this flow
flowchart LR
s[Schedule] -- trigger --> pipeline[Pipelines]
pipeline -- pull --> provider[Provider]
provider -- push --> e[Endpoint] -- interactive --> d[Database]
provider -- push --> l[Log] -- store --> g[GCS]
provider -- push --> pubsub[PubSub] -- send metadata of job --> bq[BigQuery]
flowchart LR
subgraph Architecture
direction BT
subgraph Basement
direction LR
pipeline -- pull --> provider[Provider]
provider -- push --> e[Endpoint]
provider -- push --> l[Log]
provider -- push --> pubsub[PubSub]
end
subgraph Orchestration
direction LR
scheduler
end
scheduler --> pipeline
end
e[Endpoint] -- interactive --> id1[(Database)]
l[Log] -- store --> g[GCS]
pubsub[PubSub] -- send metadata of job --> bq[BigQuery]
Physical View¶
flowchart LR
onprem_vm[Onpremies Server]
subgraph gcp[Google Cloud Platform]
gcs[Cloud Stroage]
sql[Cloud SQL]
pubsub[Cloud PubSub]
bq[BigQuery]
gcs --> bq
sql --> bq
end
onprem_vm <--> gcp
Deployment¶
Deployment related to trigger when push on specific branch
flowchart LR
gitHub[GitHub] -- on push --> trigger[GitHub Actions] -- deploy --> onprem[Onprem server]
To connect or interactive with the server, using VPN is required
flowchart LR
subgraph gcp[Google Cloud Platform]
services
end
subgraph innotech[Innotech]
direction LR
vpn_server
subgraph server_component[Servers Component]
s2[172.16.5.2]
s4[172.16.5.4]
end
end
gcp -- access --> vpn_server -- verify/reject --> server_component
Physical Server¶
The following
Type | Server |
---|---|
Onpremies Server | 172.5.5.2 and 172.5.5.4 |
Prefect Server | 34.124.143.40 on port 4200 |
Secret Management¶
Map between Google Secret Manager ↔ GitHub Action
For Github Action, required admin privilegde to change the secret
Component¶
Configuration for Cloud¶
Set up - Service Account Permision¶
[1] Publish message
- To get the permissions that you need to publish messages to a topic, ask your administrator to grant you the Pub/Sub Publisher (roles/pubsub.publisher) IAM role on topic. For more information about granting roles, see Manage access.
[1] Database
[2] Authentication for machine of runner
[3] Sudo permission/ or at least CRON executor
The service account is required to access:
[1] Cloud SQL: Read/Write/Create
[2] Cloud GCS: Upload log/output
In order to run the tests against your own bucket, make sure to set up a Service Account with all necessary permissions:
storage.objects.get storage.objects.list storage.objects.create storage.objects.update storage.objects.delete
All five permissions listed above are e.g. included in the predefined Cloud Storage IAM Role roles/storage.objectAdmin.
Set up - Account¶
Account | Action |
---|---|
runner_kunkka | Runner for execute pipeline, script |
runner_oracle | Runner for healthcheck, reboot server |
Set up - System Dependencies¶
Services | Description |
---|---|
dnf | A package manager for RPM-based Linux distributions |
epel-release | The EPEL repository was created by the EPEL group working at the Fedora Project. |
wget | Retrieving files using HTTP, HTTPS, FTP and FTPS, the most widely used Internet protocols. |
curl | Perform requests HTTPS |
vim | text editor has been the standard editor in all UNIX and Linux-based systems |
unzip | Zipping and unzipping files is not as simple of a task in Linux |
make | GNU Make tools controls the generation of executables of a program. |
openssl-devel | A toolkit for supporting cryptography. |
bzip2-devel | A high-quality data compressor. |
htop | A cross-platform interactive process viewer |
telnet | Local area network to provide a bidirectional interactive text-oriented communication |
yq | A lightweight and portable command-line YAML, JSON and XML processor |
Set up - Service Component¶
Services | Description | Note |
---|---|---|
Python3.9 | Python is an interpreted high-level general-purpose programming language. | Required >= 3.6 |
git | Source Version Control | |
docker | Docker daemon | |
gcloud-cli | Command Line interactive with Google Cloud resources | |
docker-compose | Compose multiple services from Docker file | |
yq | A lightweight and portable command-line YAML, JSON and XML processor | mikefarah/yq |
CronD | Daemon to run the scheduled jobs in the environment | |
google-chrome | Handle the browser tool in the server | Version 106.0.5249.119 |
Set up - SSH connection¶
Config the SSH connection for production
Follow GitHub Actions Deployment From zellwk
Schedule Requirement¶
The demand of this is with any schedule libraries or packages need to satified/built-in or can build
Design Requirement for Schedule¶
-
Alway trigger full circle when reboot
-
Set a log stream file
-
Has a communication system.
-
Reboot alerts.
Securities¶
For our development server, we need to achieve the stage of Secure-Utilize to make develop an environment for our developers.
Required
[1] Postgres Connection String
34.1. Database Connection Control Functions
[2] Create Access [Maintancer Only]
Task
Target
Priority
Note
Root User Protect
Credential need to be frequency update
Change Root User Password
Storage Places of Root/Credential of program system: Database,…
High
System Healths
Turn On/Off when downgrade
Monitor System Access database
Read/Write in specific schema
Whitelist Domain, IP [Database] Database Access Policy
GitHub Action configuration¶
The introduction to config the secret in GitHub Action
Firewall¶
Inbound
GitHub Server
Retrieve a list of GitHub's IP addresses from the meta API endpoint
Example component
...
[
"20.201.28.144/32",
"20.205.243.164/32",
"102.133.202.243/32",
"20.248.137.52/32",
"20.207.73.86/32",
"20.27.177.117/32",
"20.200.245.241/32",
"20.233.54.55/32"
],
"pages": [
"192.30.252.153/32",
"192.30.252.154/32",
"185.199.108.153/32",
"185.199.109.153/32",
"185.199.110.153/32",
"185.199.111.153/32",
"2606:50c0:8000::153/128",
"2606:50c0:8001::153/128",
"2606:50c0:8002::153/128",
"2606:50c0:8003::153/128"
],
"importer": [
"52.23.85.212/32",
"52.0.228.224/32",
"52.22.155.48/32"
],
"actions": [
"4.148.0.0/16",
"4.149.0.0/18",
"4.149.64.0/19",
"4.149.96.0/19",
"4.149.128.0/17",
"4.150.0.0/18",
"4.150.64.0/18",
"4.150.128.0/18",
"4.150.192.0/19",
"4.150.224.0/19",
"4.151.0.0/16",
"4.152.0.0/15",
"4.154.0.0/15",
"4.156.0.0/15",
],
...
Outbound:
scheme:host https:iboard.ssi.com.vn https:finfo-iboard.ssi.com.vn https:fundamental-ssi.fiintrade.vn https:api.vietstock.vn https:vsd.vn https:hnx.vn https:www.hsx.vn https:s.cafef.vn https:streaming-api.hsc.com.vn http:f319.com
Appendix¶
Appendix A: Record of Changes¶
Table: Record of changes
Version | Date | Author | Description of Change |
---|---|---|---|
0.0.1 | 01/01/2024 | Bao Truong | Initation documentation |
0.0.2 | 01/10/2024 | Bao Truong | Added Component when deploying new server |
0.0.3 | 01/15/2024 | Bao Truong | Added Pip packages for Python |
0.0.4 | 01/17/2024 | Bao Truong | Added crond service health check |
0.0.5 | 01/25/2024 | Bao Truong | Revamp scripts |
0.2.0 | 02/02/2024 | Bao Truong | Updated layout of SAD |
0.2.0 | 05/17/2024 | Bao Truong | Updated component from basement |
Source Reference¶
-
[1] Business Requirement for Project of Financial Spectrum.
[2] Backup options with detail implement with OpenVPN of Gastkemper at How to use OpenVPN in Github Action workflow