Skip to content

Job Schedule

Overview

Schedule timeline for tracking data freshness of the project. Currently listed related for endpoint of Pluto (aka: DAPI)

The design concept of progress:

flowchart LR
  j[Job - Job will be executed with target model] --> etl[ETL - The executor invoke ETL process data lake] --> transfer[Transfer - Transfer data to endpoint]

Total time of transfered data from provider (Freshness Time):

Freshness Time = Job Model Run + Transfer by CDC + Wait time of ETL + ETL Progress + Transfer Endpoint Time

[1] Job model: around 1-5 min

[2] CDC: currently around 3 - 5 min (the max has been captured at 10 min)

[3] The ETL model has been around 5 min (based on our model) and ETL has been invoked by scheduler, in Asia/Ho Chi Minh time (UTC+07)

flowchart LR
  1[5:00] --> 2[7:00] --> 3[9:00] --> 4[12:00] --> 5[16:00] --> 6[17:00] --> 7[18:00] --> 8[20:00] --> 9[23:00]

[4] The transfer is around 20 min (related to the upsert strategy). Triggered after ETL model.

flowchart LR
  1[5:30] --> 2[7:30] --> 3[9:30] --> 4[12:30] --> 5[16:30] --> 6[17:30] --> 7[18:30] --> 8[20:30] --> 9[23:30]

For the itergrator, the time is try to captured out the last stage of freshness time.

Note: The timezone is Asia/Ho-Chi-Minh (UTC+7)

Term

Term Description
Model The representation name for the component of script to be execute
Provider Name of provider of that model
Job Schedule The CRON schedule. E.g: 27 2,16 * * 1,5
Variant The CRON point in time for a specific timeframe (Likely daylike)
Best Match The time that the model run will fetch freshness data from provider
Timezone Zone of time

Schedule

Matrix

This matrix will be serialize the schedule in form of target information This schedule must be align with crontab construct

Model name [Metadata] Variant Model (24H) ETL (24H) Transfer (24H) Best Match
OHLCV [Model]
... 01 15:30 16:00 16:30 X
... 02 18:30 20:00 20:30
... 03 21:30 23:00 23:30
Adjustment [Model]
... 01 4:45 5:00 5:30
... 02 15:45 16:00 16:30 X
... 03 19:45 20:00 20:30
... 04 22:45 23:00 23:30
Event Right [Model]
... 01 4:15 5:00 5:30
... 02 15:15 16:00 16:30 X
... 03 19:15 20:00 20:30
... 04 21:15 23:00 23:30
Event Detail [Model]
... 01 4:20 5:00 5:30
... 02 15:20 16:00 16:30 X
... 03 19:20 20:00 20:30
... 04 21:20 23:00 23:30
VNINDEX [Model]
... 01 17:25 18:00 18:30 X
... 02 20:25 23:00 23:30
... 03 02:25 5:00 5:30
Index Metadata [Model]
... 01 17:40 18:00 18:30 X
... 02 20:40 23:00 23:30
... 03 02:40 5:00 5:30
Index Constituent [Model]
For HOSE
... 01 3:45 5:00 5:30 X
... 02 16:45 17:00 17:30
... 03 21:45 23:00 23:30
For HNX
... 01 3:20 5:00 5:30 X
... 02 17:20 18:00 18:30
... 03 22:20 23:00 23:30

Updated at 2023-06-07 by @bao.truong

Model

Model: OHLCV

Description: Get the OHLCV from Exchange (aka through HSC)

Rule: At least 1 on trading day

Endpoint: stocks/eod, stocks/detail

Both for Index + Common Stock

Model: Adjustment

Description: Adjustment affected price of OHLCV based on event right

Rule: None

Endpoint: stocks/eod, stocks/detail

Scan: The ticker has right to the adjust on the next trading date

Related: Model Event, Model Event Detail

Model: Event

Description: Get the list of historical event

Rule: None

Endpoint: stocks/eod, stocks/detail

Model: Event Detail

Description: Get detail information of adjusted based on right news

Rule: None

Endpoint: stocks/eod, stocks/detail, stocks/adjust

Model: VNINDEX

Description: Get the historical index metadata

Rule: None

Endpoint: stocks/detail - VNINDEX

Model: Index constituent

Description: Get the historical index metadata, for both HOSE, HNX

Rule: None

Endpoint: stocks/detail - (Query: type=index)

Model: Index constituent

Description: Capture snapshot metadata of index

Rule: Verify the true value of historical index market cap

Endpoint: stocks/detail - (Query: type=index)

Troubleshooting

How to capture the data arrival (or freshness) window

For outside of ETL progress timeframe, there can be 2 way to cactch next data basket

(1) Wait to until the next ETL progress timeline

(2) Invoke by call to user maintain the data-pipeline