Ops: Manual input¶
Overview¶
When the process to fetch data of right exercies from providers can not automaticly run or the resources has been meet problems, the internal team (Data team) can manual input and transfer to a staging area which then synced into dataset:inno:corporate-action
for trigger flow adjustment.
Provider: Vietnam Securities Depository (VSD) which is official source of right exercises.
Provider | Category | Provider Status |
---|---|---|
Vietnam Securities Depository (VSD) | Government | Official |
Targeted datasets:
Dataset | Type | Is Operation Dataset |
---|---|---|
Corporate Action | Manual Input | Yes |
This required manual process from VSD within 1 week or to 1 month in advanced.
Mostly process happenned from GitHub > Repository: inno-basement
Flow¶
Introduction¶
The process has been worked by operation officer which included 4 stags with 9 steps that have been described below:
flowchart TB
%% Component
subgraph input_from_provider[Stage 1 - Input Excel from provider]
step1[Step 1 - Manual input by copy news]
step2[Step 2 - Basic transformation]
step3[Step 3 - Detect actionable attributes]
step4[Step 4 - Manual input action case by case]
step5[Step 5 - Execute transit to buffer area]
end
subgraph execute_script[Stage - Execute related script]
step6[Step 6 - Execute Sync process from manual input]
end
subgraph trigger_adjustment[Trigger flow adjustment]
step7[Step 7 - Execute Flow adjustment proceess]
end
subgraph validate_consume[Validate consume area]
step8[Step 8 - Validate from Pluto]
step9[Step 9 - Validate from Spectrum]
end
%% Flow
input_from_provider --> execute_script --> trigger_adjustment --> validate_consume
Detail Process¶
Following is the documentation that describe the stage and step
Stage 1: Input Excel from provider¶
[1] Step 1: Manual input by copy news
Go to website of VSD - Vietnam Securities Depository
Then:
(a) Used English (EN) language
(b) Go to news of corporate action per days
Syntax: https://vsd.vn/en/lich-giao-dich?tab=LICH_THQ&date=DD/MM/YYYY
For the date is the RECORD_DATE
type
© Choosing target period time-frame
(d) then [Find] to get output results.
Sample of screenshot:
Then:
Get all records on a page and go through all pages by copy-paste.
Note:
(a) Using Basic Copy Paste Value - not using Copy-Pastes Value Only, to make sure we had URL link related.
(b) Remember the page position in navigation pages to make sure you are captured all records.
[2] Step 2: Basic Transform
Step 2a: Parse URL by VBA function
Get Url Link of each record by using Excel VBA
Added GetURL
Macros into Excel that are captured data.
Guidance:
-
Hold down the ALT + F11 keys to open the Microsoft Visual Basic for Applications window.
-
Click Insert > Module, and paste the following code in the Module Window.
Function GetURL(pWorkRng As Range) As String
'Updateby Extendoffice
GetURL = pWorkRng.Hyperlinks(1).Address
End Function
- Save the code and close the window, select a blank cell to type this formula =GetURL(A2) (A2 is the cell that the hyperlink in), and press Enter button. You can see the real hyperlink address is extracted.
Source: How to extract actual addresses from hyperlinks in Excel?
Step 2b: Rename Columns by copying following this into table names
erDiagram
EVENT {
int table_position
string record_date
string securities_code
string isin_code
string event_title
string securities_type
string exchange
string administration_place
string url
string is_affected_ohlcv
}
DETAIL {
string url
float event_index
float securities_code
float code
float amount
float rate
float issue_rate
float issue_price
float execution_date
}
EVENT ||--|{ DETAIL : contains
Step 2c: Transfer record_date
columns in YYYYMMDD
format. This required for parse
[3] Step 3: Detect actionable attributes
This required screen for term related to define yes/no
for the action contain the metadata will be adjust
[4] Step 4: Manual input action case by
Go through each detail URL then manual input
[5] Step 5: Execute transit to buffer area
Step 5a: Output CSV with UTF8 encoding with some checklist:
a) Excel (.xlsx) file.
b) Valid record with not null at some key dimensions
c) UTF8 encoding
Example sample Excel (.xlsx) can be found at Repository inno-basement
> master
> component > provider > entry > data
Step 5b: Execute script at component/provider/vsd/entry/event_entry.py
with production credentials
python component/provider/vsd/entry/event_entry.py -v --prefix vsd_event --start 20240708 --end 20240719 --revision 20240707
# 5199 | 2024-07-07 20:12:34,184 - __main__ ] - INFO - Script has been invoked with included with arguments of > prefix:vsd_event, revision:20240707, start:20240708, end:20240719, verbose:True
# 5199 | 2024-07-07 20:12:34,184 - __main__ ] - INFO -
# [+] [c7d36196-71fd-4200-833b-e0a5487505aa] The command trigger: [component/provider/vsd/entry/event_entry.py -v --prefix vsd_event --start 20240708 --end 20240719 --revision 20240707]
# Python version: 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)].
# [+] [c7d36196-71fd-4200-833b-e0a5487505aa] Job start with detail information:
# Component path: [`component/provider/vsd/entry/event_entry.py`]
# Name: ENTRY_MANUAL_VSD_EVENT
# Vendors: INNO
# Environment: production
# Start at: 2024-07-07 20:12:34
# Stage: [Queue] to [Running]
# 5233 | 2024-07-07 20:12:34,218 - __main__ ] - INFO - Read file with name: `vsd_event_20240708_20240719_20240707.xlsx`
# 5233 | 2024-07-07 20:12:34,218 - __main__ ] - INFO - The required sheets `EVENT, DETAIL` are available.
# 5233 | 2024-07-07 20:12:34,218 - __main__ ] - INFO - Successful check required column for `EVENT`: [table_position, record_date, securities_code, isin_code, event_title, securities_type, exchange, administration_place, url, is_affected_ohlcv]
# 5245 | 2024-07-07 20:12:34,230 - __main__ ] - INFO - Successful check required column for `DETAIL`: [url, event_index, securities_code, code, amount, rate, issue_rate, issue_price, execution_date]
# 5265 | 2024-07-07 20:12:34,251 - __main__ ] - INFO - Successful check schema of `EVENT`
# 5269 | 2024-07-07 20:12:34,254 - __main__ ] - INFO - Successful check schema of `DETAIL`
# 5306 | 2024-07-07 20:12:34,291 - __main__ ] - INFO - [Event] Has total of 74 records
# 5307 | 2024-07-07 20:12:34,292 - __main__ ] - INFO - [Detail] Has total of 54 records
# 6682 | 2024-07-07 20:12:35,667 - __main__ ] - INFO - [Event] Inactive total of 12 records
# 6815 | 2024-07-07 20:12:35,801 - __main__ ] - INFO - [Event] Successfully 74 inserted
# 7009 | 2024-07-07 20:12:35,995 - __main__ ] - INFO - [Detail] Successfully inserted
# 7116 | 2024-07-07 20:12:36,101 - __main__ ] - INFO -
# [+] [c7d36196-71fd-4200-833b-e0a5487505aa] Job finish with detail information:
# Finish at: 2024-07-07 20:12:36
# Total running time: 0 hours 0 minutes 1.917 seconds
# Stage: [Running] to [Success]
# Waiting up to 5 seconds.
# Sent all pending
Stage 2: Execute related script¶
Step 6: Execute sync process in the Workflow Orchesstration
Stage 3: Trigger Flow adjustment¶
Step 7: Execute flow adjustment in the Workflow Orchesstration
Stage 4: Validate consume area¶
Step 8: Validate HTTPs requests from Pluto
Step 9: Validate HTTPs requests from Spectrum