Skip to content

Ops: Manual input

Overview

When the process to fetch data of right exercies from providers can not automaticly run or the resources has been meet problems, the internal team (Data team) can manual input and transfer to a staging area which then synced into dataset:inno:corporate-action for trigger flow adjustment.

Provider: Vietnam Securities Depository (VSD) which is official source of right exercises.

Provider Category Provider Status
Vietnam Securities Depository (VSD) Government Official

Targeted datasets:

Dataset Type Is Operation Dataset
Corporate Action Manual Input Yes

This required manual process from VSD within 1 week or to 1 month in advanced.

Mostly process happenned from GitHub > Repository: inno-basement

Flow

Introduction

The process has been worked by operation officer which included 4 stags with 9 steps that have been described below:

flowchart TB

  %% Component
  subgraph input_from_provider[Stage 1 - Input Excel from provider]
    step1[Step 1 - Manual input by copy news]
    step2[Step 2 - Basic transformation]
    step3[Step 3 - Detect actionable attributes]
    step4[Step 4 - Manual input action case by case]
    step5[Step 5 - Execute transit to buffer area]
  end

  subgraph execute_script[Stage - Execute related script]
    step6[Step 6 - Execute Sync process from manual input]
  end

  subgraph trigger_adjustment[Trigger flow adjustment]
    step7[Step 7 - Execute Flow adjustment proceess]
  end

  subgraph validate_consume[Validate consume area]
    step8[Step 8 - Validate from Pluto]
    step9[Step 9 - Validate from Spectrum]
  end

  %% Flow
  input_from_provider --> execute_script --> trigger_adjustment --> validate_consume

Detail Process

Following is the documentation that describe the stage and step

Stage 1: Input Excel from provider

[1] Step 1: Manual input by copy news

Go to website of VSD - Vietnam Securities Depository

Then:

(a) Used English (EN) language

(b) Go to news of corporate action per days

Syntax: https://vsd.vn/en/lich-giao-dich?tab=LICH_THQ&date=DD/MM/YYYY

For the date is the RECORD_DATE type

© Choosing target period time-frame

(d) then [Find] to get output results.

Sample of screenshot:

Screenshot - Input dataset

Then:

Get all records on a page and go through all pages by copy-paste.

Note:

(a) Using Basic Copy Paste Value - not using Copy-Pastes Value Only, to make sure we had URL link related.

(b) Remember the page position in navigation pages to make sure you are captured all records.

[2] Step 2: Basic Transform

Step 2a: Parse URL by VBA function

Get Url Link of each record by using Excel VBA

Added GetURL Macros into Excel that are captured data.

Guidance:

  1. Hold down the ALT + F11 keys to open the Microsoft Visual Basic for Applications window.

  2. Click Insert > Module, and paste the following code in the Module Window.

Function GetURL(pWorkRng As Range) As String
'Updateby Extendoffice
  GetURL = pWorkRng.Hyperlinks(1).Address
End Function
  1. Save the code and close the window, select a blank cell to type this formula =GetURL(A2) (A2 is the cell that the hyperlink in), and press Enter button. You can see the real hyperlink address is extracted.

Source: How to extract actual addresses from hyperlinks in Excel?

Step 2b: Rename Columns by copying following this into table names

erDiagram
  EVENT {
    int table_position
    string record_date
    string securities_code
    string isin_code
    string event_title
    string securities_type
    string exchange
    string administration_place
    string url
    string is_affected_ohlcv
  }

  DETAIL {
    string url
    float event_index
    float securities_code
    float code
    float amount
    float rate
    float issue_rate
    float issue_price
    float execution_date
  }

  EVENT ||--|{ DETAIL : contains

Step 2c: Transfer record_date columns in YYYYMMDD format. This required for parse

[3] Step 3: Detect actionable attributes

This required screen for term related to define yes/no for the action contain the metadata will be adjust

[4] Step 4: Manual input action case by

Go through each detail URL then manual input

[5] Step 5: Execute transit to buffer area

Step 5a: Output CSV with UTF8 encoding with some checklist:

a) Excel (.xlsx) file.

b) Valid record with not null at some key dimensions

c) UTF8 encoding

Example sample Excel (.xlsx) can be found at Repository inno-basement > master > component > provider > entry > data

Step 5b: Execute script at component/provider/vsd/entry/event_entry.py with production credentials

python component/provider/vsd/entry/event_entry.py -v --prefix vsd_event --start 20240708 --end 20240719 --revision 20240707
#   5199 | 2024-07-07 20:12:34,184 - __main__     ] - INFO     - Script has been invoked with included with arguments of > prefix:vsd_event, revision:20240707, start:20240708, end:20240719, verbose:True
#   5199 | 2024-07-07 20:12:34,184 - __main__     ] - INFO     -

# [+] [c7d36196-71fd-4200-833b-e0a5487505aa] The command trigger: [component/provider/vsd/entry/event_entry.py -v --prefix vsd_event --start 20240708 --end 20240719 --revision 20240707]
# Python version: 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)].

# [+] [c7d36196-71fd-4200-833b-e0a5487505aa] Job start with detail information:
# Component path: [`component/provider/vsd/entry/event_entry.py`]
# Name: ENTRY_MANUAL_VSD_EVENT
# Vendors: INNO
# Environment: production
# Start at: 2024-07-07 20:12:34
# Stage: [Queue] to [Running]

#   5233 | 2024-07-07 20:12:34,218 - __main__     ] - INFO     - Read file with name: `vsd_event_20240708_20240719_20240707.xlsx`
#   5233 | 2024-07-07 20:12:34,218 - __main__     ] - INFO     - The required sheets `EVENT, DETAIL` are available.
#   5233 | 2024-07-07 20:12:34,218 - __main__     ] - INFO     - Successful check required column for `EVENT`: [table_position, record_date, securities_code, isin_code, event_title, securities_type, exchange, administration_place, url, is_affected_ohlcv]
#   5245 | 2024-07-07 20:12:34,230 - __main__     ] - INFO     - Successful check required column for `DETAIL`: [url, event_index, securities_code, code, amount, rate, issue_rate, issue_price, execution_date]
#   5265 | 2024-07-07 20:12:34,251 - __main__     ] - INFO     - Successful check schema of `EVENT`
#   5269 | 2024-07-07 20:12:34,254 - __main__     ] - INFO     - Successful check schema of `DETAIL`
#   5306 | 2024-07-07 20:12:34,291 - __main__     ] - INFO     - [Event] Has total of 74 records
#   5307 | 2024-07-07 20:12:34,292 - __main__     ] - INFO     - [Detail] Has total of 54 records
#   6682 | 2024-07-07 20:12:35,667 - __main__     ] - INFO     - [Event] Inactive total of 12 records
#   6815 | 2024-07-07 20:12:35,801 - __main__     ] - INFO     - [Event] Successfully 74 inserted
#   7009 | 2024-07-07 20:12:35,995 - __main__     ] - INFO     - [Detail] Successfully inserted
#   7116 | 2024-07-07 20:12:36,101 - __main__     ] - INFO     -

# [+] [c7d36196-71fd-4200-833b-e0a5487505aa] Job finish with detail information:
# Finish at: 2024-07-07 20:12:36
# Total running time: 0 hours 0 minutes 1.917 seconds
# Stage: [Running] to [Success]

# Waiting up to 5 seconds.
# Sent all pending

Step 6: Execute sync process in the Workflow Orchesstration

Stage 3: Trigger Flow adjustment

Step 7: Execute flow adjustment in the Workflow Orchesstration

Stage 4: Validate consume area

Step 8: Validate HTTPs requests from Pluto

Step 9: Validate HTTPs requests from Spectrum