Model to deploy machine learning model¶

Overview¶

Progress¶

Agree on metrics to evaluate experiment(s)

Data Collection Identify all the data that may affect the end output

Collecting data need to assessment from multiple tables (if already have) and collecting data from other souses.

EDA Exploratory the data to identify which data affect the target which needed to predict

How each factor affect, which level do they affect each other

Select algorithm Research existing strategies and white papers

Select an algorithm based on hypothesis, type of features, patterns in data +Classification vs. regression +Supervised vs. unsupervised learning +Univariate or multivariate +Time series +DNN vs. Non-DNN

Deliverables: document decisions related to algorithms

Feature Engineering From factor which identifies in EDA process -> select factor to be the main input of the model

Calculate supportive factor of each file and Data clean and labeling (Transform raw data or Craft new features)

Remove redundant/duplicate features

Remove highly correlated features

Reduce dimensionality as required

Check for class imbalance

Check for data leakage

Deliverable: input feature

Model Training Choose 3 models to train the data

Choose the time period to train the data

Choose the dataset for training, and test the set

Build a model

Determine duration and the amount of data for the initial experiment

Determine whether the model meets ROI requirements and risk requirements

Choose the adjustment of the model (retraining)

Tools: Python libraries

Deliverable: code/trained model

Model Evaluation Identify the Accuracy of the training sample and the testing sample

Identify Overfitting and Underfitting

Iterate and Improve the result

Model Deployment Plan Prepare performance and scale requirements for production

Prepare operationalization requirements for training and scoring

Prepare architecture for model training and retraining

Develop proposed timelines for training and retraining the model

Prepare a plan for rollout and the success criteria for increasing traffic.

Deploy and operationalize the model Convert the model into an API

Build dataset training and scoring architecture

Consume the model in business application(s)

Build an automated test

Build the feedback loop

Model Monitoring Business process reply on the ML model

Data analysis and feedback loop

Process

Model Requirement Identify and define the ML use case and problems to be solved:

Output is the Portfolio (ticker) that has a high mark in diversification, and positive return with the affordable risk level

Define hypothesis(“Hypothesis” = potential pattern we expect to see in data)

Each stock have low Correlation compare to other stock in Portfolio

Chance to have a high return

Low risk

Define experiment(s) to validate the hypothesis

EDA

Identify data source(s)

database or vietstock, ssi,vv..

Identify Input category needed