Model to deploy machine learning model¶
Overview¶
Progress¶
Agree on metrics to evaluate experiment(s)
- Data Collection Identify all the data that may affect the end output
Collecting data need to assessment from multiple tables (if already have) and collecting data from other souses.
- EDA Exploratory the data to identify which data affect the target which needed to predict
How each factor affect, which level do they affect each other
- Select algorithm Research existing strategies and white papers
Select an algorithm based on hypothesis, type of features, patterns in data +Classification vs. regression +Supervised vs. unsupervised learning +Univariate or multivariate +Time series +DNN vs. Non-DNN
Deliverables: document decisions related to algorithms
- Feature Engineering From factor which identifies in EDA process -> select factor to be the main input of the model
Calculate supportive factor of each file and Data clean and labeling (Transform raw data or Craft new features)
Remove redundant/duplicate features
Remove highly correlated features
Reduce dimensionality as required
Check for class imbalance
Check for data leakage
Deliverable: input feature
- Model Training Choose 3 models to train the data
Choose the time period to train the data
Choose the dataset for training, and test the set
Build a model
Determine duration and the amount of data for the initial experiment
Determine whether the model meets ROI requirements and risk requirements
Choose the adjustment of the model (retraining)
Tools: Python libraries
Deliverable: code/trained model
- Model Evaluation Identify the Accuracy of the training sample and the testing sample
Identify Overfitting and Underfitting
Iterate and Improve the result
- Model Deployment Plan Prepare performance and scale requirements for production
Prepare operationalization requirements for training and scoring
Prepare architecture for model training and retraining
Develop proposed timelines for training and retraining the model
Prepare a plan for rollout and the success criteria for increasing traffic.
- Deploy and operationalize the model Convert the model into an API
Build dataset training and scoring architecture
Consume the model in business application(s)
Build an automated test
Build the feedback loop
- Model Monitoring Business process reply on the ML model
Data analysis and feedback loop
Process
- Model Requirement Identify and define the ML use case and problems to be solved:
Output is the Portfolio (ticker) that has a high mark in diversification, and positive return with the affordable risk level
Define hypothesis(“Hypothesis” = potential pattern we expect to see in data)
Each stock have low Correlation compare to other stock in Portfolio
Chance to have a high return
Low risk
Define experiment(s) to validate the hypothesis
EDA
Identify data source(s)
database or vietstock, ssi,vv..
Identify Input category needed