2.9%

Seiji Tanimoto

May 11, 2021 • 2 min read

2.9%.
Do you know what these numbers mean?

It's the AI adoption rate among domestic private companies, as published by Japan's Yano Research Institute (2018). Now, let's think about this low adoption rate today. There is no one exact field about AI. What I am going to talk about is machine learning. There are three main types of learning methods in machine learning. These are supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, a large amount of data is used to train an algorithm to predict an unknown outcome. In order to apply this machine learning, the following factors are important.

Define
What are the business benefits?
Data
Do you have the right quality and quantity of data?
Develop
Can we develop the best model?
Deploy
Can the model be executed in a practical environment?
Drive
Is the entire system ready for operation?

Only when these 5D's are met can AI be introduced. We will consider each item in detail.

Define (development definition)
At this point, the most important question is: What do we want to do with machine learning? What do we want to learn? How will it learn? How do we evaluate the model? How far do you want the system to go? The most important thing is to have a clear idea of what you want to do.

Data (quantity)
In many legacy companies, data is often unstructured. In addition, permission to collect data is not granted. Data collection and annotation is costly. It takes a lot of time and effort to process data. If the data is not prepared, it cannot be used as an indicator to develop algorithms.
In this section, we would like to consider the quality of the data. If the quality of the data is poor, it is meaningless." Garbage in, garbage out". Garbage in, garbage out". Identifying data is 80% of AI development.
Points to consider in data quality
Is there any noise in the data that may affect accuracy?
Is the reproducibility of the measurement ensured?
When annotating, are the criteria for data information clear?
Are there any differences between actual operation data and training data?

Develop
First, let's look at Leakage.
It contains information that is trying to be predicted, and the information is handled improperly. Even if high accuracy is achieved during development, it will not work well in the production environment.
Observe the minimum rules, such as handling time series data appropriately.
If you do not understand the meaning of the data, you will end up learning incorrectly.
In the same way, it is very important to know what to train the model on when it comes to machine learning for stock price prediction. The reason for this is that even if you just make predictions based on the movements of a few minutes ago, you will only be following the movements of the stock price.

Deploy
A model with high accuracy is certainly useful. However, the most important thing is to assume the environment at the time of deployment. (Reference)

Drive
Build an environment on a server using Linux, Docker, etc.
Knowledge of SQL
Web application
Web API
Cron (periodic execution)
Web server (Apache)
In other words, automation of the system by web is necessary.

Your emails always warm the cockles of my heart.