When building any predictive model, its success is 90% dependent on the choice of predictors in the process of data preparation, so it is necessary to pay special attention to this stage.
During preprocessing, it is very important to look at the data and visualize it. This will help you cut off unnecessary data at an early stage and see trends - perhaps for some task it will not be necessary to build a complex model, a simple algorithm will be enough.
At the same time, every person wants to get the result that was conceived at the very beginning of the study and come to innovative conclusions. Unfortunately, during preprocessing, the developer quite often has an unnoticeable desire to introduce ad-hoc hypotheses into the initial data and the final model, thereby adjusting the reality to the desired result. Knowing this trick, we will try to approach the data preparation problem as carefully as possible. Let's start with a small description of the pipeline:
1) The first step is to collect the top-n wallets by capitalization at the moment. In this case, we take 30,000 wallets. In my opinion, this figure is close to optimal, because we need a very large sample of wallets with a large number of transactions to choose from. The time of calculation with this parameter is not so long, because further on, many wallets with low predictive potential will be filtered out. At this step, we need to filter out all wallets that belong to exchanges, since there are too many transactions in them and they will obviously spoil our results, because such wallets are used mainly for depositing / withdrawing funds to the exchange. It is also worth filtering out wallets with too many and too few transactions. Let's take only those wallets with 100 to 1000 transactions. In this case, we will have only about 900 wallets from which we will carry out further selection.
Hint: first, I set the threshold for the minimum number of transactions in the wallet to 10. After receiving the results, I thought that this approach does not solve the problem, and to get adequate results we need to change the selection, because there are too few transactions. As a result, the lower threshold value became equal to 100. The upper threshold value was chosen on the basis that large wallets, which know where the price will go, are unlikely to make a lot of transactions and fuss. In addition, when the upper threshold is expanded to 5000, the sample does not change much, but the computation time increases significantly.
2) For each address, we download the full transaction history and plot the capitalization of the account in BTC without taking into account the exchange rate difference in USD, that is, we only care about the amount of BTC in the account. If we take the capitalization in USD, then the exchange rate difference of Bitcoin will already be embedded in it, and many addresses that do not have predictive power will most likely be included in the model. As a result, the model will be damaged.
3) We calculate the correlation between the BTC price and the capitalization curves of each wallet. We take the BTC price for 4 years. This is an empirical observation: there are few wallets that have started to transact a long time ago and do it regularly. At the same time, we need to move the Bitcoin price forward for a certain period of time (1 day, week, and so on) depending on the forecasting horizon. We rank the results and select those wallets whose correlation coefficient is in the ranges (-1; -0.85] and [0.85; 1). The ranges in this model can be considered a hyperparameter. In the future, when we receive the final dataset, it will be possible to manipulate this parameter and change the input sample of the neural network. At this step, we get a set of wallets with good predictive potential. In general, at this stage it would be possible to complete the process of selection of features we need, but we will dig a little deeper and try to find more predictors.
Hint: as an experiment, you can try adding random wallets to the sample using the Roulette Wheel algorithm. That is, in the third step, we need to take the value of the correlation module, sort the values in ascending order, filter out the weakest ones (for example, less than 0.55) and put them into the algorithm. At the output end there will always be a different sample. Would it be interesting to you to look at the result with an admixture of wallets with a weak correlation?
4) We download all transactions for each wallet and build a graph of counter-agents. There are quite a lot of counter-agents, so we filter only those wallets that have appeared in the transaction history more than n times. For the resulting counterparty wallets, we will repeat steps 2 and 3. Perhaps among them there are wallets with a small capitalization, but a large predictive potential.Below you can see the number of wallet nodes (white) and counterparties (black) with different parameters for the threshold of connections to be added to the graph. The larger the parameter, the fewer counterparties remain:
Also, pay attention to the date of creation of the wallet and the number of counterparty nodes with which our wallet is associated. The bubble size reflects the wallet capitalization:
5) We will try to use the resulting wallets and their capitalization curves as an input to the neural network, having previously normalized them. Also, Bitcoin prices are fed to the input of the neural network, shifted by the same number of days as in the 3rd step.
Now we are ready to build a neural network. For a more complete picture, I decided to build neural networks with different prediction windows: 1 day and a week.
Optimization and Results
So, we have wallet capitalization curves with good predictive potential, now we can build a deep LSTM neural network and try to train it.
The data containing the BTC prices has a certain problem — in the last few years the market has been trending strongly, and we need to split the test and training samples so that they include as many different market phases as possible. It would be possible to resort to complex time series transformations that would allow us to solve this problem, but each additional manipulation will complicate the further use of the model in real life. Therefore, after a visual analysis of the price graph, the samples are divided in a ratio of 75% in the training set and 25% in the test set.
The choice of a specific neural network architecture is up to you if you want to reproduce the results yourself. Now we just need to optimize the hyperparameters and train the neural network. As a hyperparameter, we have a correlation range for selecting wallets in the sample. To your taste, you can convert the number of epochs, the target optimization metric, and even the number of layers in the neural network into hyperparameters, but I will do just one, because it will be faster this way and this will not give me a reason to adjust the market to my model.
The model is trained on the basis of the simplest metric - the mean square error (RMSE), but we will evaluate the model already obtained after optimization by a different indicator. We will look at the correlation coefficient with shifted prices, and if it is more than 85%, then such a model can be used as an indicator.
Hint: In no case should we look at the accuracy of the model and optimize for it - that would be a mistake, because we don't need our model to describe price behavior very accurately. If we use accuracy, then we believe that the model will give us a prediction of exactly what price the asset will trade at after a certain period of time. The dynamics are more important to us, because predicting the exact value of the price is too difficult and a very dangerous task, - the risk of over-optimization is high.
Remember our goal from the first article: not a neural network that trades for a portfolio of algorithms (this is the lot of another AI approach), but an indicator with a negative time lag.
Correspondence in dynamics with shifted price series will provide us with the opportunity to build a binary indicator, using which it will be possible to filter the direction of transactions of algorithms in the future.
Let's see the results!
For a 1-day forecast horizon, we got a value of 88%. It is visually clear that the model, with certain modifications, can become a good binary indicator. For a 7-day forecast horizon, we got a value of 94%.
Both forecasting time horizons are pretty good for filtering intraday trades on minute, half-hour and hourly timeframes. That is, the constructed models can be used as a top-level filter for intraday trading.If you have a certain entry signal on a 5-minute or hourly chart, you can filter only long or short signals depending on the direction of the curve generated by the neural network.
If you are interested in this topic and want to talk about it with me or integrate the indicators of this model into your dashboard, try to test the model as an indicator in your strategies, you can write to me directly: https://www.linkedin.com/in/sergey-frolov-0a090a179/ .