High Frequency Data Analysis: Converting High-frequency Signals to Discrete Buy/Sell Signals

DolphinDBDolphinDB
3 min read

Table of contents

In high-frequency trading, we generate high-frequency signals from trade and quote tick data and analyze these signals to identify trading opportunities. This tutorial demonstrates how to convert high-frequency signals into discrete buy/sell/hold signals. Essentially, the problem is to convert an array of floating-point values into an array of only 3 integers: +1 (buy), 0 (hold) and -1 (sell).

The conversion rules can be quite simple. For example:

  • +1 if signal > t1

  • -1 if signal < t2

  • 0 otherwise

Such conversion rules can be easily implemented in DolphinDB with function iif:

iif(signal > t1, 1, iif(signal <t2, -1, 0))

However, to avoid too frequent reversals of trading direction, we usually adopt a more complex set of rules: if a signal is above the t1 threshold, it’s a buy signal (+1) and the subsequent signals remain buy signals until one falls below the t10 threshold. Similarly, if a signal is below the t2 threshold, it is a sell signal (-1) and the subsequent signals remain sell signals until a signal exceeds the t20 threshold. The relationship between the thresholds is as follows:

t1 > t10 > t20 > t2

With the above rules, the value of a trade signal is determined not only by the value of the current signal but also by the state of the previous signal. This is a typical example of path dependence, which is commonly considered unsuitable or difficult to be handled by vector operations and thus very slow in scripting languages including DolphinDB.

In some cases, however, a path dependence problem can be solved with vector operations. The problem above is one such example. The next section describes how to solve it with vector operations.

First, find out the signals that fall in the ranges of a determined state:

  • if signal > t1, state=1

  • if signal < t2, state=-1

  • if t20<= signal <= t10, state=0

The states of the signals in the ranges of [t2,t20] and [t10,t1] are determined by the states of the signals preceding these ranges.

The DolphinDB script for implementing the above rules:

Let’s run a simple test:

The script would be like this if we use pandas:

The test below generates a random array of 10 million signals between 0 and 100 to test the performance of DolphinDB and pandas. The test environment setup is as follows:

CPU: Intel(R) Core(TM) i7–7700 CPU @3.60GHz 3.60 GHz

Memory: 16 GB

OS: Windows 10

The executions take 171.73ms (DolphinDB) and 3.28 seconds (pandas), respectively.

DolphinDB script:

pandas script:

Thanks for your reading! To keep up with our latest news, please follow our Twitter and Linkedin. You can also join our Slack to chat with the author!

0
Subscribe to my newsletter

Read articles from DolphinDB directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

DolphinDB
DolphinDB