How to Calculate OHLC Bars in DolphinDB
OHLC bars can be efficiently calculated in various scenarios in DolphinDB. This tutorial will introduce how to calculate OHLC bars with historical data and real-time data.
- With historical data
We will explain how to calculate OHLC bars with batch calculation for the following scenarios:
-the starting time of the OHLC windows need to specified;
-multiple trading sessions in a day, including overnight sessions;
-overlapping OHLC windows;
-OHLC windows deliminated based on trading volume.
If there is a very large amount of data, and the results need to be written to the database, we can use the built-in Map-Reduce function in DolphinDB for parallel computing.
- With real-time data
Use APIs to receive market data in real time, and use DolphinDB’s built-in time-series engine for real-time calculation.
1. Calculation with historical data (batch calculation)
To calculate OHLC bars with historical data, you can use DolphinDB’s built-in functions bar
, dailyAlignedBar
or wj
.
1.1 Without specifying the starting time of OHLC windows
Function bar
is generally used to group data based on specified interval.
The result is:
Example 1:
Calculate 5-minute OHLC bars:
Parameter interval of function bar
supports the DURATION type. "5m" here means 5 minutes.
1.2 Need to specify the starting time of OHLC windows
Use function dailyAlignedBar
to specify the starting time of OHLC windows. This function can accommodate multiple trading sessions per day as well as overnight sessions.
Note that for function dailyAlignedBar
, the data type of the temporal column can be SECOND, TIME, NANOTIME, DATETIME, TIMESTAMP and NANOTIMESTAMP. The parameter timeOffset that specifies the starting time of each trading session must have the corresponding data type: SECOND, TIME or NANOTIME, respectively.
Example 2 (one trading session per day): Calculate 7-minute OHLC bars with the same table ‘trade’ in Example 1.
Example 3 (two trading sessions per day): China’s stock market has two trading sessions per day, from 9:30 to 11:30 in the morning and from 13:00 to 15:00 in the afternoon.
Use the following script to generate simulated data:
Calculate the 7-minute OHLC bars:
Example 4 (two trading sessions per day with an overnight session): Some futures have multiple trading sessions per day including an overnight session. In this example, the first trading session is from 8:45 AM to 13:45 PM, and the other session is an overnight session from 15:00 PM to 05:00 AM the next day.
Use the following script to generate simulated data:
Calculate the 7-minute OHLC bars:
1.3 Overlapping OHLC windows
In the examples above, the OHLC windows do not overlap. To calculate overlapping OHLC windows, we can use function wj
(window join). With the wj
function, each row in the left table corresponds to a window in the right table, and calculations can be conducted on this window.
Example 5: Two trading sessions per day with overlapping OHLC windows
Simulate Chinese stock market data and calculate a 30-minute OHLC bars every 5 minutes.
First generate OHLC windows, then use function cj
(cross join) to generate a combination of stock symbols and OHLC windows.
Then use function wj
to calculate OHLC bars with overlapping windows:
1.4 Determine windows with trading volume
The windows in all of the examples above were determined with time. You can also want to use other variables such as trading volume as the basis to determine windows.
Example 6: OHLC bars are calculated every time trading volume increases by 1,000,000.
1.5 Use MapReduce to speed up calculation
If you need to extract large-scale historical data from the database, calculate OHLC bars, and then save them into the database, you can use the built-in MapReduce function mr
for parallel loading and calculation. This method can significantly increase speed.
This example uses US stock market trading data. The raw data is stored in table ‘trades’ in database “dfs://TAQ” with a composite partition: a value partition based on trading dates and a range partition based on stock symbols.
(1) Load the metadata of the table on disk into memory:
(2) Create a template table ‘model’, then create an empty table ‘OHLC’ in database “dfs://TAQ” based on the schema of the template table to store the results:
(3) Use function mr
to calculate OHLC bars and write the results to table 'OHLC':
‘ds’ is a series of data sources generated by function
sqlDS
. Each data source represents data from a partition.Function
calcOHLC
is the map function in MapReduce. It calculates OHLC bars from each data source, writes the result to the database and returns the number of rows written to the database.“+” is the reduce function in MapReduce. It adds up the results of all map functions (the number of rows written to the database) to return the total number of rows written to the database.
2. Real-time calculation
The following figure describes the process of calculating OHLC bars in real-time in DolphinDB:
Data vendors usually provide subscription services with on APIs in Python, Java or other languages. In this example, trading data is written into a stream table through DolphinDB Python API. DolphinDB’s time-series engine conducts real-time OHLC calculations at specified frequencies.
This example uses the text file trades.csv to simulate real-time data. The following table shows its column names and one row of sample data:
The output table for calculation results contains the following 7 columns:
The following sections describe the steps of real-time calculation of OHLC bars:
2.1 Use Python to receive real-time data and write to DolphinDB stream table
- Create a stream table in DolphinDB
- Insert simulated data to the stream table
As the unit of column ‘Datetime’ is SECOND and DataFrame in pandas can only use DateTime[64] which corresponds to nanotimestamp data type in DolphinDB, we need to convert the data type of column ‘Datetime’ before inserting the data to the stream table.
2.2 Calculate OHLC bars in real time
OHLC bars can be calculated in moving windows in real time in DolphinDB. Generally there are the following two scenarios:
- Calculations are conducted after windows end
Nonoverlapping windows
Partially overlapping windows
2. Calculations are conducted continuously within the current window
2.2.1 Calculations are conducted after windows end
For nonoverlapping windows, set the same value for parameter ‘windowSize’ and parameter ‘step’ of function createTimeSeriesAggregator
. For overlapping windows, set 'windowSize'>'step'. Please note that 'windowSize' must be a multiple of 'step'.
Non-overlapping windows: calculate OHLC bars for the previous 5 minutes every 5 minutes.
Overlapping windows: calculate OHLC bars for the previous 5 minutes every 1 minute.
2.2.2 Multiple calculations within a window
If ‘updateTime’ is not specified, calculation for a window will not occur before the ending time of the window. To conduct calculations for the current window before it ends, we can specify ‘updateTime’. ‘step’ must be a multiple of ‘updateTime’.
If ‘updateTime’ is specified, multiple calculations may happen within the current window. These calculations are triggered with the following rules:
(1) Divide the current window into ‘windowSize’/’updateTime’ small windows. Each small window has a length of ‘updateTime’. When a new record arrives after a small window finishes, if there is at least one record in the current window that is not used in a calculation (excluding the new record), a calculation is triggered. Please note that this calculation does not use the new record.
(2) If max(2*updateTime, 2 seconds) after a record arrives at the aggregator, it still has not been used in a calculation, a calculation is triggered. This calculation includes all data in the current window at the time.
If ‘keyColumn’ is specified, these rules apply within each group.
The timestamp of each calculation result within the current window is the current window starting time or starting time + ‘windowSize’ (depending on parameter ‘useWindowStartTime’) instead of a timestamp inside the current window.
If ‘updateTime’ is specified, ‘outputTable’ must be a keyed table (created with function keyedTable
).
In the following example, we calculate 1-minute OHLC bars. Calculations for the current window are triggered no later than 2 seconds after a new message arrives.
First, create a stream table as the output table and use columns ‘datetime’ and ‘Symbol’ as primary keys.
In the time-series engine, parameter ‘updateTime’ is set to 1 (second); parameter ‘useWindowStartTime’ is set to true which means the first column of the output table is the starting time of the windows.
Subscribe to the stream table “Trade”:
2.3 Display OHLC bars in Python
In this example, the output table of the aggregator is also defined as a stream table. The client can subscribe to the output table through Python API and display the calculation results to the Python terminal.
The following script uses Python API to subscribe to the output table OHLC of the real-time aggregation calculation, and print the result.
You can also connect to DolphinDB database through a data visualization tool such as Grafana to query the output table and display the results in charts.
Subscribe to my newsletter
Read articles from DolphinDB directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by