Trino (TSF) Installation and Configuration
Understanding the Clear Reasons for Using Trino
Since the installation and configuration methods of Trino vary depending on how you use it, it's essential to first clearly understand the reasons for using it.
Determine if you need to fetch data from various data sources (MySQL, PostgreSQL, HDFS, Amazon S3, Cassandra, Kafka, etc.) and integrate them into a single query for analysis.
Analyze whether you need to integrate and analyze data from a Data Lake (HDFS) and Relational Databases (e.g., MySQL) in a Business Intelligence Platform.
Assess if data integration and analysis between multiple cloud environments or between on-premises and cloud are required.
Determine if you don't need to perform queries where data locality is important, as with Impala (for Trino).
- Since Trino operates based on in-memory processing, some queries may be relatively slower than Impala.
Assess whether you need to provide data analysis results on a real-time dashboard on a per-second basis.
Determine if data scientists need to perform rapid data exploration and prototyping.
Assess whether you need ad-hoc analysis or data validation without requiring complex batch processing.
- For complex batch processing, Spark is more advantageous.
Determine if simple data processing and aggregation tasks are needed in the data pipeline.
Assess whether sophisticated resource management and query prioritization are required.
Impala supports resource management to some extent, but it is challenging to manage as precisely as Trino.
Trino allows you to set query priorities through resource groups and allocate resources according to various users and departments.
Determine if flexibility in a multi-tenant environment is needed.
In cases where various teams and departments in a large enterprise with multiple users need to perform data analysis simultaneously.
Processing multiple user requests simultaneously in environments with resource constraints.
Criteria for Choosing Between Impala, Spark, and Trino
Impala
Suitable for quickly processing large-scale Hive tables or HDFS-based large data interactively.
Highly efficient in moving computation to DataNodes by utilizing data locality.
Deeply integrated into the Hadoop environment.
Spark
Suitable for large-scale batch data processing, ETL tasks, and complex data transformations.
Appropriate for handling machine learning workloads using libraries like MLlib.
Trino
Suitable when integrated queries across various data sources are needed.
Appropriate for low-latency interactive analysis and lightweight ad-hoc queries.
Suitable for multi-user environments where resource management is required.
Trino Installation by Apache Ambari
If you've decided to install Trino, it is recommended to proceed as follows:
Trino can be installed in various ways, but to operate in a multi-tenant environment, it shines when installed using Ambari and combined with Ranger.
For other installation methods, refer to the Trino documentation:
Deploying Trino
Trino in a Docker container
Trino on Kubernetes with Helm
RPM package
The Trino installation is described based on a Kerberized Hadoop (Secured Hadoop).
The Trino cluster runs on the JVM by default.
- An appropriate JDK or JRE must be set up. Java 11 or higher is recommended.
Trino Installation Step by Step
Install the Trino package in Ambari to configure the Trino cluster.
- The Trino cluster consists of a coordinator and workers.
After installing Trino, install the Ranger Trino Plugin where Trino is installed.
For Trino security, it is recommended to follow the security workflow suggested in the Trino documentation.
Enable TLS/HTTPS by configuring certificates and a load balancer. (Mandatory)
Set up a shared secret. (Mandatory)
Configure an authentication provider like LDAP. (Mandatory)
Avoid using Kerberos settings unless necessary, as they are complex and challenging.
Optionally, implement authorization for users accessing URLs.
Create a Kerberos principal for Trino and distribute the generated keytab files to the coordinator and workers.
- Reference: Kerberos authentication
Distribute the related configuration files for Trino authentication.
Create and configure the
password-authenticator.properties
file on the coordinator server.The content of the
password-authenticator.properties
file is as follows (for authentication using LDAP):# $TRINO_HOME/etc/password-authenticator.properties password-authenticator.name=ldap ldap.url=ldap://<ldap-host>:389 ldap.allow-insecure=true ldap.user-bind-pattern=uid=${USER},ou=users,dc=company,dc=com
Configure the Trino connector (explained based on the Hive connector).
Create and configure the file in the format
$TRINO_HOME/etc/catalog/hive.properties
.Refer to the Trino Hive Connector documentation.
connector.name=hive hive.metastore.uri=thrift://<HMS-server>:9083 hive.config.resources=$HADOOP_HOME/conf/core-site.xml,$HADOOP_HOME/conf/hdfs-site.xml hive.metastore.authentication.type=KERBEROS hive.metastore.service.principal=hive/_HOST@<REALM> hive.metastore.client.principal=trino@<REALM> hive.metastore.client.keytab=<principal keytab path> hive.hdfs.authentication.type=KERBEROS hive.hdfs.impersonation.enabled=true hive.hdfs.trino.principal=trino@<REALM> hive.hdfs.trino.keytab=<principal keytab path>
Ensure that it works well with Apache Ranger.
Verify that the Ranger Trino Plugin is operating correctly.
Create policies in Ranger and confirm that users authenticated via LDAP can exercise appropriate permissions according to each policy.
Check that audit logs are properly recorded in Trino.
Trino Configuration
The important aspects in Trino settings are resource configuration, coordinator settings, and worker settings.
Trino Resource Configuration
To manage resources in Trino, there is a resource group feature.
To apply resource groups, you need to set up a resource group manager.
- The resource group manager can be file-based or use a database.
To set it up based on files, create and configure the
$TRINO_HOME/etc/resource-groups.properties
file.Here, we'll explain how to set it up using a file-based approach.
# resource-groups.properties resource-groups.configuration-manager=file resource-groups.config-file=$TRINO_HOME/etc/resource-groups.json
The
resource-groups.config-file
can be written in JSON format.
The resource group config file uses the concept of resource group properties for configuration.
- For detailed explanations, refer to the Trino Documentation.
Explanation of essential fields:
name
- Name of the group.
maxQueued
Maximum number of queued queries.
Once this limit is reached, new queries are rejected.
hardConcurrencyLimit
- Maximum number of running queries.
softMemoryLimit
Maximum amount of distributed memory this group may use before new queries become queued.
May be specified as an absolute value (e.g.,
1GB
) or as a percentage (e.g.,10%
) of the cluster’s memory.
Sample configuration file:
{ "rootGroups": [ { "name": "ADD_HOC", "maxQueued": 50, "hardConcurrencyLimit": 20, "softMemoryLimit": "15%" }, { "name": "BI", "maxQueued": 80, "hardConcurrencyLimit": 40, "softMemoryLimit": "20%" } ] }
Characteristics of resource groups:
You can enforce resource usage limits and queue policies.
It is possible to divide usage into sub-groups.
A query belongs to one resource group and consumes resources from that group.
If there are insufficient available resources, new queries are queued, but running queries are not failed.
- Instead, all new queries remain in a queued state.
A resource group can either have sub-groups or accept queries directly, but not both simultaneously.
- It is impossible to process queries while having sub-groups.
There are restrictions on resource group names:
When resource groups are structured in a tree, sub-group names cannot duplicate names of other sibling sub-groups.
- For example,
a.b
,a.b
→ Not allowed.
- For example,
Each node in the tree must have a unique path, and sub-groups under the same parent must have different names.
- For example,
a.b
,a.c
→ Valid.
- For example,
Even if sub-groups under different parent nodes have the same name, they are identified by a unique full path, so conflicts do not occur.
- For example,
a.b.d
,a.c.d
→ Valid.
- For example,
After setting up resource groups, you need to configure selectors.
For detailed explanations, refer to Trino Selector rules documentation.
Selector rules use the regular expression syntax of the Java programming language.
In selector rules,
group
is mandatory.group
: The group these queries will run in.
By default, all rules are combined with AND.
clientTags
are mainly used in JDBC connection settings to apply rules.There are restrictions on
clientTags
.Disallowed characters:
- Whitespace, comma (,), slash (/), colon (:)
Trino Impersonation Configuration (Optional)
To apply impersonation when accessing HDFS from Trino, you need to set up proxy user settings.
Here's how to set it up in HDFS as an example:
In Ambari → HDFS → CONFIGS → ADVANCED →
core-site.xml
settingshadoop.proxyuser.trino.groups: * hadoop.proxyuser.trino.hosts: *
Since
core-site.xml
has been modified, you need to restart the NameNode for the changes to take effect.
Trino Client Configuration
To properly apply resource groups, you also need to configure the client side.
- One convenient method is to set
clientTags
in the client connector.
- One convenient method is to set
Trino Connection for JDBC Driver
The Trino JDBC driver has the following requirements:
Java version 8 or higher.
All users that connect to Trino with the JDBC driver must be granted access to query tables in the
system.jdbc
schema.
Any JDBC driver that meets the above Trino JDBC requirements can be used.
Using
trino-cli
provided by TrinoDownload the
trino-cli
executable jar.You can connect to the Trino server as follows:
./trino --server <http://trino.example.com:8080>
Note: Authentication (e.g., Kerberos) may be required to connect.
Using the Trino Python Client
Client for Trino, a distributed SQL engine for interactive and batch big data processing.
Provides a low-level client, a DBAPI 2.0 implementation, and a SQLAlchemy adapter. It supports Python >= 3.8 and PyPy.
trino.sqlalchemy
is compatible with the latest 1.3.x, 1.4.x, and 2.0.x SQLAlchemy versions at the time of release of a particular version of the client.To connect to Trino using SQLAlchemy, use a connection string (URL) following this pattern:
trino://<username>:<password>@<host>:<port>/<catalog>/<schema>
In order to pass additional connection attributes, use the
connect_args
method.- Attributes can also be passed in the connection string.
Example code for connecting using the Python client:
from sqlalchemy import create_engine from trino.sqlalchemy import URL engine = create_engine( URL( host="localhost", port=8080, catalog="system" ), connect_args={ "session_properties": {'query_max_run_time': '1d'}, "client_tags": ["tag1", "tag2"], "roles": {"catalog1": "role1"}, } ) # or in connection string engine = create_engine( 'trino://user@localhost:8080/system?' 'session_properties={"query_max_run_time": "1d"}' '&client_tags=["tag1", "tag2"]' '&roles={"catalog1": "role1"}' ) # or using the URL factory method engine = create_engine(URL( host="localhost", port=8080, client_tags=["tag1", "tag2"] ))
Trino Coordinator Configuration
The Trino coordinator server plays a central role in the cluster, so its hardware specifications must be carefully considered.
The Trino coordinator handles query scheduling, resource management, user request processing, and various critical roles, requiring appropriate hardware specifications accordingly.
Recommended hardware specs for a server to be used as the Trino coordinator:
CPU
CPU Cores: At least 8–16 cores are recommended.
- The coordinator handles query planning, analysis, and resource scheduling, so performance improves significantly in a multi-core environment.
CPU Clock Speed: High-clock-speed latest Intel Xeon or AMD EPYC series are suitable.
Hyper-Threading: Trino tends to generate many threads, so hyper-threading can help performance.
Memory
RAM Size: At least 256 GB is recommended.
- The coordinator uses a large memory pool to manage query planning and execution, so more memory is needed when processing large-scale queries.
Memory Speed: Using high-speed memory of DDR4 2933 MHz or higher is advantageous—the faster, the better.
Disable Swap Usage
It is advisable to configure the OS not to use swap memory.
Performance degradation can be severe if swapping occurs.
Disk
Disk Type: NVMe or SSD (Solid State Drive) is the best choice.
- The coordinator frequently accesses log records and metadata, so disk I/O performance is critical.
RAID Configuration: RAID 1+0 configuration is recommended.
- It is optimized for both durability and read/write performance.
Disk Size: At least 1 TB is recommended.
Disk Partition Configuration
It's advisable to use separate partitions to prevent conflicts between the OS and data.
Root Partition (for OS): Configure at least 100 GB to ensure stable operation of the OS and other services.
/var/log
Partition: Allocate sufficient disk space to/var/log
since Trino generates many logs. At least 200 GB is recommended./data
Partition: Configure a separate partition to store Trino's metadata and cache data.
Network
Since the coordinator exchanges data with multiple worker nodes, network performance is important.
A NIC (Network Interface Card) with a bandwidth of 10 GbE or higher is recommended.
Example of a typical Trino coordinator server spec (this article assumes this level of server spec):
CPU: Intel Xeon Gold 6248 2.5 GHz, 24 cores/48 threads
Memory: 512 GB DDR4 3200 MHz
Disk: 2 TB NVMe SSD, RAID 1+0 configuration
Network: 10 GbE network interface
OS: CentOS 7.9, Ubuntu 20.04 LTS, Red Hat Enterprise Linux 9
Linux-based OS is appropriate.
It's common to use CentOS 7, Ubuntu 18.04 or higher, or Red Hat Enterprise Linux (RHEL).
A Linux-based environment is most suitable in terms of stability and performance optimization.
Kernel Settings
Adjust the TCP buffer size to optimize network performance.
Adjust settings like
vm.swappiness=0
for I/O performance.
JVM Settings
- It is appropriate to use Java versions like OpenJDK 11 or 17, and you should optimize JVM options to efficiently manage memory and garbage collection (GC).
Resource Isolation
You can utilize cgroups to limit CPU and memory to prevent the Trino coordinator from consuming excessive resources.
- This can prevent resource conflicts between the coordinator and other system processes.
Configure High Availability (HA)
The Trino coordinator is a Single Point of Failure (SPOF), so you need to prepare for failures.
Set up multiple coordinator servers using a load balancer like HAProxy.
Trino Coordinator JVM Configuration in Detail (for Optimal Performance)
Typically, the coordinator uses less memory than workers.
An unnecessarily large heap size can increase GC pause times.
Xmx
Assuming a cluster of about 30 workers, an optimal value is around 60 GB.
Since the heap memory size is 60 GB, G1 GC is appropriate.
Set
Xmx60g
.
XX:InitiatingHeapOccupancyPercent
The default value is 45%, but adjust it to 30%–40% to optimize for the coordinator's role.
Starting GC early when heap usage is low can detect memory leaks and improve memory management efficiency.
Set
XX:InitiatingHeapOccupancyPercent=35
.
XX:MaxGCPauseMillis
Set the target GC pause time to increase the coordinator's responsiveness.
The default is 200 ms, but adjust to 100 ms–200 ms for slightly faster response times.
Since the coordinator performs latency-sensitive tasks like query planning and scheduling, shorter pause times are desirable.
Set
XX:MaxGCPauseMillis=150
as a realistic adjustment.
XX:G1HeapRegionSize
By default, G1 GC automatically determines the heap region size based on the heap size.
An appropriate ratio is 1/6400 of the heap size (60 GB / 6400 ≈ 9.6 MB).
Set
XX:G1HeapRegionSize=10M
.
Add
XX:+AlwaysPreTouch
Pre-allocate heap memory at JVM startup to improve runtime performance.
Activate
XX:+AlwaysPreTouch
.
XX:ParallelGCThreads
andXX:ConcGCThreads
Adjust the number of GC threads according to the system's CPU cores.
Set
XX:ParallelGCThreads=48
to match the number of CPU threads.Set
XX:ConcGCThreads=12
to about ¼ of the CPU threads.
XX:+UseStringDeduplication
Reduce memory usage by deduplicating identical strings in G1 GC.
Activate
XX:+UseStringDeduplication
since similar queries are likely to be input repeatedly.
Xlog
Log GC activities to monitor whether GC is performing appropriately.
Set:
-Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=10M
✅ Final JVM Options for Trino Coordinator
-Xmx60G -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:ReservedCodeCacheSize=512M -Dpresto-temporarily-allow-java8=true -Djdk.attach.allowAttachSelf=true -XX:InitiatingHeapOccupancyPercent=35 -XX:MaxGCPauseMillis=150 -XX:G1HeapRegionSize=10M -XX:ParallelGCThreads=48 -XX:ConcGCThreads=12 -XX:+AlwaysPreTouch -XX:+UseStringDeduplication -Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=20M
Trino Worker Configuration
The Trino worker server plays a crucial role in processing large volumes of data and executing queries.
The performance of worker nodes is significantly affected by traffic volume, query complexity, and the size of the data being processed, so hardware specifications must be carefully configured.
Typically, servers with similar specs to the coordinator are used, but it's better if the worker has more CPU clock speed, more cores, and larger memory than the coordinator.
Recommended hardware specs for a server to be used as a Trino worker:
CPU
CPU Cores: At least 16 cores are required, and 24–32 cores are recommended if possible.
- Since parallel processing is critical for workers, multi-core CPUs are essential.
CPU Clock Speed: For high-performance queries, the latest CPUs with high clock speeds are advantageous.
- Intel Xeon Scalable series or AMD EPYC processors are suitable.
Hyper-Threading: Since workers use many threads, hyper-threading can help improve performance.
Memory
RAM Capacity: Since worker nodes primarily use memory pools for data processing, at least 256 GB of RAM is needed. For large-scale data processing tasks, 512 GB or more may be required.
Memory Speed: A memory speed of DDR4 2933 MHz or higher is recommended. The memory bandwidth of worker nodes directly impacts query performance.
Disk
Disk Type: NVMe SSDs or high-performance SATA SSDs are suitable.
- Since workers heavily rely on disk I/O, using high-speed SSDs is recommended for fast data access and processing performance.
RAID Configuration: RAID 1+0 is recommended.
Disk Size: Each worker node requires at least 1 TB of disk space. Ample free space is necessary, especially when processing large datasets.
Disk Partition
It's advisable to use separate partitions to optimize disk I/O performance on worker nodes.
Root Partition (for OS): Allocate at least 100 GB to ensure stable operation of the OS and basic services.
/data
Partition: Allocate sufficient disk space to store data and intermediate results processed by the worker./var/log
Partition: Since many log files can accumulate, it's good to allocate a separate space of 100–200 GB.
Network
Since worker nodes exchange a lot of network traffic, a minimum of 10 GbE network or higher is recommended.
Low Latency: A low-latency network setup with fast switching environments is necessary to minimize network latency.
Example of typical Trino worker server specs (this article assumes workers with this level of spec):
CPU: Intel Xeon Gold 6248R 2.4 GHz, 24 cores/48 threads
Memory: 512 GB DDR4 3200 MHz
Disk: 2 TB NVMe SSD, RAID 1+0 configuration
Network: 10 GbE network interface
OS: CentOS 7.9, Ubuntu 20.04 LTS, Red Hat Enterprise Linux 9
- Set up OS, JVM, and resource isolation at the same level as the coordinator.
Storage & Caching
Local Disk Storage: Workers process temporary data on local disks, so fast I/O performance is crucial.
- Using SSDs or NVMe disks significantly improves intermediate data storage and processing performance.
Distributed Storage Integration: When using distributed file systems like HDFS or S3, you can improve performance by appropriately utilizing local cache disks.
Trino Worker JVM Configuration in Detail (for Optimal Performance, G1 GC Version)
Trino workers process large volumes of data, and memory usage patterns can change rapidly.
Since Trino generally processes large-scale queries, overall throughput is important.
However, if the worker node's responsiveness becomes excessively slow, it can affect the performance of the entire cluster.
Workers can use G1 GC like the coordinator, but performance can be further improved by using ZGC.
Xmx
It's stable to use about 70–80% of the physical memory capacity, leaving the rest for the OS and other processes.
Maximum available worker memory = 512 GB × 0.8 = 409.6 GB
Set
Xmx410g
.
XX:MaxGCPauseMillis
Set the maximum GC pause time to improve application responsiveness.
In large heaps, it's challenging to maintain pause times below 200 ms.
Set a realistic goal of 500 ms for the pause time to allow G1 GC to balance between responsiveness and throughput.
Set
XX:MaxGCPauseMillis=500
.
XX:InitiatingHeapOccupancyPercent
Adjust the heap usage level at which concurrent GC starts to optimize GC frequency.
In large heaps (410 GB), heap usage can increase rapidly, so start the GC cycle early to prevent out-of-memory situations.
Start GC when heap usage reaches about 30%, approximately 123 GB, to enhance memory management stability.
Set
XX:InitiatingHeapOccupancyPercent=30
.
XX:G1HeapRegionSize
In large heaps, it may be set to the maximum size of 32 MB, but explicitly reducing it to 16 MB divides the heap into more regions, allowing for more granular GC.
Set
XX:G1HeapRegionSize=16M
.
XX:ParallelGCThreads
andXX:ConcGCThreads
Allocate an appropriate number of threads for GC tasks to improve GC efficiency.
XX:ParallelGCThreads
: Set equal to the total number of CPU cores (48).XX:ConcGCThreads
: Set to ¼ ofParallelGCThreads
.Set
XX:ParallelGCThreads=48
.Set
XX:ConcGCThreads=12
.
XX:+AlwaysPreTouch
Pre-allocate heap memory at JVM startup to prevent delays due to page faults.
While it increases JVM startup time when using large heap memory, there can be runtime performance benefits.
✅ Final JVM Options for Trino Worker (G1 GC Version)
-Xmx410G -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:ReservedCodeCacheSize=512M -Dpresto-temporarily-allow-java8=true -Djdk.attach.allowAttachSelf=true -XX:InitiatingHeapOccupancyPercent=30 -XX:MaxGCPauseMillis=500 -XX:G1HeapRegionSize=16M -XX:ParallelGCThreads=48 -XX:ConcGCThreads=12 -Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=20M
Trino Worker JVM Configuration in Detail (for Optimal Performance, ZGC Version)
Java 11 or higher is required: ZGC was introduced in JDK 11 and became more stable and improved in JDK 17 LTS version. Therefore, it is recommended to use OpenJDK 17.
- Trino version 356 or higher supports JDK 11 and JDK 17.
ZGC is designed to maintain extremely low GC pause times (usually below 1–2 ms), regardless of the heap memory size.
Even with heap memory of several hundred GB, it provides consistent performance.
This is because ZGC performs most GC tasks concurrently with application threads.
Remove unnecessary options from the existing G1 GC options:
Remove:
-XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:G1HeapRegionSize=16M -XX:InitiatingHeapOccupancyPercent=30 -XX:ParallelGCThreads=48 -XX:ConcGCThreads=12
Remove Java 8 related options:
Remove:
-Dpresto-temporarily-allow-java8=true
Remove other unnecessary options:
Since ZGC by default avoids Full GC, the following may be unnecessary:
-XX:+ExplicitGCInvokesConcurrent
Modify existing JVM options for ZGC:
Activate ZGC:
-XX:+UseZGC
- In JDK 17 or higher, the
XX:+UnlockExperimentalVMOptions
option is not required.
- In JDK 17 or higher, the
Xms
andXmx
In ZGC, set the minimum and maximum heap values the same to avoid overhead from memory reallocation.
Set
Xms410g
for minimum heap.Set
Xmx410g
for maximum heap.
GC log
-Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=20M
XX:+UseLargePages
- Use large page memory to reduce TLB (Translation Lookaside Buffer) miss rate and improve memory access performance.
XX:+AlwaysPreTouch
- At JVM initialization, touch all pages in the heap to prevent page faults.
XX:+ParallelRefProcEnabled
- Enable parallel processing of references to improve GC performance.
✅ Final JVM Options for Trino Worker (ZGC Version)
-Xms410G -Xmx410G -XX:+UseZGC -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:ReservedCodeCacheSize=512M -Djdk.attach.allowAttachSelf=true -XX:+UseLargePages -XX:+AlwaysPreTouch -XX:+ParallelRefProcEnabled -Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=20M
✅ Optimization at the OS Level
- You need permissions to modify Linux kernel parameters.
Disable Transparent Huge Pages (THP)
THP can have a negative impact on performance, so it's better to disable it.
echo never > /sys/kernel/mm/transparent_hugepage/enabled
To apply automatically at boot, add the command to
/etc/rc.local
or a system configuration file.
Enable Huge Pages
Using huge pages reduces TLB miss rates, improving memory access performance.
Configuration method:
Calculate the required number of pages considering the heap memory size (410 GB).
For example, using the default huge page size of 2 MB:
Required number of pages = (410 GB * 1024 MB/GB) / 2 MB = 209,920
Add the following to the
/etc/sysctl.conf
file:vm.nr_hugepages=209920
Apply the settings by executing:
sudo sysctl -p
Add JVM options:
-XX:+UseLargePages -XX:LargePageSizeInBytes=2m
If the system supports 1 GB huge pages, you can add:
-XX:LargePageSizeInBytes=1g
- In this case, you need to recalculate the
vm.nr_hugepages
value.
- In this case, you need to recalculate the
Disable Swap Memory
Swap usage negatively affects GC performance.
Add the following to the
/etc/sysctl.conf
file:vm.swappiness=0
Apply the settings by executing:
sudo sysctl -p
Adjust ulimit
Settings
You may need to increase file descriptor and process limits.
Increase file descriptor limit (
ulimit -n
):ulimit -n 65536
Increase process and thread limit (
ulimit -u
):ulimit -u 4096
Memory Lock
Set
memlock
:When using huge pages, you need to increase the memory lock limit (mandatory).
Add the following to the
/etc/security/limits.conf
file:soft memlock unlimited hard memlock unlimited
NUMA Settings
NUMA Binding
In multi-socket servers, you can bind processes to NUMA nodes to minimize memory access latency.
Execute Trino with
numactl
as follows:numactl --interleave=all <trino command>
Potential Issues When Using ZGC
File Descriptor and Thread Shortages
Possible errors:
"Too many open files"
"Unable to create new native thread"
Performance degradation and system instability:
- Resource limitations may prevent Trino worker nodes from operating normally, causing performance degradation or unexpected shutdowns.
When using ZGC, you may reach the OS resource limits due to large heap memory and parallel GC tasks.
To prevent this and ensure the performance and stability of Trino worker nodes, you need to increase
ulimit
settings.Set appropriate values considering the server's overall resources (CPU, memory, number of file descriptors, etc.) to avoid exhausting system resources by setting
ulimit
too high.Since increasing
ulimit
values may be restricted for security or stability reasons in some environments, consult with system administrators or security teams before making changes.
Reference Documents
Subscribe to my newsletter
Read articles from Gyuhang Shim directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Gyuhang Shim
Gyuhang Shim
Gyuhang Shim Passionate about building robust data platforms, I specialize in large-scale data processing technologies such as Hadoop, Spark, Trino, and Kafka. With a deep interest in the JVM ecosystem, I also have a strong affinity for programming in Scala and Rust. Constantly exploring the intersections of high-performance computing and big data, I aim to innovate and push the boundaries of what's possible in the data engineering world.