Trino (TSF) Installation and Configuration

Gyuhang ShimGyuhang Shim
19 min read

Understanding the Clear Reasons for Using Trino

Since the installation and configuration methods of Trino vary depending on how you use it, it's essential to first clearly understand the reasons for using it.

  • Determine if you need to fetch data from various data sources (MySQL, PostgreSQL, HDFS, Amazon S3, Cassandra, Kafka, etc.) and integrate them into a single query for analysis.

    • Analyze whether you need to integrate and analyze data from a Data Lake (HDFS) and Relational Databases (e.g., MySQL) in a Business Intelligence Platform.

    • Assess if data integration and analysis between multiple cloud environments or between on-premises and cloud are required.

  • Determine if you don't need to perform queries where data locality is important, as with Impala (for Trino).

    • Since Trino operates based on in-memory processing, some queries may be relatively slower than Impala.
  • Assess whether you need to provide data analysis results on a real-time dashboard on a per-second basis.

  • Determine if data scientists need to perform rapid data exploration and prototyping.

  • Assess whether you need ad-hoc analysis or data validation without requiring complex batch processing.

    • For complex batch processing, Spark is more advantageous.
  • Determine if simple data processing and aggregation tasks are needed in the data pipeline.

  • Assess whether sophisticated resource management and query prioritization are required.

    • Impala supports resource management to some extent, but it is challenging to manage as precisely as Trino.

    • Trino allows you to set query priorities through resource groups and allocate resources according to various users and departments.

  • Determine if flexibility in a multi-tenant environment is needed.

    • In cases where various teams and departments in a large enterprise with multiple users need to perform data analysis simultaneously.

    • Processing multiple user requests simultaneously in environments with resource constraints.


Criteria for Choosing Between Impala, Spark, and Trino

Impala

  • Suitable for quickly processing large-scale Hive tables or HDFS-based large data interactively.

  • Highly efficient in moving computation to DataNodes by utilizing data locality.

  • Deeply integrated into the Hadoop environment.

Spark

  • Suitable for large-scale batch data processing, ETL tasks, and complex data transformations.

  • Appropriate for handling machine learning workloads using libraries like MLlib.

Trino

  • Suitable when integrated queries across various data sources are needed.

  • Appropriate for low-latency interactive analysis and lightweight ad-hoc queries.

  • Suitable for multi-user environments where resource management is required.


Trino Installation by Apache Ambari

If you've decided to install Trino, it is recommended to proceed as follows:

  • Trino can be installed in various ways, but to operate in a multi-tenant environment, it shines when installed using Ambari and combined with Ranger.

  • For other installation methods, refer to the Trino documentation:

    • Deploying Trino

    • Trino in a Docker container

    • Trino on Kubernetes with Helm

    • RPM package

The Trino installation is described based on a Kerberized Hadoop (Secured Hadoop).

  • The Trino cluster runs on the JVM by default.

    • An appropriate JDK or JRE must be set up. Java 11 or higher is recommended.

Trino Installation Step by Step

  1. Install the Trino package in Ambari to configure the Trino cluster.

    • The Trino cluster consists of a coordinator and workers.
  2. After installing Trino, install the Ranger Trino Plugin where Trino is installed.

  3. For Trino security, it is recommended to follow the security workflow suggested in the Trino documentation.

    • Enable TLS/HTTPS by configuring certificates and a load balancer. (Mandatory)

    • Set up a shared secret. (Mandatory)

    • Configure an authentication provider like LDAP. (Mandatory)

    • Avoid using Kerberos settings unless necessary, as they are complex and challenging.

    • Optionally, implement authorization for users accessing URLs.

  4. Create a Kerberos principal for Trino and distribute the generated keytab files to the coordinator and workers.

    • Reference: Kerberos authentication
  5. Distribute the related configuration files for Trino authentication.

    • Create and configure the password-authenticator.properties file on the coordinator server.

      • The content of the password-authenticator.properties file is as follows (for authentication using LDAP):

          # $TRINO_HOME/etc/password-authenticator.properties
          password-authenticator.name=ldap
          ldap.url=ldap://<ldap-host>:389
          ldap.allow-insecure=true
          ldap.user-bind-pattern=uid=${USER},ou=users,dc=company,dc=com
        
  6. Configure the Trino connector (explained based on the Hive connector).

    • Create and configure the file in the format $TRINO_HOME/etc/catalog/hive.properties.

    • Refer to the Trino Hive Connector documentation.

        connector.name=hive
        hive.metastore.uri=thrift://<HMS-server>:9083
        hive.config.resources=$HADOOP_HOME/conf/core-site.xml,$HADOOP_HOME/conf/hdfs-site.xml
        hive.metastore.authentication.type=KERBEROS
        hive.metastore.service.principal=hive/_HOST@<REALM>
        hive.metastore.client.principal=trino@<REALM>
        hive.metastore.client.keytab=<principal keytab path>
        hive.hdfs.authentication.type=KERBEROS
        hive.hdfs.impersonation.enabled=true
        hive.hdfs.trino.principal=trino@<REALM>
        hive.hdfs.trino.keytab=<principal keytab path>
      
  7. Ensure that it works well with Apache Ranger.

    • Verify that the Ranger Trino Plugin is operating correctly.

    • Create policies in Ranger and confirm that users authenticated via LDAP can exercise appropriate permissions according to each policy.

    • Check that audit logs are properly recorded in Trino.


Trino Configuration

The important aspects in Trino settings are resource configuration, coordinator settings, and worker settings.


Trino Resource Configuration

  • To manage resources in Trino, there is a resource group feature.

  • To apply resource groups, you need to set up a resource group manager.

    • The resource group manager can be file-based or use a database.
  • To set it up based on files, create and configure the $TRINO_HOME/etc/resource-groups.properties file.

    • Here, we'll explain how to set it up using a file-based approach.

        # resource-groups.properties
        resource-groups.configuration-manager=file
        resource-groups.config-file=$TRINO_HOME/etc/resource-groups.json
      
    • The resource-groups.config-file can be written in JSON format.

  • The resource group config file uses the concept of resource group properties for configuration.

    • For detailed explanations, refer to the Trino Documentation.
  • Explanation of essential fields:

    • name

      • Name of the group.
    • maxQueued

      • Maximum number of queued queries.

      • Once this limit is reached, new queries are rejected.

    • hardConcurrencyLimit

      • Maximum number of running queries.
    • softMemoryLimit

      • Maximum amount of distributed memory this group may use before new queries become queued.

      • May be specified as an absolute value (e.g., 1GB) or as a percentage (e.g., 10%) of the cluster’s memory.

  • Sample configuration file:

      {
        "rootGroups": [
          {
            "name": "ADD_HOC",
            "maxQueued": 50,
            "hardConcurrencyLimit": 20,
            "softMemoryLimit": "15%"
          },
          {
            "name": "BI",
            "maxQueued": 80,
            "hardConcurrencyLimit": 40,
            "softMemoryLimit": "20%"
          }
        ]
      }
    
  • Characteristics of resource groups:

    • You can enforce resource usage limits and queue policies.

    • It is possible to divide usage into sub-groups.

    • A query belongs to one resource group and consumes resources from that group.

    • If there are insufficient available resources, new queries are queued, but running queries are not failed.

      • Instead, all new queries remain in a queued state.
    • A resource group can either have sub-groups or accept queries directly, but not both simultaneously.

      • It is impossible to process queries while having sub-groups.
  • There are restrictions on resource group names:

    • When resource groups are structured in a tree, sub-group names cannot duplicate names of other sibling sub-groups.

      • For example, a.b, a.b → Not allowed.
    • Each node in the tree must have a unique path, and sub-groups under the same parent must have different names.

      • For example, a.b, a.c → Valid.
    • Even if sub-groups under different parent nodes have the same name, they are identified by a unique full path, so conflicts do not occur.

      • For example, a.b.d, a.c.d → Valid.
  • After setting up resource groups, you need to configure selectors.

    • For detailed explanations, refer to Trino Selector rules documentation.

    • Selector rules use the regular expression syntax of the Java programming language.

    • In selector rules, group is mandatory.

      • group: The group these queries will run in.
    • By default, all rules are combined with AND.

    • clientTags are mainly used in JDBC connection settings to apply rules.

      • There are restrictions on clientTags.

        • Disallowed characters:

          • Whitespace, comma (,), slash (/), colon (:)

Trino Impersonation Configuration (Optional)

  • To apply impersonation when accessing HDFS from Trino, you need to set up proxy user settings.

  • Here's how to set it up in HDFS as an example:

    • In Ambari → HDFS → CONFIGS → ADVANCED → core-site.xml settings

        hadoop.proxyuser.trino.groups: *
        hadoop.proxyuser.trino.hosts: *
      
    • Since core-site.xml has been modified, you need to restart the NameNode for the changes to take effect.


Trino Client Configuration

  • To properly apply resource groups, you also need to configure the client side.

    • One convenient method is to set clientTags in the client connector.

Trino Connection for JDBC Driver

  • The Trino JDBC driver has the following requirements:

    • Java version 8 or higher.

    • All users that connect to Trino with the JDBC driver must be granted access to query tables in the system.jdbc schema.

  • Any JDBC driver that meets the above Trino JDBC requirements can be used.

  • Using trino-cli provided by Trino

    • Download the trino-cli executable jar.

    • You can connect to the Trino server as follows:

        ./trino --server <http://trino.example.com:8080>
      
    • Note: Authentication (e.g., Kerberos) may be required to connect.

    • Reference: https://trino.io/docs/current/client/cli.html

  • Using the Trino Python Client

    • Client for Trino, a distributed SQL engine for interactive and batch big data processing.

    • Provides a low-level client, a DBAPI 2.0 implementation, and a SQLAlchemy adapter. It supports Python >= 3.8 and PyPy.

    • trino.sqlalchemy is compatible with the latest 1.3.x, 1.4.x, and 2.0.x SQLAlchemy versions at the time of release of a particular version of the client.

    • To connect to Trino using SQLAlchemy, use a connection string (URL) following this pattern:

        trino://<username>:<password>@<host>:<port>/<catalog>/<schema>
      
    • In order to pass additional connection attributes, use the connect_args method.

      • Attributes can also be passed in the connection string.
    • Example code for connecting using the Python client:

        from sqlalchemy import create_engine
        from trino.sqlalchemy import URL
      
        engine = create_engine(
            URL(
                host="localhost",
                port=8080,
                catalog="system"
            ),
            connect_args={
              "session_properties": {'query_max_run_time': '1d'},
              "client_tags": ["tag1", "tag2"],
              "roles": {"catalog1": "role1"},
            }
        )
      
        # or in connection string
        engine = create_engine(
            'trino://user@localhost:8080/system?'
            'session_properties={"query_max_run_time": "1d"}'
            '&client_tags=["tag1", "tag2"]'
            '&roles={"catalog1": "role1"}'
        )
      
        # or using the URL factory method
        engine = create_engine(URL(
          host="localhost",
          port=8080,
          client_tags=["tag1", "tag2"]
        ))
      
    • Reference: https://github.com/trinodb/trino-python-client


Trino Coordinator Configuration

The Trino coordinator server plays a central role in the cluster, so its hardware specifications must be carefully considered.

  • The Trino coordinator handles query scheduling, resource management, user request processing, and various critical roles, requiring appropriate hardware specifications accordingly.

  • Recommended hardware specs for a server to be used as the Trino coordinator:

    CPU

    • CPU Cores: At least 8–16 cores are recommended.

      • The coordinator handles query planning, analysis, and resource scheduling, so performance improves significantly in a multi-core environment.
    • CPU Clock Speed: High-clock-speed latest Intel Xeon or AMD EPYC series are suitable.

    • Hyper-Threading: Trino tends to generate many threads, so hyper-threading can help performance.

Memory

  • RAM Size: At least 256 GB is recommended.

    • The coordinator uses a large memory pool to manage query planning and execution, so more memory is needed when processing large-scale queries.
  • Memory Speed: Using high-speed memory of DDR4 2933 MHz or higher is advantageous—the faster, the better.

  • Disable Swap Usage

    • It is advisable to configure the OS not to use swap memory.

    • Performance degradation can be severe if swapping occurs.

Disk

  • Disk Type: NVMe or SSD (Solid State Drive) is the best choice.

    • The coordinator frequently accesses log records and metadata, so disk I/O performance is critical.
  • RAID Configuration: RAID 1+0 configuration is recommended.

    • It is optimized for both durability and read/write performance.
  • Disk Size: At least 1 TB is recommended.

Disk Partition Configuration

  • It's advisable to use separate partitions to prevent conflicts between the OS and data.

    • Root Partition (for OS): Configure at least 100 GB to ensure stable operation of the OS and other services.

    • /var/log Partition: Allocate sufficient disk space to /var/log since Trino generates many logs. At least 200 GB is recommended.

    • /data Partition: Configure a separate partition to store Trino's metadata and cache data.

Network

  • Since the coordinator exchanges data with multiple worker nodes, network performance is important.

  • A NIC (Network Interface Card) with a bandwidth of 10 GbE or higher is recommended.

  • Example of a typical Trino coordinator server spec (this article assumes this level of server spec):

    • CPU: Intel Xeon Gold 6248 2.5 GHz, 24 cores/48 threads

    • Memory: 512 GB DDR4 3200 MHz

    • Disk: 2 TB NVMe SSD, RAID 1+0 configuration

    • Network: 10 GbE network interface

    • OS: CentOS 7.9, Ubuntu 20.04 LTS, Red Hat Enterprise Linux 9

      • Linux-based OS is appropriate.

        • It's common to use CentOS 7, Ubuntu 18.04 or higher, or Red Hat Enterprise Linux (RHEL).

        • A Linux-based environment is most suitable in terms of stability and performance optimization.

Kernel Settings

  • Adjust the TCP buffer size to optimize network performance.

  • Adjust settings like vm.swappiness=0 for I/O performance.

JVM Settings

  • It is appropriate to use Java versions like OpenJDK 11 or 17, and you should optimize JVM options to efficiently manage memory and garbage collection (GC).

Resource Isolation

  • You can utilize cgroups to limit CPU and memory to prevent the Trino coordinator from consuming excessive resources.

    • This can prevent resource conflicts between the coordinator and other system processes.

Configure High Availability (HA)

  • The Trino coordinator is a Single Point of Failure (SPOF), so you need to prepare for failures.

  • Set up multiple coordinator servers using a load balancer like HAProxy.


Trino Coordinator JVM Configuration in Detail (for Optimal Performance)

  • Typically, the coordinator uses less memory than workers.

  • An unnecessarily large heap size can increase GC pause times.

  • Xmx

    • Assuming a cluster of about 30 workers, an optimal value is around 60 GB.

    • Since the heap memory size is 60 GB, G1 GC is appropriate.

    • Set Xmx60g.

  • XX:InitiatingHeapOccupancyPercent

    • The default value is 45%, but adjust it to 30%–40% to optimize for the coordinator's role.

    • Starting GC early when heap usage is low can detect memory leaks and improve memory management efficiency.

    • Set XX:InitiatingHeapOccupancyPercent=35.

  • XX:MaxGCPauseMillis

    • Set the target GC pause time to increase the coordinator's responsiveness.

    • The default is 200 ms, but adjust to 100 ms–200 ms for slightly faster response times.

    • Since the coordinator performs latency-sensitive tasks like query planning and scheduling, shorter pause times are desirable.

    • Set XX:MaxGCPauseMillis=150 as a realistic adjustment.

  • XX:G1HeapRegionSize

    • By default, G1 GC automatically determines the heap region size based on the heap size.

    • An appropriate ratio is 1/6400 of the heap size (60 GB / 6400 ≈ 9.6 MB).

    • Set XX:G1HeapRegionSize=10M.

  • Add XX:+AlwaysPreTouch

    • Pre-allocate heap memory at JVM startup to improve runtime performance.

    • Activate XX:+AlwaysPreTouch.

  • XX:ParallelGCThreads and XX:ConcGCThreads

    • Adjust the number of GC threads according to the system's CPU cores.

    • Set XX:ParallelGCThreads=48 to match the number of CPU threads.

    • Set XX:ConcGCThreads=12 to about ¼ of the CPU threads.

  • XX:+UseStringDeduplication

    • Reduce memory usage by deduplicating identical strings in G1 GC.

    • Activate XX:+UseStringDeduplication since similar queries are likely to be input repeatedly.

  • Xlog

    • Log GC activities to monitor whether GC is performing appropriately.

    • Set:

        -Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=10M
      
  • ✅ Final JVM Options for Trino Coordinator

      -Xmx60G
      -XX:+UseG1GC
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:+ExitOnOutOfMemoryError
      -XX:ReservedCodeCacheSize=512M
      -Dpresto-temporarily-allow-java8=true
      -Djdk.attach.allowAttachSelf=true
      -XX:InitiatingHeapOccupancyPercent=35
      -XX:MaxGCPauseMillis=150
      -XX:G1HeapRegionSize=10M
      -XX:ParallelGCThreads=48
      -XX:ConcGCThreads=12
      -XX:+AlwaysPreTouch
      -XX:+UseStringDeduplication
      -Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=20M
    

Trino Worker Configuration

The Trino worker server plays a crucial role in processing large volumes of data and executing queries.

  • The performance of worker nodes is significantly affected by traffic volume, query complexity, and the size of the data being processed, so hardware specifications must be carefully configured.

  • Typically, servers with similar specs to the coordinator are used, but it's better if the worker has more CPU clock speed, more cores, and larger memory than the coordinator.

  • Recommended hardware specs for a server to be used as a Trino worker:

    CPU

    • CPU Cores: At least 16 cores are required, and 24–32 cores are recommended if possible.

      • Since parallel processing is critical for workers, multi-core CPUs are essential.
    • CPU Clock Speed: For high-performance queries, the latest CPUs with high clock speeds are advantageous.

      • Intel Xeon Scalable series or AMD EPYC processors are suitable.
    • Hyper-Threading: Since workers use many threads, hyper-threading can help improve performance.

Memory

  • RAM Capacity: Since worker nodes primarily use memory pools for data processing, at least 256 GB of RAM is needed. For large-scale data processing tasks, 512 GB or more may be required.

  • Memory Speed: A memory speed of DDR4 2933 MHz or higher is recommended. The memory bandwidth of worker nodes directly impacts query performance.

Disk

  • Disk Type: NVMe SSDs or high-performance SATA SSDs are suitable.

    • Since workers heavily rely on disk I/O, using high-speed SSDs is recommended for fast data access and processing performance.
  • RAID Configuration: RAID 1+0 is recommended.

  • Disk Size: Each worker node requires at least 1 TB of disk space. Ample free space is necessary, especially when processing large datasets.

Disk Partition

  • It's advisable to use separate partitions to optimize disk I/O performance on worker nodes.

    • Root Partition (for OS): Allocate at least 100 GB to ensure stable operation of the OS and basic services.

    • /data Partition: Allocate sufficient disk space to store data and intermediate results processed by the worker.

    • /var/log Partition: Since many log files can accumulate, it's good to allocate a separate space of 100–200 GB.

Network

  • Since worker nodes exchange a lot of network traffic, a minimum of 10 GbE network or higher is recommended.

  • Low Latency: A low-latency network setup with fast switching environments is necessary to minimize network latency.

  • Example of typical Trino worker server specs (this article assumes workers with this level of spec):

    • CPU: Intel Xeon Gold 6248R 2.4 GHz, 24 cores/48 threads

    • Memory: 512 GB DDR4 3200 MHz

    • Disk: 2 TB NVMe SSD, RAID 1+0 configuration

    • Network: 10 GbE network interface

    • OS: CentOS 7.9, Ubuntu 20.04 LTS, Red Hat Enterprise Linux 9

      • Set up OS, JVM, and resource isolation at the same level as the coordinator.

Storage & Caching

  • Local Disk Storage: Workers process temporary data on local disks, so fast I/O performance is crucial.

    • Using SSDs or NVMe disks significantly improves intermediate data storage and processing performance.
  • Distributed Storage Integration: When using distributed file systems like HDFS or S3, you can improve performance by appropriately utilizing local cache disks.


Trino Worker JVM Configuration in Detail (for Optimal Performance, G1 GC Version)

  • Trino workers process large volumes of data, and memory usage patterns can change rapidly.

  • Since Trino generally processes large-scale queries, overall throughput is important.

  • However, if the worker node's responsiveness becomes excessively slow, it can affect the performance of the entire cluster.

  • Workers can use G1 GC like the coordinator, but performance can be further improved by using ZGC.

  • Xmx

    • It's stable to use about 70–80% of the physical memory capacity, leaving the rest for the OS and other processes.

    • Maximum available worker memory = 512 GB × 0.8 = 409.6 GB

    • Set Xmx410g.

  • XX:MaxGCPauseMillis

    • Set the maximum GC pause time to improve application responsiveness.

    • In large heaps, it's challenging to maintain pause times below 200 ms.

    • Set a realistic goal of 500 ms for the pause time to allow G1 GC to balance between responsiveness and throughput.

    • Set XX:MaxGCPauseMillis=500.

  • XX:InitiatingHeapOccupancyPercent

    • Adjust the heap usage level at which concurrent GC starts to optimize GC frequency.

    • In large heaps (410 GB), heap usage can increase rapidly, so start the GC cycle early to prevent out-of-memory situations.

    • Start GC when heap usage reaches about 30%, approximately 123 GB, to enhance memory management stability.

    • Set XX:InitiatingHeapOccupancyPercent=30.

  • XX:G1HeapRegionSize

    • In large heaps, it may be set to the maximum size of 32 MB, but explicitly reducing it to 16 MB divides the heap into more regions, allowing for more granular GC.

    • Set XX:G1HeapRegionSize=16M.

  • XX:ParallelGCThreads and XX:ConcGCThreads

    • Allocate an appropriate number of threads for GC tasks to improve GC efficiency.

    • XX:ParallelGCThreads: Set equal to the total number of CPU cores (48).

    • XX:ConcGCThreads: Set to ¼ of ParallelGCThreads.

    • Set XX:ParallelGCThreads=48.

    • Set XX:ConcGCThreads=12.

  • XX:+AlwaysPreTouch

    • Pre-allocate heap memory at JVM startup to prevent delays due to page faults.

    • While it increases JVM startup time when using large heap memory, there can be runtime performance benefits.

  • ✅ Final JVM Options for Trino Worker (G1 GC Version)

      -Xmx410G
      -XX:+UseG1GC
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:+ExitOnOutOfMemoryError
      -XX:ReservedCodeCacheSize=512M
      -Dpresto-temporarily-allow-java8=true
      -Djdk.attach.allowAttachSelf=true
      -XX:InitiatingHeapOccupancyPercent=30
      -XX:MaxGCPauseMillis=500
      -XX:G1HeapRegionSize=16M
      -XX:ParallelGCThreads=48
      -XX:ConcGCThreads=12
      -Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=20M
    

Trino Worker JVM Configuration in Detail (for Optimal Performance, ZGC Version)

  • Java 11 or higher is required: ZGC was introduced in JDK 11 and became more stable and improved in JDK 17 LTS version. Therefore, it is recommended to use OpenJDK 17.

    • Trino version 356 or higher supports JDK 11 and JDK 17.
  • ZGC is designed to maintain extremely low GC pause times (usually below 1–2 ms), regardless of the heap memory size.

    • Even with heap memory of several hundred GB, it provides consistent performance.

    • This is because ZGC performs most GC tasks concurrently with application threads.

  • Remove unnecessary options from the existing G1 GC options:

    • Remove:

        -XX:+UseG1GC
        -XX:MaxGCPauseMillis=500
        -XX:G1HeapRegionSize=16M
        -XX:InitiatingHeapOccupancyPercent=30
        -XX:ParallelGCThreads=48
        -XX:ConcGCThreads=12
      
  • Remove Java 8 related options:

    • Remove:

        -Dpresto-temporarily-allow-java8=true
      
  • Remove other unnecessary options:

    • Since ZGC by default avoids Full GC, the following may be unnecessary:

        -XX:+ExplicitGCInvokesConcurrent
      
  • Modify existing JVM options for ZGC:

    • Activate ZGC:

        -XX:+UseZGC
      
      • In JDK 17 or higher, the XX:+UnlockExperimentalVMOptions option is not required.
    • Xms and Xmx

      • In ZGC, set the minimum and maximum heap values the same to avoid overhead from memory reallocation.

      • Set Xms410g for minimum heap.

      • Set Xmx410g for maximum heap.

    • GC log

        -Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=20M
      
    • XX:+UseLargePages

      • Use large page memory to reduce TLB (Translation Lookaside Buffer) miss rate and improve memory access performance.
    • XX:+AlwaysPreTouch

      • At JVM initialization, touch all pages in the heap to prevent page faults.
    • XX:+ParallelRefProcEnabled

      • Enable parallel processing of references to improve GC performance.
  • ✅ Final JVM Options for Trino Worker (ZGC Version)

      -Xms410G
      -Xmx410G
      -XX:+UseZGC
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:+ExitOnOutOfMemoryError
      -XX:ReservedCodeCacheSize=512M
      -Djdk.attach.allowAttachSelf=true
      -XX:+UseLargePages
      -XX:+AlwaysPreTouch
      -XX:+ParallelRefProcEnabled
      -Xlog:gc*,safepoint:file=/var/log/trino/trino-gc.log:time,uptime,level,tags:filecount=10,filesize=20M
    
  • ✅ Optimization at the OS Level

    • You need permissions to modify Linux kernel parameters.

Disable Transparent Huge Pages (THP)

  • THP can have a negative impact on performance, so it's better to disable it.

      echo never > /sys/kernel/mm/transparent_hugepage/enabled
    
  • To apply automatically at boot, add the command to /etc/rc.local or a system configuration file.

Enable Huge Pages

  • Using huge pages reduces TLB miss rates, improving memory access performance.

  • Configuration method:

    • Calculate the required number of pages considering the heap memory size (410 GB).

      • For example, using the default huge page size of 2 MB:

          Required number of pages = (410 GB * 1024 MB/GB) / 2 MB = 209,920
        
    • Add the following to the /etc/sysctl.conf file:

        vm.nr_hugepages=209920
      
    • Apply the settings by executing:

        sudo sysctl -p
      
    • Add JVM options:

        -XX:+UseLargePages
        -XX:LargePageSizeInBytes=2m
      
      • If the system supports 1 GB huge pages, you can add:

          -XX:LargePageSizeInBytes=1g
        
        • In this case, you need to recalculate the vm.nr_hugepages value.

Disable Swap Memory

  • Swap usage negatively affects GC performance.

  • Add the following to the /etc/sysctl.conf file:

      vm.swappiness=0
    
  • Apply the settings by executing:

      sudo sysctl -p
    

Adjust ulimit Settings

  • You may need to increase file descriptor and process limits.

  • Increase file descriptor limit (ulimit -n):

      ulimit -n 65536
    
  • Increase process and thread limit (ulimit -u):

      ulimit -u 4096
    

Memory Lock

  • Set memlock:

    • When using huge pages, you need to increase the memory lock limit (mandatory).

    • Add the following to the /etc/security/limits.conf file:

        soft memlock unlimited
        hard memlock unlimited
      

NUMA Settings

  • NUMA Binding

    • In multi-socket servers, you can bind processes to NUMA nodes to minimize memory access latency.

    • Execute Trino with numactl as follows:

        numactl --interleave=all <trino command>
      
  • Potential Issues When Using ZGC

    File Descriptor and Thread Shortages

    • Possible errors:

      • "Too many open files"

      • "Unable to create new native thread"

    • Performance degradation and system instability:

      • Resource limitations may prevent Trino worker nodes from operating normally, causing performance degradation or unexpected shutdowns.
  • When using ZGC, you may reach the OS resource limits due to large heap memory and parallel GC tasks.

    • To prevent this and ensure the performance and stability of Trino worker nodes, you need to increase ulimit settings.

    • Set appropriate values considering the server's overall resources (CPU, memory, number of file descriptors, etc.) to avoid exhausting system resources by setting ulimit too high.

    • Since increasing ulimit values may be restricted for security or stability reasons in some environments, consult with system administrators or security teams before making changes.

Reference Documents

0
Subscribe to my newsletter

Read articles from Gyuhang Shim directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gyuhang Shim
Gyuhang Shim

Gyuhang Shim Passionate about building robust data platforms, I specialize in large-scale data processing technologies such as Hadoop, Spark, Trino, and Kafka. With a deep interest in the JVM ecosystem, I also have a strong affinity for programming in Scala and Rust. Constantly exploring the intersections of high-performance computing and big data, I aim to innovate and push the boundaries of what's possible in the data engineering world.