Storage Modes, Configuration Options and Authentication Flavours

Nikhil SinhaNikhil Sinha
8 min read
Storage Modes, Configuration Options and Authentication Flavours | Parseable

Parseable is a cloud native log data analytics engine, written in Rust and uses Apache Arrow and Parquet as underlying data structures. Its ability to interface with cloud storage solutions like AWS S3 and Azure Blob Storage enables long-term, cost-effective log retention.

This guide provides an in-depth overview of the configuration parameters and authentication methods Parseable supports for these cloud storage providers, along with advanced settings available with Parseable.

This guide covers each storage provider's connection and authentication settings, making it easy to configure Parseable to meet your needs.

AWS S3 Configuration for Parseable

Parseable provides comprehensive support for connecting to AWS S3 or S3 compatible e.g. MinIO. This section outlines the mandatory parameters, supported authentication methods, and additional configurations to fine-tune the S3 connection.

Mandatory Environment Variables for AWS S3

To establish a connection to AWS S3, you need to set the following mandatory environment variables:

  • P_S3_URL: The endpoint for AWS S3 or compatible storage. Defaults to the region-based endpoint (e.g., s3.us-east-2.amazonaws.com for the us-east-2 region).
  • P_S3_REGION: Specifies the AWS region where the S3 bucket is located. Please refer to Amazon Simple Storage Service endpoints and quotas for the regions and their respective endpoints.
  • P_S3_BUCKET: Defines the S3 bucket name where Parseable will store log data.

Authentication Options for AWS S3

Parseable supports multiple authentication mechanisms for AWS S3, offering flexibility for different deployment environments:

  • Access Key and Secret Key: Add the environment variables P_S3_ACCESS_KEY and P_S3_SECRET_KEY. The AWS access key paired with the secret key is used to authenticate to AWS and access S3 bucket based on the permissions provided. This is essential if Parseable is running outside of AWS EC2 or ECS, where IAM roles are unavailable.

  • IMDSV1 Fallback: For Parseable instances running on EC2, AWS credentials can be sourced from the Instance Metadata Service (IMDS), avoiding the need for explicit P_S3_ACCESS_KEY and P_S3_SECRET_KEY. To enable this, set the environment variable P_AWS_IMDSV1_FALLBACK and configure your EC2 instance to allow Instance Metadata (IMDS) access. First, you need to enable Instance Metadata Service (IMDS) when creating your EC2 instance (under Advanced details section) which is required to obtain the credentials.

Secondly, select the Metadata version to V1 and V2 (token optional). Please refer to the metadata service docs for more.

  • Metadata Endpoint: Add the optional environment variable P_AWS_METADATA_ENDPOINT. This configuration option is used to specify a custom endpoint URL for retrieving instance metadata. By default, Parseable uses the standard AWS metadata endpoint.

The default to the IPv4 endpoint: http://169.254.169.254 The default to the IPv6 endpoint: http://fd00:ec2::254

This configuration is particularly useful when working with a custom setup or infrastructure where metadata needs to be accessed differently than AWS’s default setup.

  • Container Credentials Relative URI: Add the optional environment variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI. If you plan to run the Parseable server on Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS), you can set the container credentials relative URI from the environment variable which is used by AWS to grant temporary, scoped credentials to applications running within a container (like ECS tasks) without requiring hardcoded AWS credentials. The ContainerCredentialsRelativeUri key is used to set the path for obtaining AWS credentials when running in a containerized environment.

Advanced Configurations for AWS S3

Parseable provides several advanced configuration options for AWS S3, which are especially useful for development or custom storage setups:

  • Allow HTTP: This allows users to use HTTP or HTTPS protocol, this can be useful in development or testing environments where SSL is not configured or for local testing with S3-compatible storage services, like MinIO, that might run on HTTP.

  • Connect Timeout: Parseable sets the connect timeout of 5 secs which means if the connection from Parseable to your S3 bucket is not successful within 5 secs, the operation timed out and returns an error. This can be critical for ensuring that the server doesn’t hang indefinitely when attempting to connect to the S3 bucket.

  • Timeout: Parseable sets a maximum duration for any S3 operation to 300 secs after which the operation timed out and throws an error. This setting defines the total allowed time for the request, including connection establishment, data transfer and response handling.

  • Allow Invalid Certificates: Parseable uses this configuration option to bypass SSL/TLS certificate validation. It is useful in development environments, testing scenarios or when connecting to AWS servers with self-signed or untrusted certificates. You can set the environment variable P_S3_TLS_SKIP_VERIFY to true to enable this setting.

  • SSE-C Encryption Key: SSE-C allows you to provide your own encryption key for encrypting data instead of letting AWS manage encryption key. The encryption key must be a 256-bit key for AES-256 encryption.

To add SSE-C encryption key, add the environment variable P_S3_SSEC_ENCRYPTION_KEY before starting the server. The value should be in the format - SSE-C:AES256:<base64_encryption_key>. Note that SSE-C requires HTTPS and Amazon S3 or compatible service might reject any requests made over HTTP when using SSE-C.

  • Send Checksum Header: This config allows you to set the checksum algorithm SHA256 which has to be used for object integrity check during upload. Add the optional environment variable P_S3_CHECKSUM to true to use this setting. By default, the set checksum property is set to false.

You can find more details about the object integrity in below link - https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

  • Virtual Hosted Style Access: Add the optional environment variable P_S3_PATH_STYLE to false. By default this property is set to true. If set to false, virtual hosted style request is used i.e. the endpoint (P_S3_URL should have bucket name included) else, path style request is used.

  • Retry Config: Below retry configuration has been added in Parseable in the connection property to AWS S3 -

    • Max Retries = 5, this sets the maximum number of times to retry a request
    • Retry Timeout = 120 secs, this sets the maximum length of time from the initial request after which no further retries will be attempted and server throws an error, this also bounds the length of time a request’s credentials must remain valid.
    • Backoff Config -
      • Initial Backoff = 100 ms, this sets the initial delay before the first retry is attempted
      • Max Backoff = 15 secs, this sets the maximum length of time between the two retries
      • Base = 2, this is a multiplier to use for the next backoff duration i.e. 2s, 4s, 8s etc.

Azure Blob Storage Configuration for Parseable

Parseable also supports multiple authentication options and key configuration parameters for Azure Blob Storage.

Mandatory Environment Variables for Azure Blob Storage

To establish a connection to Azure Blob Storage, you need to set the following mandatory environment variables:

  • P_AZR_URL: The endpoint for Azure Blob Storage, accessible in the Azure portal under Storage Account > Settings > Endpoints > Primary Endpoint > Blob Service.
  • P_AZR_ACCOUNT: The Azure Storage Account name as specified in the Azure portal.
  • P_AZR_CONTAINER: The container name where log data will be stored, as set in your Azure Storage Account.

Authentication Options for Azure Blob Storage

Azure Blob Storage offers both access key-based and Azure AD-based authentication:

  • Storage Account Access Key: Use P_AZR_ACCESS_KEY in environment variable for access key-based authentication, available under the Security + Networking section of your storage account in the Azure portal.

  • Client ID, Client Secret, and Tenant ID: For applications registered in Azure Active Directory, add P_AZR_CLIENT_ID, P_AZR_CLIENT_SECRET, and P_AZR_TENANT_ID in environment variables to enable client-secret authorization. Client ID (Application ID) is generated when you create an application in Azure AD. Client Secret is a secret string that the application uses to prove its identity when requesting a token. Also can be referred to as application password.

This can be added from the Manage -> Certificates & Secrets page of the registered app in Azure AD. Every Azure AD instance is identified by a unique identifier called the Tenant ID which is associated with an organization’s account. The Tenant ID can be retrieved from the Manage -> Properties -> Tenant ID section in the Azure AD.

Advanced Configuration for Azure Blob Storage

Parseable provides several advanced configuration options for Azure Blob Storage, which are especially useful for development or custom storage setups:

  • Allow HTTP: This allows users to use HTTP or HTTPS protocol, this can be useful in development or testing environments where SSL is not configured or for local testing that might run on HTTP.

  • Connect Timeout: Parseable sets the connect timeout of 5 secs which means if the connection from Parseable to your Azure Blob Storage is not successful within 5 secs, the operation timed out and returns an error. This can be critical for ensuring that the server doesn’t hang indefinitely when attempting to connect to the Azure Blob Storage.

  • Timeout: Parseable sets a maximum duration for any Blob Store operation to 300 secs after which the operation timed out and throws an error. This setting defines the total allowed time for the request, including connection establishment, data transfer and response handling.

  • Retry Config: Below retry configuration has been added in Parseable in the connection property to Azure Blob Storage -

    • Max Retries = 5, this sets the maximum number of times to retry a request
    • Retry Timeout = 120 secs, this sets the maximum length of time from the initial request after which no further retries will be attempted and server throws an error, this also bounds the length of time a request’s credentials must remain valid.
    • Backoff Config -
      • Initial Backoff = 100 ms, this sets the initial delay before the first retry is attempted
      • Max Backoff = 15 secs, this sets the maximum length of time between the two retries
      • Base = 2, this is a multiplier to use for the next backoff duration i.e. 2s, 4s, 8s etc.

Conclusion

This guide covers the detailed configurations and options Parseable provides for connecting to AWS S3 and Azure Blob Storage. From authentication and endpoint settings to retry and backoff configurations, Parseable allows seamless integration with cloud providers, making it an optimal choice for scalable and durable log data management.

For more specific environment variable details, refer to Parseable documentation on AWS S3 Configuration and Azure Blob Storage Configuration.

0
Subscribe to my newsletter

Read articles from Nikhil Sinha directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nikhil Sinha
Nikhil Sinha