Integrating Amazon Simple Storage Service (Amazon S3) with Megaladata

MegaladataMegaladata
6 min read

Pairing Amazon S3's industry-leading object storage with Megaladata creates a highly efficient data management solution. This article explores three effective methods to integrate these powerful platforms: mounting a file system, using the AWS Command Line Interface (CLI), and connecting via REST API. We'll compare each approach to help you choose the best fit for your specific needs and infrastructure.

Integrating Amazon Simple Storage Service (Amazon S3) with Megaladata

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service known for its industry-leading scalability, data availability, security, and performance. Millions of users across various industries choose it to store, manage, and protect any amount of data for a range of use cases, such as data lakes, cloud-native applications, and mobile apps.

The S3 API has become an industry standard, leading other major cloud providers to offer compatible solutions. Some large providers, like IBM Cloud and Oracle Cloud, offer highly compatible APIs. Others use specific features, such as the interoperability mode in Google Cloud Storage, to enable S3 compatibility. For services like Microsoft Azure Blob Storage, connectivity can be achieved using third-party gateways.

Cost-effective storage classes and robust management functions help users to optimize expenses, organize data, and customize access with necessary flexibility to match the company's needs and legal requirements.

Among the many similar services Amazon S3 stands out in popularity. Some of our customers have found it very efficient to use Megaladata and Amazon S3 as a bundle.

There are several methods to integrate Megaladata with Amazon S3. In this article, we'll highlight three of them that we find most efficient and user-friendly. You can choose the most convenient method according to your needs and infrastructure.

Mounting a file system

One of the ways to integrate Amazon S3 with Megaladata is to mount a file system. Mounting is a process when the operating system adds files and catalogues from the data storage device to the file system of a local user's machine. The file system is added to an empty catalogue and the user can access the data available at the data storage device through the system's file browser.

That enables you to operate data files in Megaladata as though they were stored at your local machine.

This can be achieved by mounting a network resource, using either s3fs or the Mountpoint file client.

The benefit of this type of integration is constant access to the S3 storage due to maintaining an active client-server connection. But this very factor becomes the method's main disadvantage: continuous data synchronization requires a lot of requests, which increases the load on the user's local server.

AWS Command Line Interface

Another integration method is employing AWS Command Line Interface, a console utility to manage Amazon Web Services (AWS) resources. It allows you to schedule file transfer using the command line.

You can use Megaladata's Program Execution node to schedule sending commands to the AWS Command Line Interface. This method is available only for Windows.

If you work on Linux, you will have to employ an additional utility, for example, shell2http, which will send commands from Megaladata to AWS CLI as REST requests. You can use any other utility with similar functionality.

Example 1: To create a S3 bucket, run the following command:

aws s3 mb s3://my-new-bucket

It will create a new S3 bucket named my-new-bucket. Please note that the bucket's name must be globally unique (within the whole S3 configuration).

Example 2: To receive the list of objects in the bucket, run:

aws s3 ls s3://my-bucket/

Using AWS Command Line Interface, you can perform any necessary operations with Amazon S3. This integration method allows you to set up a schedule for sending requests. This eliminates the disadvantage of the previous method (large number of requests), but might not match some specific requirements.

REST API

One more method is integration through REST API. By sending REST requests, Megaladata connects to the Amazon S3 storage. This flexible mechanism allows the user to upload and download files at any time via the Megaladata interface.

To employ this integration method, you need to perform the following:

  • Create a REST service in Megaladata and configure it to connect to Amazon S3. It requires authentication using AWS access keys (Access Key ID and Secret Access Key). This is needed to ensure the security of your data.

  • Within a REST Request node, create a necessary request to Amazon S3.

Executing this request will result in two tables: one with service responses and the other with request execution results, error descriptions, exit codes, and HTTP statuses.

Three methods are mainly used: ListObjectV2, GetObject, and PutObject.

  • ListObjectV2 returns some or all (up to 1,000) of the objects in a bucket with each request. You can use the request parameters as selection criteria to return a subset of the objects in a bucket. A 200 OK response can contain valid or invalid XML. You can analyze it with Megaladata's XML Extraction component.

  • GetObject retrieves an object from Amazon S3. In this request, specify the full key name for the object.

  • PutObject adds an object to a bucket.

Also, Megaladata offers a built-in job scheduler, which can be used to schedule the execution of REST requests.

Amazon S3 and Megaladata integration: Comparison table

CriteriaMounting a file systemAWS CLIREST API
Configuration complexityMedium (s3fs/Mountpoint installation required)High (requires scripts; on Linux — shell2http)Medium (requires API skills)
PerformanceLow (high load)MediumHigh
Ease of useHighMediumLow
Flexibilitylimited (operating objects as files)HighMaximum (full control)
Data accessibilityConstant (as on local disc)On request (scheduled)On request (scheduled)
Operating systemLinux/WindowsWindows/Linux (workaround)Linux/Windows
SecurityDepends on mounting settingsHigh (IAM roles)High (HTTPS + access keys)

When choosing an integration method between Amazon S3 and Megaladata, consider the specifics of your tasks. If you need constant file access, similar to accessing a local disc, mounting a file system is an optimal fit. It is easier to use but may create extra server load.

If you need to schedule data processing, we recommend using AWS CLI. This approach ensures control flexibility by using commands. However, it requires additional configuration, especially in Linux environments.

If your priority is maximum control over operations, REST API would be the best option: It combines efficient use of resources with capabilities of selective data management. However, it requires certain skills in working with S3 API.

Each integration method has its advantages and limitations. We recommend you to base your choice on your task's technical requirements and the infrastructure's specifics.

0
Subscribe to my newsletter

Read articles from Megaladata directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Megaladata
Megaladata