Blogs
byhafiz
S3 Presigned URL for Single and Multipart Upload
September 23, 2023 | Author: Hafiz, CTO at Enygma

Amazon Simple Storage Service (S3) has emerged as a cornerstone solution for scalable and reliable cloud storage, catering to a multitude of use cases.

The common approach that we did is to use AWS CLI or some backend scripts to upload things to the cloud. However, sometimes, we just want to upload files from frontend directly to S3 bucket, without the need to pass the backend first. That will just introduce an unnecessary layer of file transfer.

Storing the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the frontend is a really bad idea as it would be accessible to the public. That is why, S3 have a feature called presigned URL to overcome this problem.

The way it works is:

  1. Frontend send file's metadata to the backend
  2. Backend will use the metadata to generate a presigned URL and return it to the frontend
  3. Frontend received the presigned URL and use it to upload the file directly to S3

Upon uploading, S3 will check whether the metadata of the file uploaded matched with the original metadata it receives during creation. Unmatched file will receive 403 Forbidden code error.

For this blog, I will demonstrate using Python as backend and Javascript specificially axios library for frontend. I will also demonstrate how to do single upload and multipart upload (which is important for large files upload).

1. Create and Configure S3 Bucket

  1. Create an S3 Bucket
  2. Under the permission tab, update the bucket policy
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": "*",
                "Action": [
                    "s3:PutObject",
                    "s3:PutObjectAcl",
                    "s3:GetObject"
                ],
                "Resource": "arn:aws:s3:::bucket-name/*"
            }
        ]
    }
    

    This will allow upload and download file to/from the bucket
  3. Under the permission tab, update the Cross-origin resource sharing (CORS)
    [
        {
            "AllowedHeaders": [
                "*"
            ],
            "AllowedMethods": [
                "GET",
                "PUT",
                "POST"
            ],
            "AllowedOrigins": [
                "*"
            ],
            "ExposeHeaders": [
                "ETag"
            ]
        }
    ]
    

    This will allow us to upload file directly to S3 via presigned URL. Most important is to have PUT in AllowedMethods and Etag in ExposeHeaders
  4. Make sure your IAM User/Role have the necessary permission to the bucket. To get started, you can attach AmazonS3FullAccess policies to your IAM User/Role. (Not a recommended approach)

Now the bucket and user have all the necessary permissions, we can get dive into the implementation.

2. Backend Endpoint

I created a file called aws_utils.py that will handle all the operations with S3.

Generate presigned URL

def get_s3_presigned_url(file_name, file_type, upload_id=None, part_number=None):
    try:
        resource_url = f"https://{os.getenv('S3_BUCKET')}.s3.amazonaws.com/{file_name}"
        # for single upload
        if upload_id is None or part_number is None:
            presigned_url = s3.generate_presigned_url(
                'put_object',
                Params={
                    'Bucket': os.getenv('S3_BUCKET'),
                    'Key': file_name,
                    'ContentType': file_type,
                },
                ExpiresIn=3600 # The URL will expire in 1 hour. 
            )
        # for multipart upload
        else:
            presigned_url = s3.generate_presigned_url(
                ClientMethod ='upload_part',
                Params = {
                    'Bucket': os.getenv('S3_BUCKET'),
                    'Key': file_name,
                    'UploadId': upload_id, 
                    'PartNumber': part_number,
                },
                ExpiresIn=url_expired_limit # The URL will expire in 1 day. Large file took long time to upload
            )
        return presigned_url, resource_url
    except:
        return None, None

The code above is to generate the presigned URL. It's pretty direct, you just need to supply Bucket (bucket name), Key (file name/path in the bucket) and ContentType (file type e.g. image/png, video/mp4) and it will return the presigned URL.

And that is basically it for single upload!

The method above also caters for multipart upload. For that case, you do not need to supply ContentType, but instead UploadID and PartNumber. We will see about this in the next part of the code.

Initiate and complete multipart upload

def create_multipart_upload(file_name):
    try:
        response = s3.create_multipart_upload(
            Bucket=os.getenv('S3_BUCKET'), 
            Key=file_name
        )
        upload_id = response['UploadId']
        return upload_id
    except Exception as e:
        return None
    
def complete_multipart_upload(file_name, upload_id, parts):
    try:
        response = s3.complete_multipart_upload(
            Bucket=os.getenv('S3_BUCKET'),
            Key=file_name,
            MultipartUpload = {'Parts': parts},
            UploadId= upload_id
        )
        return True
    except Exception as e:
        return None

Before diving into the code, the way multipart upload works is first we would need to initiate a multipart upload, this will tell S3, that we want to upload file in parts instead of single upload. AWS will then return an UploadId, this will be used to associate each upload parts into a single file object. This process is done in the create_multipart_upload method above.

Next, we use the get_s3_presigned_url method to generate the presigned URL for each of the part. Notice, other than UploadId, we also would need to supply the PartNumber. This is to keep track the sequence of the part. We will look more into that in the frontend section

After the upload is complete, we then call complete_multipart_upload to tell S3 that the upload is completed. We just need to supply the Key, UploadId and MultipartUpload (contains all the parts eTAG, we will see in the frontend section). Upon calling this, the upload is considered complete and it can be accessed in the bucket

I know it sounds complicated, but it basically consists of 3 simple steps:

  1. Initiate multipart upload
  2. Upload file by chunks
  3. Complete the multipart upload

Wrap as endpoint

You can wrap the code supplied above in your backend framework and create an endpoint so it can be communicated by our frontend.

Let's assume we already create 3 endpoints which are:

  1. /get-s3-presigned-url/: To generate presigned URL
  2. /initiate-multipart-upload/: To initiate multipart upload
  3. /complete-multipart-upload/: To complete the multipart upload

3. Frontend Code

For single upload

const file = //your file object
const fileName = 'example.png'
const fileType = 'image/png'

// send file metadata (file_name and file_type to backend)
const response = await axios.post('/get-s3-presigned-url/', {
                        file_name: fileName,
                        file_type: fileType
                    })

// use the presigned url to upload content
await axios.put(response.data.presigned_url, file, {
    headers: {
        'Content-Type': fileType
    }
})

Thats basically it! Request for presigned URL and upload file.

Now, we will take a look at multipart upload. The code is a bit long so I will separate it into sections

Step 1: Initiate multipart upload

// only supply the filename (file path in S3)
const initiateMultipartUploadResponse = await axios.post('/initiate-multipart-upload/', {
                                                file_name: fileName
                                            })

Step 2: Split file into chunks, request presigned url for each chunks and upload separately

let parts = []
const chunkSize = 2 * 1024 * 1024 * 1024 // 2GB in bytes
const totalChunks = Math.ceil(file.size / chunkSize) // calc total number of chunks for the file
const chunkIndexList = Array.apply(null, Array(totalChunks)).map((x, i) => { return i+1 }) // [1, 2, 3, ...]

let resourceUrl = '' // will be override later

// upload every chunks of the file
// wrapped in Promise.all so that after all upload is done, we want to call complete upload endpoint
await Promise.all(chunkIndexList.map(async (partNumber) => {
    // request a presigned url for the specific chunk
    const presignedUrlResponse = await axios.post('/get-s3-presigned-url/', {
        file_name: fileName,
        upload_id: initiateMultipartUploadResponse.data.upload_id,
        part_number: partNumber
    })
    resourceUrl = presignedUrlResponse.data.resource_url

    // slice the chunk from the file
    const startByte = (partNumber-1) * chunkSize
    const endByte = Math.min(startByte + chunkSize, file.size)
    const chunk = file.slice(startByte, endByte)

    // start upload the chunk
    const uploadedResponse = await axios.put(presignedUrlResponse.data.presigned_url, chunk,
        {
            headers: {
                'Content-Type': '' // this is important
            }
        }
    )
    .then(response => {
        // keep the etag and part number. this will be used to complete the upload
        parts.push({ETag: response.headers.etag, PartNumber: partNumber})
    })
    .catch(error => {
        console.log(error)
    })
}))

Now that each chunks are uploaded, we proceed to the last step.

Step 3: Complete the multipart upload

// after uploaded all chunks, we sort back the parts based on the part number
parts.sort((a, b) => a.PartNumber - b.PartNumber)
// pass it to the backend to complete it 
const completeMultipartResponse = await axios.post('/complete-multipart-upload/', {
    file_name: fileName,
    upload_id: initiateMultipartUploadResponse.data.upload_id,
    parts: parts
})

The upload is done!

Conclusion

This solution is the best way to upload files to S3, instead of using the other approach where the files are uploaded to the backend, and the backend will upload it to S3 which introduce unnecessary layer in the upload operation. I hope many can benefit from this blog. I spent hours and days to figure some part of this out (lol).