Amazon Simple Storage Service (S3) has emerged as a cornerstone solution for scalable and reliable cloud storage, catering to a multitude of use cases.
The common approach that we did is to use AWS CLI or some backend scripts to upload things to the cloud. However, sometimes, we just want to upload files from frontend directly to S3 bucket, without the need to pass the backend first. That will just introduce an unnecessary layer of file transfer.
Storing the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
in the frontend is a really bad idea as it would be accessible to the public. That is why, S3 have a feature called presigned URL to overcome this problem.
The way it works is:
Upon uploading, S3 will check whether the metadata of the file uploaded matched with the original metadata it receives during creation. Unmatched file will receive 403 Forbidden
code error.
For this blog, I will demonstrate using Python
as backend and Javascript
specificially axios
library for frontend. I will also demonstrate how to do single upload and multipart upload (which is important for large files upload).
S3 Bucket
permission
tab, update the bucket policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::bucket-name/*"
}
]
}
permission
tab, update the Cross-origin resource sharing (CORS)
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"PUT",
"POST"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [
"ETag"
]
}
]
PUT
in AllowedMethods
and Etag
in ExposeHeaders
IAM User/Role
have the necessary permission to the bucket. To get started, you can attach AmazonS3FullAccess
policies to your IAM User/Role
. (Not a recommended approach)Now the bucket and user have all the necessary permissions, we can get dive into the implementation.
I created a file called aws_utils.py
that will handle all the operations with S3.
Generate presigned URL
def get_s3_presigned_url(file_name, file_type, upload_id=None, part_number=None):
try:
resource_url = f"https://{os.getenv('S3_BUCKET')}.s3.amazonaws.com/{file_name}"
# for single upload
if upload_id is None or part_number is None:
presigned_url = s3.generate_presigned_url(
'put_object',
Params={
'Bucket': os.getenv('S3_BUCKET'),
'Key': file_name,
'ContentType': file_type,
},
ExpiresIn=3600 # The URL will expire in 1 hour.
)
# for multipart upload
else:
presigned_url = s3.generate_presigned_url(
ClientMethod ='upload_part',
Params = {
'Bucket': os.getenv('S3_BUCKET'),
'Key': file_name,
'UploadId': upload_id,
'PartNumber': part_number,
},
ExpiresIn=url_expired_limit # The URL will expire in 1 day. Large file took long time to upload
)
return presigned_url, resource_url
except:
return None, None
The code above is to generate the presigned URL. It's pretty direct, you just need to supply Bucket
(bucket name), Key
(file name/path in the bucket) and ContentType
(file type e.g. image/png, video/mp4) and it will return the presigned URL.
And that is basically it for single upload!
The method above also caters for multipart upload. For that case, you do not need to supply ContentType
, but instead UploadID
and PartNumber
. We will see about this in the next part of the code.
Initiate and complete multipart upload
def create_multipart_upload(file_name):
try:
response = s3.create_multipart_upload(
Bucket=os.getenv('S3_BUCKET'),
Key=file_name
)
upload_id = response['UploadId']
return upload_id
except Exception as e:
return None
def complete_multipart_upload(file_name, upload_id, parts):
try:
response = s3.complete_multipart_upload(
Bucket=os.getenv('S3_BUCKET'),
Key=file_name,
MultipartUpload = {'Parts': parts},
UploadId= upload_id
)
return True
except Exception as e:
return None
Before diving into the code, the way multipart upload works is first we would need to initiate a multipart upload, this will tell S3, that we want to upload file in parts instead of single upload. AWS will then return an UploadId
, this will be used to associate each upload parts into a single file object. This process is done in the create_multipart_upload
method above.
Next, we use the get_s3_presigned_url
method to generate the presigned URL for each of the part. Notice, other than UploadId
, we also would need to supply the PartNumber
. This is to keep track the sequence of the part. We will look more into that in the frontend section
After the upload is complete, we then call complete_multipart_upload
to tell S3 that the upload is completed. We just need to supply the Key
, UploadId
and MultipartUpload
(contains all the parts eTAG, we will see in the frontend section). Upon calling this, the upload is considered complete and it can be accessed in the bucket
I know it sounds complicated, but it basically consists of 3 simple steps:
Wrap as endpoint
You can wrap the code supplied above in your backend framework and create an endpoint so it can be communicated by our frontend.
Let's assume we already create 3 endpoints which are:
/get-s3-presigned-url/
: To generate presigned URL/initiate-multipart-upload/
: To initiate multipart upload/complete-multipart-upload/
: To complete the multipart uploadFor single upload
const file = //your file object
const fileName = 'example.png'
const fileType = 'image/png'
// send file metadata (file_name and file_type to backend)
const response = await axios.post('/get-s3-presigned-url/', {
file_name: fileName,
file_type: fileType
})
// use the presigned url to upload content
await axios.put(response.data.presigned_url, file, {
headers: {
'Content-Type': fileType
}
})
Thats basically it! Request for presigned URL and upload file.
Now, we will take a look at multipart upload. The code is a bit long so I will separate it into sections
Step 1: Initiate multipart upload
// only supply the filename (file path in S3)
const initiateMultipartUploadResponse = await axios.post('/initiate-multipart-upload/', {
file_name: fileName
})
Step 2: Split file into chunks, request presigned url for each chunks and upload separately
let parts = []
const chunkSize = 2 * 1024 * 1024 * 1024 // 2GB in bytes
const totalChunks = Math.ceil(file.size / chunkSize) // calc total number of chunks for the file
const chunkIndexList = Array.apply(null, Array(totalChunks)).map((x, i) => { return i+1 }) // [1, 2, 3, ...]
let resourceUrl = '' // will be override later
// upload every chunks of the file
// wrapped in Promise.all so that after all upload is done, we want to call complete upload endpoint
await Promise.all(chunkIndexList.map(async (partNumber) => {
// request a presigned url for the specific chunk
const presignedUrlResponse = await axios.post('/get-s3-presigned-url/', {
file_name: fileName,
upload_id: initiateMultipartUploadResponse.data.upload_id,
part_number: partNumber
})
resourceUrl = presignedUrlResponse.data.resource_url
// slice the chunk from the file
const startByte = (partNumber-1) * chunkSize
const endByte = Math.min(startByte + chunkSize, file.size)
const chunk = file.slice(startByte, endByte)
// start upload the chunk
const uploadedResponse = await axios.put(presignedUrlResponse.data.presigned_url, chunk,
{
headers: {
'Content-Type': '' // this is important
}
}
)
.then(response => {
// keep the etag and part number. this will be used to complete the upload
parts.push({ETag: response.headers.etag, PartNumber: partNumber})
})
.catch(error => {
console.log(error)
})
}))
Now that each chunks are uploaded, we proceed to the last step.
Step 3: Complete the multipart upload
// after uploaded all chunks, we sort back the parts based on the part number
parts.sort((a, b) => a.PartNumber - b.PartNumber)
// pass it to the backend to complete it
const completeMultipartResponse = await axios.post('/complete-multipart-upload/', {
file_name: fileName,
upload_id: initiateMultipartUploadResponse.data.upload_id,
parts: parts
})
The upload is done!
This solution is the best way to upload files to S3, instead of using the other approach where the files are uploaded to the backend, and the backend will upload it to S3 which introduce unnecessary layer in the upload operation. I hope many can benefit from this blog. I spent hours and days to figure some part of this out (lol).