I have no expertise in this area, I am just dumping my decisions and thoughts. If you have better suggestions please email me!
I think this is where most of the cloud providers come in to solve this problem. I wanted something file agnostic but also allows me to do file specific checks.
Checklist
- Needs to allow upload from API
File Uploads
Optimization, Validation, Post Processing
The uploaded file can be anything from an Image to a video to a PDF. While we can have some checks in the frontend, we really ought to have these checks in the backend. Now this has levels to it, what file types do you allow, what file types can your backend process, okay, you accept csv, did they send a csv with headers on it? Did they send a file which satisfies all your filetype checks but is a malicious crafted file? Because you set the max file upload to 30MB they sent a photo from their phone
Processing
- Metadata extraction
- Metadata stripping
Optimization
Validation
- Virus check
On use of S3 based storage
Now this is one area where we need to think of S3 as more of a service than simply a storage, i.e S3 based storage(Eg. R2 and others) vs AWS S3. For example, R2 has no way to tell if a upload happened on trigger like s3 events. So once the upload is done, we need to manually make a call to the backend to check for the file.
File Downloads / Serving
Domain
- Custom domain or bucket URL?