I’d like to know whether it is possible to avoid multipart uploads from the Nextcloud client (Windows) going via primary S3 storage, e.g. can they be stored locally on the Nextcloud server prior to merge/assembly?
For example, is it possible to have a different storage type for specific files/directories or even a different S3 bucket for that?
Having them go to the S3 primary storage is causing some issues due to backend replication (I’m using MinIO site replication, but same result with bucket replication). What happens is:
Individual parts are uploaded to the “primary” MinIO server.
Replication of the individual parts begins (waste of traffic, but not a big deal).
Last part is stored on MinIO and Nextcloud assembles parts into the target file.
Nextcloud deletes the individual parts, and waits for MinIO to acknowledge.
MinIO delays acknowledgement because some parts are still “PENDING” replication, and MinIO won’t create a delete marker for them until that is resolved.
Process times out and gives an error to the Nextcloud (Windows) client.
Simply increasing the timeout so that replication completes isn’t a good solution, e.g. secondary MinIO server is down/unreachable. I could conditionally replicate files if they were tagged, but they’re all just urn:oid:x.
You might experiment with reducing the concurrency option to reduce the possibility of triggering this in your environment. It defaults to 5 in the AWS SDK (which we use). I believe this only exists in Nextcloud 29, however (not sure what you’re using).
But I’m not sure that’s really what you want be doing in the long run.
Are you using MinIO’s replication in asynchronous mode (the default) or synchronous?
There may be a combination of options within MinIO that would handle multipart uploads with replication better… Might be a good scenario/question to ask MinIO.
For example, is it possible to have a different storage type for specific files/directories or even a different S3 bucket for that?
Yes, using S3 as External Storage within Nextcloud rather than Primary Storage. Though that is a fairly different architecture.
Last part is stored on MinIO and Nextcloud assembles parts into the target file.
Nextcloud deletes the individual parts, and waits for MinIO to acknowledge.
Note the assembly is being done by MinIO as well as the deletion of the individual parts. We just send the CompleteMultipartUpload and the S3 server (MinIO in this case) takes care of assembly+deletion. (Unless I’m completing spacing about things this morning).
So the logical place to optimize MultipartUpload when being used with server-side replication is within the S3 server implementation, since it’s aware of the underlying goals/requirements/admin intent regarding the replication. Another reason why I’m thinking this is probably fixable on the MinIO side (hopefully just through existing config options).
Process times out and gives an error to the Nextcloud (Windows) client.
What’s the exact error that appears? Do you know for certain the timeout is coming from MinIO/S3? It could also be on the web/app server side, which is a bit of a different situation. Don’t get me wrong: you may be correct, but I just want to confirm so we’re not troubleshooting the wrong area of the problem.
Is it “BadGateway: Error while uploading to S3 bucket”? I believe that’s what the server-side (of Nextcloud) would generate if it’s arising at the S3 level, but I’m not sure offhand how it’ll appear in the Desktop client to the end-user. I believe it’ll be visible in your server logs if the desktop client doesn’t show the full message.
FYI you can essentially “disable” multipart upload streaming, albeit in a crude way, by completely removing the distributed memcache from your Nextcloud configuration. It’ll warn and fallback.
Though I still hope you can find a solution that doesn’t rely on that. Plus even that could timeout still for different reasons.