My experience deploying a HIPAA Compliant Nextcloud

Hello! I wanted to write informing you all of my experience deploying a HIPAA Compliant maximum security and fully cloud-based infrastructure using AWS. This has been a rather trying journey, attempting to balance costs with security, availability, and functionality. I will say though, that once properly configured, Nextcloud beats every solution I tested.

AWS Account Setup

Of course the first part of ensuring the best security was ensuring the security of our infrastructure from the AWS-side of things, and this was not as straightforward as one might think. Ensuring that only the root account had access to manage HIPAA infrastructure, and properly managing S3 buckets (i’ll get to that later), access keys and secrets to ensure that the only way data could be compromised by AWS is if someone were to kidnap our system administrators was perhaps the most difficult aspect of this. In the end, after much back and forth with AWS, our account was locked to access only from certain IP, required dual MFA to access, and logging on all levels was enabled. Every request or action, every bit of traffic passing across HIPAA infrastructure is logged at the AWS level.

S3 Bucket Setup

We have 38TB of patient data, all in compressed image/pdf format. The use of anything other than object storage was simply not an option. To provide the level of redundancy, availability, reliability, security, access logging, and management that is provided by AWS in a physical datacenter setting would not be feasible for my small team of 3 people.

However, S3 comes with its own unique challenges. We have over 100 S3 buckets configured, each with its own AWS IAM access policy and secret. Keeping all these access keys straight in Nextcloud was less than fun. Likewise managing the over 300 users and dozens of groups, all who need access to different sets of buckets representing different levels of patient data wasn’t fun either. This is certainly an area Nextcloud could improve upon enormously.

Of course each bucket had to have detailed object-level logging enabled, at-rest encryption, and to be object locked so that patient data could not be deleted using the not-so-simple to remove delete button in Nextcloud (another thing Nextcloud could improve upon).

On top of this our S3 buckets can only be accessed from the IP addresses which our Nextcloud cluster runs on. This part is crucial, because it allows us to trust in log monitoring and S3 security as opposed to trusting Nextcloud’s software to keep our S3 access secrets from leaking.

Finally Nextcloud uses absolutely stupid structure for accessing S3 object metadata. As opposed to caching it in a database like any reasonable person might do, it only presents two options. Either never check for changes and track files locally, or pay $10,000/month to Amazon. Of course we had to devise a solution to follow the former, and that was the hardest part of this scenario.

The Nextcloud Servers

Our facilities span multiple states, and we absolutely cannot afford lack of availability for our patient data, so we have fleets of EC2 servers running across several regions with load balancers. In total we have 20 servers across 10 availability zones. Of course the only way this is possible is thanks to Amazon RDS, a database service which we used instead of locally run databases for obvious reasons. A remote database solved the problem of extreme amounts of S3 requests, since our cluster of EC2 instances has exclusive access to the buckets in which patient data is stored.

AWS made the process of setting up these servers trivial. It is incredibly easy to manage everything from load balancing, to elasticity, to backup life cycles. Thanks to AWS we have not encountered any issues with the server themselves and we are able to keep hourly backups of the RDS database and the master image which is used to launch EC2 instances.

Security and encryption

With our AWS account secure and S3 bucket policies set to ensure no unauthorized access to patient data, the next part in the saga was securing the actual Nextcloud instances. Of course IP white listing and port blocking are applied to all of our instances. These servers can only be accessed over HTTPS from our offices’ private and secured networks. Aside from communicating with the RDS database and S3, there is no way to access these instances without physically being present at one of our offices. Strong password policy and enforced MFA with no option to reset one’s password was good enough for us. Our desktops at our offices require fingerprints to access and are heavily scrutinized, and combining our local network monitoring with AWS monitoring and Nextcloud logging provides us with the capability to spot irregular login patterns.

All of our Nextcloud servers encrypt data before sending it to S3, using the same encryption keys. The only copy of these keys is kept offline in a redundant and physically secured storage setting at one of our offices.

Final Thoughts

I am happy with how our setup has turned out. I am quite confident in the security of our instances and think Nextcloud is an excellent product that has enabled us deploy a VERY cost effective solution to a very important problem. However, Nextcloud has a lot of issues that still need to be addressed, and has a long way to go before one could consider it an application that is “easy” to setup or makes sense for a wider range of clients.

4 Likes

Would it be somewhow possible to receive the information from you how you found the solution for that problem?
We’re fighting with the same issue (S3 as primary storage with encryption) but not in such big infrastructure and data volume as you.

When I managed to make S3 as primary storage working, it still was fine. Just as soon as I enabled encryption the whole nextcloud system slowed extremely down.

Hi @hipaa-pita, I just want to encourage you to file your findings to Nextcloud’s server github issues and feature request so they might improve their software based on your experiences.

2 Likes