Integrating New File Compression Into Nextcloud

Hello,

I wanted to have an open discussion to learn if anyone has previously attempted to integrate a non-standard file compression?

My team and I have a compression platform that we imagine being able to use to compress files leaving the desktop, remaining compressed on the server then decompressing when the file returns to the desktop.

Do anyone have any idea of the challenges that would be required to accomplish this?

Hey Tim,

That would be a clear No-Go for a platform like Nextcloud…

What you described makes the file unusable by the server. As such, it can not share it to another user. What the server is expected to do when you share that file with a public link ?

And what about server-side encryption ? Compression and encryption often does not work great together… If you compress before, there will be a know plaintext attack available, as well as problems with everything related to padding. Should you try to compress after, you will not gain much because a cryptogram looks like random garbage, so no pattern to compress.

Really not a good idea at all…

2 Likes

Heracles,

I appreciate the candid answer! =)

This will help a lot with my research.

Would it make any difference if the compression was Fully Homomorphic? Meaning you can read, write compressed files?

Hey Tim,

The type of compression is not the problem. The fact that there is compression is.

Once you compressed the data, you end up with a standardized format that has some logic and structure in it. That is true for whatever kind of compression. It is true that such format exist for other documents, but here, if everything is compressed the same way, that pattern is common to everything. One does not need to guess which underlying pattern is inside the cryptogram.

Problem remain in the same way, once data is encrypted, no matter what will try to compress it. Encrypted data turns to random garbage in which there is not supposed to be any kind of pattern. So when it is time to compress that, there will not be much to compress.

As a last point, the server must be able to understand the content for distributing it / using it with Apps. What will happen if one wishes to add an attachment from his Nextcloud account to an e-mail using the e-mail app ? Everything is done server-side. If the server can not turn that file back to readable, he can not and that will disable all server-side features.

So at the end :
–The server must be able to handle to content by itself. If there is compression, the server, and not the client, must be in charge of it.
–Compression after Encryption is a No-Go because encryption’s goal is to eliminate everything that compression can benefit from
–Compression before encryption has a lot of risks and problem.

You can read about SSL Oracle attacks / Padding attacks / etc for a lot of details about how compression and encryption do not do well together. Qualys SSL Scanner, ssllabs.com, also has a lot of doc about SSL and security, including all these attacks.

Have fun reading about that,

1 Like

Such a compression algorithm may be good for a file system. ZFS has its own compression capability already but not all file systems do. It would fit better for a file system because there is nothing supposed to handle the compressed data other than the file system itself. Of course, do no try to compress everything in the file system (like sockets, device nodes, etc.), but compression within the filesystem is a use case for such an algorithm.

This is opposed to your scenario where compression is done by the client but the server is expected to handle these data as well like sending the doc file to the OnlyOffice Server for editing Office document within the browser, viewing the photos and videos directly from the Web interface and more.

Heracles,

I appreciate your responses - I’ve been looking at some different open source platforms to test the difficulty of implementing our compression on.

Hi again,

So I recommend you to look at whatever platform that will work locally only or at most, peer-to-peer. That way, you can always compress / decompress before letting the data out to whatever will use it.

In a client - server mode, that means you are server side because the server must be able to do its own job and you can not guarantee that all clients will be able to do what is required. You can also see the server as a client of its own service, like the mail app I told you about.

Second thing is to avoid whatever is mixed with encryption. As explained, encryption and compression are no friends.

There are 3 states in which data can exist : Online, in transit and Offline. Data is Online when it is in a single system. It is in transit when moving between multiple systems (network). It is offline when not in any system (tape backups, …).

When adding these facts, that basically drops anything network or protocol related. Today, basically all protocols and communications must be encrypted.

That leaves you with 2 states : Offline and Online. What you described as the benefit from your algorithm makes it pointless for Offline storage. The data itself being 100% static and at rest, no need for a compression algorithm that offers live modification.

So you are down to local online storage, preferably without involving encryption. You can compress before encryption. There are ways to do it properly. Just learn a lot about the subject before you try to do it. As an example of such a use case, you can look at Veracrypt.

If you will rather avoid encryption, then look at filesystems, e-mail storage like mbox format or log management.

Log management is probably your best bet : They are huge, they compress perfectly because they are text only, no encryption, they are local in the system, …

Maybe trying to tie yourself in Elastic Search or Graylog ? Syslog-NG or a new home-made log manager ?

Good luck finding a business case for your creation,

Thank you for this information!! It is much appreciated and provides me some clear objectives to pursue.

I love the idea of Log management and dying in Elastic Search - that is big business in our world.

Right now we have been focusing on quite a few sectors. Video Streaming (Currently operating), Banking Transactions (Future), Genomic Data, and Business Intelligence / Analytics. I’m going to add Log management to that list.