Fastest way to remove large number of filecache entries

pfaff · September 9, 2017, 2:24pm

I’m dealing with a situation where we have removed a very large number of files from a user’s Nextcloud space and we now have to clean up the oc_filecache table to reflect this change.

There are around 400 million rows in total to be removed from the oc_filecache table. I’m running an ‘occ files:scan username’ process to do the removal. While the files:scan is running, I’m running a process to monitor the number of oc_filecache rows, and I’m running innotop in T mode to watch tasks to monitor the process and can see the size of the MySQL Undo growing consistently as oc_filecache rows decrease slowly.

I imagine that if anything happens to interrupt this files:scan process, all of the row removals will be rolled back and I’ll have to start over. If this process takes many days to complete, there are plenty of things that could potentially interrupt the process. I also don’t know if I will run into some limitation with the MySQL Undo logs such that the process is eventually unable to continue and will have to be completely rolled back.

What I’m wondering is if there is a faster way than using ‘occ files:scan’ to remove all oc_filecache rows corresponding to a single Nextcloud user, or, if not a faster way, at least a way where changes can be comitted in stages so that a complete rollback will never be a possibility.

What exactly does ‘occ files:cleanup’ do, and would it be any better for performing this task?

pfaff · September 9, 2017, 3:36pm

By the way, for those interested in knowing how we got into this situation, see my bug report here:

MichaIng · September 10, 2017, 10:17am

To clean up the oc_filecache you need to run occ files:cleanup or wait for cron.php to do it, which should do it regularly anyway. Occ files:scan does not remove obsolete filecache enties, just adds missing ones, as far as I know.

pfaff · September 10, 2017, 1:50pm

Thanks for the reply, but I believe you’re mistaken. The question is not whether files:scan removes missing entries but whether there is a better way to run the MySQL queries (SELECTs, DELETEs) that are involved in actually making this removal happen. The problem I’m experiencing is that it is taking a very long time for this removal to happen, and if the process is interrupted then the entire removal will get rolled back by MySQL because it is all done as a single MySQL transaction.

I think I found a possible solution to my problem. I did some reading of the Nextcloud code and found the following in nextcloud/lib/private/Files/Cache/Scanner.php. I’m going to experiment with $useTransactions=false.

  /**
   * @var bool $useTransactions whether to use transactions
   */
  protected $useTransactions = true;

… and …

  /**
   * Whether to wrap the scanning of a folder in a database transaction
   * On default transactions are used
   *
   * @param bool $useTransactions
   */
  public function setUseTransactions($useTransactions) {
          $this->useTransactions = $useTransactions;
  }

pfaff · September 11, 2017, 12:42am

I tried $useTransactions=false but it wasn’t clear to me whether I was seeing any difference in behaviour. As far as I could tell from my view of mysql using innotop, the mysql queries still all appeared to be happening as one transaction, with the undo log for that transaction growing as rows were removed from oc_filecache.

To be clear, my main concern is that it’s going to take several days for all ~400 million rows to be scanned and removed, and I don’t want it run for a long time and then there be some sort of failure which results in a mysql transaction being rolled back and undoing all the row deletions. I don’t know whether that will happen or not, but I also don’t want to see time wasted if that were to happen.

What I have learned is that if I kill an occ files:scan process, it does not result in a mysql transaction rollback. All rows that have been deleted by the mysql transaction so far are not undone. So what I’m doing now is running my ‘occ files:scan username’ process for one minute at a time, killing it, and restarting it. This way I don’t have to worry about a single long-running occ files:scan failing in some way that may result in a mysql transaction rollback. Each one-minute run of occ files:scan is removing about 100,000 rows from oc_filecache, so it should get through 400 million rows in about 3 days.

pfaff · September 15, 2017, 6:32pm

It turned out that my previous method wasn’t working. Rollbacks were still happening. I’d get through hundreds of runs of files:scan, deleting tens of millions of rows, and then suddenly a rollback would happen (or complete) and I’d be back to where I started.

I eventually came up with a technique that is working well. Since I knew the paths associated with the filecache rows I wanted to delete, I’m using mysql queries to delete all rows matching that path. I use a ‘select concat’ query to generate 1 million delete queries and feed those delete queries into another mysql process. Each such run takes about 30 minutes to complete. This has already removed about 125 million extraneous rows from our filecache table.

MichaIng · September 15, 2017, 7:20pm

Interesting indeed. But I am still confused that you use occ files:scan to remove obsolete entries/cleanup the table. Did you tried occ files:cleanup at least, if perhaps that works faster? https://docs.nextcloud.com/server/12/admin_manual/configuration_server/occ_command.html#file-operations

On the other hand, if the entries are created through an symlink loop (as you stated on github), then the entries are not exactly wrong, just doubled/tripled/… and thus might not be removed through occ files:cleanup. I guess you unmounted the related external drive to solve this?

In the case, that the amount of wrong entries is way higher than the right ones, I would clear oc_filecache and do rescan afterwards.

pfaff · September 15, 2017, 7:44pm

In fact I did experiment with files:cleanup after your earlier suggestion. When I ran it, it found nothing to clean up.

I have to admit, I’m not clear on what exactly file:cleanup does. I assume it goes through the filecache table and removes entries that are in a certain state (“orphaned”?). But I do know from experience that files:scan does remove filecache rows corresponding to paths that no longer exist, since I’ve watched this happen.

No, I didn’t have to remove the external storage to eliminate the symlink loop. I just had to eliminate the symlink.

The problem with completely clearing filecache and doing a rescan is that I will lose all sharing information, will I not? At least I believe this used to be true. I don’t want to lose any legitimate sharing information for paths that do still exist.

MichaIng · September 15, 2017, 8:01pm

Hah, so about scan/cleanup I and I guess many others got it in case wrong. I will try around with this by time. In that case thanks for info, this is indeed important to know, since there often appear problems with oc_filecache here in the forum.

Yes indeed.

Ah yes you are write. You will loose shares, tags, comments, previews etc. since the rescanned files will have a different fileID inside the oc_filecache table.

pfaff · September 15, 2017, 8:04pm

By the way, I’m no longer looking for a faster method that what I have working now. If it takes a week, so be it, at least it’s getting the job done and nothing is being rolled back.

Thanks for your input.

MichaIng · September 15, 2017, 9:13pm

Yeah, I am also too less experienced with database manipulation to help you with this.

github.com

nextcloud/server/blob/master/lib/private/Files/Cache/Scanner.php#L241


					}


					// post-emit only if it was a file. By that we avoid counting/treating folders as files
					if ($data['mimetype'] !== 'httpd/unix-directory') {
						$this->emit('\OC\Files\Cache\Scanner', 'postScanFile', array($file, $this->storageId));
						\OC_Hook::emit('\OC\Files\Cache\Scanner', 'post_scan_file', array('path' => $file, 'storage' => $this->storageId));
					}


				} else {
					$this->removeFromCache($file);
				}
			} catch (\Exception $e) {
				if ($lock) {
					if ($this->storage->instanceOfStorage('\OCP\Files\Storage\ILockingStorage')) {
						$this->storage->releaseLock($file, ILockingProvider::LOCK_SHARED, $this->lockingProvider);
					}
				}
				throw $e;
			}


			//release the acquired lock

This indeed looks like entries being removed as well.

github.com

nextcloud/server/blob/3d671cc536b1b472c746fd4ea8f60135f4935a44/apps/files/lib/Command/DeleteOrphanedFiles.php#L34


namespace OCA\Files\Command;


use OCP\IDBConnection;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;


/**
* Delete all file entries that have no matching entries in the storage table.
*/
class DeleteOrphanedFiles extends Command {


	const CHUNK_SIZE = 200;


	/**
	 * @var IDBConnection
	 */
	protected $connection;


	public function __construct(IDBConnection $connection) {
		$this->connection = $connection;

Needs deeper investigation or dev enlightenment to know what exactly happens here

.

tflidd · September 16, 2017, 11:15pm

@MorrisJobke @icewind can you help here?

Sergio_Basto · April 14, 2020, 4:02pm

I edited my local file var/www/nextcloud/apps/files/lib/Command/DeleteOrphanedFiles.php

I changed public const CHUNK_SIZE = 200; to 2000000 (2 million) and it is cleaning much faster