Hi there,
Iām running a quite huge Nextcloud installation with about 120.000 files in more that 50.000 folders. All data is stored on groupfolders. When moving big folders really strange things happen. I observed in such cases, that physical files on the storage got lost while the entry in the filecache table is still there. I also had seen, that the path in the files cache table gets corrupt in a way, that there is a mismatch between the fileās path and the path of the fileās folder referred by the parentid in the filecache table. I also have seen, the the folder youāre move did arrive in the target location, but somewhere else and maybe also outside the groupfolder filesystem in the private area of any user.
In case there is an entry in the filecache table, where the physical file is not in the referred location on the filesystem, I had a lot of trouble, that the synchonization of the clients ran into endless loops without getting ever fully synchronized. As I was getting sick about this issue, which cost be hours and days, I have developed a smart php script, which analysis the database for some inconsistancy between the filecache table and the undelying filesystem. Just in case someone else is having similar issues, I disclose this script here for free. Just notice, that this is mainly looking for inconsistancy in the groupfolders. If someone likes to extend this to the private user folders as well, which shouldnāt be a major work, I would be happy, if this update can be publisher here as well. For further details, please check the source code documentation below. O.K., here we are:
<?php
# Created by Armin Riemer (armin@elleven.de)
# Version 1.0, 25.04.2020
# This script scans a Nextcloud database for some observed bugs and errors, which I had observed in the past especially around
# the GROUPFOLDER plugin. There, sometimes files got lost in the file system and client synchronization may begins to run in
# endless loops, if there is any file indexed in the filecache table, which is physically not available in the expected directory
# of the file system. Sometimes it also happened, that the Parent ID and the PATH in the filecache table didn't match anymore after
# big folders had been moved to another location in the Nextcloud. However, this leads to the same issue with endless looping sync
# clients. The result of the script's analysis is written into a log file, which will be stored in the root of the Nextcloud's data
# directory. This script was tested using a MySQL database, PHP 7.4 and via the command line only!!!
# This script needs to be located in the directory above the Nextcloud installation, but can be adopted to any other location on your
# Nextcloud server. It's currently optimized to run in the command line, but should be no problem to use it via https requests.
# Load the Nexcloud configuration and define additional variables
require_once('cloud/config/config.php'); # The path needs to be adopted according to your Nextcloud installation path
$TblPfx = $CONFIG['dbtableprefix'];
$LogFile = $CONFIG['datadirectory'] . '/Grpfldr_Diag.log';
# Connect to the database
$dblink = mysqli_connect($CONFIG['dbhost'], $CONFIG['dbuser'], $CONFIG['dbpassword'], $CONFIG['dbname']);
if (mysqli_connect_errno() == 0) {
# The first section scans the database, if there is any indexed file in the filecache table located in one of the groupfolders,
# which cannot be found in the filesystem. There is an option to delete these dead entries directly with this script, but this
# currently deactivated. Be careful with this, as there were some issues with the correct characterset interpretation and the
# mb_convert_encoding command does not work properly in all cases for some reasons.
$DefectiveItems = "";
$IssueCounter = 0;
$FileCounter = 0;
$TimeStamp = date("d.m.Y - H:i:s", time());
AddToLogFile("[$TimeStamp] Scan database for missing files:\n");
$SqlQuery = 'SELECT fileid,path FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`mimetype` != 2 AND `' . $TblPfx . 'filecache`.`path` LIKE "%__groupfolders/%" ORDER BY fileid';
$SqlResult = mysqli_query( $dblink , $SqlQuery );
if ($SqlResult) {
while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
$FileName = $CONFIG['datadirectory'] . '/' . mb_convert_encoding($FileEntry['path'], "UTF-8", "CP1252"); # This here is the critical characters set conversion!!!
$FileCounter += 1;
if (!file_exists($FileName)) {
$DefectiveItems .= $FileEntry['fileid'] . "," ;
$IssueCounter += 1;
AddToLogFile('[' . $FileEntry['fileid'] . '] ' . $FileEntry['path'] . "\n");
}
}
mysqli_free_result($SqlResult);
$Response = "$FileCounter files found in all Groupfolders.\n";
if (strlen($DefectiveItems) > 1) {
$DefectiveItems = substr($DefectiveItems , 0 , -1);
$Response .= "$IssueCounter indexed files are missing in the filesystem and need to be fixed.\n";
$SqlQuery = 'DELETE FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`fileid` IN (' . $DefectiveItems . ');';
#$SqlResult = mysqli_query( $dblink , $SqlQuery ); # This here is the option line to delete missing files from the cache - but be careful using this!!!
}
else { $Response .= "Lucky you - no missing files found in the groupfolder filesystem :-)\n"; }
}
else { $Response .= "Lucky you - no missing files found in the groupfolder filesystem :-)\n"; }
$IssueCounter = 0;
$TimeStamp = date("d.m.Y - H:i:s", time());
AddToLogFile("[$TimeStamp] Scan database for files with mismatches in the path:\n");
# In the second section the database is scanned for any mismatch between the path of the file entry in the filefache table and
# the path of the referenced parent folder. This may can happen in sone circumstances when moving folders with huge content.
$SqlQuery = 'SELECT F.fileid, F.path, F.name, P.path FROM `' . $TblPfx . 'filecache` F INNER JOIN `' . $TblPfx . 'filecache`.` P ON P.fileid = F.parent ';
$SqlQuery .= 'WHERE (CONCAT ( P.path , `/` , F.name) <> F.path) AND ( P.path <> `` ) ORDER BY fileid';
$SqlResult = mysqli_query( $dblink , $SqlQuery );
if ($SqlResult) {
while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
$IssueCounter += 1;
AddToLogFile('[' . $FileEntry['fileid'] . '] ' . $FileEntry['path'] . "\n");
}
mysqli_free_result($SqlResult);
if ($IssueCounter > 0) {
$Response .= "$IssueCounter files found with a mismatch in their path entry (path does not match with their parent's path) and need to be fixed.\n";
}
else { $Response .= "Lucky you - no files found in the whole cloud with any mismatches in the path :-)\n"; }
}
else { $Response .= "Lucky you - no files found in the whole cloud with any mismatches in the path :-)\n"; }
$TimeStamp = date("d.m.Y - H:i:s", time());
AddToLogFile ("[$TimeStamp] Database scan completed. Analysis report:\n$Response---------------------------------------------------\n");
}
else { $Response = "Could not connect to the database!!!\n"; }
echo $Response;
mysqli_close($dblink);
die (0);
function AddToLogFile ( $LogString ) {
global $LogFile;
file_put_contents( $LogFile, $LogString, FILE_APPEND);
}
?>
Cheers,
Armin