Script to analyse errors in the filescache table

Hi there,

I’m running a quite huge Nextcloud installation with about 120.000 files in more that 50.000 folders. All data is stored on groupfolders. When moving big folders really strange things happen. I observed in such cases, that physical files on the storage got lost while the entry in the filecache table is still there. I also had seen, that the path in the files cache table gets corrupt in a way, that there is a mismatch between the file’s path and the path of the file’s folder referred by the parentid in the filecache table. I also have seen, the the folder you’re move did arrive in the target location, but somewhere else and maybe also outside the groupfolder filesystem in the private area of any user.

In case there is an entry in the filecache table, where the physical file is not in the referred location on the filesystem, I had a lot of trouble, that the synchonization of the clients ran into endless loops without getting ever fully synchronized. As I was getting sick about this issue, which cost be hours and days, I have developed a smart php script, which analysis the database for some inconsistancy between the filecache table and the undelying filesystem. Just in case someone else is having similar issues, I disclose this script here for free. Just notice, that this is mainly looking for inconsistancy in the groupfolders. If someone likes to extend this to the private user folders as well, which shouldn’t be a major work, I would be happy, if this update can be publisher here as well. For further details, please check the source code documentation below. O.K., here we are:

<?php
# Created by Armin Riemer (armin@elleven.de)
# Version 1.0, 25.04.2020

# This script scans a Nextcloud database for some observed bugs and errors, which I had observed in the past especially around
# the GROUPFOLDER plugin. There, sometimes files got lost in the file system and client synchronization may begins to run in
# endless loops, if there is any file indexed in the filecache table, which is physically not available in the expected directory
# of the file system. Sometimes it also happened, that the Parent ID and the PATH in the filecache table didn't match anymore after
# big folders had been moved to another location in the Nextcloud. However, this leads to the same issue with endless looping sync
# clients. The result of the script's analysis is written into a log file, which will be stored in the root of the Nextcloud's data
# directory. This script was tested using a MySQL database, PHP 7.4 and via the command line only!!!

# This script needs to be located in the directory above the Nextcloud installation, but can be adopted to any other location on your
# Nextcloud server. It's currently optimized to run in the command line, but should be no problem to use it via https requests.

# Load the Nexcloud configuration and define additional variables
require_once('cloud/config/config.php'); # The path needs to be adopted according to your Nextcloud installation path
$TblPfx = $CONFIG['dbtableprefix'];
$LogFile = $CONFIG['datadirectory'] . '/Grpfldr_Diag.log';

# Connect to the database
$dblink = mysqli_connect($CONFIG['dbhost'], $CONFIG['dbuser'], $CONFIG['dbpassword'], $CONFIG['dbname']);
if (mysqli_connect_errno() == 0) {
	
	# The first section scans the database, if there is any indexed file in the filecache table located in one of the groupfolders,
	# which cannot be found in the filesystem. There is an option to delete these dead entries directly with this script, but this 
	# currently deactivated. Be careful with this, as there were some issues with the correct characterset interpretation and the
	# mb_convert_encoding command does not work properly in all cases for some reasons.

	$DefectiveItems = "";
	$IssueCounter = 0;
	$FileCounter = 0;
	$TimeStamp = date("d.m.Y - H:i:s", time());
	AddToLogFile("[$TimeStamp] Scan database for missing files:\n");

	$SqlQuery = 'SELECT fileid,path FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`mimetype` != 2 AND `' . $TblPfx . 'filecache`.`path` LIKE "%__groupfolders/%" ORDER BY fileid';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );

	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$FileName = $CONFIG['datadirectory'] . '/' . mb_convert_encoding($FileEntry['path'], "UTF-8", "CP1252"); # This here is the critical characters set conversion!!!
			$FileCounter += 1;
			if (!file_exists($FileName)) {
				$DefectiveItems .= $FileEntry['fileid'] . "," ;
				$IssueCounter += 1;
				AddToLogFile('[' . $FileEntry['fileid'] . '] ' . $FileEntry['path'] . "\n");
			}
		}
		mysqli_free_result($SqlResult);
		$Response = "$FileCounter files found in all Groupfolders.\n";

		if (strlen($DefectiveItems) > 1) {
			$DefectiveItems = substr($DefectiveItems , 0 , -1);
			$Response .= "$IssueCounter indexed files are missing in the filesystem and need to be fixed.\n";
			
			$SqlQuery = 'DELETE FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`fileid` IN (' . $DefectiveItems . ');';
			#$SqlResult = mysqli_query( $dblink , $SqlQuery ); # This here is the option line to delete missing files from the cache - but be careful using this!!!
		}
		else { $Response .= "Lucky you - no missing files found in the groupfolder filesystem :-)\n"; }
	}
	else { $Response .= "Lucky you - no missing files found in the groupfolder filesystem :-)\n"; }
	
	$IssueCounter = 0;
	$TimeStamp = date("d.m.Y - H:i:s", time());
	AddToLogFile("[$TimeStamp] Scan database for files with mismatches in the path:\n");
	
	# In the second section the database is scanned for any mismatch between the path of the file entry in the filefache table and
	# the path of the referenced parent folder. This may can happen in sone circumstances when moving folders with huge content.

	$SqlQuery = 'SELECT F.fileid, F.path, F.name, P.path FROM `' . $TblPfx . 'filecache` F INNER JOIN `' . $TblPfx . 'filecache`.` P ON P.fileid = F.parent ';
	$SqlQuery .= 'WHERE (CONCAT ( P.path , `/` , F.name) <> F.path) AND ( P.path <> `` ) ORDER BY fileid';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );

	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$IssueCounter += 1;
			AddToLogFile('[' . $FileEntry['fileid'] . '] ' . $FileEntry['path'] . "\n");
		}
		mysqli_free_result($SqlResult);
		if ($IssueCounter > 0) {
			$Response .= "$IssueCounter files found with a mismatch in their path entry (path does not match with their parent's path) and need to be fixed.\n";
		}
		else { $Response .= "Lucky you - no files found in the whole cloud with any mismatches in the path :-)\n";	}
	}
	else { $Response .= "Lucky you - no files found in the whole cloud with any mismatches in the path :-)\n";	}

	$TimeStamp = date("d.m.Y - H:i:s", time());
	AddToLogFile ("[$TimeStamp] Database scan completed. Analysis report:\n$Response---------------------------------------------------\n");
}
else { $Response = "Could not connect to the database!!!\n"; }

echo $Response;
mysqli_close($dblink);

die (0);

function AddToLogFile ( $LogString ) {
	global $LogFile;
	file_put_contents( $LogFile, $LogString, FILE_APPEND);
}
?>

Cheers,
Armin

3 Likes

Thanks for your script. I hope you created bug reports for the developers to eventually fix the issue.

1 Like

Hi there,

I just updated the script and beside of the GROUPFOLDERS it scans now the local storage of each user as well:

<?php
# Created by Armin Riemer (armin@elleven.de)
# Version 1.1, 28.04.2020

# This script scans a Nextcloud database for some observed bugs and errors, which I had observed in the past especially around
# the GROUPFOLDER plugin. There, sometimes files or folders got lost in the file system and client synchronization may begins to
# run in endless loops, if there is any file or folder indexed in the filecache table, which is physically not available in the
# expected location of the file system. Sometimes it also happened, that the Parent ID and the PATH in the filecache table didn't
# match anymore after big folders had been moved to another location in the Nextcloud. However, this leads to the same issue with
# endless looping sync clients. The result of the script's analysis is written into a log file, which will be stored in the root
# of the Nextcloud's data directory. This script was tested using a MySQL database, PHP 7.4 and via the command line only!!!

# This script needs to be located in the directory above the Nextcloud installation, but can be adopted to any other location on your
# Nextcloud server. It's currently optimized to run in the command line, but should be no problem to use it via https requests.

# Load the Nexcloud configuration and define additional variables
require_once('cloud/config/config.php'); # The path needs to be adopted according to your Nextcloud installation path
$TblPfx = $CONFIG['dbtableprefix'];
$LogFile = $CONFIG['datadirectory'] . '/Grpfldr_Diag.log';

# Connect to the database
$dblink = mysqli_connect($CONFIG['dbhost'], $CONFIG['dbuser'], $CONFIG['dbpassword'], $CONFIG['dbname']);
if (mysqli_connect_errno() == 0) {
	
	# The first section scans the database, if there is any indexed file or folder in the filecache table located in one of the
	# groupfolders, which cannot be found in the filesystem. There is an option to delete these dead entries directly with this
	# script, but this  currently deactivated. Be careful with this, as there were some issues with the correct characterset
	# interpretation and the mb_convert_encoding command does not work properly in all cases for some reasons.

	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile("[$TimeStamp] Scan database for missing files and folders in the file system:\n");

	# Scan the GROUPFOLDER storage 
	$SqlQuery = 'SELECT fileid,path,mimetype FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`path` LIKE "%__groupfolders/%" ORDER BY fileid';
	ScanFileSystem( $SqlQuery , "GROUPFOLDERS" , $CONFIG['datadirectory'] );

	# Pick-up the storage location for each available Nextcloud user and scan the storage of each user for missing fils/folders
	$SqlQuery = 'SELECT id,numeric_id FROM `' . $TblPfx . 'storages` WHERE `' . $TblPfx . 'storages`.`id` REGEXP "^home::.*$" AND `' . $TblPfx . 'storages`.`available` = 1';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );
	if ($SqlResult) {
		while ( $Storage = mysqli_fetch_assoc($SqlResult)) {
			$SqlQuery = 'SELECT fileid,path,mimetype FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`storage` = ' . $Storage['numeric_id'] . ' ORDER BY fileid';
			$StorageRoot = $CONFIG['datadirectory'] . "/" . substr( $Storage['id'] , 6 );
			ScanFileSystem( $SqlQuery , $Storage['id'] , $StorageRoot);
		}
		mysqli_free_result($SqlResult);
	}

	$IssueCounter = 0;
	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile("\n[$TimeStamp] Scan database for files with mismatches in the path:\n");
	
	# In the second section the database is scanned for any mismatch between the path of the file entry in the filefache table and
	# the path of the referenced parent folder. This may can happen in sone circumstances when moving folders with huge content.

	$SqlQuery = 'SELECT F.fileid, F.path, F.name, P.path FROM `' . $TblPfx . 'filecache` F INNER JOIN `' . $TblPfx . 'filecache`.` P ON P.fileid = F.parent ';
	$SqlQuery .= 'WHERE (CONCAT ( P.path , `/` , F.name) <> F.path) AND ( P.path <> `` ) ORDER BY fileid';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );

	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$IssueCounter += 1;
			AddToLogFile('(FIle ID: ' . $FileEntry['fileid'] . ') ' . $FileEntry['path'] . "\n");
		}
		mysqli_free_result($SqlResult);
		if ($IssueCounter > 0) {
			$Response .= "$IssueCounter files found with a mismatch in their path entry (path does not match with their parent's path).\n";
		}
		else { $Response .= "No files found with any mismatch in the path.\n"; }
	}
	else { $Response .= "No files found with any mismatch in the path.\n"; }

	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile ( "$Response\n[$TimeStamp] Database scan completed.\n---------------------------------------------------\n");
}
else { $Response = "Could not connect to the database!!!\n"; }

echo $Response;
mysqli_close($dblink);

die (0);

function ScanFileSystem( $SqlQuery , $Storage , $StorageRoot ) {
	# This function scans the database for any missing files or folders of a dedicated storage area (groupfolders or user folders). This function is called
	# once per user and once for the whole GROUPFOLDER environment. The output shows all files and folders listed in the filecache table for this dedicated
	# storage, where there cannot be found any physical file in the file system. The missing file's id is listed in the log file.
	
	global $dblink;
	$DefectiveItems = "";
	$FileIssueCounter = 0;
	$FolderIssueCounter = 0;
	$FileCounter = 0;
	$FolderCounter = 0;

	$SqlResult = mysqli_query( $dblink , $SqlQuery );
	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$FileName = $StorageRoot . '/' . mb_convert_encoding($FileEntry['path'], "UTF-8", "CP1252"); # This here is the critical characters set conversion!!!
			if ( $FileEntry['mimetype'] == 2 ) { $FolderCounter += 1; } else { $FileCounter += 1; }
			if (!file_exists($FileName)) {
				$DefectiveItems .= $FileEntry['fileid'] . "," ;
				if ( $FileEntry['mimetype'] == 2 ) { $FolderIssueCounter += 1; } else { $FileIssueCounter += 1; }
				AddToLogFile('[' . $FileEntry['fileid'] . '] ' . $FileEntry['path'] . "\n");
			}
		}
	
		mysqli_free_result($SqlResult);
		$Response = "$FileCounter files and $FolderCounter folders found in $Storage. ";

		if (strlen($DefectiveItems) > 1) {
			$DefectiveItems = substr($DefectiveItems , 0 , -1);
			$Response .= "$FileIssueCounter files and $FolderIssueCounter folders are missing in the filesystem!\n";
			
			$SqlQuery = 'DELETE FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`fileid` IN (' . $DefectiveItems . ');';
			#$SqlResult = mysqli_query( $dblink , $SqlQuery ); # This here is the option line to delete missing files from the cache - but be careful using this!!!
		}
		else { $Response .= "No issues found.\n"; }
	}
	else { $Response .= "No issues found.\n"; }

	AddToLogFile($Response);
	echo $Response;
}

function AddToLogFile ( $LogString ) {
	# This function just add a string to the log file
	
	global $LogFile;
	file_put_contents( $LogFile, $LogString, FILE_APPEND);
}
?>

Cheers,
Armin

Thanks a lot for this script! I have also spend hours and days on fixing strange issues in the Nextcloud DB. In my opinion the occ groupfolders:scan and occ files:scan commands should cover the case of orphaned/wrong files in the database as it does for files on the file system but not the in the database.
Is there any chance you could integrate your script into these commands? I think a starting point would be here Command/Scan.php

1 Like

Hi,

I just added some more features to the script. It’s now also scanning the filesystem and checks, if every file is already part of the filecache table. The difference to the build-in OCC function is, that you get a detailled response in the logfile without changing anything. I also added a function, which list duplicates in the filecache table. Also the reported issue with the character set is now solved. So the script is now providing a comprehensive status report of the filecache table overall for diagnostics. Here we are now:

<?php
# ---------------------------------------------------------------------------------------------------------------------------------
# Created by Armin Riemer (armin@elleven.de)
# Version 1.2, 02.05.2020
# ---------------------------------------------------------------------------------------------------------------------------------
# This script scans a Nextcloud database for some observed bugs and errors, which I had observed in the past especially around
# the GROUPFOLDER plugin. There, sometimes files or folders got lost in the file system and client synchronization may begins to
# run in endless loops, if there is any file or folder indexed in the filecache table, which is physically not available in the
# expected location of the file system. Sometimes it also happened, that the Parent ID and the PATH in the filecache table didn't
# match anymore after big folders had been moved to another location in the Nextcloud. However, this leads to the same issue with
# endless looping sync clients. On to the script scans the file system and shows all files, which are not listed in the filecache
# table. The result of the script's analysis is written into a log file, which will be stored in the root of the Nextcloud's data
# directory. This script was tested using a MySQL database, PHP 7.4 and via command line only!!!
# ---------------------------------------------------------------------------------------------------------------------------------
# This script needs to be located in the directory above the Nextcloud installation, but can be adopted to any other location on your
# Nextcloud server. It's currently optimized to run in the command line, but should be no problem to use it via https requests.
# ---------------------------------------------------------------------------------------------------------------------------------

# Load the Nexcloud configuration and define additional variables
require_once('cloud/config/config.php'); # The path needs to be adopted according to your Nextcloud installation path
$TblPfx = $CONFIG['dbtableprefix'];
$LogFile = $CONFIG['datadirectory'] . '/filecache_diag.log';
$Response = "";
$RunTime = time();
$FileCounter = 0;

# Connect to the database
$dblink = mysqli_connect($CONFIG['dbhost'], $CONFIG['dbuser'], $CONFIG['dbpassword'], $CONFIG['dbname']);
if (mysqli_connect_errno() == 0) {
	
	# The first section scans the database, if there is any indexed file or folder in the filecache table located in one of the
	# groupfolders, which cannot be found in the filesystem. There is an option to delete these dead entries directly with this
	# script, but this currently deactivated. 

	mysqli_set_charset ( $dblink , 'utf8mb4' );
	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile("[$TimeStamp] Scan database for missing files and folders in the file system:\n");

	# Scan the GROUPFOLDER storage 
	$SqlQuery = 'SELECT storage FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`path` = "__groupfolders" LIMIT 0,1';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );
	if ($SqlResult) {
		$StorageID = mysqli_fetch_assoc($SqlResult)['storage'];
		mysqli_free_result($SqlResult);

		$SqlQuery = 'SELECT fileid,path,mimetype FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`path` LIKE "%__groupfolders/%" ORDER BY fileid';
		ScanFileSystem( $SqlQuery , $StorageID , "GROUPFOLDERS" , "" );
	}

	# Pick-up the storage location for each available Nextcloud user and scan the storage of each user for missing fils/folders
	$SqlQuery = 'SELECT id,numeric_id FROM `' . $TblPfx . 'storages` WHERE `' . $TblPfx . 'storages`.`id` REGEXP "^home::.*$" AND `' . $TblPfx . 'storages`.`available` = 1';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );
	if ($SqlResult) {
		while ( $Storage = mysqli_fetch_assoc($SqlResult)) {
			$SqlQuery = 'SELECT fileid,path,mimetype FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`storage` = ' . $Storage['numeric_id'] . ' ORDER BY fileid';
			ScanFileSystem( $SqlQuery , $Storage['numeric_id'] , $Storage['id'] , "/" . substr( $Storage['id'] , 6 ));
		}
		mysqli_free_result($SqlResult);
	}

	# In the second section the database is scanned for any duplicate entries in the filefache table

	$IssueCounter = 0;
	$Response = "";
	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile("\n[$TimeStamp] Scan database for duplicates in the filecache table:\n");
	
	$SqlQuery = 'SELECT storage,path,COUNT(path) FROM ' . $TblPfx . 'filecache GROUP BY storage,path HAVING COUNT(path) > 1';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );

	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$IssueCounter += $FileEntry['COUNT(path)'];
			$SqlQuery = 'SELECT fileid FROM ' . $TblPfx . 'filecache WHERE storage = ' . $FileEntry['storage'] . ' AND path = "' . $FileEntry['path'] . '"';
			$SqlResult2 = mysqli_query( $dblink , $SqlQuery );
			if ($SqlResult2) {
				AddToLogFile($FileEntry['COUNT(path)'] .' duplicate entries found. File-IDs:');
				while ( $DuplicateEntry = mysqli_fetch_assoc($SqlResult2)) { AddToLogFile( " (" . $DuplicateEntry['fileid'] . ")"); }
				mysqli_free_result($SqlResult2);
				AddToLogFile("\n");
			}
		}
		mysqli_free_result($SqlResult);
		if ($IssueCounter > 0) { $Response .= "$IssueCounter duplicates found in the filecache table (same path and storage with different File-IDs).\n"; }
		else { $Response .= "No duplicate entries found in the filecache table.\n"; }
	}
	else { $Response .= "No duplicates found in the filecache table.\n"; }
	echo $Response;

	# In the third section the database is scanned for any mismatch between the path of the file entry in the filefache table and
	# the path of the referenced parent folder. This may can happen in sone circumstances when moving folders with huge content.

	$IssueCounter = 0;
	$Response = "";
	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile("\n[$TimeStamp] Scan database for files with mismatches in the path:\n");
	
	$SqlQuery = 'SELECT F.fileid, F.path, F.name, P.path FROM `' . $TblPfx . 'filecache` F INNER JOIN `' . $TblPfx . 'filecache`.` P ON P.fileid = F.parent ';
	$SqlQuery .= 'WHERE (CONCAT ( P.path , `/` , F.name) <> F.path) AND ( P.path <> `` ) ORDER BY fileid';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );

	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$IssueCounter += 1;
			AddToLogFile('(File ID: ' . $FileEntry['fileid'] . ') ' . $FileEntry['path'] . "\n");
		}
		mysqli_free_result($SqlResult);
		if ($IssueCounter > 0) { $Response .= "$IssueCounter files found with a mismatch in their path entry (path does not match with their parent's path).\n"; }
		else { $Response .= "No files found with any mismatch in the path.\n"; }
	}
	else { $Response .= "No files found with any mismatch in the path.\n"; }

	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile ( "$Response\n[$TimeStamp] Database scan completed.\n---------------------------------------------------\n");
}
else { $Response = "Could not connect to the database!!!\n"; }

echo $Response;
mysqli_close($dblink);

die(0);

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function scans the database for any missing files or folders of a dedicated storage area (groupfolders or user folders). This function is called
# once per user and once for the whole GROUPFOLDER environment. The output shows all files and folders listed in the filecache table for this dedicated
# storage, where there cannot be found any physical file in the file system. The missing file's id is listed in the log file.
	
function ScanFileSystem( $SqlQuery , $StorageID , $Storage , $StorageFolder ) {
	global $dblink;
	global $CONFIG;
	global $Response;
	global $FileCounter;
	
	$StorageRoot = $CONFIG['datadirectory'] . $StorageFolder;
	$DefectiveItems = "";
	$FileIssueCounter = 0;
	$FolderIssueCounter = 0;
	$FileCounter = 0;
	$FolderCounter = 0;
	$ScanCounter = 0;	

	$SqlResult = mysqli_query( $dblink , $SqlQuery );
	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$FileName = $StorageRoot . '/' . $FileEntry['path'];
			if ( $FileEntry['mimetype'] == 2 ) { $FolderCounter += 1; } else { $FileCounter += 1; }
			if (!file_exists($FileName)) {
				$DefectiveItems .= $FileEntry['fileid'] . "," ;
				if ( $FileEntry['mimetype'] == 2 ) { $FolderIssueCounter += 1; } else { $FileIssueCounter += 1; }
				AddToLogFile('[' . $FileEntry['fileid'] . '] ' . $FileEntry['path'] . "\n");
			}
		}
		mysqli_free_result($SqlResult);
		$Response = "$FileCounter files and $FolderCounter folders found in $Storage. ";
	}

	echo $Response . "Scanning file system...";
	$ScanDir = $StorageRoot;
	if ($Storage == "GROUPFOLDERS") { $ScanDir .= "/__groupfolders"; }
	$FileCounter = RecursiveFileScan( $ScanDir , $StorageFolder , $StorageID );

	if (strlen($DefectiveItems) > 1) {
		$DefectiveItems = substr($DefectiveItems , 0 , -1);
		$Response .= "$FileIssueCounter files and $FolderIssueCounter folders are missing in the filesystem. ";
		
		$SqlQuery = 'DELETE FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`fileid` IN (' . $DefectiveItems . ');';
		#$SqlResult = mysqli_query( $dblink , $SqlQuery );	###### These here are the optional lines to delete missing
		#mysqli_free_result($SqlResult);					###### files from the cache - but be careful using this!!!
	}
	if ($FileCounter > 0 ) { $Response .= "$FileCounter files are missing in the database."; }
	elseif ( strlen($DefectiveItems) <= 1) { $Response .= "No issues found.               "; }
	$Response .= "\n";
	AddToLogFile($Response);
	echo "\r" . $Response;
}

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function scans a a physical directory recursively and counts all files, which aren't listed in the filecache table

function RecursiveFileScan( $ScanDir , $StorageFolder , $StorageID ) {
	global $RunTime;
	global $Response;
	global $CONFIG;
	global $dblink;
	global $TblPfx;
	global $ScanCounter;
	global $FileCounter;

	$DbQueryDone = false;
	$DirEntries = array();
	$Return = 0 ;
    $Tree = glob( rtrim( $ScanDir , '/') . '/*' );
	
	if (( time() - $RunTime ) >= 1 ) { $RunTime = time(); echo "\r" . $Response . "Scanning file system... (" . round( $ScanCounter / $FileCounter * 100 ) . "%)" ; }

    if ( is_array( $Tree )) {
        foreach( $Tree as $File ) {
			if ( is_dir( $File )) { $Return += RecursiveFileScan( $File , $StorageFolder , $StorageID ); }
            elseif ( is_file( $File )) {
				if (!$DbQueryDone) {
					if ( preg_match ( "/^" . preg_quote( $CONFIG['datadirectory'] . $StorageFolder , '/' ) . "\/(.*)$/" , $ScanDir . "/" , $Path ) == 1) {
						$SqlQuery = 'SELECT `name`  FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`path` LIKE "' . $Path[1] . '%" AND `' . $TblPfx . 'filecache`.`storage` = ' . $StorageID;
						$SqlResult = mysqli_query( $dblink , $SqlQuery );
						if ( $SqlResult ) {
							$DirEntries = mysqli_fetch_all( $SqlResult );
							mysqli_free_result( $SqlResult );
							$DbQueryDone = true;
						}
					if (!$DbQueryDone) { echo "\nError performing SQL query in $ScanDir\n"; exit(0);}
					}
				}
				$ScanCounter++;
				if ( in_array( basename( $File ) , array_column( $DirEntries , 0 ) , true ) === false ) {
					AddToLogFile( "Not indexed: " . $StorageFolder . "/" . $Path[1] . basename( $File ) . "\n");
					$Return++;
				}
			}
        }
    }
	return $Return;
}

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function just add a string to the log file

function AddToLogFile ( $LogString ) {
	global $LogFile;
	file_put_contents( $LogFile, $LogString, FILE_APPEND);
}
?>

Cheers,
Armin

1 Like

Hi,

I just updated the script and it’s now able to distinguish between web request and execution via the console. The output is adjusted accordingly. Here’s the code:

<?php
# ---------------------------------------------------------------------------------------------------------------------------------
# Created by Armin Riemer (armin@elleven.de)
# Version 1.3, 06.05.2020
# ---------------------------------------------------------------------------------------------------------------------------------
# This script scans a Nextcloud database for some observed bugs and errors, which I had observed in the past especially around
# the GROUPFOLDER plugin. There, sometimes files or folders got lost in the file system and client synchronization may begins to
# run in endless loops, if there is any file or folder indexed in the filecache table, which is physically not available in the
# expected location of the file system. Sometimes it also happened, that the Parent ID and the PATH in the filecache table didn't
# match anymore after big folders had been moved to another location in the Nextcloud. However, this leads to the same issue with
# endless looping sync clients. On to the script scans the file system and shows all files, which are not listed in the filecache
# table. The result of the script's analysis is written into a log file, which will be stored in the root of the Nextcloud's data
# directory. This script was tested using a MySQL database, PHP 7.4 and either via Linux command line and via web request.
# ---------------------------------------------------------------------------------------------------------------------------------
# This script needs to be located in the directory above the Nextcloud installation, but can be adopted to any other location on your
# Nextcloud server. It's currently optimized to run in the command line, but should be no problem to use it via https requests.
# ---------------------------------------------------------------------------------------------------------------------------------

# Load the Nexcloud configuration and define additional variables
require_once('cloud/config/config.php'); # The path needs to be adopted according to your Nextcloud installation path
$TblPfx = $CONFIG['dbtableprefix'];
$LogFile = $CONFIG['datadirectory'] . '/filecache_diag.log';
$Response = "";
$RunTime = time();
$FileCounter = 0;
$DbNeedsAttention = FALSE;
$isWebRequest = (isset($_SERVER['SERVER_PROTOCOL']));

if ( $isWebRequest ) { SendHtmlHeader(); }

# Connect to the database
$dblink = mysqli_connect($CONFIG['dbhost'], $CONFIG['dbuser'], $CONFIG['dbpassword'], $CONFIG['dbname']);
if (mysqli_connect_errno() == 0) {
	
	# The first section scans the database, if there is any indexed file or folder in the filecache table located in one of the
	# groupfolders, which cannot be found in the filesystem. There is an option to delete these dead entries directly with this
	# script, but this currently deactivated. 

	mysqli_set_charset ( $dblink , 'utf8mb4' );
	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile("[$TimeStamp] Scan database for missing files and folders in the file system:\n");

	# Scan the GROUPFOLDER storage 
	$SqlQuery = 'SELECT storage FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`path` = "__groupfolders" LIMIT 0,1';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );
	if ($SqlResult) {
		$StorageID = mysqli_fetch_assoc($SqlResult)['storage'];
		mysqli_free_result($SqlResult);

		$SqlQuery = 'SELECT fileid,path,mimetype FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`path` LIKE "%__groupfolders/%" ORDER BY fileid';
		ScanFileSystem( $SqlQuery , $StorageID , "GROUPFOLDERS" , "" );
	}

	# Pick-up the storage location for each available Nextcloud user and scan the storage of each user for missing fils/folders
	$SqlQuery = 'SELECT id,numeric_id FROM `' . $TblPfx . 'storages` WHERE `' . $TblPfx . 'storages`.`id` REGEXP "^home::.*$" AND `' . $TblPfx . 'storages`.`available` = 1';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );
	if ($SqlResult) {
		while ( $Storage = mysqli_fetch_assoc($SqlResult)) {
			$SqlQuery = 'SELECT fileid,path,mimetype FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`storage` = ' . $Storage['numeric_id'] . ' ORDER BY fileid';
			ScanFileSystem( $SqlQuery , $Storage['numeric_id'] , $Storage['id'] , "/" . substr( $Storage['id'] , 6 ));
		}
		mysqli_free_result($SqlResult);
	}

	# In the second section the database is scanned for any duplicate entries in the filecache table

	$IssueCounter = 0;
	$Response = "";
	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile("\n[$TimeStamp] Scan database for duplicates in the filecache table:\n");
	
	$SqlQuery = 'SELECT storage,path,COUNT(path) FROM ' . $TblPfx . 'filecache GROUP BY storage,path HAVING COUNT(path) > 1';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );

	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$IssueCounter += $FileEntry['COUNT(path)'];
			$SqlQuery = 'SELECT fileid FROM ' . $TblPfx . 'filecache WHERE storage = ' . $FileEntry['storage'] . ' AND path = "' . $FileEntry['path'] . '"';
			$SqlResult2 = mysqli_query( $dblink , $SqlQuery );
			if ($SqlResult2) {
				AddToLogFile($FileEntry['COUNT(path)'] .' duplicate entries found. File-IDs:');
				while ( $DuplicateEntry = mysqli_fetch_assoc($SqlResult2)) { AddToLogFile( " (" . $DuplicateEntry['fileid'] . ")"); }
				mysqli_free_result($SqlResult2);
				AddToLogFile("\n");
			}
		}
		mysqli_free_result($SqlResult);
		if ($IssueCounter > 0) {
			$DbNeedsAttention = TRUE;
			$Response .= "$IssueCounter duplicates found in the filecache table (same path and storage with different File-IDs).\n"; 
		}
		else { $Response .= "No duplicate entries found in the filecache table.\n"; }
	}
	else { $Response .= "No duplicates found in the filecache table.\n"; }
	SendFeedback ( $Response );

	# In the third section the database is scanned for any mismatch between the path of the file entry in the filefache table and
	# the path of the referenced parent folder. This may can happen in sone circumstances when moving folders with huge content.

	$IssueCounter = 0;
	$Response = "";
	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile("\n[$TimeStamp] Scan database for files with mismatches in the path:\n");
	
	$SqlQuery = 'SELECT F.fileid, F.path, F.name, P.path FROM `' . $TblPfx . 'filecache` F INNER JOIN `' . $TblPfx . 'filecache`.` P ON P.fileid = F.parent ';
	$SqlQuery .= 'WHERE (CONCAT ( P.path , `/` , F.name) <> F.path) AND ( P.path <> `` ) ORDER BY fileid';
	$SqlResult = mysqli_query( $dblink , $SqlQuery );

	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$IssueCounter += 1;
			AddToLogFile('(File ID: ' . $FileEntry['fileid'] . ') ' . $FileEntry['path'] . "\n");
		}
		mysqli_free_result($SqlResult);
		if ($IssueCounter > 0) {
			$Response .= "$IssueCounter files found with a mismatch in their path entry (path does not match with their parent's path).\n";
			$DbNeedsAttention = TRUE;
			}
		else { $Response .= "No files found with any mismatch in the path.\n"; }
	}
	else { $Response .= "No files found with any mismatch in the path.\n"; }

	$TimeStamp = date("d.m.Y H:i:s", time());
	AddToLogFile ( "$Response\n[$TimeStamp] Database scan completed.\n---------------------------------------------------\n");
}
else {
	$Response = "Could not connect to the database!!!\n";
	$DbNeedsAttention = TRUE;
	}

SendFeedback ( $Response );

if ($isWebRequest) {
	Print "</DIV>\n<DIV style='font-weight: bold;";
	if ($DbNeedsAttention) { Print " color: red;'>YOUR SYSTEM NEEDS SOME ATTENTION!!!"; }
	else { Print "'>Your system is in good condition."; }
	SendHtmlFooter();
}
elseif ($DbNeedsAttention) {echo "\033[41mYour system needs some attention!\033[0m\n";}

mysqli_close( $dblink );

die(0);

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function scans the database for any missing files or folders of a dedicated storage area (groupfolders or user folders). This function is called
# once per user and once for the whole GROUPFOLDER environment. The output shows all files and folders listed in the filecache table for this dedicated
# storage, where there cannot be found any physical file in the file system. The missing file's id is listed in the log file.
	
function ScanFileSystem( $SqlQuery , $StorageID , $Storage , $StorageFolder ) {
	global $dblink;
	global $CONFIG;
	global $Response;
	global $FileCounter;
	global $isWebRequest;
	global $DbNeedsAttention;
	
	$StorageRoot = $CONFIG['datadirectory'] . $StorageFolder;
	$DefectiveItems = "";
	$FileIssueCounter = 0;
	$FolderIssueCounter = 0;
	$FileCounter = 0;
	$FolderCounter = 0;
	$ScanCounter = 0;	

	$SqlResult = mysqli_query( $dblink , $SqlQuery );
	if ($SqlResult) {
		while ( $FileEntry = mysqli_fetch_assoc($SqlResult)) {
			$FileName = $StorageRoot . '/' . $FileEntry['path'];
			if ( $FileEntry['mimetype'] == 2 ) { $FolderCounter += 1; } else { $FileCounter += 1; }
			if (!file_exists($FileName)) {
				$DefectiveItems .= $FileEntry['fileid'] . "," ;
				if ( $FileEntry['mimetype'] == 2 ) { $FolderIssueCounter += 1; } else { $FileIssueCounter += 1; }
				AddToLogFile('[' . $FileEntry['fileid'] . '] ' . $FileEntry['path'] . "\n");
			}
		}
		mysqli_free_result($SqlResult);
		$Response = "$FileCounter files and $FolderCounter folders found in $Storage. ";
	}

	if (!$isWebRequest) { echo $Response . "Scanning file system..."; }
	$ScanDir = $StorageRoot;
	if ($Storage == "GROUPFOLDERS") { $ScanDir .= "/__groupfolders"; }
$FileCounter = RecursiveFileScan( $ScanDir , $StorageFolder , $StorageID );

	if (strlen($DefectiveItems) > 1) {
		$DefectiveItems = substr($DefectiveItems , 0 , -1);
		$Response .= "$FileIssueCounter files and $FolderIssueCounter folders are missing in the filesystem. ";
		
		$SqlQuery = 'DELETE FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`fileid` IN (' . $DefectiveItems . ');';
		#$SqlResult = mysqli_query( $dblink , $SqlQuery );	###### These here are the optional lines to delete missing
		#mysqli_free_result($SqlResult);					###### files from the cache - but be careful using this!!!
	}
	if ($FileCounter > 0 ) {
		$DbNeedsAttention = TRUE;
		$Response .= "$FileCounter files are missing in the database.";
	}
	elseif ( strlen($DefectiveItems) <= 1) { $Response .= "No issues found.               "; }
	$Response .= "\n";
	AddToLogFile($Response);
	if (!$isWebRequest) { echo "\r"; }
	SendFeedback ( $Response );
}

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function scans a a physical directory recursively and counts all files, which aren't listed in the filecache table

function RecursiveFileScan( $ScanDir , $StorageFolder , $StorageID ) {
	global $RunTime;
	global $Response;
	global $CONFIG;
	global $dblink;
	global $TblPfx;
	global $ScanCounter;
	global $FileCounter;
	global $isWebRequest;

	$DbQueryDone = false;
	$DirEntries = array();
	$Return = 0 ;
    $Tree = glob( rtrim( $ScanDir , '/') . '/*' );
	
	if ( $isWebRequest AND (( time() - $RunTime ) >= 1 )) { $RunTime = time(); echo "\r" . $Response . "Scanning file system... (" . round( $ScanCounter / $FileCounter * 100 ) . "%)" ; }

    if ( is_array( $Tree )) {
        foreach( $Tree as $File ) {
			if ( is_dir( $File )) { $Return += RecursiveFileScan( $File , $StorageFolder , $StorageID ); }
            elseif ( is_file( $File )) {
				if (!$DbQueryDone) {
					if ( preg_match ( "/^" . preg_quote( $CONFIG['datadirectory'] . $StorageFolder , '/' ) . "\/(.*)$/" , $ScanDir . "/" , $Path ) == 1) {
						$SqlQuery = 'SELECT `name`  FROM `' . $TblPfx . 'filecache` WHERE `' . $TblPfx . 'filecache`.`path` LIKE "' . $Path[1] . '%" AND `' . $TblPfx . 'filecache`.`storage` = ' . $StorageID;
						$SqlResult = mysqli_query( $dblink , $SqlQuery );
						if ( $SqlResult ) {
							$DirEntries = mysqli_fetch_all( $SqlResult );
							mysqli_free_result( $SqlResult );
							$DbQueryDone = true;
						}
					if (!$DbQueryDone) { SendFeedback ( "\nError performing SQL query in $ScanDir\n" ); exit(0);}
					}
				}
				$ScanCounter++;
				if ( in_array( basename( $File ) , array_column( $DirEntries , 0 ) , true ) === false ) {
					AddToLogFile( "Not indexed: " . $StorageFolder . "/" . $Path[1] . basename( $File ) . "\n");
					$Return++;
				}
			}
        }
    }
	return $Return;
}

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function just add a string to the log file

function AddToLogFile ( $LogString ) {
	global $LogFile;
	file_put_contents( $LogFile, $LogString, FILE_APPEND);
}

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function feeds an output to the console or to the web client

function SendFeedback ( $Output ) {
	global $isWebRequest;
	global $WebClient;
	
	if ($isWebRequest) {
		print preg_replace ( '/\n/' , "<BR>\n" , $Output);
	}
	else { echo $Output; }
}

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function feeds the HTML header to the web client

function SendHtmlHeader(){
	print "<HTML>\n<HEAD>\n<TITLE>Nextcloud</TITLE>\n";
	print "\n<BODY BGCOLOR=#E8E8E8 TEXT=#000000 LINK=#0000FF VLINK=#FF0000>\n";
	print "<h1>Nextcloud instance health status:</h1>\n";
	print "<DIV style='line-height: 1.3;padding: 10px; background-color: #ccc; margin: 5px 50px 15px 15px;'>\n";
}

# -----------------------------------------------------------------------------------------------------------------------------------------------------
# This function feeds the HTML footer to the web client

function SendHtmlFooter(){
	print "</DIV></BODY>\n</HTML>\n";
}
?>

Cheers,
Armin

2 Likes

BTW - I’m now triggering this script once a day by a WEB cron, which sends me a mail in case something is wrong during the analysis. So I’m now much more confidence, that the system is up and running as it’s used by a small company with about a dozen of employees.

If someone else is using this script, I would be happy for some feedback, e.g. if it’s useless for you or somehow valuable.

Cheers,
Armin