Tutorial: How to migrate mass data to a new NextCloud server

This easy to follow tutorial will show the best practice to migrate mass data, say from an existing NAS (network drive) to a new NextCloud server, effectively and efficiently without wasting upload transfer time which may be a very lengthy process, save power and resources and get up to speed with client synchronization.

Audience
This tutorial is intended for everyone (beginner to advanced user)

User Case
You have an existing drive with over 1TB of data that you wish to migrate to your new NextCloud Install. You may also have an existing NextCloud install and wish to add new mass data to your server without recurring to use the NextCloud client sync app and or browser upload features.

Prerequisites
This tutorial assumes that you have the following:

  • A NextCloud Server properly installed and configured with enough storage to host the new data you are about to migrate. Preferably you have a high capacity drive already setup as a local under “external storage” in NextCloud server settings. Note on NextCloud 11 you need to enable the External Storage App from the Apps before.
  • An external NAS or drive that stores your data that you wish to migrate to NextCloud. Ideally the data within is already sorted and orginised in a folder structure that you are happy with.
  • If you intend to store NTFS files (for Windows Clients) you have already completed the tutorial Temporary NTFS tutorial. It will be replaced with a proper tutorial :wink:. Skip this if you will use Linux default filesystem.

Let’s start
Physically connect the external drive (generally via USB) containing the data that will be migrated to the host that has NextCloud server installed on it and then powerup the system.

SSH into your server as a superuser.

Let us find the dev that Linux gave to the external drive so that we could mount it. To do this, from terminal run

lsblk

You should be presented with a list of devices and their mount point. You will be presented with a list similar to this

NAME                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                             8:0    0  1.8T  0 disk
├─sda1                          8:1    0  250G  0 part  /media/crypto
└─sda2                          8:2    0  1.6T  0 part  /media/win
sdb                             8:16   0  1.8T  0 disk
└─sdb1                          8:17   0  1.8T  0 part
sdc                             8:32   1 29.8G  0 disk
├─sdc1                          8:33   1  487M  0 part  /boot
├─sdc2                          8:34   1    1K  0 part
└─sdc5                          8:37   1 29.3G  0 part
  ├─hostname--vg-root   252:0    0 21.3G  0 lvm   /
  └─hostname--vg-swap_1 252:1    0    8G  0 lvm
    └─cryptswap1              252:2    0    8G  0 crypt [SWAP]
sr0                            11:0    1 1024M  0 rom

In the output above, I have sda set as my local external storage for NextCloud with two partitions named win (sda2) and crypto (sda1). Crypto is used for server side encryption and for the purpose of this tutorial I will omit it.

Notice that the external drive has been recongnised as sdb with one partition containing the data being sdb1. Remember to replace sdb1 with your dev number that the command lsblk has given you.

Let as create a directory so that we mount the external drive to. From terminal type

sudo mkdir /media/extdrive

You may wish to change “extdrive” to any name you wish as long as its not already mounted and used.

Now let us mount the drive to the folder we just created

sudo mount /dev/sdb1 /media/extdrive

Let us cross check to make sure the mount was correct by running again

lsblk

This should output

NAME                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                             8:0    0  1.8T  0 disk
├─sda1                          8:1    0  250G  0 part  /media/crypto
└─sda2                          8:2    0  1.6T  0 part  /media/win
sdb                             8:16   0  1.8T  0 disk
└─sdb1                          8:17   0  1.8T  0 part /media/extdrive
sdc                             8:32   1 29.8G  0 disk
├─sdc1                          8:33   1  487M  0 part  /boot
├─sdc2                          8:34   1    1K  0 part
└─sdc5                          8:37   1 29.3G  0 part
  ├─hostname--vg-root   252:0    0 21.3G  0 lvm   /
  └─hostname--vg-swap_1 252:1    0    8G  0 lvm
    └─cryptswap1              252:2    0    8G  0 crypt [SWAP]
sr0                            11:0    1 1024M  0 rom

Notice that dev/sdb1 is now mounted at /media/extdrive.

Ok so far so good, let’s head to the data migration. For the intents of this tutorial, we will keep to /media/win as the local external drive set in our NextCloud install and we will be using a very useful bash command “rsync” to merge the data from our external drive to the NextCloud data folder.

We will be changing to root for this to bypass any restrictions, from terminal type

sudo -i

Notice the prompt changed to

root@hostname:~#

Now we will typing the following command to merge the data. Replace “foldercontainingdata” with the name of the folder you used to store all the data on the external drive.

rsync -a /media/extdrive/foldercontainingdata/ /media/win

The data on your external drive is now being copied and or merged onto the storage of NextCloud Server local external storage.

This will take some time, depending on your server’s processors, ram and the amount of data you are transferring. It took me about 10 minutes to transfer 1.6TB of data on a dual xeon processor. If you are using an SoC like a Pi it would be longer.

When data syncing is ready you will be presented with the root prompt once again

root@hostname:~#

I have the habit to cross check all processes, and would like to ascertain that the syncing happened. To do this I simply cd to the NextCloud data folder and run ls. As we are root I simply do

cd /media/win
ls

This will present you with the folder structure and files of the data that has been synced.

Now its time to force NextCloud to scan for the new files. To do this from terminal type the following:

cd /var/www/nextcloud
sudo -u www-data php console.php files:scan --all

This will start the scanning, needed to populate the NextCloud database and will output something like this

Scanning files for 2 users
Starting scan for user 1 out of 2 (username)
+---------+--------+--------------+
| Folders | Files  | Elapsed time |
+---------+--------+--------------+
| 28585   | 107801 | 02:11:30     |
+---------+--------+--------------+

This process will be done for all the users.

Lastly we safely unmount our external drive sdb1 is the dev used in this tutorial, make sure to use yours instead.

sudo umount /dev/sdb1
sudo reboot

That’s it, the round trip time for the transfer to this data including NextCloud Scanning was a couple of hours, compared to over a month to do such with the conventional Desktop App or browser.

If you liked or found this tutorial helpful give it a like “heart”. Post any relevant questions hereunder, I will do my best to answer all.

43 Likes

Thank you for this very clear tutorial.

Is it possible to omit external storage and put directly new files/folders to data/[username]/files in nextcloud web repository ?

Regards.
Fred.

Hello,

I’ve answer the question by doing upload on test server.

It works without external storage.
In fact I used rsync to transfer folders to users files directory.
The only thing is that console.php script not work (nextcloud 12 ?).
After searching unseccefully the web I try occ script that have the sames parameters :slight_smile:
sudo -u nginx php occ files:scan --all

Regards,
Fred

3 Likes

@fredk sorry for my later reply, was unavailable. It would work without external storage just the same. The most important thing is that you run the scan --all command so that you populate the nextcloud database

2 Likes

Hi fab:

Thank you for this clear guide. FYI, your

sudo -u www-data php console.php files:scan –all

has a typo in it I think, it should be

sudo -u www-data php console.php files:scan --all

The only difference is the --all option. You probably wrote that up in a word processor that “autocorrected” two dashes with a long dash.

1 Like

@Tom_Forge thanks for pointing it out :wink: Correction made

Hi fab:

If you have the time and inclination maybe you explain in a bit more detail as to what happens when you run the:

sudo -u www-data php console.php files:scan --all

command (herein “UPDATE COMMAND”)?

I have been reorganizing and reorganizing nearly 2GB of files in the Nextcloud directory over and over and then I’ll run the UPDATE COMMAND. I am worried that I may be creating excessive database entries by doing this.

Another thing, loosely related, that I think is an issue is that I first rsync’d my NTFS external drive into a /oldfiles directory that I created as the root. So then I moved the files into my Nextcloud data folders into the hopefully appropriate NC user or NC group folder. It turns out that sometimes I get this wrong and so that necessitates moving the files a 3rd or 4th time with a root privileged Nemo session. That necessitates, I think, running the UPDATE command again. I think? Anyway, my ownership and permissions for the files I am manipulating are mostly:

-rwxrwxrwx 1 root root

I am thinking that I probably should do a chown and chmod on the whole /nextcloud/filespath. But I don’t know what you would recommend. Please don’t worry about me messing things up. I still have the external drive with all my files so I can do this all over again if I need to.

So I tried running chown -R www-data:www-data /nextcloud/filespath/__groupfiles/4/test/ (was root:root ownership) and when I did that I go back into Nextcloud with an Admin login and the files do not display even though my admin account is also a member of that group. So I am a bit confused at this point as to what exactly is going on here.

@Tom_Forge I am not sure if I grasped your issue right. I would reorder the data (ideally for NextCloud on SoC and embedded devices) if this is on an external drive, by using an normal pc, then attach this to your NextCloud server and perform the scan --all so that console.php (or occ if nginx is used instead of apache) will update the database of files on the external storage.

This is just a typo but it should be as seen dev/sdb1

@lebernd no, as per tutorial steps you would have mounted dev/sdb1 as /media/extdrive

Yeah, that’s what I’m pointing to :wink:

Anyway you should correct this - I’m quoting you… and write what you meant to write… but you write once ‘sda1’ where you mean you have written ‘sdb1’.

Thanks fab:

My apologies for being unclear. I am a long time pc enthusiast and skin deep administrator so I do fumble about quite a bit. Also, I don’t want to impose on you as I realize my questions really are not the focus of your howto. But I thought I would reach out for some mentoring in case you have the inclination to have a discussion on topics that relate to your howto.

So, in my situation there’s just no figuring out the final file structure prior to implementation. This server is for a family of four and each of us have our hands in a bunch of different private and group interests and enterprises. To complicate things I am having to sort through about 7 different PCs and multiple backups over the years to figure out which files – located all over the place because of different file storage and sharing preferences of each person – are the version(s) to keep. So it’s a bloody self-created family and business mess to be sure. It would take a long time (weeks) to figure out the final destination and final version of everything if I could sit down and do it day after day. Unfortunately, I have to hit this project as time permits. So, what I have done is just take chunks at a time, i.e. thousands of photos, and got them onto the server. But they, like everything else, need to go through a process of sorting, while at the same time we’re trying to get our day to day workflow going as well. Anyway, I am describing this just so folks understand the scope and limitations of what I am doing. I am not expecting others to solve this organizational mess.

What I am hoping for is to understand enough about how Nextcloud/Mariadb/Apache works to go about this work.

So if you have time to help let’s just say that my /nextcloud/filespath will end up with files in there with different owners and different permissions. I think that as root I should be able to run:

chown -R www-data:www-data /nextcloud/filespath

and that should set the file ownership to what is best for Nextcloud to operate as expected. Do you think that is right?

Next, I am interested in people’s opinions on what the file permissions should be? I am thinking:

-rwxrwx--- www-data www-data (for files)

and

drwxrwx--- www-data www-data (for directories)

However, someone suggested that Apache umask can be used as well.

And finally, my first question was that, with all of this reorganizing of files and running the UPDATE COMMAND is my Mariadb going to have tons of entries that I should be looking to purge or not? I will be able to dig into this myself a little later on, I was just hoping you might be able to point me in the right direction on this question. For example, I might just be able to purge a changelog of some sort to clean up the sqldata.

And thanks for your Howto it works great!

@Tom_Forge I had the same situation as you both at work (20 users with data since 1993) and at home (a family of 4). To solve the organizational mess, I had no other option but to take the plunge and create a directory structure common for all users and migrated the data into these folders on the user devices and copied such to an external high capacity drive. To expedite I used a comparing software that I generally use for coding found here they have a free trial.

After organizing the data back at work, I mounted the external drive (containing all the data from the user pcs) with the rack servers via USB and used rsync (as per this tutorial) to copy the file structure and data on the DC storage drives, scan --all to populate the db of mysql (in our case).

On a much lighter dimension, I did the same on the user pcs back at home.

Will research and revert back to you on your other questions as I am not quite sure on the file permissions if such would constitute vulnerabilities.

Hi fab:

Heh heh the way you went about it sounds so much easier. But you have to be sure of how you want things organized and I am not. And I can’t have everybody stop computing while I finally get this all structured out and merged. I may just be in a situation where there is a lot more collaboration on files and projects. Photos, music, is technically but time consuming to root out duplicates and to sort into a years and months subfolder structure, etc.

I have been using Meld and Pyrenamer but I’ll definitely take a look at the scooter software, thanks for the tip.

Well I did some testing and `chown -R www-data:www-data /nextcloud/filespath/ seems to fix most permission issues. I went ahead and did the UPDATE COMMAND and a restart afterwards and that may help as well.

Once I get all this data sorted and working I’ll dig into my Mariadb questions. It shouldn’t be that hard to look at the tables and figure out if I am creating a big SQLdata mess by running this UPDATE COMMAND everytime I reorganize a batch of files. But in the worst case scenario my files will be organized exactly how I want, then I’ll back that up to a new external drive, and then I can reinstall Mariadb and-or Nextcloud if I have to, but I am betting on that if there is a SQLdata mess I’ll be able to clean it up manually with a bit of research. Thanks for all your input, much appreciated!

@Tom_Forge didn’t want to discourage you… At work I shifted through 90TB of data spanning since 1993. I had to face the not so happy user faces too, whilst doing so. You may wish to first create the file structure first than invite to see if they like it and then apply it. I have the full version of scooter and it helps alot in transferring of data, but try to use a high end pc beefed up with RAM, so you can keeping working your way through the data transfer without waiting for the cut and past thing to complete.

I am happy that you solved the ownership and rwe permissions as this is very much dependent on how you installed NextCloud.

Currently I am working on another NextCloud project at home whereby I will be trying to make better use of my router running LEDE (a port of OpenWRT) to act as a NextCloud Server. If I succeed I would have accomplished my goal to exploit and make better use of my embedded system (Linksys WRT1900ACS v2) and remove a power hungry rack server from home and a NAS to run everything I need with just 0.03 kW/h :wink:

1 Like

@lebernd, correction noted and done, thanks for pointing it out :wink:

Hello fab, how are you?

I have some questions:

  • About this: "… data that you wish to migrate to NextCloud. Ideally the data within is already sorted and orginised in a folder structure that you are happy with." The organization of the data have to be like data/[username]/. Because I cant understand how the scan asociates a file to an user. Ergo, I have 4 users with 10 files each one. Which should be the better organization to do the sync and impact in NextCloud?

  • So, as a Prerequisite the users have to exists in the database?

Regards,

Alexis.

@alexis.rosano Hi, the scope of this tutorial is to migrate existing mass data into a new nextcloud install. The data you have to migrate is to be ready available on an external drive, organized in a folder structure friendly to the user who is going to use it. Scan does not associate the data with the user but populates the database in use with nextcloud (Maria DB or mysql) with the file names of the data - a kind of index. You as an admin would associate the user to a folder from the nextcloud web interface.

Thanks for your answer.

Regards!

Great post I have needed to do this with over 16gb of data a few times I never knew bout the console.php script to force a scan and update the database so I always spent a few days using the sync app for each user. In the owncloud days (and i think in the early nextcloud versions) you use to be able to just copy the data to the webroot data folder and it would just work but this option was removed a few years ago for reasons I never understood.

Just an fyi rather than running ls to check the progress of rsync you can use rsync -avP to view rsyncs progress in realtime. You can also speed up the transfer by adding compression during transfer using the -z option.

rsync is a very powerful tool with tons of useful options great for tasks like this.

-a
archive mode
-v
increase verbosity
-P
show progress during transfer (must use upercase P)

more on rsync here
https://linux.die.net/man/1/rsync

thanks again
tim