:date: 2016-10-07 09:10

.. index:: lizardfs, tech, centos, linux, filesystem

.. _`2016-lizardfs`:

LizardFS
========

`Lizardfs <https://lizardfs.com>`_ is a distributed file system that originates from `MooseFS <http://moosefs.com>`_ and spans over multiple servers/systems automatically replicating and managing a mountable file-system.

For this testing purpose and prove of concept I've chosen a limited setup with four hosts:

* Host1: LizardFS master (Master + Chunk-server)
* Host2: Chunk-server
* Host3: Chunk-server
* Host4: LizardFS Metalogger (Metalogger + Chunk-server)

.. image:: /_images/images/gallery/2016/2016-placeholder/20161007-lizardfs-testsystem.png
   :scale: 50
   :alt: Testsystem setup
   :align: center

I have deployed the chunk-server of all hosts, though it's against best practices. I needed something to play with, so a higher number of a chunk-servers was preferable compared to throughput.

Configuration
-------------

Setup
"""""
The environment has been setup in Vagrant/Virtualbox on CentOS 7.1. The shared network between the nodes is 172.16.34.0/24.

Vagrant
'''''''
The machines used looked as follows:

.. code::

   config.vm.define "lizard01" do |ld1|
     ld1.vm.box = "centos71"
     ld1.vm.box_check_update = false
     ld1.vm.network "private_network", ip: "172.16.34.10"
     ld1.vm.provision :shell, path: "lizard/bootstrap.lizard01.example.com.sh"
     ld1.vm.provider "virtualbox" do |vb|
       vb.memory = "1024"
     end
   end

   config.vm.define "lizard02" do |ld2|
     ld2.vm.box = "centos71"
     ld2.vm.box_check_update = false
     ld2.vm.network "private_network", ip: "172.16.34.11"
     ld2.vm.provision :shell, path: "lizard/bootstrap.lizard02.example.com.sh"
     ld2.vm.provider "virtualbox" do |vb|
       vb.memory = "528"
     end
   end

   config.vm.define "lizard03" do |ld3|
     ld3.vm.box = "centos71"
     ld3.vm.box_check_update = false
     ld3.vm.network "private_network", ip: "172.16.34.12"
     ld3.vm.provision :shell, path: "lizard/bootstrap.lizard03.example.com.sh"
     ld3.vm.provider "virtualbox" do |vb|
       vb.memory = "528"
     end
   end

   config.vm.define "lizard04" do |ld4|
     ld4.vm.box = "centos71"
     ld4.vm.box_check_update = false
     ld4.vm.network "private_network", ip: "172.16.34.13"
     ld4.vm.provision :shell, path: "lizard/bootstrap.lizard04.example.com.sh"
     ld4.vm.provider "virtualbox" do |vb|
       vb.memory = "1024"
     end
   end

Node
''''
Each node has the host name set during the bootstrap process:

.. code::

   #!/bin/bash

   # Host specific stuff
   hostnamectl set-hostname lizard01

   # Include package installation
   source /vagrant/lizard/install.yum.sh

   # Install lizardfs master
   source /vagrant/lizard/install.lizardfs-master.sh

   # Install Webinterface
   source /vagrant/lizard/install.cgiserv.sh

   # Install Chunk-server
   source /vagrant/lizard/install.chunk.sh

The rest of the installation is sourced from other files during the bootstrap process in order to make the management easier. The example below show the bootstrap file for the master server (lizard01) only. For the other nodes other packages have been added or removed.

The bootstrap files are translated to a bash script from the `Quick Start Guide <https://github.com/lizardfs/lizardfs/wiki/Quick-Start-Guide>` and extended by OS specific settings like Package management, service configuration and firewall settings.

Master
''''''
The master gets the most installation packages

* Master
* Chunk-server
* CGI Web-interface

The installation of the lizardfs-master server:

.. code::

   # Install the lizard ..ssssssss
   #lizardfs-client lizardfs-adm
   yum install -y lizardfs-master lizardfs-adm

   # Create empty metadata file
   cp /var/lib/mfs/metadata.mfs.empty /var/lib/mfs/metadata.mfs

   # Create example configuration
   cp /etc/mfs/mfsexports.cfg.dist /etc/mfs/mfsexports.cfg
   cp /etc/mfs/mfsmaster.cfg.dist /etc/mfs/mfsmaster.cfg

   # All local host to access the test machine
   echo '172.16.34.0/24 / rw,alldirs,maproot=0' >> /etc/mfs/mfsexports.cfg

   # Enable lizard master
   echo 'LIZARDSFSMASTER_ENABLE=true' >> /etc/default/lizardfs-master

   # Start the service
   service lizardfs-master restart
   chkconfig lizardfs-master on

   # Configure firewall
   firewall-cmd --zone public --add-port 9419/tcp --permanent
   firewall-cmd --zone public --add-port 9420/tcp --permanent
   firewall-cmd --zone public --add-port 9421/tcp --permanent
   firewall-cmd --zone public --add-port 9422/tcp --permanent
   firewall-cmd --reload

Without the complete firewall configuration on all nodes I experienced problems copying the files to the destination. I suspect that at least one chunk-server did not have the correct firewall settings applied and therefore the copy-process failed at a certain point. By reviewing the firewall settings and fixing them, the time required to copy the files and the overall performance as well as error message in `/var/log/messages` on the LizardFS-master improved. 

CGI
'''
The CGI web-interface on the lizardfs-master is configured as follows:

.. code::

   # Install
   yum install -y lizardfs-cgiserv

   echo 'LIZARDFSCGISERV_ENABLE=true' > /etc/default/lizardfs-cgiserver

   # Start service
   service lizardfs-cgiserv start
   chkconfig lizardfs-cgiserv on

   # firewall
   firewall-cmd --zone public --add-port 9425/tcp --permanent
   firewall-cmd --zone public --add-port 9425/tcp

The web-interface then is available at http://172.16.34.10:9425.

Chunk-server
''''''''''''
Each Chunk-server is a file node for the master and shares the following setup:

.. code::

   yum install -y lizardfs-chunkserver

   cp /etc/mfs/mfschunkserver.cfg.dist /etc/mfs/mfschunkserver.cfg
   cp /etc/mfs/mfshdd.cfg.dist /etc/mfs/mfshdd.cfg

   # Setup master in host file
   echo '172.16.34.10 mfsmaster mfsmaster.example.com lizard01 lizard01.example.com' >> /etc/hosts
   echo '172.16.34.11 lizard02 lizard02.example.com' >> /etc/hosts
   echo '172.16.34.12 lizard03 lizard03.example.com' >> /etc/hosts
   echo '172.16.34.13 lizard04 lizard04.example.com' >> /etc/hosts

   # Configure mountpoint
   mkdir /data
   echo '/data' >> /etc/mfs/mfshdd.cfg
   chown -R mfs:mfs /data

   # Start chunkserver
   service lizardfs-chunkserver start
   chkconfig lizardfs-chunkserver on

   # Firewall config
   firewall-cmd --zone public --add-port 9422/tcp --permanent
   firewall-cmd --zone public --add-port 9420/tcp --permanent
   firewall-cmd --reload

Metalogger
''''''''''
The Metalogger is configured with a chunk-server on the fourth node only.

.. code::

   # Install package
   yum install -y lizardfs-metalogger

   # Set configuration
   cp /etc/mfs/mfsmetalogger.cfg.dist /etc/mfs/mfsmetalogger.cfg

   # Add hostentry
   echo '172.16.34.10 mfsmaster mfsmaster.example.com lizard01 lizard01.example.com' >> /etc/hosts

   # Enable logger as a service
   echo 'LIZARDFSMETALOGGER_ENABLE=true' > /etc/default/lizardfs-metalogger

   # Service management
   service lizardfs-metalogger start
   chkconfig lizardfs-metalogger on

Firewall
""""""""
I am missing an overview from LizardFS about which components need access to where and what. So far I configured the following which seems to work:

.. code::

   master:
     9419/tcp incoming metalogger/shadow master
     9420/tcp incoming chunkserver
     9421/tcp incoming client/mount
     9424/tcp incoming tapeserver
   chunkserver:
     9422/tcp incoming client/mount

So the communication looks like this:

.. code::

   extern                -> lizardfs-cgi       : 9425/tcp
   extern                -> lizardfs-master    : 9421/tcp
   lizardfs-chunkserver  -> lizardfs-master    : 9420/tcp
   lizardfs-metalogger   -> lizardfs-master    : 9419/tcp
   lizardfs-master       -> lizard-chunkserver : 9422/tcp

The result are the following port opening on the nodes:

.. code::

   node1: 94[19-22]/tcp
   node2: 9422/TCP
   node3: 9422/TCP
   node4: 9422/TCP

Labels
""""""
A chunk-server node can be given a label, e.g. based on the location or anything else. Those can be re-used within the goal configuration to assign files or directories to a certain location.

For testing purposes, each node got the label according their name:

.. code::

   lizard01 -> node1
   lizard02 -> node2
   lizard03 -> node3
   lizard04 -> node4

Those are re-used in the goal configuration. Changing a label requires the lizardfs-chunk-server to be reloaded.

Goals
"""""
Goals are like policies. They define where to put the data and how many copies to keep.
The configuration of those is a bit hidden, but understandable.

* There are 40 possible goals (policies), all defined on the lizardfs-master in `/etc/mfs/mfsgoals.cfg`. Less goals than the maximum can be defined.
* The structure of the definition is like this:

    id name : label

* id is a numeric value of the range 1 to 40.
* name is a custom string, like 'ssd', 'default' or anything else.
* label is a string used as label on at least one of the chunk-servers. Using the underscore as label '_', means any available chunk-server.

**Example**:

.. code::

   # Goal '3', a copy on three, random chunk-server
   3 3 : _ _ _

The default goal is '1', which keeps only a single copy of the file on the file-system on one chunk-server. This shows as *endangered files* in the CGI web-interface and can be resolved by changing the goal configuration or assigning a different goal to a file/directory.

For testing I used two goals:

.. code::

   2 special : node4 node3
   3 default : node1 node2 node3

* Goal *special*: some "special" files which shall be placed on *node4* and *node3* only.
* Goal *default*: Files shall be evenly distributed between *node1*, *node2* and *node3*.

The assignment of goals is happening from client which has already mounted the lizardfs-export into the file-system.  Then it's basically:

.. code:: bash

   $ lizardfs setgoal <goalname> /full/path/to/directory/or/files

**Example**: Mounted in `/mnt/lizardfs` the folder *special-files* and all containing files shall get the goal with the name *special*:

.. code:: bash

   $ lizardfs setgoal -r special /mnt/lizardfs/special-files

Disks
"""""
The configuration file `/etc/mfshdd.cfg` defines which folders should be used for storing the information on the local machine. Changes require the service `lizardfs-chunkserver` to be reloaded/restarted.

Concepts
--------
Trash
"""""
Removed files are being moved into a Trash location where they can be retrieved until a timeout period has passed. The trash location is not visible from the default mount point, but must be mounted as meta-data:

.. code:: bash

   $ mfsmount /mnt/lizardfs-meta -o mfsmeta

The sub-folder *trash* contains the deleted files. To restore them they only need to be moved into the sub-directory `/trash/undel`.

Files that had assigned the default goal *1* with one chunk-server assigned and therefore had the status *endangered* still were shown as *endangered* when they were deleted. After the timeout for the trash folder and the automatic clean up, those files disappeared from the CGI web-interface. This can be confusing. Manually cleaning those files away can make the status more clear. During this testing I set the timeout period down to 30 seconds, so I did not have to deal with this.


Issues
------
Documentation
"""""""""""""
The documentation of LizardFS seems to be a bit thin. The quick start guide on www.lizardfs.com is working nicely, but the pdf document with the more technical details lacks a bit of depth IMHO. I would be happy to contribute if required. Other than that a playground for testing out the commands and how LizardFS works should be considered in order to get used to the behavior and to test configuration changes.

Time-stamps
"""""""""""
In this test environment the time-stamp in `/var/log/messages` were referring to a different timezone than what the server were set to. All nodes were setup with the wrong timezone and have been adjusted afterwards, but the log files still were collected two hours in the past.
This got fixed after all machines had been restarted.

Switching manually to shadow master
-----------------------------------
Running a shadow master beside the production master is quite simple. The switch-over is trivial as well, but keeps some traps ready.

* As in the test setup described the host "mfsmaster"/"mfsmaster.example.com" is placed in */etc/hosts*. Taking that node out now requires updating that information all remaining hosts as well. A central DNS entry would make the management easier, but also put a DNS timeout window into the game. The same is required for all clients that mount the export from the lizardfs-master.
* The configuration on the shadow master must be identical to the production master. A different goal configuration will result in a re-balancing progress that can lower the redundancy of the chunks. Since the configurations are not synced automatically, some third-party tools (e.g. configuration management tools) can be put to good use here.
* Switching from the master to the shadow worked without a problem. Switching back - however - required a restart of the chunk-server and metalogger services on the connected nodes. All of them.