Libvirtd Unable to Connect when Using RBD Storage Pools

I ran across a problem recently where attempting to list virtual machines was taking ~45 minutes through virsh and virt-manager; it turns out that the problem was actually due to this patch in libvirt for using RBD fast-diff. In my case the ‘default’ storage pool is actually a link to my RBD storage pool. and that patch checks for the enabled feature but does not check the flags to see if the object-map and fast-diff are invalid

Good News Everyone!

There has been a recent patch that solves this. Unfortunately some distributions have not caught up with it yet (looking at you Ubuntu Bionic). Anyhow, this will hopefully make its way down the various streams that package libvirtd and the problem will be sorted.

Creating Ceph Bluestore OSDs with Spinning Drives and SSDs for DB/WAL

As a consultant I work with Ceph using a downstream version of the product; so once in awhile I like to catch up on new features and functions that have not yet hit the downstream/supported version of the product; that process has led me to setting up my homelab (again) and using Ceph Nautilus as a base for storage.

Using ceph-volume

Ceph comes with a deployment and inspection tool called ceph-volume. Much like the older ceph-deploy tool, ceph-volume will allow you to inspect, prepare, and activate object storage daemons (OSDs). The advantages of ceph-volume include support for LVM, dm-cache, and it no longer relies/interacts with udev rules.

For my use case I have installed a single Fusion IOMemory card unto each of my nodes in order to deploy OSDs with faster storage for the DB and WAL devices. It’s a very good idea to read the Bluestore configuration reference as that is default for new OSD deployments. Take careful note of the recommendations for the use of a DB and WAL device.

If there is only a small amount of fast storage available (e.g., less than a gigabyte), we recommend using it as a WAL device. If there is more, provisioning a DB device makes more sense. The BlueStore journal will always be placed on the fastest device available, so using a DB device will provide the same benefit that the WAL device would while also allowing additional metadata to be stored there (if it will fit).

Bluestore Configuration Reference

In my case, due to the access to the Fusion IOMemory card, I want to create enough partitions to support 11 OSDs and make them as large as possible for the DB device (which will put the WAL device on the same partition). My fast media is 931 GB of usable storage, if I split it evenly across all eleven OSDs I should end up with partitions ~84 GB in size. I like round numbers so those partitions are now 80 GB in size and the deployment command looks something like this.

root@ganymede:~# ceph-volume lvm prepare --bluestore --dmcrypt --data /dev/sdd --block.db /dev/fioa5

Be sure to replace the –data argument with the storage device and the –block.db argument needs to point to the partition on the fast storage you wish to use for the given OSD. After that I run the activation command for all OSDs on the node.

root@ganymede:~# ceph-volume lvm activate --all

Assuming everything has gone as expected the OSDs will start up and join the cluster and you’ll get all the speedy goodness of an SSD for the write ahead log and RocksDB.

Moving Drives From an Old Ceph Cluster to a New Ceph Cluster

Among the core functions of my homelab is a storage environment based on Ceph. For months I’ve been looking for, buying, and preparing new hardware and a server rack for an update to my lab. For the last week, I’ve been moving data from the old nodes to the new nodes. Today there was enough data moved to completely shutdown one old node and transfer the hard drives into the new machines. These are my notes of cleaning the drive partitions, preparing the flash device partitions, and adding the OSDs to the new cluster.

Wipe The Drives

I shutdown the old node and pulled the hardware, without removing any data from the old drives – just in case there was a need to restore something to the old cluster; luckily that was not the case and I moved forward with wiping the drives using the following commands.

root@titan:~# wipefs -a /dev/sdc
/dev/sdc: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31

Check For LVM Related Data

Some of my old drives were already using LVM and BlueStore, if you try to prepare an old drives that had any PV (physical volume) or LV (logical volume) data then the ceph-volume prepare command will fail with something similar to this:

root@europa:~# ceph-volume lvm prepare --bluestore --dmcrypt --data /dev/sdc --block.wal /dev/fioa3 --block.db /dev/fioa4
...
 stderr: Physical volume '/dev/sdc' is already in volume group 'ceph-eebc4ef5-712b-4924-b70c-1df6269fc9a4'
  Unable to add physical volume '/dev/sdc' to volume group 'ceph-eebc4ef5-712b-4924-b70c-1df6269fc9a4'
  /dev/sdc: physical volume not initialized.
...
-->  RuntimeError: command returned non-zero exit status: 5

Remove LVM Related Data

When you need to remove LVM data from the drive you’ll find the use of pvdisplay (to get the VG name) and vgremove are the easiest ways to solve the problem. Make sure you are looking at the correct device, I shortened the output below.

root@europa:~# pvdisplay
...
  PV Name               /dev/sdc
  VG Name               ceph-eebc4ef5-712b-4924-b70c-1df6269fc9a4
  PV Size               <7.28 TiB / not usable <1.34 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              1907721
  Free PE               0
  Allocated PE          1907721
  PV UUID               LIe071-C7gV-q1tq-iAAb-3V3p-ZA3i-3VEJZX

Then remove the PV and LV using the following and confirming that you want to remove the physical and logical volume.

root@europa:~# vgremove ceph-eebc4ef5-712b-4924-b70c-1df6269fc9a4
Do you really want to remove volume group "ceph-eebc4ef5-712b-4924-b70c-1df6269fc9a4" containing 1 logical volumes? [y/n]: y
Do you really want to remove and DISCARD active logical volume ceph-eebc4ef5-712b-4924-b70c-1df6269fc9a4/osd-block-404a4208-0d30-4b9a-a7a1-87a1898e924b? [y/n]: y
  Logical volume "osd-block-404a4208-0d30-4b9a-a7a1-87a1898e924b" successfully removed
  Volume group "ceph-eebc4ef5-712b-4924-b70c-1df6269fc9a4" successfully removed

Prepare the WAL and DB Devices

I was lucky enough to get my hands on some cheap IOFusion devices (these are EOL (End of Life) so using them in a production cluster would not be recommended. That warning aside, these drives are awesome and are sized just about right for my cluster. I used gdisk to prepare new partitions (1GB for the DB (metadata) portion of the device and 80GB for the WAL portion (roughly 10% of the storage device).

root@ganymede:~# gdisk /dev/fioa
GPT fdisk (gdisk) version 1.0.3

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): n
Partition number (3-128, default 3):
First sector (6-244140619, default = 20971776) or {+-}size{KMGTP}:
Last sector (20971776-244140619, default = 244140619) or {+-}size{KMGTP}: +1G
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300):
Changed type of partition to 'Linux filesystem'

Command (? for help): n
Partition number (4-128, default 4):
First sector (6-244140619, default = 21233920) or {+-}size{KMGTP}:
Last sector (21233920-244140619, default = 244140619) or {+-}size{KMGTP}: +80G
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300):
Changed type of partition to 'Linux filesystem'

Command (? for help): x

Expert command (? for help): c
Partition number (1-4): 3
Enter the partition's new unique GUID ('R' to randomize): R
New GUID is 302BDE02-F625-4B33-80F5-5EE0254AADB9

Expert command (? for help): c
Partition number (1-4): 4
Enter the partition's new unique GUID ('R' to randomize): R
New GUID is 2F4EF305-A7BA-42E0-B690-3D3CDCF28B29

Expert command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/fioa.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@ganymede:~# partprobe /dev/fioa

A quick note about the above. Notice that I dropped into expert (x) mode and set a random GUID (c, then R) on each of the new partitions. Be sure to run partprobe after you finish adding the new partitions and their new GUID.

Prepare and Activate the OSD

At this point all you should have to do is prepare the OSD.

root@ganymede:~# ceph-volume lvm prepare --bluestore --dmcrypt --data /dev/sdc --block.wal /dev/fioa3 --block.db /dev/fioa4
...
--> ceph-volume lvm prepare successful for: /dev/sdc

Then activate the OSD.

root@ganymede:~# ceph-volume lvm activate --all
...
--> ceph-volume lvm activate successful for osd ID: 4

Mounting CephFS From Multiple Clusters to a Single Machine using FUSE

For my new homelab cluster I’ve built up a fresh Ceph filesystem to store certain chunks of my data and found the need to mount both to one of my nodes. Normally I use ceph-fuse through /etc/fstab, so I simply modified with the following.

root@storage:~# grep fuse /etc/fstab
none	/mnt/storage/ceph	fuse.ceph	ceph.id=admin,ceph.conf=/etc/ceph/ceph.conf,_netdev,defaults  0 0
none	/mnt/storage/ceph-old	fuse.ceph	ceph.id=admin,ceph.conf=/etc/ceph-old/ceph.conf,_netdev,defaults  0 0

The /etc/ceph-old/ is a copy of my config files from the older cluster. In the /etc/ceph-old/ceph.conf file I added the following, since the keyring for the that cluster is not in the default path.

[client.admin]
keyring = /etc/ceph-old/ceph.client.admin.keyring

Anytime the ceph.conf from the old cluster is used so is the old keyring and the cluster mounts up just fine.

Filesystem     Type            Size  Used Avail Use% Mounted on
ceph-fuse      fuse.ceph-fuse  100T   91T  9.4T  91% /mnt/storage/ceph-old

FreeIPA Certificates Displays CertificateOperationError

Working with a fresh install of FreeIPA using the Ubuntu Bionic package is displaying an error on the ‘Certificates’ page which reads:

IPA Error 4301: CertificateOperationError
Certificate operation cannot be completed: Unable to communicate with CMS (Start tag expected, '<' not found, line 1, column 1)

After doing some research on the problem it seems to have already been resolved upstream, and in the Ubuntu Cosmic distribution, however the backport has not yet hit Ubuntu Bionic. I’ve been able to safely apply this commit to the dogtag.py file at /usr/lib/python2.7/dist-packages/ipapython, then restarted FreeIPA and all was well.

root@ipa:~# ipactl restart
Stopping pki-tomcatd Service
Restarting Directory Service
Restarting krb5kdc Service
Restarting kadmin Service
Restarting named Service
Restarting httpd Service
Restarting ipa-custodia Service
Restarting pki-tomcatd Service
Restarting ipa-otpd Service
Restarting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful

Ubuntu Bionic (actually cloud-init) Reverting Hostname on Reboot

If you’ve changed the hostname on an Ubuntu Bionic install, restarted the node, then found that the hostname has reverted you may be wondering why this has happened. The problem actually stems from the cloud-init scripts and the ‘preserve_hostname’ option.

root@ipa:~# grep -H -n preserve /etc/cloud/cloud.cfg
/etc/cloud/cloud.cfg:15:preserve_hostname: false

Go change the variable to true and the next time you change the hostname and reboot it will be left intact.

FreeIPA WebUI Login Fails with “Login failed due to an unknown reason.”

I’ve been working with setting up a fresh install of my homelab and have been trying to get FreeIPA to work on Ubuntu Bionic. If you happen to see the “Login failed due to an unknown reason.” error while trying to login through the web UI, try adding execute permissions for all users to the “/var/lib/krb5kdc/” directory.

root@ipa:~# chmod a+x /var/lib/krb5kdc

Try to login after that and, if the problem was the same as my own, you’ll find it working now.

Libvirtd: Using RBD for a CDROM Device

Mostly a note to self but others may find this snippet useful.
I use the following to install from a disk image stored in RBD. Make sure to fill in your own client username, secret UUID, and monitor addresses.

    <disk type='network' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <auth username='libvirt'>
        <secret type='ceph' uuid='a487c228-159b-4197-8be9-8e0e0d2b8bd4'/>
      </auth>
      <source protocol='rbd' name='rados-pool/diskimage.iso'>
        <host name='monitor-1.address' port='6789'/>
        <host name='monitor-2.address' port='6789'/>
        <host name='monitor-3.address' port='6789'/>
      </source>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

Unable to Access WordPress Dashboard after Upgrades

I use DreamHost’s DreamPress product for this website, as part of that product there are caching plugins installed. Normally this would be perfectly fine but after an upgrade that caching was preventing me from accessing the dashboard. This post explains how I got past it.

Step 1:
Make sure you are logged in and can see the admin bar at the top of your site. From that bar purge the page and database cache.

Step 2:
From the DreamPress dashboard go identify your SSH credentials and use them to login using SSH.

Step 3:

Once you are logged in using SSH use ‘cd yourdomain.tld’ to access your website directory and execute ‘wp cache flush’ to clear out any remaining issues.

That’s it, go ahead and try to access your dashboard again.

Use GitLab Personal Access Token as Password

I’m finally writing some more code so I’ve started to make use of my gitlab instance, one of the first things I turned on was 2FA but that also means that checking out through https:// could no longer authenticate. The solution is to visit your profile page, click on the “Access Tokens” tab, and generate a token which can be used as a password.