DRBD8


From Docunext Technology Wiki

Jump to: navigation, search

Contents

Summary

DRBD = Distributed Redundant Block Device

  • Distributed = network based (in this context)
  • Redundant = more than one replicated element, which can continue to function if other replicated elements fail
  • Block device = storage media from an operating system perspective

DRBD 8 has the ability to setup dual primary configurations which is pretty sweet, but it requires a cluster filesystem like GFS or OCFS2. I tried them both and was able to get OCFS2 working - even on a QEMU virtual machine!

Now I'm going to try again with real machines, though they will be minimal. I'm using D201GLY machines with 512MB of RAM and a 30GB IDE laptop hard drive, so far in my experience the hard drive has been pretty slow.

hdparm  -T /dev/hda

/dev/hda:
 Timing cached reads:   1074 MB in  2.00 seconds = 536.98 MB/sec

Well I guess that's not too bad.

Installing DRBD8 on Debian Etch

apt-get install module-assistant
module-assistant

You'll see something like this:

────────────┤ module-assistant, interactive mode ├─────────────┐                                   
│ Welcome to the dialog frontend of module-assistant. This      │                                   
│ user                                                          │                                   
│ interface provides access to the few commands of this         │                                   
│ program.                                                      │                                   
│                                                               │                                   
│  If you wish to learn more, choose the OVERVIEW option.       │                                   
│                                                               │                                   
│  You should better run UPDATE once before you proceed. If     │                                   
│ you wish                                                      │                                   
│ to look for existing module packages for your needs or wish   │                                   
│                                                               │                                   
│     OVERVIEW Show all possible command line commands          │                                   
│     UPDATE   Update the cached package information            │                                   
│     PREPARE  Configure the system to compile modules          │                                   
│     SELECT   Select the module/source packages to work on     │                                   
│     EXIT     Exit the program                                 │                                   
│                                                               │                                   
│                                                               │                                   
│                <Ok>                    <Cancel>   

Then choose update, and prepare, it will install the packages you need. Confirm when apt-get asks you if you want to do so.

To use drbd8 on etch stable, you have to use backports. Fortunately, module-assistant can handle the fact that all backports are disabled by default, so just follow the directions at backports.org to get going:

http://www.backports.org/dokuwiki/doku.php?id=instructions

Then you can go back to module-assistant and choose select, drbd8-module. Then get, build, and install (it will actually prompt you to ask whether you want to go ahead and install once its done building it)! Once its done, you can exit and then "modprobe drbd".

You also have to configure drbd with /etc/drbd.conf

http://www.drbd.org/fileadmin/drbd/doc/8.0.2/en/drbd.conf.html

after that, run

drbdadm create-md r0

then /etc/init.d/drbd start

drbd primary all on both machines.

cat /proc/drbd to see whats up.

Installing OCFS2

This is easy:

apt-get install ocfs2-tools

Then you need to create /etc/ocfs2/cluster.conf, here's a good reference:

http://xenamo.sourceforge.net/ar01s03.html#id2477161

Then you can start the cluster, the init script has more options than most, try:

/etc/init.d/o2cb

and you'll see what I mean.

I use

/etc/init.d/o2cb load

then:

/etc/init.d/o2cb start

I ran into two issues this time:

o2cb_ctl: Unable to load cluster configuration file - check the file syntax, its picky about indentation, but in my case I had a - instead of a _.

o2cb_ctl: Cluster "ocfs2" does not exist -= I didn't pass my cluster name to /etc/init.d/start "myclustername"

User mkfs.ocfs2 to create the filesystem you plan to mount, then mount it!

ocfs2_hb_ctl: I/O error on channel while starting heartbeat
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"

http://smue.org/us/archives/265-ocfs2-1.2.5-+-drbd-8.0.3-3.html

I didn't patch anything as the above link mentions, but I did have to add unstable to my sources list because ocfs-1.2.5 wasn't in the backports. Works now! :-)

Comments

I like the idea, but I need to figure out the configuration options for two primaries better. I can't seem to get it to be as automatic in its recovery as I'd hoped, even when specifying what it should do after a split brain. If I restart one of the machines, it seems to reconnect OK, but if I disconnect the network, both machines are primary, with the other side as unknown. Then I cannot get them to reconnect no matter what I do. :-(

With two primaries, you have three possibilities after a split brain:

  • 0 primaries
  • 1 primary
  • two primaries

DRBD takes care of this but in the event that there are 0 primaries, the choices are too mutually exclusive in my opinion, but the one that looks good to me is discard-zero-changes:

In case one node did not write anything since the split brain became evident, sync from the node that wrote something to the node that did not write anything. In case none wrote anything this policy uses a random decision to perform a "resync" of 0 blocks. In case both have written something this policy disconnects the nodes.

It would be terrific if you could use this, but then default to something else instead of disconnecting.

After wrestling with drbd8 for awhile, I decided to update to 8.0.4-1. Maybe that will help. This is still happening though:

drbd0: Writing meta data super block now.
drbd0: conn( StandAlone -> Unconnected ) 
drbd0: receiver (re)started
drbd0: conn( Unconnected -> WFConnection ) 
eth1: no IPv6 routers present

The receiver is restarted, but then it disconnects. Grr. I was also having the dickens of a time getting the device to be primary automatically. That had to be set with drbdsetup. To setup auto primary (or default primary) on a drbd device do the following:

drbdsetup /dev/drbd0 primary --set-defaults

Actually no that didn't work at all. :-(

I ended up using a ha.d script which is included in the drbd package and included it in the drbd init script, as well as mount -a, and added the according piece to /etc/fstab.

I think this might have something to do with it:

always-asbp

And in general, my gut is telling me that the recovery of a displaced system is best handled by custom scripts. Also once thing that surprised me about drbd is that there is a difference between which node you issue a connect command on.

See Also

External Links

Personal tools