Articles‎ > ‎

Xsan, ACLs and the AD/OD Magic Triangle: A Success Story

posted Dec 19, 2008, 5:41 AM by Philip Rinehart   [ updated Dec 19, 2008, 5:41 AM by Greg Neagle ]

Recently at Swarthmore College we set up a new Xsan and workstations to support a small, four station video editing lab. A major challenge was to maintain as much control and keep the machines as clean as possible. We instituted both Active Directory binding using Directory Services, so that users could log in with their network credentials which we could also use for access control on the server space, and also bound the machines to Open Directory to manage machine specific settings and policies, such as access and software updates. We've been very pleased with the success of this "Magic Triangle" system (see below) and we've detailed some of the critical pieces of the process of combining that with the Xsan implementation and some things to watch out for in this article.

 

We spent a lot of time planning and carefully making sure we had things set up correctly with the right equipment for each element of the system before proceding with the next piece. I've listed some of the sources used during the planning phase in the References section at the end of this article. Make sure you do your homework and fully consider all aspects of your deployment before you start purchasing and you'll have more success than if you attempt to do things in an ad hoc way.

 

Basics: The Network

For those who are more used to some Linux server clustering setups, Apple's Xserve RAID/Xsan architecture is different. I had to explain this difference a number of times to our systems folks, who initially were convinced we could use a zoned set of spare ports on existing Fibre-Channel switches in our infrastructure. Unfortunately, although the total number of ports available would have been enough, the architecture would not have worked. This was because the Apple storage arrays are not connected through a server "head unit" as are some Linux systems, but rather though the same Fibre-Channel fabric switch to both the servers and clients. Everything in the Xsan system must be on the same switch or linked set of switches, including the Xserve RAID boxes (See Figure 1). There are actually three separate networks involved in an Xsan deployment, only two of which are absolutely necessary. These include the Fibre-Channel network, for data transmission and the private ethernet network for metadata. The third network is to the broader internet. Note that the Fibre-Channel Switch and Xserve RAID arrays are not connected to the private ethernet network, but are connected to the broader internet. This allows us to manage each of these devices remotely. The Fibre-channel fabric switch comes with remote management software and Apple provides a RAID Admin tool for managing the Xserve RAID arrays.

xsanconnections


Figure 1: Xsan Network Connections

Our setup includes the following equipment:

  • Two Xserve G5 cluster nodes, two Xserve RAID arrays – fully populated
  • Four MacPro workstations with Apple 30” monitors,
  • 4 JVC Mini-DV/VHS dual VCR decks,
  • Lots of headphone sets, so that multiple users could share a single workstation for team content development without disturbing other nearby users.
  • LogicKey Final Cut Pro keyboards
  • Triplite Isobar Surge Supressors
  • Qlogic Fibre-Channel Fabric Switch (16 port model 5202 for the dual power supply).
  • We made use of existing network switches, after making sure we had at least two Cat6 ethernet lines to each workstation (one for general network access, and one for the private Xsan network).
  • All workstations and servers have the 2 Gb Apple Fibre-Channel card installed with two fiber pairs to each workstation.
  • We had to run about 50 feet of fiber optic cable between the video editing space and our data center, which was a separate contract, professionally installed and with fiber jacks at each workstation location as well as coherently terminated in a patch panel in the data center. We actually ran some extras, in case of a need for expansion or cable failure. (Due to changes in server room structure, we ended up with the patch panel in the wrong rack and needed to patch the fiber another 30 feet to the rack holding the Apple equipment and Fibre-channel switch, however that turned out to be no problem, due to good installation quality and labeling.)
  • We were meticulous about labeling every machine, cable and jack and that was used as we completed the deployment and made adjustments. We even used color coded network cables to make sure to keep things clear.
  • Apple Remote Desktop 3 is critical to this installation, using the cluster nodes, which don't have video cards (so as not to slow up the fibre-channel speed on the bus). If you will be using one of the new Xserve Intel servers as your Meta-Data controller, then this restriction no longer applies, since the new machines come with on-board video and the PCI bus to which the Fibre-channel card is attached is on a separate riser card. ARD is also tremendously helpful in managing the lab machines, pushing software updates, etc.
  • Two existing large multi-server Uninterruptable Power Supplies. Each piece of equipment in the server room that has two power supplies is connected to both UPS's. The Xserve Cluster Nodes only have a single power supply, and therefore they each serve as failover for the other, and are connected to different UPS lines.

System Architecture

One of the Xserve Cluster nodes serves as the primary Meta-data Controller (MDC) and the other serves as the backup. Although It's not ideal (according to Apple), we use the backup MDC as the Open Directory (OD) Master, and the primary MDC as the OD backup. Since these machines only have a single power supply, this allows us some redundency in case of power failure to one of our primary Uninterruptable Power Supplies (UPS) for the racks. Every other piece of the server equipment has dual power supplies, allowing us to connect them to both of our big UPS systems. (ASIDE: While you might think, why worry, that's what a UPS is for, we've had one or the other fail for configuration errors, wiring errors, and because we're in a old building, someone recently tripped a switch that shut off server room power at about 4am Sunday morning. It was a switch we didn't even know about! That included power to the DHCP and mail servers, so the notification system went off line with it - a situation that has now been corrected, of course.)

Each computer in the system is on both our public ethernet network and on a private ethernet network (on a separate zone (vlan) on a main switch) for the Xsan system. The client workstations are attached via two pairs of fiber-optic cables (fibre-channel) to the Qlogic fibre-channel fabric switch. In the server room, the Xserve RAID boxes have a single copper fibre-channel connection for each controller (two per Xserve RAID) to the Qlogic switch and the MDC units have single copper fibre-channel connections to the Qlogic switch. We can use the copper, since distance is less than 3 meters between these components.) We do have to keep reminding users to leave the ethernet connections alone at the workstations, since they tend to try to connect them to their personal laptops when we're not looking, while they're working on the video editing equipment.

Installation

The setup of the Xsan was quite smooth and orderly, primarily because we took our time and did everything in a methodical way - by the book(s) if you will (see below). First we set up the Networking and fibre-channel switch, then we brought up the Xserve RAID boxes, then we set up the servers, and set up the Xsan licenses. (This is also the power up and boot order you will need to use for the system when you need downtime too.)

It should be noted that the Xserve RAID arrays are quite heavy. Initially we had intended them for the upper half of the server rack in which all of this equipment lives, however for stability and temperature control we decided to locate them nearer the bottom of the rack.

Another issue might be that it's recommended that you not use the same controller for both the Metadata store and file store within the Xserve RAID structure. This limits the space you have available, since you only need two out of seven drives in that bay for the Metadata RAID 1 set. We actually also allocate a hot spare for that group and then have the remainder of the drives pulled to serve as backups in case of drive failure in another bay. Our other bays are set up as RAID 5 sets with a hot spare for each bay. This means that despite having 500 Gb hard drives in two fully populated Xserve RAID arrays, we get only about 7 Tb of storage space for this Xsan. If we were truly optimizing for throughput, we'd also have an even number of storage controllers, aside from the metadata storage, which would entail another Xserve RAID array, half full, to add the fourth video data controller to the set.

Finally, when all that was in place, we set up and connected the workstations. Each individual machine takes an Xsan license code and must be connected to the SAN. Before moving any further we did some test captures of video content in both standard definition and high definition directly to the Xsan storage space with no problems whatsoever, even when we loaded the system with multiple simultaneous captures. (I would note here that Apple recommends, and our practice is to advice users to capture to the local hard drives and then copy their content at the end of their sessions to the Xsan. This also allows us to have the write caches enabled on the Xserve RAID boxes with battery backup in case of power issues (To clarify - Apple recommends having the write caches enabled for the Metadata store, but not for the Video storage). Having the write caching enabled can slow the data transfer, and is therefore not ideal if you intend to capture directly to the storage array, although we noticed no problems whatsoever in this regard.)

Once that was all in place we then bound the workstations to the OD master with the LDAP Plugin and made sure the Directory Services lookup order was AD first, then LDAP. In OD, we set up a group for access to the workstations and added each workstation to that group. We also set up an access group for Users and added the AD groups that were relevant to that OD group. In AD then, we managed the group memberships, to allow Help Desk staff with the right privileges to add or remove access based on the rules of the facility. The only interaction users have with the OD is when their login fails for lack of access.

Managing The System

There is a process for new users of the equipment that includes adding them to the AD group with access first, then setting up their space on the Xsan and changing access permissions to give them full rights to their own folder.

Adding the users to the list can happen outside the Mac environment, since we have the AD integration. I manually set the access to the shared space on the SAN logging into the OD master with the directory adminstrator account in Workgroup Manager. I then also set up quotas for each user and for user groups, as needed, using the Xsan Admin application, which we can also use to monitor quota balances.

We setup group space as well, when that's appropriate for a particular project, with group members added to the access list in Workgroup Manager. The permissions work very well to restrict or allow access to users based on their roles in the projects. This permits an outstanding level of user flexibility between individual workstations, which was one of our primary goals for the project. We used to be forced to assign individuals to particular workstations and this has freed us from that bottleneck, as well as adding significant storage resources for video work.

Issues Along the Way

One of the goals of the AD/OD magic triangle was to control access to the high end workstations to only those who have permission to use them. We had some issues with individuals who had physical access, but didn't have permission to use the machines, who were not respecting the rules of the space by using these workstations at off hours. Initially, we observed that clients would allow anyone to log in no matter what groups they were in so long as they had valid AD credentials. Obviously, the MCX (Managed Client Settings in Open Directory) were not taking, but the AD bind was working. We would get various errors in the Client system logs and on the Server logs for OD that suggested the client was either not really bound to the OD or was having trouble connecting.

We were able to solve these issues with some investigation and the help of several of the lists out there for Mac systems administrators. Most notibly, theApple Client Management list, and Mac OS X Server list along with the MacEnterprise list (Subscribe) were most helpful.

The critical steps to our success included the following:

  1. Make sure the systems, software and Xsan version were all up to date.
     
  2. Remove all previous directory bind information by blowing away (rm -r) on the client systems:
    	/Library/Managed Preferences	
    	/Library/Preferences/DirectoryService	
    	/Library/Preferences/DirectoryService/*.plist
  3. Remove all cached mcx information on the client systems by issuing following commands:
    	
    rm /Library/Caches/com.apple.LaunchServices*.csstore
    /System/Library/CoreServices/mcxd.app/Contents/Resources/MCXCacher -d
  4. Based on a note in the Client Management list discussion at Apple (Dan Ball 9/9-9/29/2006, "Re: MCX not taking on first boot..." reporting a suggestion by Scott Barber of Apple), setup a startup delay to allow network to successfully load before attempting to connect to directory servers:
    	
    touch /etc/rc.local
    chown root:wheel /etc/rc.local
    chmod 644 /etc/rc.local
    Edit the new file called rc.local in /etc. Then within that file put:
    	
    #!/bin/bash
    #
    /bin/sleep 10
    exit 0

    Possible Alternative:

    		defaults write /Library/Preferences/com.apple.loginwindow StartupDelay -int 

     (Note: Apple has indicated the the rc.local mechanism may no longer function under Mac OS 10.5 “Leopard” but some have noted that the loginwindow StartupDelay mechanism doesn't work for them either. Others have suggested re-ordering the AD and OD in the Directory services setting, however we have users log in with their AD credentials and therefore wanted to make sure that service was checked first, per the AFP548 article.) 

  5. Bind clients to AD with a stable account that has privileges to bind machines. Make sure you specify the groups you need to have admin access to your workstations in the "Allow Management By" field.
     
  6. Deselect other binding methods in Directory Access.app, except for AD, LDAPv3 and SMB, to avoid confusion.
    nbsp;
  7. Bind to your OD controller with your Directory Admin credentials, making sure that you have selected SSL (Note: quite a few folks argue that one should not use an authenticated bind to the Open Directory and have reported trouble with the authenticated bind. Our experience bears this out, unless the startup delay is sufficient to allow all of the network services to fully come up and stabilize. This may therefore be a reasonable way to shorten the restart times for your users.)

    1. Uncheck “Add DHCP-supplied LDAP servers to automatic search policies”. Click "New…" and then enter the server IP address. In my case, since the backup MDC was also serving as the OD Master, I’ve selected the private network address for the Backup MDC rather than use the public DNS name or IP address.

    2. Make sure to check the Encript using SSL box and deselect the "Use for Authentication" and "Use for Contacts" boxes.
       
  8. Restart clients twice (Note: full reboots now can take quite a long time).

Once we went through these steps, and the machines were back up after two restarts, they were functioning perfectly. Changes in the AD group membership were reflected nearly instantaneously in login attempts at the workstation. All users have been able to access their data on the Xsan without any problems, and supervisors with the correct permissions are able to review (but not change) the files of those for whom they are responsible. We've occasionally had trouble with individual users not successfully launching Final Cut Pro, but those issues have been immediately resolved by deleting the FCP preferences folders from the user's local Preferences folder.

J. Douglas Willen (3/23/2007)

 


References

See this link for more information about the MCXCacher application from Nigel Kersten.

The basics of the “Magic Triangle” can be found in their clearest form in a paper on the AFP548.com web site: http://www.afp548.com/filemgmt_data/files/AD-OD-2.1.pdf (Thanks guys!) There are quite a few articles at the AFP548.com site regarding Active Directory and Open Directory, including other ways to restrict logins to specific individuals with AD if that’s your only need. I strongly recommend reviewing the content of the AD/OD paper before attempting to replicate this installation.

I’ve also had very good luck with Apple’s Xsan documentation Xsan 1.4 Administrator’s GuideXsan Tuning Guide and specifically the Peachpit Press Xsan book Apple Pro Training Series: Xsan Quick-Reference Guide, 2nd Edition (2006, Green and Geller) and Final Cut Pro book Apple Pro Training Series: Optimizing Your Final Cut Pro System (2006, Cullen, Geller, Roberts and Wilt) book from Peach Pit Press, that are parts of the Apple Training Series (see “Lesson 15”, pp 619-706 in the FCP book). You should start, of course, with the "Xsan Quick-Reference Guide, Second Edition" by Green and Geller for the basics of Xsan architecture. I'd also recommend getting an Apple Developer Connection membership and if possible getting access to the Xsan presentation from the 2006 Apple World Wide Developer Conference if you can.

Last Updated ( Tuesday, 05 June 2007 )
Comments