Computing Resources

In order to name the cluster a competition took place among the DCL group members of that time. Adriana Noo, former DCL member and ECE Network Administrator, earned first place with the name "Virgo", as its homonymous it's a celestial super cluster of galaxies.

During the 2003-2004 academic year, a 40-CPU Beowulf computer cluster was designed and constructed. The cluster forms the nucleus of the DCL and consists of a front-end and 20 compute nodes, each with two CPUs and interconnected by three 24-port gigabit Ethernet communications switches. The front-end node [Beo] schedules work for the compute nodes and communicates with the outside world, while the compute nodes perform the desired computations. The nodes and switches form a private network of computers, which communicate only with each other, and is externally accessible only through the front-end node. Thus, the architecture is distributed rather than parallel.


In 2008 the cluster was upgraded and expanded from 20 to 23 nodes, 32-bit Pentium 4 to 64-bit Xeon Quad-core CPUs, and from 4 GB SDRAM to 8 GB DDR2.

In 2012, 12 nodes were upgraded with Tesla C2070 GPUs and additional RAM.


The current cluster configuration consists of the following :

Front End Node

The front-end node architecture:

  • Two 2.66 GHz Intel Quad Core XEON E5430 CPUs
  • Tyan S5396 motherboard with two Gigabit Ethernet interfaces
  • 8 GB memory
  • Quadro FX 1700 Video Adapter
  • 250 GB disk.
  • 2 1 TB disks configured as a RAID 1 mirror
  • DVD drive
  • 3U rack mount case and Antec TP 850 W power supply
  • Standard Nodes

    9 rack-mounted computers (c0-2, c0-12, c0-13, c0-14, c0-15, c0-16, c0-17, c0-18, c0-19) each with:

    • Two 2.66 GHz Intel Quad Core XEON E5430 CPUs
    • Tyan S5396 motherboard with two Gigabit Ethernet interfaces
    • 8 GB memory
    • Nvidia GeForce 7300 SE Video Adapters
    • 250 GB disk
    • DVD drive
    • 3U rack mount case and Antec EPS 550 W power supply

    High Memory Nodes

    2 rack-mounted computers (c0-21, c0-22) each with:

    • Two 2.66 GHz Intel Quad Core XEON E5430 CPUs
    • Tyan S5397 motherboard with integrated video and two Gigabit Ethernet interfaces
    • 64 GB memory
    • 250 GB disk
    • DVD drive
    • 3U rack mount case and Antec EPS 550 W power supply

    GPGPU Nodes

    12 rack-mounted computers (c0-0, c0-1, c0-03, c0-4, c0-5, c0-6, c0-7, c0-8, c0-9, c0-10, c0-11, c0-20) each with:

    • Two 2.66 GHz Intel Quad Core XEON E5430 CPUs
    • Tyan S5396 motherboard with two Gigabit Ethernet interfaces
    • From 16 to 32 GB memory
    • Tesla C2070 GPU
    • 250 GB disk
    • DVD drive
    • 4U rack mount case and Antec TPQ 1200 W power supply




    The internal cluster network is controlled by:

    The system software consists of: Additional peripherals and equipment include:



    Design Decisions

    The processor chosen for the system was the 64-bit, 2.66 GHz, Intel Xeon with hyper-threading, 1333 MHz front side bus and 12 MByte cache. This CPU has a well-documented performance record in clusters, and an excellent price/performance ratio. Each cost approximately $500. Hyper-threading allows each CPU core to execute two threads in parallel and the 1333 MHz front side bus provides fast memory access. Faster CPUs were available, but at a significantly higher cost.

    The cluster employs CentOS (RHEL) as the OS with ROCKS software to manage and monitor the system. ROCKS is open source software distributed by the National Partnership for Advanced Computational Infrastructure (NPACI). ROCKS provides the mechanism to create a well-defined, secure, patched, system software for installation on all nodes. Fast installation and reinstallation of the software becomes the "basic management tool" when problems arise, rather than troubleshooting and software modifications. This greatly reduces problems due to system configuration problems (all compute nodes have the same software) or due to insecure, unpatched, versions. Monitoring is provided through "ganglia" which takes snapshots of node operation (CPU and memory usage, I/O, node up/down, etc.) approximately every five minutes or at the user's discretion. ROCKS has already been found to be very useful software for management of the cluster.

    Particular attention was paid to heat generation. This was in response to considerable traffic on cluster web sites about this issue. Many manufacturers market blade servers in 1U and 2U configurations. They require nonstandard part configurations and may, for example, not support expansion cards. In addition, CPUs at higher clock speeds may not be allowed because of the constricted space to remove heat. For this reason, Antec 3U enclosures were used for our basic nodes. Chenbro 4U enclosures were chosen for the nodes containing the Tesla GPUs to allow for a much larger power supply and larger fans. Additionally, the lab which houses the cluster is cooled by a dedicated 36,000 BTU/Hr air conditioner.

    References

    1. [Beo] http://www.beowulf.org
    2. [Red] http://www.redhat.com
    3. [Roc] http://rocks.npaci.edu/Rocks/
    4. [Top] http://www.top500.org