Cluster Usage Guidelines

From Vision Wiki
Jump to navigation Jump to search

Guidelines for General Cluster Etiquette

  1. Communicate -- our system is driven by a distributed peer-to-peer human queuing system. Communicate your needs to others and be receptive to other people's needs. Our group is small enough that open communication is an extremely effective strategy for resource allocation. Whenever necessary, e.g. during deadlines, make your needs know ahead of time. Greg has added a comments section at the bottom of the cluster usage page.
  2. Be flexible -- by being flexible, it's easiest to accommodate other users. Make your jobs interruptible/re-launchable. Whenever possible, set jobs up so you can run everything concentrated on a few machines or spread across many. 'Nice' your processes for longer term jobs. Minimize file IO and memory use.
  3. Job types -- long term, low priority jobs should be launched spread across many machines keeping load per machine relatively light. Such processes should be 'niced' so that as users need to run short term jobs the long term job will move into the background. On occasion cluster load is light, in this case running a relatively short but intense burst of jobs on the cluster is an efficient use of resources. Examples include 10 minutes of use of nearly all the processors, or overnight use of 80% of the processors. This assumes that no one else needs the cluster. The key to running such processes is making the instantaneously interruptible and being receptive to requests for freeing processors. Finally, on occasion users may need 1-2 machines for exclusive use (e.g. if shared memory parallelization is necessary). Again in all these cases open communication and flexibility are key.