for the geeks who may be reading:
Does anybody know anything about grid computing? Our lab has been limping along for years with pmake, but it's really not scaling well. We're looking into moving to another system of grid computing to make it easier to work with -- ideally a CPU-scavenging architecture.
Our current favorite (well, my current favorite) seems to be Condor, with the Sun Grid Engine being a second runner-up. PBS (no, not PBS) has also been suggested, but I'm not at all convinced about its support -- it seems to have moved to closed-source.
My question for any of you: have you used any of these? Was it difficult? Would you recommend it? We have an unwieldy cluster of some 200 nodes (with ~300 CPUs among them) and a small number of master fileservers that share a common NFS space. It would be neat to include some of the features of the Condor supersystem, but that's not really critical. What is critical is that we need to move our lab to a system that is supported by somebody outside our lab: we're a speech lab, not a parallel computing lab. We don't have the time or expertise to build clever parallel computing architectures. We'd love to leave it to the experts -- and to be able to file a bug that other people will get their degrees by fixing.
Any advice? Quite honestly, I'm not really expecting any responses, but who knows who's paying attention out there? Who's doing high-throughput, parallel computing on many nodes?
evan?
xaosenkosmos? (
evan, don't say "google filesystem", unless they want to share with us! )