The discussion forums in the XSEDE User Portal are for users to share experiences, questions, and comments with other users and XSEDE staff. Visitors are welcome to browse and search, but you must login to contribute to the forums. While XSEDE staff monitor the lists, XSEDE does not guarantee that questions will be answered. Please note that the forums are not a replacement for formal support or bug reporting procedures through the XSEDE Help Desk. You must be logged in to post to the user forums.

« Back to General Discussion

Jobs running abnormally slow on Comet

Combination View Flat View Tree View
Threads [ Previous | Next ]
toggle
Jobs running abnormally slow on Comet
Answer
6/25/17 6:40 PM
Recently, some jobs on Comet seem to be running a factor of 4 slower than usual (that is, they run slow from the start and throughout the entire calculation). I have cancelled such jobs and and started them over, and then they run at the expected speed. This has only occurred since I started running on 32 nodes (768 cores). What could cause the exact same code to behave like this?

I am going to start to keep track of which nodes are being used... Could it be a problem with one faulty slow node? My code runs at the pace of the slowest node.

Any advice on how I might diagnose this?

For the record, this code has run successfully on Stampede (TACC) and Pleiades (NASA) and not had this problem.