The discussion forums in the XSEDE User Portal are for users to share experiences, questions, and comments with other users and XSEDE staff. Visitors are welcome to browse and search, but you must login to contribute to the forums. While XSEDE staff monitor the lists, XSEDE does not guarantee that questions will be answered. Please note that the forums are not a replacement for formal support or bug reporting procedures through the XSEDE Help Desk. You must be logged in to post to the user forums.

« Back

Bridges: Processing large number of large files

Combination View Flat View Tree View
Threads [ Previous | Next ]
Hey guys,

Do you have suggestions on best way to submitting batch job on Bridges for a project with large number of large file?

I have a projects where I need to process around 100,000 csv.gz. An average gz file is 500 MB. I am currently processing with grep | sed | awk. So approximately for each file the processing time is 0.5 hour on a RM-shared core.

My thinking on this is that I submit 100,000 RM-shared batch where each batch process one file. But my concern is that my later jobs will have lower priority since I have already submitted/completed many jobs.

I cannot find such mechanism in PSC documentation but I think it's a common thing for supercomputers. Can anyone help me verify that? Also any suggestions are welcome.