The discussion forums in the XSEDE User Portal are for users to share experiences, questions, and comments with other users and XSEDE staff. Visitors are welcome to browse and search, but you must login to contribute to the forums. While XSEDE staff monitor the lists, XSEDE does not guarantee that questions will be answered. Please note that the forums are not a replacement for formal support or bug reporting procedures through the XSEDE Help Desk. You must be logged in to post to the user forums.

« Back to Allocations

Choosing resource for large memory jobs?

Combination View Flat View Tree View
Threads [ Previous | Next ]
toggle
Hi

We are in process of submitting a startup allocation grant and I would like to know which compute resource would be most suitable for our usage.

My work will involve:

  • Protein sequence property calculation: First, I'll calculate various sequence based properties in R for ~75k protein/mRNAs. Then, I'll calculate the pairwise correlation between all possible pairs of ~75k protein/mRNA. And, finally store them for use in developing prediction models. This step requires large amount of RAM because I need to store the entire ~75k * ~75k correlation matrix.
  • mRNA quantification using RNA-Seq: Process ~100 RNA-Seq datasets to quantify the expression of mRNA. Calculate pairwise correlation of all mRNA's for every RNA-Seq dataset. Store them for use in developing prediction models. RNA-Seq processing can be parallelized fairly easily and won't be very memory intensive. However, calculating and saving the correlation matrix requires a large amount of RAM.
  • Combine features: Each of the correlation score serves as a feature for my machine learning model and hence I join these scores to get a single file where each row is a protein/mRNA pair and the columns contains the correlation scores from sequence properties and RNA-Seq dataset.


Because the project involves steps which require multiple nodes (RNA-Seq processing) and large RAM (correlation calculation), I am wondering which resources would be best? I can then ask my PI to include those in our proposal.

Thanks