Choosing resource for large memory jobs?

We are in process of submitting a startup allocation grant and I would like to know which compute resource would be most suitable for our usage.

My work will involve:

  • Protein sequence property calculation: First, I'll calculate various sequence based properties in R for ~75k protein/mRNAs. Then, I'll calculate the pairwise correlation between all possible pairs of ~75k protein/mRNA. And, finally store them for use in developing prediction models. This step requires large amount of RAM because I need to store the entire ~75k * ~75k correlation matrix.
  • mRNA quantification using RNA-Seq: Process ~100 RNA-Seq datasets to quantify the expression of mRNA. Calculate pairwise correlation of all mRNA's for every RNA-Seq dataset. Store them for use in developing prediction models. RNA-Seq processing can be parallelized fairly easily and won't be very memory intensive. However, calculating and saving the correlation matrix requires a large amount of RAM.
  • Combine features: Each of the correlation score serves as a feature for my machine learning model and hence I join these scores to get a single file where each row is a protein/mRNA pair and the columns contains the correlation scores from sequence properties and RNA-Seq dataset.

Because the project involves steps which require multiple nodes (RNA-Seq processing) and large RAM (correlation calculation), I am wondering which resources would be best? I can then ask my PI to include those in our proposal.