I just found this bug in EOM after generating two distributions pools of the same protein in monomer (P1) and dimer (P2) forms. I noticed that the I(0) values in the P1 and P2 pools were considerably different, by a factor of ~14. I also noticed that there is significant variation in the I(0) values in each distribution, although it is proportional to Rg or Dmax and not the total MW as should be expected. I have calculated the average I(0) for each POOL below.
"eom, ATSAS 3.0.4 (r13469)"
Code: Select all
awk '{if($1~"Curve"){getline;II+=$1;ISI+=($1)**2;NN+=1}}END{A=II/NN;printf "P1 AvgI(0)=%.3e +/- %e (N=%d)\n",A,sqrt(ISI-2*A*II+NN*(A**2))/NN,NN}' junP1Ca05e2.eom
Code: Select all
awk '{if($1~"Curve"){getline;II+=$1;ISI+=($1)**2;NN+=1}}END{A=II/NN;printf "P2 AvgI(0)=%.3e +/- %e (N=%d)\n",A,sqrt(ISI-2*A*II+NN*(A**2))/NN,NN}' junP2Ca05e2.eom
So for these two distributions P2_I(0) / P1_I(0)= 14
Which is not the expected result based on MW P2/P1 = 2.
This must have a significant effect upon which EOM models P1 or P2 are selected and their fraction contribution to the selected models, in this case overweighting the apparent fraction of P1 in the distribution.
Do you recommend weighting the Fractions of the P2, P1 distributions by their relative I(0) values to get an accurate estimate of their actual occurrence in solution?
Best regards,
Mark
PS. Now that I have your attention, I will repeat my previous request for a 2-D distribution (Rg vs. Dmax) for Gajoe by showing my recent efforts to produce such plots. The 2-D distribution is limited by the PDBs listed in the Gajoe LogFile, which is for the best ensemble only, while it is easy to extract the 2D Rg-Dmax POOL distribution from the EOM Size_list.txt file.
Code: Select all
cat Size_listP1Ca05e2.txt |awk '{Ro=50.11-0.85;Rm=95.87;Do=177.48-2.76;Dm=326.70;NN=55; dR=(Rm-Ro)/NN;dD=(Dm-Do)/NN;printf "%d %d %8.5f\n",int(($2-Ro)/dR+0.5),int(($3-Do)/dD+0.5),100}' | awk '{NN=55;X=$1;Y=$2;RgDm[X][Y]+=$3;}END{for(i=0;i<=NN;i++){for(j=0;j<=NN;j++){printf "%8.5f ",RgDm[i][j];}printf "\n"}}' |tee RgDmax_P1Ca05e2.PDB-Pool.mat
Code: Select all
grep "e2.pdb " GA04[1-9]/curve_1/logFile_*_1.log GA050/curve_1/logFile_*_1.log |sed 's/~/ /g' |awk '{Ro=50.11-0.85;Rm=95.87;Do=177.48-2.76;Dm=326.70;NN=55; dR=(Rm-Ro)/NN;dD=(Dm-Do)/NN;printf "%d %d %8.5f\n",int(($4-Ro)/dR+0.5),int(($5-Do)/dD+0.5),$6}' |awk '{NN=55;X=$1;Y=$2;RgDm[X][Y]+=$3*100;}END{for(i=0;i<=NN;i++){for(j=0;j<=NN;j++){printf "%8.5f ",RgDm[i][j];}printf "\n"}}' |tee RgDmax_041-050.PDB.mat
