GAJOE no PDB output in file / CaCa_dist file doesnt look right

Linear (OLIGOMER), and non-linear (MIXTURE) analysis, singular value decomposition (SVDPLOT), addition of missing fragments (BUNCH, CORAL), analysis of flexible systems (EOM/RANCH & GAJOE), flexible refinement of high-resolution models (SREFLEX)
Post Reply
Message
Author
nmeri
Member
Posts: 6
Joined: 2018.04.06 01:10

GAJOE no PDB output in file / CaCa_dist file doesnt look right

#1 Post by nmeri » 2018.07.14 02:06

Hi,
I am running GAJOE on a pool of 10k pdb files I generated with TraDES. The program seems to run smoothly, I get no errors, however output seems to be strange.

First, here is the log file I get

Code: Select all

[size=85]Experimental data file name ............................ : 280_h_A4_1.mccd.dat
 -------------------------------------------
 -- Pool(s)
 Pool of PDBs, directory ................................ : TraDES_pool/
 Intensities file name .................................. : junXXX.eom
 -------------------------------------------
 -- Standard Mode
 Maximum number of curves per ensemble (min. 1, max. 50)  : 20
 Minimum number of curves per ensemble (min. 1) ......... : 5
 Constant subtraction allowed? .......................... : Y
 Curve repetition in the ensemble allowed? .............. : Y
 Number of times you want the genetic algorithm repeated (min. 1): 100
 -------------------------------------------
 -- Number of theoretical curves :        10000
 -------------------------------------------
 Start:  Fri Jul 13 13:53:30 2018
 End:    Fri Jul 13 15:27:42 2018
 -------------------------------------------
 -- Chi^2 :  1.721
 -------------------------------------------
 -- Rflex (random) / Rsigma: ~ 82.93% (~ 83.17%) / 1.55
 -------------------------------------------
 -- Constant subtracted :  0.787
 -------------------------------------------
 -- Files created:
 Fit to the experimental data (in 1/angstrom) ........... : profiles_001_1.fit
 Radius of gyration distribution (in angstrom) .......... : Rg_distr_001_1.txt
 Max Dimensions distribution (in angstrom) .............. : Size_distr_001_1.txt
 Ca(N)-Ca(C) distances distribution (in angstrom) ....... : CaCa_distr_001_1.txt
 Volume distribution (in angstrom) ...................... : Volume_distr_001_1.txt
 -------------------------------------------
 -- PDB models in the folder "GA001/curve_1/pdb":
  #  Filename           Rg       Dmax     Fraction        

  1) TraDES_pool/   46.94     155.91    ~0.061.00  17
  2) TraDES_pool/  37.12     113.86    ~0.061.00  17
  3) TraDES_pool/   61.85     181.54    ~0.061.00  17
  4) TraDES_pool/  78.44     244.20    ~0.061.00  17
  5) TraDES_pool/   57.76     186.44    ~0.061.00  17
  6) TraDES_pool/  37.31     120.70    ~0.061.00  17
  7) TraDES_pool/   35.52     116.47    ~0.061.00  17
  8) TraDES_pool/   50.95     177.73    ~0.061.00  17
  9) TraDES_pool/   46.47     140.48    ~0.061.00  17
 10) TraDES_pool/   56.18     182.62    ~0.061.00  17
 11) TraDES_pool/   45.02     139.65    ~0.061.00  17
 12) TraDES_pool/  51.12     159.91    ~0.061.00  17
 13) TraDES_pool/   71.72     224.73    ~0.061.00  17
 14) TraDES_pool/  49.90     161.59    ~0.061.00  17
 15) TraDES_pool/   75.05     266.38    ~0.061.00  17
 16) TraDES_pool/   48.99     170.59    ~0.061.00  17
 17) TraDES_pool/   50.12     164.03    ~0.061.00  17

 Final ensemble :           52.97     170.99[/size]

1 - It looks like I had to get 17 pdbs in pdbs directory but there is none. I wonder if I miss something extremely trivial but I tried running this several times and I get no structures.
2 - I get a good fit (judged from profiles.fit), reasonable Rg_distr etc. One thing that does not look ok (at least when I compared to my previous runs with RANCH output) is CaCa_dist. Here it is:

Code: Select all

 Ense  Average  Ca(N)-Ca(C) distance =   0.00
 Ense  Average  Ca(N)-Ca(C) distance =   0.00
 Pool  Average  Ca(N)-Ca(C) distance =   0.00
 Pool Histogram Ca(N)-Ca(C) distance =   0.00
    Distance    Pool freq.    Sel. freq.
       -0.00  0.0000000000  0.0000000000
       -0.00  0.2000000000  0.2000000000
        0.00  0.2000000000  0.2000000000
        0.00  0.2000000000  0.2000000000
        0.00  0.2000000000  0.2000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.00  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
        0.01  0.0000000000  0.0000000000
This really puzzles me. In my opinion, GA does a good job considering there are no errors and Rgs and fit profile is very nice but the program seems to fail at some point. I'd be glad to provide additional details if needed on this run. Thank you so much!

User avatar
AL
Administrator
Posts: 658
Joined: 2007.08.03 18:55
Location: EMBL Hamburg, Germany
Contact:

limitations of a custom EOM pool

#2 Post by AL » 2018.07.31 15:36

nmeri wrote:
2018.07.14 02:06
I am running GAJOE on a pool of 10k pdb files I generated with TraDES.
...
1 - It looks like I had to get 17 pdbs in pdbs directory but there is none.
I don't think GAJOE can reproduce the ensemble models if the models were not generated by RANCH.
nmeri wrote:
2018.07.14 02:06
2 - I get a good fit (judged from profiles.fit), reasonable Rg_distr etc. One thing that does not look ok (at least when I compared to my previous runs with RANCH output) is CaCa_dist.
Same problem: the pool was not generated by RANCH - so EOM cannot compute the CaCa_dist values (whatever they are).

nmeri
Member
Posts: 6
Joined: 2018.04.06 01:10

Re: limitations of a custom EOM pool

#3 Post by nmeri » 2018.10.16 19:54

Hi AL, thanks for the reply,
However, I have one more question:
I don't think GAJOE can reproduce the ensemble models if the models were not generated by RANCH.
Correct me if I am wrong, GAJOE takes the RANCH output and simply performs GA based on minimizing the Chisq of theoretical scattering curve and generated intensities. How are the intensity profiles from given own pool of pdbs are created if the pool is not generated by RANCH?

Also in the log file, the pdb models are listed but their identifiers are missing. In my opinion, GAJOE knows the identifiers (for ex., ./ownpool/identifier) but not reported since Rg, Dmax etc of a pdb file in an ensemble are given in this list. Is there a quick fix that I can perform to get those identifiers? I can go back to the Size_listXXX.txt and pull the representative pdbs out but I seek more of an automated way to do this.

Thank you
Irem

User avatar
AL
Administrator
Posts: 658
Joined: 2007.08.03 18:55
Location: EMBL Hamburg, Germany
Contact:

Re: limitations of a custom EOM pool

#4 Post by AL » 2018.11.14 18:54

nmeri wrote:
2018.10.16 19:54
How are the intensity profiles from given own pool of pdbs are created if the pool is not generated by RANCH?
If you used your own pool of models then RANCh was not involved. The intensities are calculated by CRYSOL if you run GAJOE with the --pool option.
nmeri wrote:
2018.10.16 19:54
Also in the log file, the pdb models are listed but their identifiers are missing.
You lost me here. In the end of the log file you get something like:

Code: Select all

  #  Filename                 Rg       Dmax     Fraction        

  1) 01332web.pdb           35.98     112.95    ~0.82
  2) 06592web.pdb           39.77     122.36    ~0.18

 Final ensemble :           36.67     114.66
You see the *.pdb file names, don't you? What do you mean by "identifiers"?

nmeri
Member
Posts: 6
Joined: 2018.04.06 01:10

Re: GAJOE no PDB output in file / CaCa_dist file doesnt look right

#5 Post by nmeri » 2018.11.15 01:50

Hi AL,
I appreciate the reply
You see the *.pdb file names, don't you? What do you mean by "identifiers"?
If you look at the example log file in my original post, you see that the program picks .pdb files but does not disclose what they are except their Rg and Dmax. In my case, every given pdb file has an identifier (i.e model00001 - model 10000) however, what you see is only the folder name as such:

# Filename Rg Dmax Fraction

1) TraDES_pool/ 46.94 155.91 ~0.061.00 17
2) TraDES_pool/ 37.12 113.86 ~0.061.00 17
3) TraDES_pool/ 61.85 181.54 ~0.061.00 17

So I eventually had to go back to sizelist.txt and handpick according to Rg (it was pain and not to mention ambiguous). Maybe it is a bug to be fixed...
Cheers,
Irem

Post Reply