Check fit with alternate EOM ensemble

Linear (OLIGOMER), and non-linear (MIXTURE) analysis, singular value decomposition (SVDPLOT), addition of missing fragments (BUNCH, CORAL), analysis of flexible systems (EOM/RANCH & GAJOE), flexible refinement of high-resolution models (SREFLEX)
Post Reply
Message
Author
jpruneda
Member
Posts: 4
Joined: 2010.10.03 05:30
Location: University of Washington

Check fit with alternate EOM ensemble

#1 Post by jpruneda » 2010.10.03 05:43

Hello,
I have generated EOM ensembles for two similar proteins and discovered interesting differences between them. Now I'm trying to double check that these differences are truly real and my first attempt was to simply re-run GAJOE on the same pool of structures to see if I get consistent ensembles. This so far is true. Now, as another test, I would like to apply one EOM ensemble to the OTHER proteins scattering curve to see how good (or poor) the fit is. So I guess what I'm asking is:
**How do I check the fit of a set ensemble against a scattering curve?
Thanks!

User avatar
Hayds
Active member
Posts: 100
Joined: 2008.05.21 19:01
Location: EMBL, Hamburg

#2 Post by Hayds » 2010.10.04 10:06

Hi jpruneda.

This is quite straight forward, you need to rename the size list file in the top level of your EOM directory and rerun GAJOE with the new data file.

eg. If your new data file name is DATA.dat

(1) rename Size_listDATA.txt (eg. from the previous run it was named something like Size_listOLD.txt, with the data file named OLD.dat).
(2) run gajoe with the new DATA.dat file

A new eg. GA002, directory should be created and all the usual analysis files placed there at the end of the run.

Hope it helps :)

Haydyn

jpruneda
Member
Posts: 4
Joined: 2010.10.03 05:30
Location: University of Washington

#3 Post by jpruneda » 2010.10.05 19:17

I think we are talking about different things. I would like to check the fit of a final 20-member ensemble to a scattering curve that is different than the one for which the ensemble was originally created.
Does this make sense?
Thanks!
Jonathan

User avatar
Hayds
Active member
Posts: 100
Joined: 2008.05.21 19:01
Location: EMBL, Hamburg

#4 Post by Hayds » 2010.10.08 10:20

Hi Jonathan,

You should be able to run CRYSOL acrosss the ensemble and then create an average scattering curve from the calculated intensity files. You should then be able to fit this averaged curve to the data.

Note that if you ran GAJOE in default mode then you allow repetitions (ie. the same structure is allowed to be selected multiple times if this is required to fit the data), so keep this in mind if the final selected ensemble is not exactly 20 (or the number you choose in the dialog). I will have to get back to you as to where this information can be found as I can't find it in my GAJOE output so far. I can find overall selection frequency but I don't think this is what we want.

You can also run OLIGOMER on the the ensemble of selected structures and take note of the volume fractions. This could be a good solution.

Best regards,

Haydyn

User avatar
Hayds
Active member
Posts: 100
Joined: 2008.05.21 19:01
Location: EMBL, Hamburg

#5 Post by Hayds » 2010.10.08 14:01

Hi jonathan,

So, running OLIGOMER on the EOM selected ensemble should, in principle replicate the GAJOE fit, and you can fit multiple SAXS data sets to the ensemble this way. This should also weight the samples by volume fraction which should account for any "replicates" that were used in EOM.

To discover if any models were selected multiple times by GAJOE you need to run GAJOE and answer YES to the question asking if extra analysis files should be created. Then do the following after the genetic algorithm is finished:

check the header of the profile.fit file:

eg.

CYCLE: 5 Chi: 1.498 GENER.: 1000 ENSEMBLES: 50 CURVES: 20 MAX_MUT.: 10 CROSS: 20

This tells you that cycle 5 contains the selected ensemble of structures.

Then you check the selected_ensem.txt file for cycle 5:

eg.

CYCLE: 5
ENSEMBLE 1 2 3 4 5 6 7 8 9 10

728 728 728 728 728 728 728 728 728 728
2041 2041 2436 2041 2436 2436 2041 2436 2041 2041
2436 2436 2650 2436 2650 2650 2436 2650 2436 2436
2650 2650 3332 2650 3332 3332 2650 3332 2650 2650
3332 3332 3401 3332 3401 3401 3332 3401 3332 3332
3401 3401 3662 3401 3662 3513 3401 3513 3401 3401
3662 3515 4995 3513 4995 3662 3513 3515 3662 3662
4995 3662 5989 3515 5989 5989 3662 3662 4995 4995
5989 4995 6000 3662 6000 6000 5989 5989 5989 5989
6000 5989 6000 5989 6000 6000 6000 6000 6000 6000
6000 6000 6554 6000 6554 6554 6000 6000 6000 6000
6554 6000 6573 6000 6573 6573 6554 6573 6554 6554
6573 6573 6759 6573 6759 6759 6573 6759 6573 6573
6867 6867 6867 6867 6867 6867 6867 6867 6819 6819
7016 7016 7016 7939 7939 7939 7939 7939 7016 7939
7939 7939 7939 8155 8155 8155 8155 8155 7939 8155
8681 8681 8681 8681 8681 8681 8681 8681 8681 8681
8681 8681 8681 8681 8681 8681 8681 8681 8681 8681
8681 8681 8681 8681 8681 8681 8681 8681 8681 8681
8681 8681 8681 8681 8681 8681 8681 8681 8681 8681

The first column lists the names/number id. of the pdb files selected.

Thus in this example, ensemble 1 consists of 16 unique entries, with 1 structure repeated twice and another structure repeated 4 times. Yielding the ensemble of 16 pdb files.

Best regards,

jpruneda
Member
Posts: 4
Joined: 2010.10.03 05:30
Location: University of Washington

#6 Post by jpruneda » 2010.10.08 18:14

Thanks!
I actually tried manually averaging the form factor files to produce one theoretical curve that I can run through oligomer (I don't know how to use a previously generated curve in crysol), but I think I must have messed up the formatting because it didn't want to read in my new file. Do you know anything about the formatting that is required?

I think I've also tried running the 20 structures through oligomer directly, but to my recollection it does not treat all 20 equally (which makes me wonder what GAJOE is doing differently). Do you know of a way to force oligomer to use all 20 structures?
Thanks again for your help!
Jonathan

User avatar
Hayds
Active member
Posts: 100
Joined: 2008.05.21 19:01
Location: EMBL, Hamburg

#7 Post by Hayds » 2010.10.08 19:33

Hi jonathan,

For OLIGOMER I would extract the 2nd column (Theoretical scattering intensity) from each CRYSOL .int file and paste them into one file (also including the 1st column from one of the CRYSOL .int files, this will be column 1, the s-values, for the combined file).

This should give you an input form-factor file with 21 columns, with columns 2-21 the form-factors of the 20 EOM selected models.

OLIGOMER can then be run using this form-factor file and an experimental data file, eg:

oligomer /ff ff.dat /dat data.dat /un 2 /cst
(/un 2: units in nm, /cst: use of a constant to account for differences b/n sample and buffer).

If required, oligomer will weight each "model" by a volume fraction. I'm not sure if you can force it to keep all volume fractions = 1. If, for example, GAJOE gave you 15 structures, I would expect OLIGOMER to increase the volume fraction of the model(s) that are repeatedly selected in EOM, and hopefully :wink: give you the same answer.

I hope this helps,

Haydyn

Post Reply