Criteria for selecting models from Crysol output

Calculation of SAXS and SANS profiles (CRYSOL, CRYSON), superposition of models (SUPCOMB, DAMAVER, DAMCLUST), database DARA
Post Reply
Message
Author
asharma
Guest
Posts: 1
Joined: 2018.08.14 04:48

Criteria for selecting models from Crysol output

#1 Post by asharma » 2018.08.14 16:15

Dear All,
I am a beginner to SAXS and hence to crysol as well. I am working on a two-domain protein. Two domains are connected with a small linker. Structure of one of the domains is solved from one organism and that of other is known from the other organism. Through, modelling I generated a full-length model for one case. I carried out MD simulation on this model and have an ensemble of structures. I also have the SAXS data for this full-length protein. Now, I want to see if I can filter out pdb files (conformations) which are in agreement with the SAXS data. I ran crysol and now have theoretical SAXS profile for more than a thousand pdbs obtained from the MD simulation.

Now, I want to know what should be my criteria to select these models (conformations). If I understood correctly, Chi^2 could be one, but in that case what range of Chi^2 value is preferred?
Also, is there any good documentation (publication, tutorial or web-literature ) for such an exercise?

Any help would be greatly appreciated.

Many thanks,
Ashu

franke
Administrator
Posts: 407
Joined: 2007.08.10 11:09
Contact:

Re: Criteria for selecting models from Crysol output

#2 Post by franke » 2018.08.15 13:06

asharma wrote:
2018.08.14 16:15
I want to know what should be my criteria to select these models (conformations). If I understood correctly, Chi^2 could be one...
The prerequisite for this are correct error estimates and error propagation on the experimental saxs data. Unfortunately, those are still hard to come by as it depends on the instrument, the detector but also software used for radial averaging. A quick "post-mortem" test on whether you have accurate errors, assuming that you didn't take a single shot of your data, but multiple frames, checked for radiation damage and then averaged:
  • find 10 consecutive frames without radiation damage (may be water, buffer or sample, doesn't matter)
  • use the cormap test to verify that all frames are similar up to noise (datcmp --test=cormap [...])
  • calculate the reduced chi-square values for all pairs of frames (datcmp --test=chi-square [...])
  • compare the chi-square values obtained with the critical values for your data (see below); as a rule of thumb for contemporary data sets with >1000 points you may use [0.9; 1.1] as an approximation
  • if your chi-square values have a wider spread or are located entirely elsewhere on the number line, your errors are likely wrong.
Some points to keep in mind if your errors are correct:
  • the requested range of chi-square values is the same as above, e.g. [0.9; 1.1] as an approximation
  • any model that results in a chi-square value outside this range does not fit; no, squinting at it with half-closed eyes is not a better judge than this
  • a chi-square value of 1.03 does not fit better than one of 1.04 - neither is significantly different within the precision of the data collected and thus they have to be considered equivalent; collect more/better data to discriminate
In case your errors are incorrect:
  • if the software at the instrument was pyFAI, reprocess your data with radaver (ATSAS, available on request) or BioXTAS RAW; re-do the above
  • you may use the cormap test instead of chi-square test to categorize your models, use alpha=0.01 and adjust your p-values for multiple testing
Finally, the correct procedure to determine the critical values of the chi-square distribution:
  • determine the number of data points in your files (text editor, count number of data points)
  • calculate in python (replace 'm=1000' with the number of data points determined before):

Code: Select all

>>> from scipy.stats import chi2
>>> m = 1000
>>> chi2.ppf([0.005, 0.995], m) / (m - 1)
[ 0.88945298, 1.12006813 ]
Note 1: datcmp and radaver are part of the ATSAS software package
Note 2: A more detailed write-up of this with all background and references is in preparation

HTH.

User avatar
AL
Administrator
Posts: 658
Joined: 2007.08.03 18:55
Location: EMBL Hamburg, Germany
Contact:

Re: Criteria for selecting models from Crysol output

#3 Post by AL » 2018.08.15 13:56

asharma wrote:
2018.08.14 16:15
I am working on a two-domain protein. Two domains are connected with a small linker. Structure of one of the domains is solved from one organism and that of other is known from the other organism.
You may try CORAL to fit your experimental data with a single model or EOM to fit your data with an ensemble of models. Both programs have a web interface.

Post Reply