Dear All,

I am a beginner to SAXS and hence to crysol as well. I am working on a two-domain protein. Two domains are connected with a small linker. Structure of one of the domains is solved from one organism and that of other is known from the other organism. Through, modelling I generated a full-length model for one case. I carried out MD simulation on this model and have an ensemble of structures. I also have the SAXS data for this full-length protein. Now, I want to see if I can filter out pdb files (conformations) which are in agreement with the SAXS data. I ran crysol and now have theoretical SAXS profile for more than a thousand pdbs obtained from the MD simulation.

Now, I want to know what should be my criteria to select these models (conformations). If I understood correctly, Chi^2 could be one, but in that case what range of Chi^2 value is preferred?

Also, is there any good documentation (publication, tutorial or web-literature ) for such an exercise?

Any help would be greatly appreciated.

Many thanks,

Ashu

## Criteria for selecting models from Crysol output

### Re: Criteria for selecting models from Crysol output

The prerequisite for this are correct error estimates and error propagation on the experimental saxs data. Unfortunately, those are still hard to come by as it depends on the instrument, the detector but also software used for radial averaging. A quick "post-mortem" test on whether you have accurate errors, assuming that you didn't take a single shot of your data, but multiple frames, checked for radiation damage and then averaged:

- find 10 consecutive frames without radiation damage (may be water, buffer or sample, doesn't matter)
- use the cormap test to verify that all frames are similar up to noise (datcmp --test=cormap [...])
- calculate the reduced chi-square values for all pairs of frames (datcmp --test=chi-square [...])
- compare the chi-square values obtained with the critical values for your data (see below); as a rule of thumb for contemporary data sets with >1000 points you may use [0.9; 1.1] as an approximation
- if your chi-square values have a wider spread or are located entirely elsewhere on the number line, your errors are likely wrong.

- the requested range of chi-square values is the same as above, e.g. [0.9; 1.1] as an approximation
- any model that results in a chi-square value outside this range does not fit; no, squinting at it with half-closed eyes is not a better judge than this
- a chi-square value of 1.03 does not fit better than one of 1.04 - neither is significantly different within the precision of the data collected and thus they have to be considered equivalent; collect more/better data to discriminate

- if the software at the instrument was pyFAI, reprocess your data with radaver (ATSAS, available on request) or BioXTAS RAW; re-do the above
- you may use the cormap test instead of chi-square test to categorize your models, use alpha=0.01 and adjust your p-values for multiple testing

- determine the number of data points in your files (text editor, count number of data points)
- calculate in python (replace 'm=1000' with the number of data points determined before):

Code: Select all

```
>>> from scipy.stats import chi2
>>> m = 1000
>>> chi2.ppf([0.005, 0.995], m) / (m - 1)
[ 0.88945298, 1.12006813 ]
```

Note 2: A more detailed write-up of this with all background and references is in preparation

HTH.

### Re: Criteria for selecting models from Crysol output

You may try CORAL to fit your experimental data with a single model or EOM to fit your data with an ensemble of models. Both programs have a web interface.