### Bug in EOM Rsigma calculation?

Posted:

**2020.05.18 21:32**Hi folks,

I'm using EOM from ATSAS 3.0.1, and I think the Rsigma is being calculated incorrect. Either that or I don't understand how the Rsigma calculation is being done (quite possible).

I've attached the results of an EOM run on some good quality data (.dat file also attached). Most of the results look reasonable, but the reported Rsigma value in the .log file is 4.57! This is huge. In the EOM 2.0 paper, figure 3 shows an Rsigma for a hugely artificial bimodal distribution as 2.91. I'm struggling to see how the Rsigma of my result is larger than that of the biomdal distribution shown in that Figure 3. Below is a plot of the EOM distributions.

Here's why I think this might be a bug. First, I reviewed the 2015 paper and it defined Rsigma as std_selected/std_pool for the distributions. In the sup it defines the standard deviation in the usual manner. Second, I calculated the standard deviation of the pool and ensemble distributions using the standard approach for calculating standard deviation from a histogram. Which is to say, I weighted the value for each bin (for example, each Rg value) by the frequency of the bin, and then calculated the standard deviation of the resulting data. So, for example, some python (psuedo-)code to do this would be:

where f is the frequency of the rg in the distribution and rg_distribution are the Rg values of the bins of the distribution. I then calculated the ratio of the standard deviatiosn for the selected ensemble and pool to come up with a distribution specific Rsigma value.

I did this for each individual distribution, and then calculated a final Rsigma as the average of the Rsigma of all four distributions (Rg, Dmax, Volume, and C alpha).

For the EOM result I've attached, the calculated Rsigma values of:

Rg R_sigma: 1.31

Dmax R_sigma: 0.99

Volume R_sigma: 0.99

C alpha R_sigma: 0.71

Average R_sigma: 1.0

I've attached the python script I used for the calculations so you can test it yourself, and see if I made a mistake somewhere.

I'm hoping you can let me know if either:

1) I'm completely misunderstanding how to calculate Rsigma from the distributions.

2) I've made a mistake in the actual calculation.

or 3) There's actually a bug in the EOM reported Rsigma value.

Finally, here are a few caveats:

1) I know that calculating standard deviations from histograms should use the center of each bin. I'm not sure if the results reported in the distribution files are edges or centers, so I didn't adjust for this. This could change the results slightly.

2) I know that calculating standard deviations from histograms might not be precisely the same as calculating it from the underlying data, since you assuming everything in the bin is at the bin midpoint. However, given the size of the bins involved I don't imagine this will make a lot of difference.

All the best.

- Jesse

I'm using EOM from ATSAS 3.0.1, and I think the Rsigma is being calculated incorrect. Either that or I don't understand how the Rsigma calculation is being done (quite possible).

I've attached the results of an EOM run on some good quality data (.dat file also attached). Most of the results look reasonable, but the reported Rsigma value in the .log file is 4.57! This is huge. In the EOM 2.0 paper, figure 3 shows an Rsigma for a hugely artificial bimodal distribution as 2.91. I'm struggling to see how the Rsigma of my result is larger than that of the biomdal distribution shown in that Figure 3. Below is a plot of the EOM distributions.

Here's why I think this might be a bug. First, I reviewed the 2015 paper and it defined Rsigma as std_selected/std_pool for the distributions. In the sup it defines the standard deviation in the usual manner. Second, I calculated the standard deviation of the pool and ensemble distributions using the standard approach for calculating standard deviation from a histogram. Which is to say, I weighted the value for each bin (for example, each Rg value) by the frequency of the bin, and then calculated the standard deviation of the resulting data. So, for example, some python (psuedo-)code to do this would be:

Code: Select all

```
rg_w_list = []
for rg in rg_distribution:
rg_weighted = rg*f
rg_w_list.append(rg_weighted)
rg_mean = numpy.mean(rg_w_list)
rg_std = numpy.std(rg_w_list)
```

I did this for each individual distribution, and then calculated a final Rsigma as the average of the Rsigma of all four distributions (Rg, Dmax, Volume, and C alpha).

For the EOM result I've attached, the calculated Rsigma values of:

Rg R_sigma: 1.31

Dmax R_sigma: 0.99

Volume R_sigma: 0.99

C alpha R_sigma: 0.71

Average R_sigma: 1.0

I've attached the python script I used for the calculations so you can test it yourself, and see if I made a mistake somewhere.

I'm hoping you can let me know if either:

1) I'm completely misunderstanding how to calculate Rsigma from the distributions.

2) I've made a mistake in the actual calculation.

or 3) There's actually a bug in the EOM reported Rsigma value.

Finally, here are a few caveats:

1) I know that calculating standard deviations from histograms should use the center of each bin. I'm not sure if the results reported in the distribution files are edges or centers, so I didn't adjust for this. This could change the results slightly.

2) I know that calculating standard deviations from histograms might not be precisely the same as calculating it from the underlying data, since you assuming everything in the bin is at the bin midpoint. However, given the size of the bins involved I don't imagine this will make a lot of difference.

All the best.

- Jesse