Hello,

I have some problems with trying to understand how SHANUM works. As an example, I ran SHANUM for the Importin alpha/beta complex (SASBDB id: SASDAC5) with Dmax = 11 nm. The results for this example can be found in the Konarev and Svergun paper (http://journals.iucr.org/m/issues/2015/ ... index.html).

If I run SHANUM ($ shanum -d SASDAC5.dat 19) I obtain:

Datafile=SASDAC5.dat

Dmax= 19.000000000000000

Smax= 6.2370099999999997

Nsh= 37.720736921316117

Nopt= 8

Sopt= 1.3227758541430708

Mi questions:

i) why the Nopt value suggested by shanum is 8 in spite of the minimum value in the f(M) function is at M = 11? (I have the same problems when I work with other noisy data)

ii) is the SASDAC5 data the same data used in the Konarev & Svergun paper? How can I reproduce the f(M) function in Fig 4a (Konarev & Svergun, 2015)?

iii) the DPR_vs_Nshannon.dat file has three columns, what does they mean? (dp/dr, (dp/dr)², alpha*Omega(pM) ...?)

Thanks!

A.

## SHANUM interpretation

### Re: SHANUM interpretation

Hi Albert,

thank you for your inquiry, below you can find the answers.

>>i) why the Nopt value suggested by shanum is 8 in spite of the minimum value in the f(M) function is at M = 11? (I have the same problems >>when I work with other noisy data)

It is correct to check the minimum value of f(M), in majority of cases it gives a good estimate for the useful angular range.

However, in some cases (especially for noisy data) f(M) has a wide minimum plateau and the angular range, where a significant improvement of the fit quality of Shannon approximation happens, actually corresponds to lower M values.

That is why in the current implementation of Shanum, if f(M) function has a wide plateau, Nopt value is estimated in a more 'conservative' way selecting the M value after which no significant improvement of Chi^2 (or correlation map value) occurs.

It is also a good practice to compare the results for the data with and without error estimates, e.g. for this particular case

the correlation test will yield M=7 (that corresponds to the angular range up to 1.3 nm^-1

taken into account that the data becomes noisy already at 1.0 nm^-1)

>>ii) is the SASDAC5 data the same data used in the Konarev & Svergun paper? How can I reproduce the f(M) function in Fig 4a (Konarev & >>Svergun, 2015)?

It can be reproduced using the automated search for Dmax (that yields 17.2 nm), in this case f(M) will have minimum at M=8.

>>iii) the DPR_vs_Nshannon.dat file has three columns, what does they mean? (dp/dr, (dp/dr)², alpha*Omega(pM) ...?)

The first column - the number of Shannon channels used for the data approximation,

the second column - is Omega(pM) (the integral first derivative of p(r))

the third column - is the integral second derivative of p(r), calculated in a similar way as Omega(pM)

(the latter column does not influence any estimations of Shanum, it is stored just for information)

thank you for your inquiry, below you can find the answers.

>>i) why the Nopt value suggested by shanum is 8 in spite of the minimum value in the f(M) function is at M = 11? (I have the same problems >>when I work with other noisy data)

It is correct to check the minimum value of f(M), in majority of cases it gives a good estimate for the useful angular range.

However, in some cases (especially for noisy data) f(M) has a wide minimum plateau and the angular range, where a significant improvement of the fit quality of Shannon approximation happens, actually corresponds to lower M values.

That is why in the current implementation of Shanum, if f(M) function has a wide plateau, Nopt value is estimated in a more 'conservative' way selecting the M value after which no significant improvement of Chi^2 (or correlation map value) occurs.

It is also a good practice to compare the results for the data with and without error estimates, e.g. for this particular case

the correlation test will yield M=7 (that corresponds to the angular range up to 1.3 nm^-1

taken into account that the data becomes noisy already at 1.0 nm^-1)

>>ii) is the SASDAC5 data the same data used in the Konarev & Svergun paper? How can I reproduce the f(M) function in Fig 4a (Konarev & >>Svergun, 2015)?

It can be reproduced using the automated search for Dmax (that yields 17.2 nm), in this case f(M) will have minimum at M=8.

>>iii) the DPR_vs_Nshannon.dat file has three columns, what does they mean? (dp/dr, (dp/dr)², alpha*Omega(pM) ...?)

The first column - the number of Shannon channels used for the data approximation,

the second column - is Omega(pM) (the integral first derivative of p(r))

the third column - is the integral second derivative of p(r), calculated in a similar way as Omega(pM)

(the latter column does not influence any estimations of Shanum, it is stored just for information)

### Re: SHANUM interpretation

Thanks for your reply!

Please, let me raise another question.

I have obtained the following Shannon results in several times:

According to the Chi²(M), Omega(pM) and f(M) functions the optimal number of Shannon channels is around 12. However, if I plot the fits as a function of M on the experimental data (without errors) I see that the best fitting is for the 5 Shannon channel (which corresponds to Smax = 2.4). This is in concordance with the fact that I can't obtain good p(r) functions with GNOM beyond S = 2.9.

How can I explain it?

Thanks!

Please, let me raise another question.

I have obtained the following Shannon results in several times:

According to the Chi²(M), Omega(pM) and f(M) functions the optimal number of Shannon channels is around 12. However, if I plot the fits as a function of M on the experimental data (without errors) I see that the best fitting is for the 5 Shannon channel (which corresponds to Smax = 2.4). This is in concordance with the fact that I can't obtain good p(r) functions with GNOM beyond S = 2.9.

How can I explain it?

Thanks!

### Re: SHANUM interpretation

It is difficult to clearly see the low angle part of the data, but still one can distinguish that Fit_Shannon_7.dat does not fit this part,

and with high probability Fit_Shannon5 and Fit_Shannon6 should also have systematic deviations from the data in this region.

Besides, Chi_Rfac_vs_Nshannon.dat points that the best fit quality corresponds to M=10-12.

It looks that the buffer was undersubtraced (or there is some sample/buffer mismatch), one can try to force

Porod asymptotics at higher angles by subtracting a constant from the data, it may improve p(r) fitting at higher angles.

and with high probability Fit_Shannon5 and Fit_Shannon6 should also have systematic deviations from the data in this region.

Besides, Chi_Rfac_vs_Nshannon.dat points that the best fit quality corresponds to M=10-12.

It looks that the buffer was undersubtraced (or there is some sample/buffer mismatch), one can try to force

Porod asymptotics at higher angles by subtracting a constant from the data, it may improve p(r) fitting at higher angles.

### Re: SHANUM interpretation

Hello konarev,

This discussion is very useful for me, thanks a lot!

Is it acceptable a nice fitting with a slightly oscillating pair distribution function?

Dmax= 11.000000000000000

Smax= 6.0127709999999999

Nsh= 21.053168979251168

Nopt= 21

Thanks,

A.

This discussion is very useful for me, thanks a lot!

Is it acceptable a nice fitting with a slightly oscillating pair distribution function?

Dmax= 11.000000000000000

Smax= 6.0127709999999999

Nsh= 21.053168979251168

Nopt= 21

Thanks,

A.

### Re: SHANUM interpretation

It looks reasonable taking into account that Omega(M) function for M between 10 and 23 has similar values.