Dear All,

I am wondering if my data format is OK or datcmp is broken. Although I tried several data formats and three recent version of datcmp. Any hint will be appreciate :

my files and output:

a.dat)

# REMARK Columns: q, I(q), error

1.185900E-02 2.14433E03 2.12911E02

1.246700E-02 2.04531E03 1.31805E02

1.307500E-02 2.01542E03 4.51282E02

1.368300E-02 2.04956E03 2.85521E02

1.429100E-02 2.06111E03 3.05166E02

1.489900E-02 2.05000E03 2.23006E02

1.550700E-02 2.06500E03 1.74990E02

1.611500E-02 2.02543E03 1.53453E02

1.672400E-02 2.02297E03 1.46936E02

... more lines follows

b.dat)

# REMARK Columns: q, I(q), error

1.185900E-02 1.95367E03 2.06718E02

1.246700E-02 1.90382E03 1.33606E02

1.307500E-02 1.84460E03 4.45725E02

1.368300E-02 1.84456E03 2.98739E02

1.429100E-02 1.82568E03 3.05721E02

1.489900E-02 1.80671E03 2.34762E02

1.550700E-02 1.79001E03 1.77641E02

1.611500E-02 1.75975E03 1.51766E02

1.672400E-02 1.74624E03 1.49467E02

....

(version 2.5.1)

Hypothesis: all data sets are similar

Alternative: at least one data set is different

Univariate Type III Repeated-Measures ANOVA

eps num Df den Df F Pr(>F) adj Pr(>F)

Assuming Sphericity 1 513 162.0400 0.000000

Greenhouse-Geyser Correction 1.0000 1 512 0.000000

Huynh-Feldt Correction 1.0000 1 512 0.000000

(version 2.5.2)

Hypothesis: all data sets are similar

Alternative: at least one data set is different

Univariate Type III Repeated-Measures ANOVA

eps num Df den Df F Pr(>F) adj Pr(>F)

Assuming Sphericity 1 513 162.5289 0.000000

Greenhouse-Geyser Correction 1.0000 1 512 0.000000

Huynh-Feldt Correction 1.0000 1 512 0.000000

(version 2.6)

% datcmp a.dat b.dat

Hypothesis: all data sets are similar

Alternative: at least one data set is different

Pair-wise Correlation Map test with correction for Familywise Error Rate (Bonferroni)

C Pr(>C) adj Pr(>C)

1 vs. 2 255.000000 0.000000 0.000000*

1 vs. 3 235.000000 0.000000 0.000000*

1 vs. 4 282.000000 0.000000 0.000000*

2 vs. 3 255.000000 0.000000 0.000000*

2 vs. 4 11.000000 0.116297 0.697785

3 vs. 4 282.000000 0.000000 0.000000*

1* 0.000000 + 1.000000 * a.dat

2 0.000000 + 1.000000 * a.dat

3 0.000000 + 1.000000 * b.dat

4 0.000000 + 1.000000 * b.dat

## DATCMP im Maverick

### Re: DATCMP im Maverick

What exactly is your question?

This means that only frames 2 and 4 may be considered similar, all others are different from each other. Given the large value of C, I'd say that there are scaling issues.

I'd give a link to the relevant publication here, but it's still in review

If you want to use the previous test, add "--test=anova", however, this is not recommended.Release Notes wrote: Changes in Programs

====================

* datcmp: pair-wise correlation map set as default test

Code: Select all

```
C Pr(>C) adj Pr(>C)
1 vs. 2 255.000000 0.000000 0.000000*
1 vs. 3 235.000000 0.000000 0.000000*
1 vs. 4 282.000000 0.000000 0.000000*
2 vs. 3 255.000000 0.000000 0.000000*
2 vs. 4 11.000000 0.116297 0.697785
3 vs. 4 282.000000 0.000000 0.000000*
```

I'd give a link to the relevant publication here, but it's still in review

### Re: DATCMP im Maverick

Thanks for your comments.

Sorry I wasn't clear.

I am using datcmp v2.1 (r4556). My MAC OS is 10.9.4

I tested using a) twice the same dataset, b) fairly similar scaled datasets , b)

completely different datasets (two non-related proteins) ,

the only change in the output are filenames and Cs.

The C are large for identical (twice the same) datasets

What is going on? Is datcmp broken in my OS version?

Mario

Sorry I wasn't clear.

I am using datcmp v2.1 (r4556). My MAC OS is 10.9.4

I tested using a) twice the same dataset, b) fairly similar scaled datasets , b)

completely different datasets (two non-related proteins) ,

the only change in the output are filenames and Cs.

The C are large for identical (twice the same) datasets

What is going on? Is datcmp broken in my OS version?

Mario

### Re: DATCMP im Maverick

No, it's perfectly fine.mab wrote:What is going on? Is datcmp broken in my OS version?

The point is: if two datasets are exactly identical, then the probability that this can occur

*by chance*is, in all generality, exactly 0.0.

And this is what you see.

Going into the details here would be a bit too much, but I invite you to look up the paper once it's (finally) published. Then it should become clear. I hope.

### Re: DATCMP im Maverick

Finally, the details.