Bootstrapping and non-informative sites
It is a good idea to only use those positions for phylogenetic analyses for which one is sure that the sites are indeed homologous. However, in order to obtain a bootstrap value better than 90% for a branch one only needs three sites that change along this branch. These bootstrap values are not lowered by adding non-informative sites to the alignment. An example of four sequences is discussed that has 5 positions supporting the central branch in one orientation and 2 positions each supporting the two alternatives:
spec1 ACGTG AC CG
spec2 ACGTG GG AA
spec3 TGCAC GG CG
spec4 TGCAC AC AA
Regardless how many non informative sites are added, the grouping of spec1 with spec2 (and spec3 with spec4) is found in about 80% of the bootstrapped samples.
Example of sequence file with added non-informative sites:
spec1 AAGCAGCTGT AAGCAGCTGT AAGCAGCTGT AAGCAGCTGT AAGCAGCTGT ACGTG ACCG
spec2 AACCGCCTGT AACCGCCTGT AACCGCCTGT AACCGCCTGT AACCGCCTGT ACGTG GGAA
spec3 AACGACCTCT AACGACCTCT AACGACCTCT AACGACCTCT AACGACCTCT TGCAC GGCG
spec4 ATCCACCAGA ATCCACCAGA ATCCACCAGA ATCCACCAGA ATCCACCAGA TGCAC ACAA
Table of bootstrap support:
Spec1 with 2 |
Spec1 with 3 or 1 with 4 |
|
5 informative sites support this branch |
2 informative sites each |
|
+0 non-informative sites |
||
82.5 |
12.17 |
|
77.83 |
3.67 |
|
84.17 |
11 |
|
6.5 |
||
11.83 |
||
10.33 |
||
Mean of 3 replicates with 100 bootstrapped samples each |
81.50 |
9.25 |
Standard deviation |
3.29 |
3.41 |
+50 non-informative sites |
||
77.00 |
13.00 |
|
77.00 |
10.00 |
|
78.83 |
14.00 |
|
9.00 |
||
11.33 |
||
9.83 |
||
Mean of 3 replicates with 100 bootstrapped samples each |
77.61 |
11.19 |
Standard deviation |
1.06 |
1.96 |
+200 non-informative sites |
||
78.83 |
14.33 |
|
78.33 |
6.83 |
|
76.50 |
12.83 |
|
8.83 |
||
14.50 |
||
9.00 |
||
Mean of 3 replicates with 100 bootstrapped samples each |
77.89 |
11.05 |
Standard deviation |
1.23 |
3.25 |
+2000 non-informative sites |
||
79.67 |
10.67 |
|
83.33 |
9.67 |
|
81.33 |
10.33 |
|
6.33 |
||
10.83 |
||
7.83 |
||
Mean of 3 replicates with 100 bootstrapped samples each |
81.44 |
9.28 |
Standard deviation |
1.83 |
1.81 |