Posts mit dem Label stratigraphy werden angezeigt. Alle Posts anzeigen
Posts mit dem Label stratigraphy werden angezeigt. Alle Posts anzeigen

Freitag, 11. Februar 2011

Maths in Paleontology (I): Data

''In every special doctrine of nature only so much science proper can be found as there is mathematics in it.'' - Immanuel Kant, Metaphysical Foundations of Natural Science (1786)

Warningly the maths professor who got the unthankful task to teach us first-semester scientists-to-be some basic basics of his field chose Kant's statement as the first in his first lecture on "higher" maths. However, when I started my studies in geology and paleontology, there was another saying among old school geology teachers: "A bad mathematician makes a good geologist."

Many a fellow student were rather willing to believe in these latter words than in the inconvenient alternative. (I always considered this believe as outdated and I got the feeling that geology as a science might have been shaped not only by the talents of its protagonists but also by their limitations in terms of exactness and rigorousity.)

Luckily you were not necessarily considered as a bad geologist if you were interested in maths and the notion that modern geoscience involves maths and exact methods (e.g. methods of quantitative data analysis, databases, multivariate statistics and geostatistics, geoinformatics and geographic information systems, 3D and 4D modelling, remote sensing) was clearly on the rise. Perhaps from a biologists' point of view this story would be different, but, to tell you the truth, some of the biology-based paleontologists I got to know are not much living on the exact side either.

Apart from microscopy seminars, field, and lab practicals which teach you ways of data acquisition some classes in statistics and data analysis during first semesters of study give you an idea about the structure of data and ways how to sample and how to deal with data in order to find new knowledge, e.g. a relationship between two phenomena previously not considered to be related.

At the very beginning you will learn that there are different types of data used in paleontology and that you have to bring your data into shape for any kind of mathematical analysis tools, i.e. arrange them as a data table such as the following:

SpecimenClassState of XYZNo. of UVWsize L [mm]size M [cm²]
AaAa212.1234
BbB313.387
..................
XxX.........

Normally lines of the table represent samples (or groups of samples or taxa) whereas columns may represent various features or measures. Such features may be the belonging to a certain class or category or the presence, absence, or specificity of a feature. Measured values as entries may have a discrete contribution (e.g. natural numbers such as the number of teeth or segments or body chambers) or a continuous distribution (e.g. length, area, angle, temperature measurements).

Various data relevant for paleontologists can be arranged as tables, such as morphological and microstructural data, stable isotope and other geochemical data, geographical, sedimentological, and stratigraphic data, as well as taphonomic and paleoecological data. Some of these data have a special structure and can be referred to one of the following types:


Compositional data...

... add up to 100%. Chemical compositions of fossils or faunal compositions are compositional data:

CommunityTrilobitesBrachiopodsEchinodermsPoriferansNautiloids
A23 [%]4217513
B101555020
..................
X...............

These data require careful considerations and a special kind of maths because all variables are (necessarily) correlated and thus an alleged dependence, e.g. of brachiopod and echinoderm abundances, can be obscured by variation in another group.


Spatially or temporally correlated data

‘Spatial correlation’ means that values for data points close to each other are more similar than values of more distant data points – e.g. the faunal composition of an ecosystem from Arizona is rather like that of a Nevada community than that of a Massachusetts community.

LocalityEasting (X)Northing (Y)FaciesArchosaurs [%]Rhynchosaurs [%]
A56870487lacustrine2345
B64850808fluviatile3438
C68001490fluviatile4037
..................

Geostatistics is the usual method to deal with spatially correlated data. Spatial correlation can also occur on much smaller scales, e. g. the shape and size of two skull bones in contact to each other can show a stronger dependence than the shape and size of bones that are more distant to each other.

In paleontology temporal correlation is quite abundant, especially if your study considers different stratigraphic ages or sedimentological field data:

PopulationHorizonAr/Ar age [Ma]Faciesδ18O [‰]Average size [mm]
A1210 ± 1deltaic-2.05.2
B2aN/Adistal shelf1.46.4
C2c207 ± 2?2.16.8
D4200 ± 1deltaic-2.26.0

As in stockmarket analytics methods of time series analysis can be applied to interpret temporally correlated data (i.e. time series). Such data may be relevant for your study as they often indicate evolutionary trends (biological evolution in the stricter sense but also evolution of paleoenvironments), cyclic processes with a certain periodicity, and/or they can form the basis for relating contemporaneous processes in the geological past (e.g. stratigraphic correlation of separate sedimentary successions).


Orientation data

For elongated fossils such as conical shells or long bones the orientation of the fossil long axis towards the geographical cordinate system can be measured using a compass (with inclinometer). In a similar way the orientation of bedding planes can be documented. Such measurements are often used for the purpose of deducing the former transport direction of a ancient sediment transport and depostion system (such as a river, delta, or alluvial fan). A data table with orientation data may look like that:

Specimen No.DescriptionLength [cm]HorizonAzimuth>Dip
1long bone211N 20° E
2rib121N 10° W
3calamite stem802a N 15° E
..................

“Azimuth” refers to the angle towards north. Orientation data are distributed on a halfsphere. Mean values (e.g. the average orientation of long bones) and other distribution parameters cannot be derived directly from the averaging of orientation angles but vector arithmetics has to be applied.


Cladistic data

Phylogeny on the basis of morphology conventionally involves cladistic methods, especially in the field of vertebrate paleontology which deals with a particular character-rich group that is deemed suitable for cladistic approaches employing certain kinds of analysis software specialized for the calculation of phylogenetic trees (e.g. PAUP, WinClada).

In cladistic datasets lines represent taxa, mostly species or genera of the group of interest, and columns represent characters (ordered by number), i. e. features of the skeleton which are variable among the included taxa:

Taxon12345678910
A-saurus00000010?0
B-raptor?01?001100
C-onyx110?-1-101
D-ops1121111011
E-mimus0-21120011

One of the main issues in cladistics is the definition of characters and the correct (unbiased) coding of morphological information. You can include qualitative differences ("bone X contacts bone Y but not bone Z" = character state “0”; "bone X contacts bones Y and Z" = character state “1”) and quantitative differences ("length of metatarsal 3 larger than or as large as length of metatarsal 4" = character state "0"; "mt3 is shorter than mt4" = "1"). Sometimes mixed character states like "0 or 1 [but not 2]" occur in a taxon and are coded accordingly.


Missing data...

...occur all the time in paleontology ... either because specimens are not complete enough or because their geological age cannot be exactly determined or because specimens are too rare or valuable to use them for a destructive analysis method or because they are for some reason no longer accessible. "N/A" ("not applicable") or empty entries or question marks often symbolize missing data.


Some introductory literature:

Borradaile, G. J. 2003. Statistics of Earth Science Data. Springer, Berlin, 280 pages. ISBN 3540436030

Swan, A. R. H. and M. Sandilands. 1995. Introduction to geological data analysis. Blackwell, Oxford, 446 pages. ISBN 0632032243

Dienstag, 26. Mai 2009

Lineage concept vs cladistics
in continental biostratigraphy

The white hair of my chief Ph.D. supervisor is to some degree explained by his livelong efforts to get a grip on Carboniferous to Permian continental biostratigraphy - trying out different groups such as cockcroaches, conchostracans, freshwater sharks, and amphibians.

One of the underlying concepts which I suppose I will always find hard to believe is the idea of searching for and finding so-called lineages, i.e. series of species occurring subsequently in the stratigraphic record which show stepwisely distinct anatomies because each species has descended from the respective next-oldest species.

Of course every species has an ancestor and many have descendants but how can I define them from the fossil record? Is there not the typical problem of epistemic vagueness of the ancestor in any kind of phylogeny (e.g. discussed by Wolf-Ernst Reif in some of his many theoretical papers on cladistics in paleontology)?

Searching for lineages leads to a fallacy?

The idea that whithin a continental sedimentary succession a certain species occurring deeper than a related species should be regarded as the ancestor of the latter - unless disproven - always reminded of a type of logical fallacy called post hoc ergo procter hoc: "B occurred later than A, therefore A must be the reason for B." In terms of imposing the lineage concept: "Species B occurred subsequent to species A, therefore A must be the ancestor of B."

If a multiple- and irregularly branched bush is a good analogon to how evolution works I daresay the idea of a biostratigrapher to pick up the isolated fragements of branches (i.e. fossils) and glue them together in a few long continuous branches results in a bad model of the bush.

The problems occur after I have established a biostratigraphic zonation concept on the basis of what I think is a lineage: Someone working on the same material puts the species of my 'lineage' into a cladistic analysis and finds that there is almost no concordance between the appearance date of a species and its likely phylogenetic position.

If I do agree that similarities/ dissimilarities in morphology, histology, behavior, etc. should form the basis of a classification and consider the data basis of the phylogenetic analysis as sufficient I will have to admit that my scheme has been proven wrong. Or else if I suppose that the data are not sufficient and I myself cannot add more then I will have to concede that my scheme is at least no more valid than the alternative.

Proving microevolution depends on the sufficiency of "population" samples?

Im not saying that it is impossible to find arguments in favor of an ancestor-descendant relationship: Imagine I have large enough sample of specimens of the supposedly related species A, B, and C from three successive horizons. For A, B, C the empiric distributions of morphological parameters can be compared:
If the mean value & variance for A is not signficantly distinct from the mean and variance of B and
if the mean value & variance for B is not signficantly distinct from the mean and variance of C
but given a significant difference in the mean values/ variances of A and C,
I could infer that from A to C microevolution took place...
...but do we have such samples, let's say for tetrapods?

An example: Amphibian Biostratigraphy

These problems have been discussed for the amphibian biostratigraphy of the European Permocarboniferous as developed by Werneburg and Schneider and applied for various amphibian occurrences, see for example:

R. Werneburg & J.W. Schneider, 2006, Amphibian biostratigraphy of the European Permo-Carboniferous. In: S.G. Lucas, G. Cassinis and J.W. Schneider, Editors, Non-Marine Permian Biostratigraphy and Biochronology: Geological Society of London, Special Publications 265 (2006), pp. 201–215. [Link]

R. Werneburg, A. Ronchi, and J.W. Schneider, 2007, The Early Permian Branchiosaurids (Amphibia) of Sardinia (Italy): Systematic Palaeontology, Palaeoecology, Biostratigraphy and Palaeobiogeographic Problems. Palaeogeography, Palaeoclimatology, Palaeoecology, Volume 252, Issues 3-4, 3 September 2007, Pages 383-404 [Link]

The zonation scheme and proposed lineages have been criticized by Steyer (2004) as being a stratophenetic rather than a true phylogenetic approach considering the criteria how the authors relate different species:

J. S. Steyer, 2004, Phylogenetic or stratophenetic systematics? - Comment of R. Werneburg: The branchiosaurid amphibians from the Lower Permian of Buxières-les-Mines, Bourbon l’Archambault Basin (Allier, France) and their biostratigraphic significance. Bull. Soc. géol. France, 2004, 175 (4), 423-425
[Link]

Another particular problem is that amphibians are known to be abundantly subject to heterochronous evolution - evolutionary shifts in the ontogenesis, in particular, neoteny, is a common phenomen and can obscure characteristic features.

A recent analysis by Schoch & Milner (2008) on branchiosaurids, a group of neotenic small dissorophoid temnospondylians which is often considered for biostratigraphy, features a cladistic approach and proposes a scenario, related to which nodes of the tree neoteny/ life style changes occurred:

R.R. Schoch & A.R. Milner, 2008, The intrarelationships and evolutionary history of the temnospondyl family Branchiosauridae. Journal of Systematic Palaeontology (2008), 6 : 409-431 [link]

While the relationship of Branchiosaurus forming the outgroup of major clades (Melanerpeton-clade, Apateon-clade) is correspondent to the order of occurrences in the stratigraphic record, certain long ghost lineages occur - in particular the interpretation of Apateon gracilis/Melanerpeton gracile shows a mismatch between the cladistic approach of Schoch & Milner and the scheme of Werneburg & Schneider. This divergence is also the consequence of conflicting interpretations of the gracil(e/is) material, however, it demonstrates the potential for stratigraphic misinterpretation:

If I believe that a species forms the end member of a lineage because it is the youngest in certain sedimentary sequences I may underestimate the species' stratigraphic range - unlike the cladistic analysis which (if well-founded) would imply a deep divergence suggesting that some of the earlier record of the species is missing.

Decoupling (continental) biostratigraphic zonation from the lineage concept

Assuming that evolution works rather bush-like than lineage-like, I dont' see why we can't keep a biostratigraphic zonation even in the case of sparse continental records. I still can associate a series of morphologically defined taxa with a certain stratigraphic range and spatial distribution - until the concept has been shown not to be adequate (or not outside a more narrowly defined spatiotemporal window).

Whether it is a lineage-like relationship of species or another factor (related to geography, climate, ecology or else) that makes biostratigraphy work is a question which might be solved only in some cases.