Memories and delusions on phylogenetic comparative methods in econometrics

Today, navigating among the old files in my computer, I found a text and some figures in which I tried to explain why comparative methods might be of interest for people studying macroeconomy. I just applied the same reasoning we usually do in biology: countries are not independent entities since historically they’ve had different degrees of relationships (some have arisen from some others). Therefore, the type of relationship among macroeconomic variables we usually see in graphs with a large number of countries might only reflect a random correlation product of the historical relationship among countries. I know, this kind of metaphors biology-economy (or evolution-history) are flawed and, as one of my former bosses put it once: ‘the danger of metaphors is that the better they are the less they look like a metaphor’.

That’s probably why it was among old files in my computer, where this kind of fast idea should be, and I’m happy to say that. However, because this blog is a bit of a jumble too, I thought I’d share the graphs here:

Here, ‘Felsenstein’s best case scenario’
Here, ‘Felsenstein’s worst case scenario’

Estoy escribiendo un libro de estadística multivariante

Lo he repetido sin parar en los últimos meses. Me digo que así sentiré la presión de terminar por fin un pequeño proyecto que tengo aparcado desde hace tres o cuatro años, aunque sea por la vergüenza de pensar que quizá alguien me pregunte en un futuro que dónde está.

Bueno, en realidad no es un libro LIBRO. Es más bien un pequeño manual. Bueno, más bien un manual para mí mismo, pero entiendo que puede servir para otros. Me dije que podría redactar un pequeño manual de consulta con todos los pormenores de las técnicas de estadística multivariante que más utilizo (usadas en morfometría, vaya). Aquí os dejo el índice:

  • Introducción
  • Vectores y matrices: propiedades y operaciones
  • Estadística descriptiva: media y matriz de covarianza, espacios, distancias y correlaciones
  • Reorganización de datos: Análisis de componentes principales, escalado multidimensional, mínimos cuadrados parciales, análisis discriminantes, clustering
  • Normalidad: MANOVA, Regresión lineal múltiple y multivariante
  • Bootstrap y permutaciones

Si falta algo me decís.

Mi idea es, para cada concepto:

  • Pequeña introducción sobre lo que es y para qué se suele usar
  • Mostrar el desarrollo analítico (a mano vaya, como en el cole) junto con un ejemplo numérico para dos variables
  • Visualización para dos variables
  • Comentar lo que ocurre cuando el número de variables aumenta y en particular cuando el número de variables es mayor que el número de sujetos
  • Adjuntar un script R con su estimación usando sólo funciones lo más básicas posibles

Mi primera idea fue subirlo aquí una vez que lo termine y que se lo baje quien quiera, por aquello del open-access, el bien común, y otros conceptos hippies que me atraen. Después pensé que prefiero publicarlo con alguna editorial (si encuentro alguna) porque me gustaría que el manuscrito pasara un proceso de edición. En fin, cuando lo esté terminando (en unas semanas, espero) veré. En todo caso a quien me lo pida personalmente se lo pasaré.

Habrá para todos, no empecéis a saquear las librerías todavía.


Summer reads (papers)

Over the last few weeks of holidays I’ve started to store some papers I’d like to read. Now I’ve got all my devices and apps full of references, open and with no particular order, which is causing me a bit of stress (obsessive-compulsive disorder alert). I thought it may be useful to write a post with all the references so that they’re out of my daily activities and I can find all of them fast in few days. Plus, they may be interesting for other people. Here they are:

Inherent forms and the evolution of evolution (S. A. Newman):

John Bonner presented a provocative conjecture that the means by which organisms evolve has itself evolved. The elements of his postulated nonuniformitarianism in the essay under discussion—the emergence of sex, the enhanced selection pressures on larger multicellular forms—center on a presumed close mapping of genotypic to phenotypic change. A different view emerges from delving into earlier work of Bonner’s in which he proposed the concept of “neutral phenotypes” and “neutral morphologies” allied to D’Arcy Thompson’s analysis of physical determinants of form and studied the conditional elicitation of intrinsic organizational properties of cell aggregates in social amoebae. By comparing the shared and disparate mechanistic bases of morphogenesis and developmental outcomes in the embryos of metazoans (animals), closely related nonmetazoan holozoans, more distantly related dictyostelids, and very distantly related volvocine algae, I conclude, in agreement with Bonner’s earlier proposals, that understanding the evolution of multicellular evolution requires knowledge of the inherent forms of diversifying lineages, and that the relevant causative factors extend beyond genes and adaptation to the physics of materials.

Making and breaking symmetry in development, growth and disease (D. T. Grimes):

Consistent asymmetries between the left and right sides of animal bodies are common. For example, the internal organs of vertebrates are left-right (L-R) asymmetric in a stereotyped fashion. Other structures, such as the skeleton and muscles, are largely symmetric. This Review considers how symmetries and asymmetries form alongside each other within the embryo, and how they are then maintained during growth. I describe how asymmetric signals are generated in the embryo. Using the limbs and somites as major examples, I then address mechanisms for protecting symmetrically forming tissues from asymmetrically acting signals. These examples reveal that symmetry should not be considered as an inherent background state, but instead must be actively maintained throughout multiple phases of embryonic patterning and organismal growth.

Genomics of developmental plasticity in animals (E. Lafuente & P. Beldade):

Developmental plasticity refers to the property by which the same genotype produces distinct phenotypes depending on the environmental conditions under which development takes place. By allowing organisms to produce phenotypes adjusted to the conditions that adults will experience, developmental plasticity can provide the means to cope with environmental heterogeneity. Developmental plasticity can be adaptive and its evolution can be shaped by natural selection. It has also been suggested that developmental plasticity can facilitate adaptation and promote diversification. Here, we summarize current knowledge on the evolution of plasticity and on the impact of plasticity on adaptive evolution, and we identify recent advances and important open questions about the genomics of developmental plasticity in animals. We give special attention to studies using transcriptomics to identify genes whose expression changes across developmental environments and studies using genetic mapping to identify loci that contribute to variation in plasticity and can fuel its evolution.

The evolution of phenotypic correlations and ‘developmental memory’ (R. A. Watson, G. P. Wagner, M. Pavlicev, D. W. Weinrich, R. Mills):

Development introduces structured correlations among traits that may constrain or bias the distribution of phenotypes produced. Moreover, when suitable heritable variation exists, natural selection may alter such constraints and correlations, affecting the phenotypic variation available to subsequent selection. However, exactly how the distribution of phenotypes produced by complex developmental systems can be shaped by past selective environments is poorly understood. Here we investigate the evolution of a network of recurrent nonlinear ontogenetic interactions, such as a gene regulation network, in various selective scenarios. We find that evolved networks of this type can exhibit several phenomena that are familiar in cognitive learning systems. These include formation of a distributed associative memory that can “store” and “recall” multiple phenotypes that have been selected in the past, recreate complete adult phenotypic patterns accurately from partial or corrupted embryonic phenotypes, and “generalize” (by exploiting evolved developmental modules) to produce new combinations of phenotypic features. We show that these surprising behaviors follow from an equivalence between the action of natural selection on phenotypic correlations and associative learning, well‐understood in the context of neural networks. This helps to explain how development facilitates the evolution of high‐fitness phenotypes and how this ability changes over evolutionary time.

Evolutionary significance of phenotypic accommodation in novel environments: an empirical test of the Baldwin effect (A. V. Badyaev):

When faced with changing environments, organisms rapidly mount physiological and behavioural responses, accommodating new environmental inputs in their functioning. The ubiquity of this process contrasts with our ignorance of its evolutionary significance: whereas within-generation accommodation of novel external inputs has clear fitness consequences, current evolutionary theory cannot easily link functional importance and inheritance of novel accommodations. One hundred and twelve years ago, J. M. Baldwin, H. F. Osborn and C. L. Morgan proposed a process (later termed the Baldwin effect) by which non-heritable developmental accommodation of novel inputs, which makes an organism fit in its current environment, can become internalized in a lineage and affect the course of evolution. The defining features of this process are initial overproduction of random (with respect to fitness) developmental variation, followed by within-generation accommodation of a subset of this variation by developmental or functional systems (‘organic selection’), ensuring the organism’s fit and survival. Subsequent natural selection sorts among resultant developmental variants, which, if recurrent and consistently favoured, can be inherited when existing genetic variance includes developmental components of individual modifications or when the ability to accommodate novel inputs is itself heritable. Here, I show that this process is consistent with the origin of novel adaptations during colonization of North America by the house finch. The induction of developmental variation by novel environments of this species’s expanding range was followed by homeostatic channelling, phenotypic accommodation and directional cross-generational transfer of a subset of induced developmental outcomes favoured by natural selection. These results emphasize three principal points. First, contemporary novel adaptations result mostly from reorganization of existing structures that shape newly expressed variation, giving natural selection an appearance of a creative force. Second, evolutionary innovations and maintenance of adaptations are different processes. Third, both the Baldwin and parental effects are probably a transient state in an evolutionary cycle connecting initial phenotypic retention of adaptive changes and their eventual genetic determination and, thus, the origin of adaptation and evolutionary change.

Bonus tracks:

Why the reward structure of science makes reproducibility problems inevitable? (R. Heesen):

Recent philosophical work has praised the reward structure of science, while recent empirical work has shown that many scientific results may not be reproducible. I argue that the reward structure of science incentivizes scientists to focus on speed and impact at the expense of the reproducibility of their work, thus contributing to the so-called reproducibility crisis. I use a rational choice model to identify a set of sufficient conditions for this problem to arise, and I argue that these conditions plausibly apply to a wide range of research situations. Currently proposed solutions will not fully address this problem. Philosophical commentators should temper their optimism about the reward structure of science.

Null hypothesis significance testing: A review of an old and continuing controversy (R. S. Nickerson):

Null hypothesis significance testing (NHST) is arguably the most widely used
approach to hypothesis evaluation among behavioral and social scientists. It is also
very controversial. A major concern expressed by critics is that such testing is
misunderstood by many of those who use it. Several other objections to its use have
also been raised. In this article the author reviews and comments on the claimed
misunderstandings as well as on other criticisms of the approach, and he notes
arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding
the interpretation of experimental data. The concluding opinion is that NHST is
easily misunderstood and misused but that when applied with good judgment it can
be an effective aid to the interpretation of experimental data.


Los p-valores y el procés

Twitter me ha dado una excusa magnífica para explicar un fenómeno de moda en mi trabajo y no pienso dejarla escapar. En realidad, este fenómeno va más allá de mi trabajo: es un concepto estadístico y está últimamente en boca de todo el mundo en ciencias y ciencias sociales (sobre todo para mal). Son los p-valores. Aprovecho para contarlo en versión técnica y en versión ‘juicio del procés’.

Versión técnica:

Imaginad que queréis saber si el nuevo medicamento para bajar la tensión que habéis desarrollado funciona o no. ¿Qué hacemos? Pues hacemos dos grupos de personas: un grupo toma el medicamento y el otro no. Entonces, les medimos la tensión a las personas de los dos grupos y vemos si hay diferencia entre las medias de los grupos. Si el grupo al que le hemos dado el medicamento tiene la tensión más baja que el grupo control, quizá nuestro medicamento esté funcionando.

¿Solamente ‘quizá’? Solamente quizá, porque si la diferencia entre los dos grupos es mínima a lo mejor la diferencia no es por el medicamento sino por mil otras cosas que pueden estar influyendo de manera diferente a los dos grupos por azar. Entonces hacemos un pequeño cálculo: Si suponemos que la tensión varía de manera aleatoria entre los individuos de la población, ¿cuál es la probabilidad de obtener una diferencia entre dos grupos de personas al azar como la que hemos obtenido o más grande? Esa probabilidad es un p-valor. Cuanto más bajo sea, más sugiere que la diferencia entre nuestro grupo control y el grupo que ha tomado el medicamento no es debido al azar. ¿Cuál es el debate? Mira, os lo voy a explicar con el procés.

Versión procés:

Encuentro este artículo en Twitter, escrito por un señor con muchos seguidores. Habla sobre el juicio del procés, lo titula ‘Cuando una moneda siempre cae del mismo lado’ y en el texto incluye ese parrafito que veis. ¿Se parece a lo que acabo de explicar, verdad? Cuando un p-valor es muy bajo, cuando la probabilidad de que algo haya ocurrido al azar es muy baja, debe de haber un mecanismo de interés por detrás que lo está afectando.

Pues el p-valor está últimamente muy criticado y os ilustro por qué con este párrafo:

  1. ¿Qué es ‘sistemáticamente’? ¿Tirar la moneda 2 veces? ¿4? ¿50? Además, que la probabilidad sea baja no quiere decir que sea nula: se puede tirar una moneda al aire 6 veces y que las 6 veces te salga cruz. Es difícil, pero no imposible. Que una cosa sea improbable no es una garantía de nada: a todos nos pasan cosas improbables continuamente.
  2. ¿Te sorprendería si en vez de una moneda fuera un dado? ¿Te sorprendería tirar un dado 6 veces y que ninguna te saliera un tres? A mí no mucho. ¿Por qué comparamos esta situación con una moneda y no con un dado? ¿A lo mejor habría que pensar mejor las condiciones con las que describimos ‘el azar’? Porque si no, estamos manipulando.
  3. ¿’La probabilidad de que alguien esté dirigiendo la moneda‘? Las monedas no son perfectas: están deformadas, no son regulares en su peso… A la larga, una moneda va a caer sistemáticamente de un lado por su propias imperfecciones, no necesito a nadie dirigiendo la moneda. De hecho: ¿cómo se ‘dirige’ una moneda? La estadística te puede dar una probabilidad, pero no la interpreta: cuando este señor se encuentra una moneda que siempre cae del mismo lado él piensa que hay alguien trucando las tiradas y yo diría que es simplemente una moneda deformada.

Así que los p-valores son herramientas, si se utilizan bien, para ver cuándo algo sale de la normalidad. Si eso es relevante (o no) es otra cosa y hay que explicarlo bien. No podemos usar una probabilidad como una garantía de que las cosas pasen como nosotros decimos.

De hecho, la desafortunada metáfora de este periodista sólo mostraría que el transcurso del juicio seguramente no ha sido resultado del azar. Pues eso esperábamos todos; la verdad.