A short review by Jonathan Lambert dissects the issue of diversity in genomic studies. In this type of studies called GWAS (genome-wide association studies), scientists have to elect a cohort of individuals to be able to statistically link a disease to genetic traits. It turns out that most of the cohorts are made of white Europeans.
The over-representation of a population category in statistics impairs on the long-term scientific studies and their technological applications. If we consider for instance sex as a binary category for statistical classification purposes, most of our environment has been adapted to the average male. For more information, the book “Invisible Women” is a very nice resource on the subject. In the pharmaceutical industry, over the counter medication fits the average male weight and metabolism, which means women slightly overdose each time they have an Advil. Switching to the transport business, car seats do not allow women to have both their feet on the gas pedal and their knees at a comfortable spot, somewhere under the steering wheel. Again because the physical proportions between male and female happen to differ.
These distortions are [sometimes] easily spotted in the male vs female case because of the pronounced sexual dimorphism of the human species. On average, male individuals are heavier, taller and stronger than the female ones. If we consider other categories such as ethnicity, the bias between categories become more complex. The COVID19 outbreak exposed both ethnic and social inequalities – often linked – regarding the susceptibility to infection.
Experimental cell biology is now integrating genomic tools to analyze highly complex patterns of gene expression among the standard cell lines. These cells were initially derived from an individual and cultured in labs for decades. Research teams from different labs use the same cell lines to increase the reproducibility of the results, because they are aware that each one of these experimental models has its own genetic background and its own personalized response to a given experimental protocol.
If we now take a step back from the experiment and look at the origin of these models – that are used so much that they have lost some of their meaning – we might notice that these cell lines are not diverse. The oldest immortal cell line derived in 1951 is called HeLa, from Henrietta Lacks. Henrietta Lacks was a young Afro-American mother treated for cervical cancer at the John Hopkins Hospital. Her cells were derived without consent, neither from her nor from her family. After the regularization of the right to anonymity, cancer cell lines are now sourced with some categorical information and without the complete identity of the donor patient. Some of these average characteristics have been documented during the generation of several breast cancer lines (51 total) to analyze different tumor specificities. These lines originated from women of various ages but with a limited representation of ethnicity: 25 are white, 8 are black, one is Hispanic, and one is East Indian.
The same problem is encountered in the embryonic stem cell field. Embryonic stem cells are derived from blastocysts generated during fertility treatment provided to couples in IVF clinics [Guidelines For Human Embryonic Stem Cell Research, The National Academies Press, 2005]. A study performed with a sample of 224 individuals in Massachusetts – who had had cryopreserved the resulting embryos but not yet made a decision regarding a possible donation – found out that 88% of the participants were Caucasian.
Cell lines that come from the same kind of patients is the tiny, visible part of the iceberg. Since 1993, documentation about gender, race and ethnicity of participants of NIH-funded biomedical research is mandatory. Significant efforts have been dedicated to balance gender representation but ethnicity/race representation is still lagging behind.
From these considerations, we should probably build upon the three following assumptions:
– Science is incremental and is built up over several decades, even centuries. It has integrated many biases, including diversity-related biases.
– For this reason, history, sociology and more generally humanities are necessary to have a critical eye on the way biological data has been sourced and analyzed.
– To increase our knowledge of the context surrounding the data, interdisciplinarity has to develop not only from the quantitative sciences towards the non-quantitative, but also in the reverse direction.
I know there is no coming back to the holistic scientist. And in the 21st century, when a cell biologist is expected to master imaging techniques, molecular biology, genomic data analysis pipelines and bioengineering, how realistic is it to ask that each one of us dedicates a little bit of his/her/their brain time to science ethics, history and philosophy? Maybe the answer lies in stopping the rush towards academic performance and starting to become, not complete, but equilibrated scientists. And I would quote Leonardo Da Vinci to end this, who said:
“ Whatever you do in life, if you want to be creative and intelligent, and develop your brain, you must do everything with the awareness that everything, in some way, connects to everything else.”