16 years ago, geneticists began to notice that Microsoft’s spreadsheet editor Excel was automatically renaming genes – most of the time, replacing them with dates. Since then, the scientific community has been asking The Redmond-based company to take care of Excel, but since nothing has been done, the Genetic Nomenclature Committee of the Human Genome Organization (acronym HUGO) has decided to change the names of the genes that the software insists on “fixing.”
The problem affects the name of certain genes, such as MARCH1 (acronym for Membrane Associated Ring-CH-Type Finger 1), whose name is converted by Excel to 1-Mar (March 1st). After entering the data, geneticists are thus required to review the worksheet row by row, column by column.
The problem also happens with Apache’s OpenOffice Calc but not in Google Sheets. Microsoft did not respond to a request from The Verge for comment.
In 2004, a group of scientists reported in a study that “data conversions [by Excel] affect at least 30 gene names; conversions of floating point types affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original names of the genes cannot be recovered.”
Twelve years later, scientists at the Baker Institute for Heart and Diabetes again pointed to the problem: “A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.”
On the left are errors per publication (the black bar indicates the average). On the right, the number of errors were identified per year. Source: Genome Biology/Reproduction
It should be noted that the problem does not only affect the work of geneticists: research published in the Journal of Organizational and End User Computing in 2009 indicated that 90% of the spreadsheets in use in companies contained errors due to the changes made by Excel in the data entered.