White Paper

10 reasons why Excel is a bad tool for clinical data capture and handling

... especially in the context of GCP and GDPR

For many good reasons, Excel has become the world's most popular tool for working with business data. A lot of people know the software quite well or at least well enough to deal with their daily challenges in the context of data collection, preparation and interpretation. Due to this familiarity with the software it’s very obvious that Excel is also their first choice for almost every task related to numbers and/or tables. The only problem is: Excel was not developed for some of those applications and thus does not only do a bad job in that context but even bears some threats to your data.


In the following, we summarize the 10 top reasons why Excel is a bad choice when it comes to capture, analysis and interpretation of clinical (trial) data.



  1. No prevention of unintentional data deletions or changes

We all know the problem of either deleting the content of the wrong cell or unintentionally overwriting an existing value or formula. Most of the time you realize your mistake and use the “undo''-function. However, sometimes you either realize the error too late (already after saving the document) or in the worst case not at all. Sometimes it is even more tricky: for example, copy-pasting cells, columns or lines from another spreadsheet within Excel, or even worse, from another source like a table in Word or from a webpage. In this case you really have to take great care to insert the values as intended and not shift cells, columns or lines, unintentionally. That way you have the chance to not only ruin single values but render the whole table useless.

  1. No control on wrong auto conversion of dates

When pasting content into an Excel spreadsheet, the program will try to interpret your data and adjust its type, if necessary. Sometimes auto-conversion goes wrong: E.g. a value like “10-9” (e.g. a patient ID), Excel interpret it as “September 10th” and change the value accordingly. You have no chance to recover the initial value: Even if you set the format of the cell from “date” back to “standard” it will not change the number back to the original value but to “44084”, which has nothing to do with the original value. Beyond, this is not the only source of automatic conversion of dates. Depending on the country you live in you either use dots or commas as a decimal separator. In this context, a number will end up as a string when the wrong separator is used and Excel will not warn you about the inconsistency.

  1. No user permission management

A clinical study and the related data handling and interpretation is typically not the work of a single person – and even for a very small study, usually at least your boss wants to have access to the data. No matter how good you are at Excel and how conscientiously you treat your data, you never know how good and conscientious someone else is. The only way to protect data in Excel is to lock cells, spreadsheets or documents with a code. This is not only impractical for your daily work but also prevents teamwork. Often it’s necessary that other people can edit/insert additional data. In Excel, you cannot grant different permission rights to different persons or groups but always need to either fully trust in their diligence or implement cumbersome workarounds like separate spreadsheets locked with different codes and merge them later.

  1. No audit trail

In Excel you have no way of tracking who changed what when. The only thing you see is the time and the date when the excel file was edited last. This is fine as long as nothing goes wrong; but if an error occurs, it’s almost impossible to identify the timepoint or the source. Thus, suddenly you have to question the validity of the whole table which creates huge overhead. 

  1. No parallel data entry

Especially in larger studies, many people have to add or edit data to a central “database” – in the worst case the one and only Excel file. In this situation, only one person at a time can work on the table. All others either have to wait for their colleague to close the document or, even worse, will create their own copy somewhere and someone will get the almost impossible task to gather all data from different sources and copies. Impossible because, quite frequently, they don’t even know that a copy exists on someone else's computer.


  1. No data type security

In order to properly analyse your data, it has to be clean. First and foremost, that means, no incorrect or missing data but also consistent data with respect to format or data type. If you collect for example a laboratory value of glucose, the value can be reported in different units. The values can be anything from mg/dL, mg% or mg/L to recommended (but not caught on everywhere) SI-units like mmol/L or µmol/L. In Excel, these values can be listed in the same column without any discrimination or data type security.


  1. No data validation

Usually you expect values to be in a certain range. It’s very unlikely that one participant is 187 years old or smokes 51 packs per day. These mistakes usually occur due to a typing error, due to a misunderstanding of the question or due to mix up of requested units (see point 6). In Excel, you have a very limited chance to identify these errors, and even if you do, you have to initiate a query to ask for the correct value. Using a professional EDC (electronic data capture) tool, you can set up rules and ranges for expected values. If an invalid value is typed in, the software directly notes the participant or even prevents continuing to the next question.


  1. No clear tables

When working with Excel, it happens quite fast that the table structure gets unclear. As long as you only have a single data point for every variable and patient, things are quite manageable. However, once you have to represent multiple-choice questions, comments, or results for the same questions at different time points or dependencies between questions, it becomes increasingly harder to correctly represent these data in a spreadsheet. The natural reaction to this in Excel is to start using the “merge cells” function to create top headings, grouping columns based on colours, creating additional tabs etc. While this may help a little, confusion and resulting mistakes are inevitable – especially when more people work on the same file and have a slightly different understanding of how to group things.

 

  1. No possibility to globally review used formulas 

As your table grows and gets more complex, Excel formulas seem like a great way to add value to your data tables. Simple functions like SUM, AVERAGE or COUNT or more complex formulas like VLOOKUP or CHISQ.INV  or even cell references all have in common that you can’t review them without entering the respective cell. In case a formula is altered/damaged anywhere in your spreadsheet, often, the only chance to detect the problem is to visit that particular cell. So to be sure you’d have to visit each and every cell that contains a formula or reference. If you are lucky, the change caused an obvious failure you can spot – but even then you will have to dig through the sheet to find the cause. The same is true if a formula is overwritten by a value: this won’t be recognized easily and finding these errors in a large table is close to impossible.


  1. No automatic backup

Your valuable data deserves safe storage. Most people are aware of the importance of backups. During daily work however, backups are often forgotten or done infrequently. Every day without a backup implies the risk of data loss and Excel has no real concept to mitigate this. Even worse – a spreadsheet may get slightly damaged at some point in time but still not show any obvious problems. The file corruption may become obvious only much later and you may need to go back to older archived copies for the file to get a clean status. That is – if you have archived copies.


Summary

Excel is a great tool for many different applications in business and for simple data analysis tasks. However, handling clinical data requires specialized software that adequately addresses security, availability and integrity of your data. EDC software is specially designed to fulfill those needs and a good EDC system will support your clinical workflow, provide a user permission management system and offer many additional features to make data collection and handling easier and safer.


Our aim at MaganaMed is to provide you with a software that frees your mind from pitfalls and hurdles in data capture and handling. You can concentrate on running your trial and Magana Trial Manager will care for your data.


23.12.20
Download pdf