The short answer
Data cleansing is the process of identifying and fixing errors, inconsistencies and gaps in your business data. Bad data costs UK businesses an estimated 20% of their annual revenue in wasted time, poor decisions and failed automation. Most businesses can fix the majority of their data quality problems without expensive software — but it requires a structured approach.
In 15 years of working with UK businesses on data and reporting systems, the single most common reason a project takes longer than expected — or fails entirely — is bad data. Not bad software. Not wrong tools. Bad data.
The business has spent months planning an automated reporting system. The tool is ready. The process is designed. Then someone opens the source data and finds dates in four different formats, customer names entered six different ways, blank rows scattered throughout, and figures that simply do not add up. The project stalls. The team loses confidence. The automation never gets built.
What is data cleansing?
Data cleansing — also called data cleaning or data scrubbing — is the process of identifying and correcting problems in a dataset so it can be used reliably. This includes fixing errors, resolving inconsistencies, removing duplicates, filling gaps, and standardising formats so that the data behaves predictably when used in reports, automation, or analysis.
It is not glamorous work. But it is foundational. Every automated report, every Power BI dashboard, every data pipeline is only as reliable as the data feeding into it. Garbage in, garbage out — as the saying goes — is not a cliché. It is a description of what actually happens when businesses try to build on top of unclean data.
How much does bad data actually cost?
The numbers are sobering. Research from The Software Bureau estimates that dirty data costs the UK economy £900 billion annually — representing around 20% of revenue for affected organisations. Experian's research found that 30% of UK businesses suspect their customer data is inaccurate. And Gartner predicts that through 2026, organisations will abandon 60% of AI and automation projects due to data that is not ready to support them.
The cost of bad data in numbers
For most SMEs, the cost of bad data is not a dramatic figure on a finance report — it is a hidden, ongoing drain. Staff spending time correcting errors. Reports that have to be rebuilt because the source data changed format. Decisions made on figures that were wrong to begin with. It accumulates quietly, and most businesses do not realise how much it is costing until they try to automate something and the whole thing breaks.
What causes bad data in the first place?
Bad data rarely arrives all at once. It accumulates over time through a combination of human error, system limitations, and the natural entropy of business operations. Here are the most common causes I see when working with UK businesses:
Manual data entry
The most common source of data problems. When people type data into spreadsheets or systems manually, they make mistakes. Names get spelled differently. Dates get entered in different formats. Figures get transposed. One person writes "United Kingdom", another writes "UK", another writes "England". None of these are wrong in isolation — but they are incompatible when you try to aggregate the data.
Multiple systems not talking to each other
Most businesses use several systems — an accounting package, a CRM, an operations spreadsheet, an HR system. When these do not integrate, data gets copied between them manually. Every manual copy is an opportunity for inconsistency. Over time, the same customer, product, or employee can exist in three different systems with three slightly different records.
No data standards or conventions
Without agreed standards for how data should be entered — date formats, naming conventions, required fields — different people enter data differently. This is not carelessness, it is the natural result of not having a clear standard. The fix is not to discipline people; it is to enforce standards at the point of entry through validation rules and structured input forms.
Legacy systems and old data
Businesses that have been operating for years often carry data that was entered under old conventions, imported from a previous system, or simply never cleaned up. This historical data sits in spreadsheets and databases, gradually becoming less reliable as the business evolves around it.
Lack of data ownership
When nobody is specifically responsible for data quality, it degrades over time. Everyone uses the data but nobody maintains it. Fields get left blank because they are not required. Outdated records are never removed. Duplicates are never merged. Over months and years, the dataset becomes progressively less reliable.
What does data cleansing actually involve?
A proper data cleansing process works through a structured set of steps. The exact process depends on the data and what it will be used for, but the core stages are consistent:
Audit and profile the data
Before fixing anything, understand what you have. How many records? How many blanks? How many duplicates? What formats are being used? A data audit reveals the scale of the problem and prioritises where to focus.
Remove duplicates
Duplicate records are one of the most common data quality problems. The same customer entered twice with slightly different spellings. The same product with two different codes. Duplicates inflate counts, skew analysis, and cause double-reporting. They need to be identified and merged or removed.
Standardise formats
Dates, phone numbers, postcodes, currency values — all of these need consistent formatting to be usable. This stage converts everything to a standard format so the data behaves predictably. In Excel, Power Query is extremely effective at this step.
Fix errors and inconsistencies
Typos, wrong values, impossible dates, negative quantities where only positives are valid. This stage identifies and corrects values that are factually wrong or logically impossible. Some can be fixed automatically; others require human review.
Handle missing data
Blank fields are a universal problem. Some blanks can be filled from other sources. Some can be inferred from context. Some need to be flagged as genuinely unknown. The right approach depends on what the field is and how critical it is.
Validate and document
Once cleaned, the data should be validated against expected ranges and business rules. And the cleaning process itself should be documented so it can be repeated — because data quality is not a one-time fix, it is an ongoing process.
What tools are used for data cleansing?
For most UK SMEs, the right tools for data cleansing are already available — they just need to be used correctly:
Power Query (Excel & Power BI)
The most powerful built-in tool for data cleansing in Microsoft's ecosystem. It can standardise formats, remove duplicates, fill blanks, split columns, and apply complex transformations — all without writing a single formula. The steps are recorded and can be reapplied automatically every time the data is refreshed.
Best for: regular automated cleansing of Excel and Power BI data sources
Python (Pandas)
For larger datasets or more complex cleansing logic, Python with the Pandas library is extremely powerful. It can process millions of rows, apply sophisticated matching algorithms to identify duplicates, and integrate with multiple data sources simultaneously.
Best for: large datasets, complex logic, or automated pipelines
Excel (formulas and VBA)
For smaller datasets, Excel's built-in functions — TRIM, CLEAN, PROPER, IFERROR — combined with VBA macros can handle many common cleansing tasks. Less powerful than Power Query but widely understood and often already in use.
Best for: smaller datasets where the team already works in Excel
SQL
For data stored in databases, SQL is the natural cleansing tool. UPDATE and DELETE queries can fix systematic errors across millions of records in seconds. Particularly useful when the bad data originates in a database rather than a spreadsheet.
Best for: database-stored data, high-volume cleansing
Data cleansing and automation — why one depends on the other
One of the most important things to understand about data cleansing is that it is not separate from automation — it is a prerequisite for it.
When businesses come to us wanting to automate their Excel reports or build a Power BI dashboard, the first thing we do is assess the quality of the underlying data. In around half of all projects, data cleansing is required before any automation can be built reliably.
This is not a problem — it is just part of the process. And in most cases, the cleansing work itself delivers immediate value. Once the data is clean and structured, the automation is faster to build, more reliable in operation, and far less likely to produce outputs that people do not trust.
The businesses that struggle are the ones that skip the cleansing step and try to build automation on top of messy data. The system runs, but the outputs are wrong. Trust in the data collapses. The automation gets abandoned. The team goes back to doing it manually.
Related: If your data is clean and you are ready to start automating, read our guide to 5 signs manual reporting is costing your business to understand where automation will have the most impact.
Frequently asked questions
What is the difference between data cleansing and data transformation?
Data cleansing fixes problems in the data — errors, duplicates, inconsistencies, blanks. Data transformation changes the structure or format of the data to make it usable in a different context — for example, converting rows into columns, combining two datasets, or calculating new fields. In practice, most data projects involve both.
How long does data cleansing take?
It depends entirely on the volume and condition of the data. A single spreadsheet with a few thousand rows can often be cleaned in a day or two. A legacy database with millions of records across multiple tables may take weeks. The key is to audit the data first to understand the scale of the problem before committing to a timeline.
Can data cleansing be automated?
Yes — and it should be wherever possible. Tools like Power Query and Python can apply cleansing rules automatically every time new data is loaded. This means the data is always clean at the point it enters your reports or dashboards, without anyone having to manually check and fix it each time.
What is unstructured data and how is it different from bad data?
Unstructured data is data that does not have a predefined format or organisation — for example, emails, PDFs, free-text notes, or scanned documents. Bad data is data that should be structured but contains errors or inconsistencies. Unstructured data needs to be extracted and structured before it can be cleaned and used. Both are common problems in UK businesses, and both can be addressed with the right tools and approach.
Do I need specialist software for data cleansing?
For most UK SMEs, no. Power Query within Excel and Power BI handles the majority of common data cleansing tasks without any additional software. For larger or more complex datasets, Python is free and extremely powerful. Specialist data quality tools exist but are usually only justified for enterprise-scale data operations.
How do I prevent bad data from building up again after cleansing?
The only permanent fix for bad data is fixing it at the source — implementing validation rules at the point of entry, standardising how data is captured, and ensuring systems are integrated so data does not have to be copied manually. A one-time clean without process changes will gradually accumulate problems again. The cleansing and the process improvement need to happen together.
Further reading
Related articles
5 Signs Manual Reporting Is Costing Your Business Money
Excel AutomationHow to Automate Excel Reports (Without Knowing How to Code)
Excel AutomationHow Much Does Excel Automation Cost in the UK?
Power BIPower BI vs Excel: Which Should Your Business Use in 2026?
Not sure how clean your data actually is?
Book a free 30-minute call and we will look at your current data setup, identify the quality issues that are slowing you down, and tell you exactly what needs to change before automation can work reliably.
Book a free scoping call →