Episode 33: Data processing tools
- Embedded IT

- Apr 7, 2025
- 3 min read
Updated: Jan 16
When organisations collect and store large volumes of data, the next challenge is making that data useful. That is where data processing tools come in. This guide breaks down the main categories of tools used to move, transform, and process data, and highlights what procurement teams need to consider when selecting them.
This builds on our explanation of what data is and how it’s used across organisations.
Understanding what data processing tools do
Once data has been collected and stored, it needs to be converted into a usable format. Data processing tools help teams extract, transform, analyse, or move data between different systems. While the technology behind them can be complex, the core principles are straightforward: get the right tool for the job, and make sure it aligns with the organisation’s needs and technical environment.
Working with ETL tools
A common category in this space is ETL, which stands for extract, transform, and load. These tools pull data from one source, clean or reformat it, and then load it into another location.
For example, an ETL tool might extract data from Excel, change date formats, convert text to uppercase, or run more advanced transformations before loading the updated data into another application or storage system.
Not every ETL tool can connect to every data source, so procurement teams need to check compatibility carefully. If the business uses platforms such as Salesforce or SAP, the chosen ETL tool must be able to interface with them. The same applies to the output format. The tool must load the transformed data into a destination the organisation can use.
Common ETL products include Informatica PowerCenter, Talend, and IBM DataStage. Each varies in complexity, maturity, and price, so IT teams should lead on technical evaluation while procurement focuses on commercial considerations.
Using batch processing for large data sets
Batch processing is another way to handle data, particularly when working with large volumes. Rather than processing information in real time, batch tools run tasks on grouped data sets. This can be scheduled to happen overnight or during quiet periods, making better use of computing resources.
Tools often used for batch processing include Apache Hadoop, Apache Spark, and Apache Flink. Some organisations also choose to build bespoke batch-processing scripts, giving them more control over how their data is manipulated.
From a procurement perspective, the key question is whether to buy or build. IT teams will usually have strong views on this, so it should be a joint decision based on capability, cost, and long-term needs.
Exploring stream processing for real-time data
Stream processing focuses on handling data in real time, one transaction at a time. Instead of working through large batches, these tools process each piece of data as it arrives.
This is useful for high-intensity applications, real-time websites, or systems where decisions need to happen instantly. Stream processing tools often operate on a per-transaction pricing model, so procurement teams must understand usage volumes to avoid unexpected costs.
Products in this category include Apache Kafka, Storm, Google Dataflow, and TensorFlow. Stream processors can scale quickly to meet spikes in demand, but they also introduce commercial and security considerations that need careful review, especially when cloud services are involved.
Bringing it all together in procurement
Data processing is a vast area, and choosing the right approach depends heavily on the organisation’s technical needs. IT teams should lead on defining requirements and selecting suitable tools, while procurement ensures contracts are fair, pricing models are understood, and data remains secure.
Both sides need to work closely together to avoid overspending or accidentally allowing data to move across unintended locations or geographies.
For organisations looking to strengthen their approach to selecting and procuring data processing tools, get in touch.




