Episode 31: Data collection tools
- Embedded IT

- Mar 25, 2025
- 3 min read
Updated: Jan 16
Data collection tools play a crucial role in how organisations gather and use information. This piece explores the main types of external data collection methods and highlights the commercial and procurement considerations behind them. From web scraping to APIs and IoT sensors, it looks at where risks lie, what to watch for and why forecasting usage matters.
This builds on our explanation of what data is and how it’s used across organisations.
Web scraping and its commercial risks
Web scraping is a simple idea in theory, as it involves taking information directly from a webpage. It can be useful when gathering details about companies, services or products, and there are many free or low-cost tools available, such as Scrapy, Octoparse and import.io.
However, web scraping carries clear risks from a technology procurement perspective. Many websites have copyright restrictions or terms and conditions that prohibit copying their data. Organisations also need to consider the reliability of the data source, as well as the potential reputational impact of scraping low-quality or inappropriate sites. Some websites have tools designed to detect heavy scraping activity, which may result in a company being blocked or flagged.
Before licensing any web scraping tool, it is important to understand what data will be collected, where it will come from and whether those sources present a commercial or legal risk.
APIs as structured data collection tools
APIs, or Application Programming Interface, are one of the most common and reliable ways for systems to exchange data. An organisation can send a request for specific information, and the API returns the data in a structured format. A well-known example is the free Companies House API, which provides director and company information.
While many APIs are free, some commercial services charge per API call. Costs can rise extremely quickly without proper volume forecasting. A single call may cost only pennies, but repeating that call thousands or hundreds of thousands of times can create significant spend.
API management tools, such as Postman, Apigee and Swagger, help organisations monitor usage, manage throughput and ensure that calls do not exceed agreed limits. Procurement teams need to understand expected volumes, charging models and any blocking thresholds before committing.
IoT sensors and real-world data collection
IoT devices and sensors collect real-world data, from temperature readings to monitoring business processes. Some organisations run their own sensors internally, while others purchase data from external providers. As with APIs, costs can escalate quickly if usage is high or continuous.
Procurement teams must ensure the data is relevant, legally usable and priced appropriately for the required volume. Clear controls and forecasting help prevent overspend or reliance on data that is not commercially viable.
Why data collection demands careful procurement
Data collection is becoming increasingly common, and as organisations generate more valuable data, opportunities also emerge to sell that data to others through APIs. However, this also introduces commercial and contractual risks around how the data will be used and what it will cost.
Careful procurement helps ensure data collection tools remain legal, reliable and financially sustainable.
For organisations looking to procure data collection tools or manage data usage more effectively, get in touch.




