Data Processing: An Overview

The term “data processing” is a term that is often used without being well-defined. Because it can be such a broad term, it is difficult to have a discussion about it without agreeing on the definition beforehand. According to Wikipedia, “data processing” is “the collection and manipulation of items of data to produce a meaningful result.”

That’s a broad definition indeed. To understand data processing, why you might want to use it, and what to look for in a data processing service, we need to break it down.

Again, according to Wikipedia, data processing may involve any of the following processes:

1)     Validation

2)     Sorting

3)     Summarization

4)     Aggregation

5)     Analysis

6)     Reporting

Validation ensures that the data is clean and correct. If the data going in is bad, then the results coming out will be bad.

Sorting arranges the data into groups in some way. This can make other, later steps easier.

Summarization reduces the data to key points or values.

Aggregation combines multiple data together.

Analysis produces and presents interpretations of the data.

Reporting presents the data detail, summarization, or analysis results.

While data processing can be manual, most data processing these days is done electronically.  In recent years, the amount of data that can be processed, and the ways it can be processed have increased dramatically.

“Big Data” is the current industry catchphrase to describe a lot of these types of systems. Big Data refers to new types of data stores that allow for the storage, retrieval, and analysis of huge amounts of data. Google is commonly referenced as the creator of the first, true, Big Data systems.

Why did Google develop these systems? At a simplified level, Google has to crawl the entire Internet and find as many webpages as possible. Then they have to find how many other webpages link to that first page. And then they have to determine what keywords or phrases are most relevant for those pages. Finally, they have to make that data searchable very quickly.

Originally, Google’s webpage-finding programs would send the data they found back to Google’s servers and then Google would periodically process it. Over the years, they’ve improved their Big Data systems to process the incoming data faster and faster. Now, their systems can update the search results almost as fast as their web crawlers can send the data back.

What does this mean for you and your business? It means that there are very few businesses that have too much data to process and analyze. With today’s technologies, companies can now process huge amounts of data and analyze it in ways that weren’t feasible even a few years ago. Big Data systems currently provide arguably the most efficient, effective way to sort, aggregate, summarize, and process data.

Leave a Reply