To produce data information is often processed and then combined with the appropriate context

Last Updated November 26, 2018

Inhaltsverzeichnis Show

What is raw data?
How raw data works
How to process raw data
Value of raw data
Which of the following best explains the difference between data and information?
Are an information system that records processes and reports on transactions to provide financial and nonfinancial information for decision making and control?
What is the correct order of effects in the value chain?
What would be the most appropriate designation for a professional who serves as an IT auditor?

Data, information, and knowledge are often used interchangeably. However, these terms represent different stages of value creation from data to decision-making.

Data are the raw alphanumeric values obtained through different acquisition methods. Data in their simplest form consist of raw alphanumeric values.

Information is created when data are processed, organized, or structured to provide context and meaning. Information is essentially processed data.

Knowledge is what we know. Knowledge is unique to each individual and is the accumulation of past experience and insight that shapes the lens by which we interpret, and assign meaning to, information. For knowledge to result in action, an individual must have the authority and capacity to make and implement a decision. Knowledge (and authority) are needed to produce actionable information that can lead to impact.

The flow and characteristics of these terms are illustrated in Figure 1 and Table 1. Table 2 provides examples of data, information, and knowledge for water data.

Figure 1: The flow from data to information to knowledge.

The flow from data to information and knowledge is not uni-directional. The knowledge gained may reveal redundancies or gaps in the data collected. As a result, an actionable insight may be to change the data collected, or how those data are converted into information, to better meet user needs.

Table 1: Characteristics of data, information, and knowledge (adopted from de Vries 2018).

Table 2: Examples of transforming water data to information to knowledge that leads to action.

What is raw data?

Raw data (sometimes called source data, atomic data or primary data) is data that has not been processed for use. A distinction is sometimes made between data and information to the effect that information is the end product of data processing. Raw data that has undergone processing is sometimes referred to as cooked data.

Although raw data has the potential to become "information," it requires selective extraction, organization and sometimes analysis and formatting for presentation. Because of processing, raw data sometimes ends up in a database, which enables the data to become accessible for further processing and analysis in a number of different ways.

How raw data works

Tremendous amounts of raw data surround us and are produced every day. The human brain is incredibly good at taking in raw data, processing it and using it to make decisions.

For example, imagine you are trying to cross a busy road. The eyes capture raw data as flashes of light and dark. Then the brain takes these flashes and resolves them into objects such as street signs and cars. The working memory can tell you if that car is sitting still, getting bigger as it comes toward you, or getting smaller as it drives away. Meanwhile, the ears take in raw information in the form of vibrations in the air, which the brain translates into sounds that can be interpreted as the wind, voices or a car engine. Finally, all this processed data that came in through the eyes, ears and memory helps you make the informed decision to cross the street or not.

Computers cannot intuitively process raw data like a human mind can, however, and raw data is generally not useful on its own. Extra processing is required to turn it into useful information. Additionally, the final data from one system may be used as raw data in another.

For example, imagine a simple home thermostat. Its raw data source is a temperature probe -- usually read as an analog voltage level. The system takes this voltage level as raw data and turns it into a temperature reading. It can then use this processed data to meet a predetermined desired temperature for turning on and off a heater or air conditioner.

Furthermore, the system may feed this temperature reading and the current time into another climate control system as that system's raw data. Then the data is stored and analyzed over time to produce a predictive modeling algorithm to help make better heating and cooling decisions.

Usually, organizations must process raw data for it to become information when putting it in a repository to become useful. One notable exception is the data lake, which is a storage repository that can hold massive volumes of raw data in its native format.

How to process raw data

Many sources can produce raw data. How it is processed and stored depend on its source and intended use, though. Examples of raw data can be financial transactions from a point of sale (POS) terminal, computer logs or even participant eye tracking data in a research project. Applications and devices can save raw data in various formats, but the most common format for interchanging raw data between systems is as a comma separated values (CSV) file.

In many instances, users must clean raw data before it can be used. Cleaning raw data may require parsing the data for easier ingestion into a computer, removing outliers or spurious results and, occasionally, reformatting or translating the data -- a process sometimes called massaging or crunching the data.

There are many ways to process raw data, ranging from simple to complex. A spreadsheet such as Microsoft Excel or Google sheets allows users to format, organize and graph data to reveal simple trends and help summarize data. More complicated systems such as business intelligence (BI) programs may use raw data for financial trending or forecasting purposes. Advanced systems may use raw data for alerting purposes or with machine learning to build models of the data and its behavior.

Value of raw data

The primary value in data is after it has been processed and interpreted. There is generally not much value in holding onto raw data without a way to use it, but as the cost of storage decreases, organizations are finding more and more value in collecting raw data for additional processing -- if not right away, then later.

Raw data may contain personally identifiable information (PII). This may make an organization liable for storing or transmitting it. Therefore, it may use data anonymization to remove PII from the raw data or data controls and implement data retention policies to limit the risk of data leaks.

Organizations can feed raw data into a database or a data warehouse (one of several kinds of data repositories -- see image above), which can collect raw data from many sources for automatic or manual correlating and processing. An analysist can then query the data using BI tools to produce useful information from the data.

Many large businesses today recognize the value of raw data. Consumer data is a hot commodity that they can buy and sell to build profiles of users or target a specific audience, for example. Businesses can also store operational and logging data for use in performance metrics and to streamline business practices, while they can use access logs and the like to identify computer breaches and track what data may have been accessed by hackers.

Also see data lake, big data, big data analytics and data governance.

This was last updated in May 2021

Continue Reading About raw data (source data or atomic data)

Top 7 predictive analytics use cases in enterprises

Collaborative analytics benefits enterprise data analysis

Combining AI and predictive analytics crucial for the enterprise

Understanding and comparing six types of data processing systems

Understanding object storage vs. block storage for the cloud

Dig Deeper on Data governance

Arm processor
By: Robert Sheldon
smart sensor
By: Brien Posey
data preprocessing
By: George Lawton
5-step predictive analytics process cycle
By: George Lawton

Which of the following best explains the difference between data and information?

Which of the following BEST describes the difference between data and information? Data are raw facts or figures; information is the interpretation of that data.

Are an information system that records processes and reports on transactions to provide financial and nonfinancial information for decision making and control?

An accounting information system (AIS) is defined as being an information system that records, processes and reports on transactions to provide financial and nonfinancial information for decision making and control.

What is the correct order of effects in the value chain?

The correct order of effects in the value chain are: Inbound Logistics → Operations → Outbound Logistics. A supply chain: Refers to the flow of materials, information, payments and services.

What would be the most appropriate designation for a professional who serves as an IT auditor?

Certified Information Systems Auditor (CISA) The CISA designation is a globally recognized certification for IS audit control, assurance and security professionals.