GeeKay: Big Unstructured Data vs Structured Relational Data

With emerging technologies, data is exploding at an astounding rate. Big Data refers to huge datasets with larger volumes with greater complexity and variety. This term is closely associated with unstructured data with 90% of big data comprising of unstructured content.

So what is unstructured data? How different is it from structured data?

UNSTRUCTURED DATA

Unstructured data is any information that does not have a pre-defined data model and which cannot be organized in a predefined manner. This type of data does not fit into relational databases and is difficult to search using algorithms. Unstructured data mainly includes text and multimedia content. A few examples are e-mails, word documents, videos, photos, audio files, presentations, web pages and other kinds of business documents.

Organizations are looking towards unstructured data to extract important information which can be useful in strategic decision-making. They are adopting various technological solutions such as Hadoop, Business Intelligence software, data mining tools and data integration software to gain more accurate insights and achieve competitive advantage.

STRUCTURED DATA
In contrast, structured data is data that can be easily organized. It resides in a fixed field within a record or file. Data which can be stored in relational databases or on spreadsheets generally form structured data. Structured data storage depends on the creation of a data model which defines the business data along with the data types such as numeric, alphanumeric, Boolean, etc., data constraints such as primary, referential integrity, check, not null, etc. and metadata information. Some examples of structured data are call detail records such as time of call, caller and receiver information, Point-of-Sale data such as credit card details, product information and location of sale.

Although structured data comprises of only 10% of total data available, it plays a critical role in data analytics and serves as a backbone to critical business insights. The advantage of structured data is that it can be easily entered, stored, queried and analyzed. We can use Structured Query Language (SQL) to manage structured data. SQL helps us perform several operations including insert, update, delete to analyze the data and fetch desired results.

DIFFERENT DATA TYPES AVAILABLE TO ORGANIZATIONS

1. Integrated Operational Data
It is a volatile collection of data that supports an organization’s daily business activities. This data store contains dynamic data which is constantly updated by business operations. It connects all scattered operational data stores with a software tool so that they work as one large efficient system.

2. Spatial Data
This is information that has several dimensions. It includes both geospatial and structo-spatial data and gives importance to location. It is commonly used in geographical databases and comes in different formats such as map coordinates, images taken from space, remote sensing data etc.

3. Redundant Data
This form of data refers to data duplication wherein same data is stored at multiple data sites. Redundant data can enter organizational systems unknowingly and lead to undesirable results such as reduced performance, inaccurate results, compromised data integrity. It can also risk data quality if the databases are not updated concurrently.

4. Legacy Data
This refers to information which is stored in an old or obsolete format, making it difficult to access or process. The various sources of legacy data are XML documents, flat files, relational databases etc. This form of data adds to the disparity in data warehouse systems.

5. Historical Data
Historical data is digital information that outlines a company’s past activities and trends. This data is often archived, and may be held in non-volatile, secondary storage. Historical data can be useful to perform predictive analysis.

ROLE OF DATA WAREHOUSING

Data warehouse is a central repository of aggregated data from various data sources. It is used to store large amounts of data, such as analytics, historical, or customer data, and then build reports which could help managers take informed decisions. We can also analyze data over different time periods further improving business decision making.

LIMITATIONS OF DATA WAREHOUSING IN TERMS OF ANALYZING DATA

High maintenance

Data warehouses systems require high maintenance. Any reorganization of the business processes and the source systems may affect the data warehouse and consequently, result in high maintenance cost.

Required data not captured

The required data may not be completely captured by the source systems in some cases. This data may be important for data warehousing purposes. For example, the registration data of property may not be used in source systems but it may be useful for data analysis.

Data Flexibility

Data warehouses have static data making them difficult to analyze. Moreover, since the data is imported and filtered through a schema, it becomes old by the time it is actually used. Data warehouses are also subjected to ad hoc queries creating difficulties in the tuning of processing and query speed.

Underestimation of resources of data loading

In certain scenarios, we may underestimate the time required to extract, clean, and load the data into the warehouse. It may take a significant proportion of the total development time however, we can reduce the time and effort spent on this process with the use of some tools.

FUTURE OF DATA WAREHOUSING

The data warehouse market has begun to evolve with the advent of big data. In the past, data warehouses were designed to handle only structured data stored in ERP systems. The new torrent of data from social media, sensors, satellites, cameras etc. has made it essential for data warehouses to develop advanced statistical capabilities for performance analytics and forecasting. There have already been profound changes in data warehousing but what is the future of data warehousing and how will it affect businesses? These questions have been answered below:

Data warehouses will continuously adjust their standing due to technologies such as Hadoop. It will become critical to reduce errors and speed migration of database schema as the data warehouses continue to evolve with development, test and production environments.

The data warehouse of the future is a fluid system which will bring resources online as organizational needs evolve. A modern front-end will allow you to choose the data sets you want to query and will bring them together while hiding all the complexities.

The evolution in data and in cloud services will make it a necessity to have a cloud-based solution for data warehousing and analytics. Cloud-based solutions will be critical to helping organizations expand access to data and analytics as well as increase their agility with data. These solutions will offer performance on demand and support a wide range of analytics without overhead costs by taking advantage of the flexibility and cost model of the cloud.

REFERENCES

http://www.kimballgroup.com/2015/12/design-tip-180-the-future-is-bright/

https://www.betterbuys.com/bi/future-of-data-warehousing/

http://www.smartdatacollective.com/michelenemschoff/206391/quick-guide-structured-and-unstructured-data

http://www.webopedia.com/TERM/U/unstructured_data.html

http://www.webopedia.com/TERM/S/structured_data.html

GeeKay

Sunday, 28 February 2016

Big Unstructured Data vs Structured Relational Data

No comments:

Post a Comment

Blog Archive