In this section, we will concentrate on the difference between structured, semi-structured and unstructured data.
Structured data concerns the data stored in SQL format in table with rows and columns. It incorporates a relational key, which is planned into pre-designed fields. Structured data is utilized for a larger scale.
Structured data represents only 5-10 percent of all informatics data.
Sem-structured data incorporates data which don't reside in relational database. They incorporate some of organizational properties that make it simpler to analyse. It includes the same process to store them in relational database. The examples of semi-structured database are CSV files, XML and JSON documents. NoSQL databases are considered semistructured.
Unstructured data represents 80 percent of data. It frequently includes text and multimedia content. The best examples of unstructured data include audio files, presentations and web pages. The examples of machine generated unstructured data are satellite pictures, scientific data, photos and video, radar and sonar data.
The above pyramid structure specifically focusses on the amount of data and the ratio on which it is dissipated.
Semi-structured data appears as type between unstructured and semi-structured data. In this study notes, we will focus on semi-structured data, which is helpful for agile methodology and data science research.
Semi structured data doesn't have a formal data model but has a clear, selfdescribing pattern and structure which is developed by its analysis.