Clustered File Organization in DBMS - Coding

Data storing and accessing is a fundamental concept in the area of DBMS. A clustered file organization is one of the methods that have been practiced to improve these operations. The clustered file organization technique is the main concern of this article. This is used by DBMS to enhance access to data especially when it is in several tables which have a high probability of being accessed together. Definitions of these terms will also be made in the article while examples will also be used in the explanation of the concepts and the various types of clustered file organization.

Key Terminologies

File Organization: The technique that is used to store data in a file. Thus, file arrangement affects the efficiency of data access and retrieval.

Clustered index: The arrangement of the physical data in rows is dictated by an index referred to as a clustered index. The key for the clustered index referred to as the cluster key is used to order the table.

Cluster Key: A shared field from which related records from various tables are combined. Usually, a foreign key from one table and a primary key from another are involved.

Indexing: Database tables can get data faster with the help of the technique involving the construction of a data structure that is referred to as indexing.

Primary Key: A key that can uniquely identify a record in a table is known as the primary key.

Clustered File Organization

A clustered file organization keeps two or more related tables/records in a single file known as a cluster. These files consist of two or more tables within a single data block, and the mapping attributes, defining the relationships between the tables, are stored only once. These contain two or more tables in one data block and the key attributes that are related between the tables are stored only once. This means it is cheaper to search for and retrieve distinct records from different files as they are now integrated and stored in the same cluster.. Let us understand the concept better with the following examples.

Examples

Example 1: let’s consider a database for an online store with two tables: “Customers” and “Orders” .

Customer table :

customer_id	customer_name	address
1	Ramesh Sharma	Delhi
2	Priya Patel	Lucknow
3	Sanjay Gupta	Patna

Orders table:

order_id	customer_id	order_date	amount
101	1	2024-05-01	Rs. 1500
102	2	2024-05-03	Rs. 10000
103	1	2024-05-04	Rs. 2500
104	3	2024-05-06	Rs. 500

The system must quickly combine these databases in order to obtain the necessary data when a consumer requests information about their purchasing history. Clustered file organization can greatly improve this procedure.

By clustering the tables on the “customer_id”(cluster key), the records would be grouped as follows:

customer_id	customer_name	address	order_id	order_date	amount
1	Ramesh Sharma	Delhi	101	2024-05-01	Rs. 1500
1	Ramesh Sharma	Delhi	103	2024-05-04	Rs. 2500
2	Priya Patel	Lucknow	102	2024-05-03	Rs. 10000
3	Sanjay Gupta	Patna	104	2024-05-06	Rs. 500

Because of this organization, the database may quickly retrieve any relevant information in a single, contiguous block of storage when a query is made for Ramesh Sharma’s order history, eliminating the need for numerous I/O operations.

Example 2: let’s consider a database for a library management system with two tables: “Books” and “Book_issue” .

Books table

book_id	title	author	publication_year
1	“Midnight’s Children”	Salman Rushdie	1981
2	“The God of Small Things”	Arundhati Roy	1997
3	“A Suitable Boy”	Vikram Seth	1993

Books_issue table

issue_id	book_id	issue_date	return_date
201	1	2024-01-10	2024-01-24
202	2	2024-01-15	2024-01-30
203	1	2024-02-01	2024-02-15
204	3	2024-02-05	2024-02-20

By clustering the tables on the “book_id”(cluster key), the records would be grouped as follows:

book_id	title	author	publication_year	issue_id	issue_date	return_date
1	“Midnight’s Children”	Salman Rushdie	1981	201	2024-01-10	2024-01-24
1	“Midnight’s Children”	Salman Rushdie	1981	203	2024-02-01	2024-02-15
2	“The God of Small Things”	Arundhati Roy	1997	202	2024-01-15	2024-01-30
3	“A Suitable Boy”	Vikram Seth	1993	204	2024-02-05	2024-02-20

Because of this organization, the database may quickly retrieve any relevant information in a single, contiguous block of storage when a query is made for issue history of a particular book by the librarian, eliminating the need for numerous I/O operations.

Types of Clustered file organization

1. Indexed Clusters

The clusters are organized, such that records in an indexed cluster are stored in the order of the clustering key. However, for searching and retrieving data, an index is created on the clustering key along with physical sorting as well. When using the range queries and equality searches on the clustering key, this kind of clustering is beneficial.

2. Hash Clusters

In a hash cluster, every record is located in accordance with a hash function on the clustering key. By applying it, one can identify records with the same hash value, and therefore identify their physical location. Another great advantage of hash clustering is that by using their clustering key, it is possible to obtain an individual record very quickly. Still, since records are not sorted in a particular order, the algorithm is slower for range queries.

Conclusion

One of the techniques stated in the context of DBMS is clustered file organization which serves the purpose of enhancing the efficiency of data search most conveniently and specially for join operations. The idea behind the use of databases is that, if the records with related data are stored near each in the memory, then answer to the query takes much less time than in the case of read operation. This strategy proves quite useful when the data on customers and orders is regularly available in the integrated format, as it has been demonstrated in the case of an online store.

Frequently Asked Questions on Clustered File Organization – FAQs

In what situation is use of clustered file organization appropriate in a database?

ans. For those databases in which queries often involve joining the related tables or in the case of working with large quantities of interconnected data, the use of the clustered file organization is effective. This is particularly good for scenarios where the index mostly helps in read operations since the complex maintenance task is not of much concern.

Is clustered file organization applicable to all types of databases?

ans. Clustered file organization indeed can be implemented in different relational database management systems(RDBMS), but its practical application and efficiency can differ depending on the certain DBMS. Some other systems, for example, have certain settings and customizations for clustering data.

In order to create clustered file organization , what are some good practices?

ans. The best practices involve practices such as the fields that should be clustered are chosen based on the query frequencies, space can be controlled in this method to avoid fragmentation, and the data clustered in this method can be rearranged and defragmented time and again and lastly used only for those applications where the read approach is most of the time dominant and queries are highly significant.

Reffered: https://www.geeksforgeeks.org

DBMS

Related
What is XML Data Model in DBMS?
What is ETL (Extract Transform Load)?
Multivalued Dependency and Fourth Normal Form
What is Data Independence in DBMS?
Sequential File Organization in Database

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	19