Relational Database: Your Ultimate Guide to Representing Data
Today, small, medium and large businesses face wide-ranging challenges in their attempt to succeed. For modern businesses to remain competitive, reduce churn rates, increase customer acquisitions, reduce overhead, scale, and increase their bottom line, they must utilize every tool at their disposal when designing the architecture of their business and crafting robust strategic plans. One of the most critical aspects of a company's component-based infrastructure is their Information Technology (IT) infrastructure, which - when leveraged correctly - can streamline operations, ease workflows, reduce project timeframes, increase operational efficiency, reduce overhead, save millions of dollars, and make "business as usual" much more effective in the short-term and long-term, both tactically and strategically. A company's IT infrastructure is usually composed of six or seven main categorical components or systems:
- Hardware Systems
- Software Systems
- Enterprise Systems
- Network Systems
- Web Systems
- Security Systems
- Database Systems
The evolution of database systems has revolutionized business operations. Database systems, which are typically digital data centers, warehouses, and servers, are technological storage centers that businesses use to store, parse, query, edit, and transmit their critical business information. Databases act as crucial backbones that form how a company’s data is stored and utilized. No business can operate without data, making the structure and logical orientation of business database systems vastly significant to enterprises around the world.
A recent and widely used database system is the Relational Database, which was first defined by Edgar Codd of IBM in June 1970. A truly relational database applies all of Codd's 12 rules to the structure and operational system, while many relational databases of today can be classified as pseudo-relational databases that follow two critical principles:
- Visually representing the data as relations (that is, tables, using rows and columns).
- Utilizes relational operators to query and manipulate the tabular data.
Defining A Relational Database
Codd’s 12 principles/rules - as noted by IT World - dictate what truly is a relational database, in defining how the database structure and operations are to perform:
Rule 0: The Foundation Rule:
Databases should be relational and should use relational theory (models).
Rule 1: The Information Rule:
Data in a relational database should be presented in a tabular format.
Rule 2: The Guaranteed Access Rule:
Data in a relational database should be easily accessible based on being easily identified.
Rule 3: Systematic Treatment Of Null Values:
A field in the database should be allowed to remain empty.
Rule 4: Dynamic Online Catalog Based On The Relational Model:
A relational database should allow easy access to its structure using the same operations that are used to access the data itself.
Rule 5: The Comprehensive Data Sublanguage Rule:
The database can support multiple languages but must support at least one language for core operations and functionalities, including data definition, data manipulation, data integrity, and data transaction control.
Rule 6: The View Updating Rule:
All views of the data (being different logical combinations) should support the same full scope of data manipulation as direct table access.
Rule 7: Possible For High-level Insert, Update, And Delete:
In order to be truly relational, data retrieval can include data sets from multiple rows/columns or tables, with all data manipulation operands (insert, update, delete) being applied to all in the set.
Rule 8: Physical Data Independence:
Manipulation of the database hardware does not change logical access to the data, while the user’s actions on the logical database system and its data are isolated from the physical storage/retrieval methods.
Rule 9: Logical Data Independence:
The method of viewing the data in the relational database should not change or be dependent on changes in the logical structure of the database.
Rule 10: Integrity Independence:
The core database query language should ensure that database integrity is maintained even with a full range of user inputs.
Rule 11: Distribution Independence:
Database distribution should not be visible or evident to the user, who should be unaware of its distribution status.
Rule 12: The Non-subversion Rule:
The database structure should not be modifiable except via the multiple row (core) database language.
All but replacing the hierarchical and network database models, the evolution of relational databases based on Codd’s 12 rules gave app developers, data administrators, data officers, security admins, and other executives and specialists unparalleled access to a type of database that represents relations between data points in an easy to understand format that is feasibly queried and utilized for streamlined data operations.
As one of the significant changes that have altered and optimized business datacenters and databases, relational databases form the backbone of database management software systems, known as Relational Database Management Systems (RDBMS). RDBMSs are based on Relational Theory and utilize Relations (Tables) representing an entity, and rows/records (representing objects or concepts associated with the entity) and columns/attributes (representing values associated with those objects of the overarching entity). Such systems also use a query language (typically Structured Query Language or SQL) to query and manipulate the data in the database.
IT specialists and executives now can have a visual representation of related data points while allowing them to separately manage logical and physical database storage systems without having to utilize or recall the arbitrary structure of non-relational database systems (which do not represent data in a format that allows for data comparisons).
RDBMSs, based on relational databases and relational theory, are the norm in today's global IT ecosystem and are often integrated with advanced cloud-based technologies, such as Cloud computing platforms, Artificial Intelligence, and advanced web applications, giving businesses all over the world a uniform standard for storing and querying their data in an easily digestible and relational format.
Advantages of Relational Models
In the past, companies leveraged several database system types to store, parse, retrieve, and analyze data. To grasp the advantages of using relational models, it helps to understand the prior, non-relational database models that were in use for decades. Significant Non-relational models include the Hierarchical database model and the Network database model:
Hierarchical Database Model
This older database model stores data via a tree-like structure (beginning at the root node), with child records and a singular parent record. Such a database model allows for the storage of data as records, which connect to other pertinent data via links but does little to offer any relational system of data visualization that could be used to compare data points.
The network database model was a more recent model that existed as a middle-ground database model in between the hierarchical model and the relational model. As a database model, it represented data objects and their relationships via a robust schema. The schema graph-structure allowed for the establishment of relationships between data points and objects but failed to represent data relationships on a high-level with a declarative, non-navigational interface like the relational model (which replaced the network model).
Now relational models offer a robust and comprehensive manner for recording, storing and comparing complex data sets between relations ("tables") - and even within relations - using simple concepts, and a visual, tabular representative form that presents consistent, multifaceted data storage capabilities to administrators, along with secure transactions that are guaranteed to be valid. Relational databases give companies an unprecedented view into their most critical data by offering a top-level view of crucial data points and how they relate to other data points, while also providing a simple system for querying and manipulating this data. Such systems can significantly streamline data operations and increase the bottom line by increasing efficiency and productivity. A robust Relational Database Management System (RDBMS) can streamline this operation.
Consistency Of Data
One of the most crucial aspects of Relational Systems is their flexibility and consistency with how data is stored and utilized, allowing for concurrent and safe access and manipulation across multiple platforms. RDBMSs keep data consistent across all systems that access them by ensuring that transactions follow predetermined rules, parameters, and constraints while allowing internal functionalities to continually meet changing business requirements without affecting the structure of the database as a whole. Due to features that are hard-baked into all relational database systems, data also has several “fail-safe” measures that ensure their integrity and validity. Any “illegal” transactions - ones that do not follow the defined rules - are halted so that only valid changes to the database are allowed.
Multifaceted Commitment Capability
Relational databases utilize a commitment function to carry out transactions by ensuring that all aspects of a transaction are available and carried out in full before the database permanently changes and completes the transaction in the database. This commitment mechanism ensures that errors in a transaction do not result in duplicated data records, incorrect data records (due to a half-complete transaction), and transactions that are illegal.
Allows Stored Procedures
One of the essential features that relational databases utilize is stored procedures, which act as an intermediary layer between applications and databases for security purposes.
Essentially, a stored procedure is a set of instructions that operate as a subroutine that can be called upon by an application to contact and query a relational database management system, thus returning any number of database results. Complex executions of several SQL statements can be stored in a single procedure, allowing multiple applications to call the procedure. As a security feature, stored procedures ensure that applications do not directly access databases, which, by extension, allows stored procedures to be used for data-validation and access-control mechanisms.
Allows Database Locking And Concurrency
Relational databases allow for multi-user access, which is an essential function within any enterprise that requires several data operations per day or even an hour. To ensure that data is consistent, and to ensure the integrity and validity of concurrently accessed data, data locking is used. Database locking (or simply data locking) are RDBMS functions put in place to protect database data from the potential of being destroyed or negatively altered when accessed and interacted with by multiple transactions at the same time. As noted by Oracle, “locks are mechanisms that prevent destructive interaction between transactions accessing the same resource.” Database locking is a transaction control that is a critical part of any robust RDBMS.
The Relational Model
The foundational backbone of relational databases is the Relational Model, which is based on relational theory. Relational databases use a declarative, visual, tabular interface that allows for easy, robust comparisons between data points, with the ability to establish relationships between large sets of data. This database model contrasts the former database models which were mainly low-level navigational interfaces and were not as flexible or effective at increasing productivity, or in establishing relationships between data points in a high-level manner.
The relational model is a comprehensive, organized database management system based on the theories set forth by Edgar F. Codd of IBM in 1969. In 1970, Codd used the term "relational database" and elaborated on what it meant to be "relational" in his research paper "A Relational Model of Data for Large Shared Data Banks."
While IBM developed System R (a project culminating in a prototype RDBMS), Oracle was released as the first commercially available RDBMS in 1979 by Relational Software (now Oracle Corporation).
These Relational Databases were developed in a way that allowed database management systems to evolve from their previous state to relational systems. Due to being based on Relational Models, RDBMSs manage data via two distinct attributes that are consistent with first-order predicate logic: structure and language.
Additionally, relational models, in association with databases, use declarative methods for determining the data in the database, and how it is queried. The overlying RDBMS then is responsible for the data structures associated with the data storage and retrieval procedures when queries are answered. Today, most relational databases use Structured Query Language (SQL) for data definitions and querying the database, though there have been other query language proposals.
RDBMSs were a game-changer that, coupled with new and expansive hardware, allowed for unparalleled efficiency and productivity for admins who needed to compare complex data points via a relational, declarative, high-level method, while also enabling app developers to create complex applications without needing to learn or memorize arbitrary database structures. Thus, the advent of the relational database allowed for a standard database system that uniformly replaced the hierarchical and network models.
How Databases Are Structured
Databases that utilize the relational model (that is, RDBMSs) can be defined as a tabular system of representing data in a database by visually organizing information as a collection of rows (Tuples) and columns/attributes (even when the database does not strictly adhere to Codd's 12 rules).
There are several critical terms associated with relational databases that relate to the core RDBMS language (SQL) and RDBMSs as a whole:
- Row: A tuple that acts as a record or data set
- Column: A set of values or attributes that apply to the tuples/records
- Table: A relation, or a set of tuples (records) that have a shared, associated set of attributes
Another critical aspect of relational databases is the use of unique keys for each row/record that can be linked to data points in other relations (for comparison) via the use of a unique column-based element called a foreign key.
Delving further into the specific components and structures of RDBMSs, there are two major types of relations which form the foundation of how all data is stored and accessed:
- Base relations (tables): Relations that store data for access are base relations, the implementation of which is called a table.
- Derived relations (views/queries): Relations that are used to carry out operations, such as queries (to "grab" data from base relations), are derived relations.
Various Aspects of Relational Database
The components, structures, and definitions of RDBMSs above help to define what is and isn’t a RDBMS and what the foundational relational model/theory is, yet the mechanisms and functions of RDBMSs help to reveal why such database management systems are the current standard and norm. All factors associated with the tools and functionalities of relational databases can be broken down into three categories:
The language can be defined as the core computer language that is used for data definitions and querying a relational database, while the functions of said language may completely agree with Codd’s 12 rules or not.
How data is managed logically and physically is another vital aspect of relational databases that sets them apart from previous iterations of database management systems within enterprises.
Lastly, relational databases must comply with ACID (Atomicity, Consistency, Isolation, Durability) properties for transactions.
Coupled together - along with the use of SQL - these specific mechanisms give relational databases unparalleled flexibility, along with minimized redundancy and robust (but highly feasible) backup and recovery features.
SQL - Structured Query Language
Structured Query Language (SQL) is a domain-specific language that is used to query and manage the data in relational database management systems (RDBMSs). Primarily, SQL is used for structured data, which is associated with relations and the relationships among the column elements and the entities defined by a table’s rows/tuples.
SQL is a highly beneficial language to use alongside robust RDBMSs, primarily because it allows a database administrator to leverage the full power of the database management system due to SQL’s advantages, which include incredible flexibility, minimized redundancy, and unparalleled ease of backup and recovery, along with high speed, high portability, multiple data views, and more.
SQL is a highly portable database querying language that gives data administrators the ability to use the language on multiple platforms and systems - including laptops, PC workstations, serves, and mobile devices - all with high performance and speed.
Additionally, due to SQL’s simple syntax, it is feasible for administrators to use the language in simple ways to produce complex and powerful query and data manipulations for the best results, while the use of SQL is critical in that it gives data officers a myriad of complex functions that complement it’s truly flexible and scalable nature.
Redundancy is the requirement for multiple data “backups” in databases (saved in different locations) to be put in place. SinceRDBMSs and SQL carry out transactions by ensuring data integrity and guaranteeing validity, there is less need for redundant data backups to be put in place, saving company resources and streamlining operations.
Data redundancy can create inconsistent data storage records due to a user having to modify all redundant records when edits need to be made to any data record. Minimizing redundancy helps to manage data more efficiently while saving on overhead.
Ease of Backup and Recovery
These SQL/RDBMS transaction procedures ensure that most SQL-based RDBMSs have robust backup and recovery processes, making it easy for data administrators to implement their disaster and recovery strategies to protect their critical data.
A critical factor that makes relational databases so fundamentally advantageous is that strict data integrity procedures keep the integrity of crucial data throughout all database procedures and operations. This data integrity functionality is partly due to how RDBMSs handle transactions, which hard-bakes data integrity procedures into every procedure (via ACID protocols).
Database queries and operations function as transactions. A transaction is a single operation done within a database management system and includes any change done within the scope of the database. Within relational databases, transactions are deemed independent of other transactions and are also isolated from different programs attempting to access the database concurrently, such that no mixture of procedures or operations occur, which could result in possible errors. Transactions within RDBMSs also follow the ACID protocol to maintain data integrity and validity.
ACID (Atomicity, Consistency, Isolation, Durability) properties apply to relational database transactions to ensure data integrity, and to guarantee the validity of all operations:
- Atomic: An ”atomic” transaction is either entirely completed or has made no progress at all. Atomicity is important since transactions are either represented as completed 100 percent or not at all, which helps to guarantee that only completed transactions are reflected (as a single transaction) in databases, ensuring the validity of all data sets and database transactions.
- Consistent: Transactions must be consistent, which, as a property, denotes conforming with all existing constraints, rules, parameters, and set precedents in the database. This property prevents illegal transactions from being completed or recorded in the system.
- Isolated: Transactions in RDBMSs are isolated from one another so that one transaction does not affect another transaction, thus minimizing errors and redundancies.
- Durable: Durability is the transaction property of RDBMSs that ensures persistent survival of data transactions, even in the event of a database crash, so that the database and all included data sets are valid at all times.
Ultimately, the ACID property of RDBMSs ensures that all data transactions are valid, and that data transactions persist, even in the event of power failures, database crashes, and database errors.
There are many Relational Database Management Systems (RDBMSs) to choose from that allow businesses to leverage the relational model to easily compare data points and query large data sets with security and consistency hard-baked into the system. There are the typical enterprise-level RDBMS software suites plus cloud-based RDBMS “Database as a Service” systems - such as Amazon Relational Database Service (RDS), Google Cloud SQL, SQL Azure (Microsoft) - and even Object-Relational Databases (ORD) which use Object-Oriented approaches to store and query data.
Oracle Corporation is one of the earliest pioneers of RDBMSs. Today, their commercial RDBMS software, Oracle database, is one of the most popular choices for medium and large enterprises that need an enterprise-level, robust RDBMS for everyday database operations. As an enterprise-level RDBMS, Oracle DB also utilizes the Document Store, Graph DBMS, and RDF Store secondary database models, while also boasting support of a vast array of programming languages. Oracle DB also supports not only structured data, but also unstructured data, along with XML, JSON, Spatial, and Graph data.
Microsoft SQL Server
Microsoft SQL Server is an enterprise-level, proprietary RDBMS from Microsoft that integrates with other Microsoft services, while acting primarily as a system for storing and retrieving data as requested by other applications. SQL Server is a popular option amongst SMEs in that it allows for concurrent users, scalability, and smooth, seamless integration with Microsoft servers and IT components, along with working alongside with cloud-based (Azure) SQL applications. Additionally, SQL server works via a Tabular Data Stream (TDS) which allows for access to the server over an Intranet or Internet connection, via different protocols, from the OSI (application layer) model to the TCP/IP model.
MySQL is a powerful, open-source part of the LAMP stack that offers RDBMS functions for structured data, with the ability to be installed on a variety of platforms, including Linux, UNIX, and Windows. As a multi-platform RDBMS, MySQL is typically used in conjunction with web systems, based on the client-server model. As a backend, server-side SQL database, MySQL allows users to query the database and receive a client-side return via the MySQL client. Enterprises must obtain a license to use MySQL, while developers can use the software for free under the GNU-General Public License.
PostgreSQL is a powerful, Open Source RDBMS that boasts compliance with SQL and the ability to provide future scalability and growth. As a RDBMS, PostgreSQL allows for the handling of large operations and workflows with many concurrent users, while also allowing for several valuable functions, such as triggers, foreign keys, automatically updatable views, and more.
PostgreSQL also employs the Document Store secondary database model and offers multiple Database as a Service offerings, such as the Azure Database for PostgreSQL, and ScaleGrid.
IBM Db2 is an IBM RDBMS commonly used in IBM host environments and is a commercially available, enterprise-level RDBMS that supports both relational models and object-relational models along with non-relational structures such as JSON and XML.
For companies to make the most pertinent decision on which RDBMS to leverage for their company, they must take into account several factors, including the critical factor of licensing. A license gives a company the ability to use an application, and while some applications can be used for free via the GNU-General Public License (Open Source), other RDBMSs require a paid-for license.
Of the RDBMSs above, two require a license and two are Open Source:
- Oracle DB: A commercial license is required to use Oracle DB. The license costs are $47,000 per unit for the Enterprise edition, $17,500 per unit for the Standard edition, and $5,800 per unit for the Standard Edition One.
- Microsoft SQL Server: Microsoft's RDBMS offers two major licensing protocols, in being a proprietary RDBMS - a license based on users/devices (a license for each user, or a license for each device), or a license based on physical cores, with a 4 minimum core requirement, but with unlimited users.
- MySQL: MySQL is Open Source (free) for developers, while enterprises must obtain a license.
- PostgreSQL: PostgreSQL is Open Source.
- IBM Db2: IBM Db2 is commercially available via a license that can be purchased per processor or by an authorized user.
How To Choose The Ideal Database
When deliberating what relational database management system to use for your company, there are many requirements that a business must consider. Like with any IT infrastructure component, there is no “one size fits all” enterprise solution, as companies must tailor their needs to whichever RDBMS fits their most critical requirements. But some standard conditions apply to any company, of any size, such as:
- Data accuracy requirements
- Scalability requirements
- Concurrency requirements
- Performance and reliability requirements
The above requirements must take into account how well the different features of particular RDBMSs mesh and integrate with the overarching, existing, internal IT infrastructures already in place, and must also determine how each RDBMS can be implemented within the scope of the company infrastructure. Implementation considerations include:
- How can the data infrastructure of the organization integrate with the RDBMS that operates with the most robust data accuracy functions?
- When scaling the organization, how will the RDBMS structures need to be altered, if at all?
- When needing to implement advanced technologies in the future, such as cloud systems, Artificial Intelligence, and Machine Learning analytics, how can those technologies be integrated with the RDBMS?
- How do we measure RDBMS performance benchmarks? Do we focus on speed or efficiency?
In the end, the most robust RDBMS is the one that fits your enterprise's particular needs.
Data Accuracy Requirements
One of the most critical requirements for a company’s RDBMS is the organization’s data accuracy requirements. Companies need to ask themselves, “are our organization’s needs stringent? Are operations dependent on the accuracy of the information collected?” Answering those questions, virtually all of the RDBMSs spoken about above (Oracle, MySQL, PostgreSQL, Db2) are excellent choices for companies who need a robust data accuracy system with their database management system, and thus the cost of each RDBMS may be the only significant factor.
Scalability requires a forward-thinking, easily extendable RDBMS. When deciding on which RDBMS to use, companies must ask themselves, “What is the size of the data that needs to be managed? Will our database be able to accommodate the information growth?”
To this end, PostgreSQL may be the best choice for scalability, as it was designed with scalability and extendability in mind.
The concurrency requirement is one that virtually all modern RDBMSs fulfill. When companies ask themselves “will we need multiple users to access the data simultaneously?”, they can be assured that all of the above RDBMSs do allow for concurrent access of data and mechanisms to ensure data integrity and validity.
Performance And Reliability Dependence
When companies ask “are our operations highly dependent on query response performance?”, it must take into account the query response times associated with RDBMSs, and whether the speed is more important than efficiency and the reliable, powerful processing of data. After defining what “performance” entails for the company in question, the determination of which RDBMS to use comes down to the most robust, enterprise-level RDBMS with the most exceptional performance and speed benchmarks, in which Oracle DB is the most likely winner.
Selecting A Relational Database For Your Organization
Relational databases are the modern norm for database management. Relational Database Management Systems offer a robust and advantageous tabular system that visualizes easily-comparable data points and data sets via rows and columns. This tabular, declarative RDBMS enhances data management operations and workflows via a set of robust mechanisms and functionalities that increase data consistency, data integrity, data validity, data security, data flexibility, and the utilization of a powerful language (SQL) for flexible and powerful data manipulations/queries.
To fully leverage RDBMSs, it is essential to note and recognize the differences between software applications, how they operate, what they offer, and what type of licenses are required for each. While some may benefit from a Cloud-based Database as a Service system, others may prefer an Open Source RDBMS, and others, an enterprise-level commercial RDBMS.
Whichever RDBMS your company chooses, such a system has the capability of revolutionizing how you carry out your critical data operations, which can significantly save on overhead and can streamline processes, decrease project timelines, and increase the bottom line of your enterprise.