SQL (Structured Query Language)

SQL (Structured Query Language)

SQL (Structured Query Language)
SQL (Structured Query Language)

SQL (Structured Query Language) is a standardized programming language used for managing relational databases and performing various operations on the data in them. Initially created in the 1970s, SQL is regularly used by database administrators, as well as by developers writing data integration scripts and data analysts looking to set up and run analytical queries.

The uses of SQL include modifying database table and index structures; adding, updating and deleting rows of data; and retrieving subsets of information from within a database for transaction processing and analytics applications. Queries and other SQL operations take the form of commands written as statements — commonly used SQL statements include select, add, insert, update, delete, create, alter and truncate.

SQL became the de facto standard programming language for relational databases after they emerged in the late 1970s and early 1980s. Also known as SQL databases, relational systems comprise a set of tables containing data in rows and columns. Each column in a table corresponds to a category of data — for example, customer name or address — while each row contains a data value for the intersecting column.

SQL standard and proprietary extensions

An official SQL standard was adopted by the American National Standards Institute (ANSI) in 1986 and then by the International Organization for Standardization, known as ISO, in 1987. More than a half-dozen joint updates to the standard have been released by the two standards development bodies since then; as of this writing, the most recent version is SQL:2011, approved that year.

Both proprietary and open source relational database management systems built around SQL are available for use by organizations. They include Microsoft SQL ServerOracle DatabaseIBM DB2SAP HANA, SAP Adaptive Server, MySQL (now owned by Oracle) and PostgreSQL. However, many of these database products support SQL with proprietary extensions to the standard language for procedural programming and other functions. For example, Microsoft offers a set of extensions called Transact-SQL (T-SQL), while Oracle’s extended version of the standard is PL/SQL. As a result, the different variants of SQL offered by vendors aren’t fully compatible with one another.

SQL commands and syntax

SQL commands are divided into several different types, among them data manipulation language (DML) and data definition language (DDL) statements, transaction controls and security measures. The DML vocabulary is used to retrieve and manipulate data, while DDL statements are for defining and modifying database structures. The transaction controls help manage transaction processing, ensuring that transactions are either completed or rolled back if errors or problems occur. The security statements are used to control database access as well as to create user roles and permissions.

SQL syntax is the coding format used in writing statements. Figure 1 shows an example of a DDL statement written in Microsoft’s T-SQL to modify a database table in SQL Server 2016:
Figure 1. An example of T-SQL code in SQL Server 2016. This is the code for the ALTER TABLE WITH (ONLINE = ON | OFF) option.

SQL-on-Hadoop tools

SQL-on-Hadoop query engines are a newer offshoot of SQL that enable organizations with big data architectures built around Hadoop systems to take advantage of it instead of having to use more complex and less familiar languages — in particular, the MapReduce programming environment for developing batch processing applications.

More than a dozen SQL-on-Hadoop tools have become available through Hadoop distribution providers and other vendors; many of them are open source software or commercial versions of such technologies. In addition, the Apache Spark processing engine, which is often used in conjunction with Hadoop, includes a Spark SQL module that similarly supports SQL-based programming.

In general, SQL-on-Hadoop is still an emerging technology, and most of the available tools don’t support all of the functionality offered in relational implementations of SQL. But they’re becoming a regular component of Hadoop deployments as companies look to get developers and data analysts with SQL skills involved in programming big data applications.