SQL (Structured Query Language)
SQL (Structured Query Language) is a standardized programming language used for managing relational databases and performing various operations on the data in them. Initially created in the 1970s, SQL is regularly used by database administrators, as well as by developers writing data integration scripts and data analysts looking to set up and run analytical queries.
The uses of SQL include modifying database table and index structures; adding, updating and deleting rows of data; and retrieving subsets of information from within a database for transaction processing and analytics applications. Queries and other SQL operations take the form of commands written as statements — commonly used SQL statements include select, add, insert, update, delete, create, alter and truncate.
SQL became the de facto standard programming language for relational databases after they emerged in the late 1970s and early 1980s. Also known as SQL databases, relational systems comprise a set of tables containing data in rows and columns. Each column in a table corresponds to a category of data — for example, customer name or address — while each row contains a data value for the intersecting column.
SQL standard and proprietary extensions
An official SQL standard was adopted by the American National Standards Institute (ANSI) in 1986 and then by the International Organization for Standardization, known as ISO, in 1987. More than a half-dozen joint updates to the standard have been released by the two standards development bodies since then; as of this writing, the most recent version is SQL:2011, approved that year.
SQL commands and syntax
SQL commands are divided into several different types, among them data manipulation language (DML) and data definition language (DDL) statements, transaction controls and security measures. The DML vocabulary is used to retrieve and manipulate data, while DDL statements are for defining and modifying database structures. The transaction controls help manage transaction processing, ensuring that transactions are either completed or rolled back if errors or problems occur. The security statements are used to control database access as well as to create user roles and permissions.
SQL-on-Hadoop tools
SQL-on-Hadoop query engines are a newer offshoot of SQL that enable organizations with big data architectures built around Hadoop systems to take advantage of it instead of having to use more complex and less familiar languages — in particular, the MapReduce programming environment for developing batch processing applications.
More than a dozen SQL-on-Hadoop tools have become available through Hadoop distribution providers and other vendors; many of them are open source software or commercial versions of such technologies. In addition, the Apache Spark processing engine, which is often used in conjunction with Hadoop, includes a Spark SQL module that similarly supports SQL-based programming.
In general, SQL-on-Hadoop is still an emerging technology, and most of the available tools don’t support all of the functionality offered in relational implementations of SQL. But they’re becoming a regular component of Hadoop deployments as companies look to get developers and data analysts with SQL skills involved in programming big data applications.