Skip to content

Overview of important SQL commands

  • Data

In this article, you will find the most important SQL commands for basic work with relational databases. The Structured Query Language (SQL) is the most commonly used language when working with relational databases. The language can be used for much more than simple queries, despite its name. It can also be used to perform all operations necessary to create and maintain databases.

What are the advantages of Structured Query Language?

The structured query language provides many functions to read, modify or delete data. In addition, it is preferred by many analysts over other languages for the following reasons:

  • It is semantically very easy to read and understand. The commands can be understood to a large extent even by beginners.
  • The language can be used directly within the database environment. For basic work with information, the data does not have to be transferred from the database to another tool first. Simple calculations and queries are possible directly in the database.
  • Compared to other spreadsheet tools, such as Excel, data analysis with Structured Query Language can be easily replicated and copied because everyone has access to the same data in the database. Thus, the same query always leads to the same result.

SQL provides an alternative to the kind of summaries and calculations that most perform in Excel spreadsheets, such as totals, averaging, or finding the maximum within a column. These calculations can also be performed over multiple data sets simultaneously.

What sample data do we use?

In order to be able to test the most common SQL commands live, we use a Credit Record dataset from Kaggle, which lists credit data for various, anonymous people such as their income, education, or occupation.

We load this table as Pandas DataFrame into our notebook and can then test common SQL commands on it:

How to query data?

The complete table can be queried using “SELECT”. Since no specific column is to be queried, we simply use the star “*” so that all available columns are output. With the help of “LIMIT 10”, we make sure that only the top 10 rows are returned by the SQL command, otherwise, it becomes too confusing:

Since there is a large number of columns, we limit ourselves to the “NAME_INCOME_TYPE” and “CNT_CHILDREN” columns. We can query these by specifying them explicitly:

As we can see, some income types appear twice in these lines. This can happen if multiple credit suspects can have the same income type. 

To get only the unique entries for the income type, we use the additional parameter “DISTINCT”:

How to filter data?

The data can be filtered using the “WHERE” parameter. Depending on the data type, there are different queries for this:

  • Numerical values can be compared using the greater than or less than sign, e.g. “AMT_INCOME_TOTAL < 427500”. These can be supplemented with an equal sign, e.g. “AMT_INCOME_TOTAL < 427500”.
  • For texts or strings, the comparisons “=” or “<>” are used to check whether the texts match (“=”) or differ (“<>”).

For all credit applicants with an income of more than $427,500, we get the following SQL command:

To be able to use multiple filters in one SQL command, we can connect them either with an “AND” or an “OR”. This way we get all applicants with a high income and that are female:

How can the results be sorted?

Each output can be sorted according to a column using “ORDER BY”. Both numbers and strings can be sorted, which are then sorted alphabetically:

By default, the outputs are always sorted in ascending order. To change this, you must additionally specify “DESC” for descending:

How can records be counted?

Using the SQL command “count” you can count the values in columns or in the whole database:

In this case, we have 438,557 records in the database. To count the values within a column, we use the name of the column instead of the star “*”. Additionally, we can use the parameter “DISTINCT” to count only the unique values in the column:

Thus, there are 866 different income levels in the data set.

This is what you should take with you

  • Structured Query Language is the most widely used language when working with relational databases.
  • With the help of SQL commands, extensive data queries can be designed and individualized.

Thanks to Deepnote for sponsoring this article! Deepnote offers me the possibility to embed Python code easily and quickly on this website and also to host the related notebooks in the cloud.

Data Quality / Datenqualität

What is Data Quality?

Ensuring Data Quality: Importance, Challenges, and Best Practices. Learn how to maintain high-quality data to drive better business decisions.

Data Imputation / Imputation

What is Data Imputation?

Impute missing values with data imputation techniques. Optimize data quality and learn more about the techniques and importance.

Outlier Detection / Ausreißererkennung

What is Outlier Detection?

Discover hidden anomalies in your data with advanced outlier detection techniques. Improve decision-making and uncover valuable insights.

Bivariate Analysis / Bivariate Analyse

What is the Bivariate Analysis?

Unlock insights with bivariate analysis. Explore types, scatterplots, correlation, and regression. Enhance your data analysis skills.

RESTful API

What is a RESTful API?

Learn all about RESTful APIs and how they can make your web development projects more efficient and scalable.

Time Series Data / Zeitreihendaten

What is Time Series Data?

Unlock insights from time series data with analysis and forecasting techniques. Discover trends and patterns for informed decision-making.

  • The Olympia dataset used can be found here.
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Niklas Lang

I have been working as a machine learning engineer and software developer since 2020 and am passionate about the world of data, algorithms and software development. In addition to my work in the field, I teach at several German universities, including the IU International University of Applied Sciences and the Baden-Württemberg Cooperative State University, in the fields of data science, mathematics and business analytics.

My goal is to present complex topics such as statistics and machine learning in a way that makes them not only understandable, but also exciting and tangible. I combine practical experience from industry with sound theoretical foundations to prepare my students in the best possible way for the challenges of the data world.

Cookie Consent with Real Cookie Banner