Common Spark SQL Syntax- FineDataLink Help Document

Last update: February 11, 2025

Overview

This document introduces common Spark SQL syntax.

Common Operator

Operator	Meaning	Detail
=	Equal operator	A=B. Returns TRUE if the A value is equal to the B value. Returns FALSE otherwise. In FineDataLink, you can use Data Filtering instead.
<>	Not equal operator	A <> B. Returns NULL if the A value is NULL or the B value is NULL. Returns TRUE if the A value is unequal to the B value. Returns FALSE otherwise. In FineDataLink, you can use Data Filtering instead.
<	Less than operator	A < B. Returns NULL if the A value is NULL or the B value is NULL. Returns TRUE if the A value is less than the B value. Returns FALSE otherwise. In FineDataLink, you can use Data Filtering instead.
<=	Less than or equal operator	A <= B. Returns NULL if the A value is NULL or the B value is NULL. Returns TRUE if the A value is less than or equal to the B value. Returns FALSE otherwise. In FineDataLink, you can use Data Filtering instead.
>=	Greater than or equal operator	A >= B. Returns NULL if the A value is NULL or the B value is NULL. Returns TRUE if the A value is greater than or equal to the B value. Returns FALSE otherwise. In FineDataLink, you can use Data Filtering instead.
IS NULL	NULL value test	A IS NULL. Returns TRUE if the A value is NULL. Returns FALSE otherwise. In FineDataLink, you can use Data Filtering instead.
IS NOT NULL	NOT NULL value test	A IS NOT NULL. Returns FALSE if the A value is NULL. Returns TRUE otherwise. In FineDataLink, you can use Data Filtering instead.
LIKE	Fuzzy matching	Syntax one: A LIKE B Syntax two: LIKE (A, B) Expression type: string Return value type: boolean or NULL Returns NULL if either the A value or the B value is NULL. Returns TRUE if the A value matches the regular syntax of the expression B. Returns FALSE otherwise. With LIKE you can only use the percent sign (%) and the underscore (_) as the wildcard character in the expression B, where % matches any number of characters, even zero characters, and _ matches exactly one character. (You can use multiple underscores, with each representing a character.) For example, select Football like %ba from Test and select like (Football, _otba%) from Test. FineDataLink makes fuzzy querying and filtering easier by visualizing the process, improving efficiency. For details, seeFuzzy Field Querying/Filtering.
AND	Logical AND	A AND B. Returns TRUE if both the A value and the B value are TRUE. Returns FALSE otherwise.
OR	Logical OR	A OR B. Returns TRUE if either the A value or the B value is TRUE. Returns FALSE otherwise.

AS - Column Adding/Field Renaming

Example:

Add a Type column with the Test value as the column value. You can use the following SQL statement. select Test AS Type from Input Source
Add a Mark column with the column value as 1 for data with a math score greater than 90 and as 0 otherwise. You can use the following SQL statement. select Math, if(Math >90,1,0) AS Mark from Input Source

For a simpler solution, see Adding a Column Using the IF Function.

Enclose the Chinese column name with single quotation marks (’‘). For example, select Test as Field.

GROUP BY - Data Deduplication

For details, see Data Deduplication.

Sorting

ORDER BY: global sorting

SORT BY, DISTRIBUTE BY, and CLUSTER BY: partial sorting

For details, see Data Sorting.

LIMIT - Constraining the Number of Returned Rows

Example:

select * from DB Table Input limit 3

BETWEEN ... AND - Constraining the Number of Returned Rows

Description: Returns records within a given range. It is applied to numeric, string, and date columns.

Example:

select * FROM DB Table Input where Product ID between 2 and 3

JOIN and UNION

LEFT JOIN
LEFT OUTER JOIN
LEFT SEMI JOIN
INNER JOIN or JOIN
RIGHT JOIN
RIGHT OUTER JOIN
FULL JOIN
FULL OUTER JOIN
UNION
UNION ALL

For details, see JOIN and UNION.

HAVING - Conditional Filtering After Grouping

Filters records based on the condition in the HAVING clause after the GROUP BY clause groups data.

Example:

select Product ID FROM DB Table Input group by Product ID having Product ID > 400

PIVOT - Row-to-Column Conversion

You are advised to use the Row-to-Column operator.

Stack - Column-to-Row Conversion

You are advised to use the Column-to-Row operator.

to_json and struct Functions

The to_json and struct functions are supported in 4.1.6.4 and later versions.

Difference Between the to_json Function and the JSON Generation operator

The JSON Generation operator can convert a two-dimensional table into a JSON object or convert it into a two-dimensional table with multiple rows and a single column (where the value on each row is a JSON object). However, it cannot create a two-dimensional table containing a combination of plain text, JSON objects, and JSON arrays. The to_json function can be used more flexibly.

The JSON Generation operator can be used with the API Output operator to generate JSON data and then output it to the API. The to_json function can be used to construct JSON fields for MongoDB Output.

to_json Function

The to_json function can convert data to a JSON string.

SQL Statement in Spark SQL	Result
SELECT to_json(named_struct('a', 1, 'b', 2));	{"a":1,"b":2}
SELECT to_json(map('a', named_struct('b', 1)));	{"a":{"b":1}}
SELECT to_json(array(map('a', 1)));	[{"a":1}]
To convert text to a JSON array: SELECT to_json(split('a,b,c', ',')) AS jsonArray	["a", "b", "c"]

struct Function

The struct function is used to concatenate multiple columns or fields into a structured data object.

Example one: Concatenate multiple fields of a two-dimensional table into a JSON object.

Example two: Concatenate multiple fields of a two-dimensional table into a JSON object array through struct, to_json, collect_list, concat_ws, concat, and group by.For details, see MongoDB Output.

Helpful
Not helpful
Only read

中文（简体）

English

Common Spark SQL Syntax

Overview

Common Operator

AS - Column Adding/Field Renaming

GROUP BY - Data Deduplication

Sorting

LIMIT - Constraining the Number of Returned Rows

BETWEEN ... AND - Constraining the Number of Returned Rows

JOIN and UNION

HAVING - Conditional Filtering After Grouping

PIVOT - Row-to-Column Conversion

Stack - Column-to-Row Conversion

to_json and struct Functions

附件列表