Spark SQL String Function- FineDataLink Help Document

Last update: February 13, 2025

Overview

This document introduces the syntax of the string functions in Spark SQL.

String Character Count

You are advised to use LEN in New Calculation Column of FineDatalink.

CHAR_LENGTH(String): Returns the number of characters in the string.
CHARACTER_LENGTH(String): Returns the number of characters in the string.
LENGTH(String): Returns the number of characters in the string.

For example, to obtain the number of characters of Contract Type, you can use the statement shown in the following figure.

select Contract Type ,CHAR_LENGTH(Contract Type) FROM DB Table Input

String Concatenation

Description

You are advised to use CONCATENATE in New Calculation Column of FineDataLink.

You can also use the syntax in Spark SQL.

You can also use the CONCAT(s1,s2...sn) clause in Spark SQL to concatenate multiple strings into one string.

For example,

SELECT CONCAT ('Fine',' Data','Link').

Notes

Problem Description:

The 13-digit timestamp you obtain using the New Calculation Column operator will be displayed in scientific notation in the CONCAT function in the Spark SQL operator. For example, the statement md5(concat(1700794824509), 'fixed_key') will changes to md5(concat(1700794824509E12), 'fixed_key') during actual execution, changing the MD5 encryption result.

Solution:

Before concatenation, use the Field Setting operator to convert the timestamp field obtained through the New Calculation Column operator to the long type.

String Reversal

REVERSE(String): Returns the string with the order of the characters reversed.

For example, to reverse the Contract Type string, you can use the statement shown in the following figure.

select Contract Type ,reverse(Contract Type) FROM DB Table Input1

String Character Extraction

You are advised to use RIGHT, MID, and LEFT in New Calculation Column of FineDataLink.

You can also use the syntax in Spark SQL.

Name	Usage in the Spark SQL Operator
LEFT	LEFT(Expression, Length) Example: SELECT LEFT('FineDataLink',4) For example, to extract the leftmost two characters from the Contract Type string, you can use the following statement, as shown in the following figure. `select Contract Type ,LEFT(Contract Type,2) FROM DB Table Input2`
RIGHT	RIGHT(Expression, Length) Example: SELECT RIGHT('FineDataLink',8) For example, to extract the rightmost two characters of the Contract Type string, you can use the following statement, as shown in the following figure. `select Contract Type,RIGHT(Contract Type, 2 ) FROM DB Table Input3`
Substring	SUBSTR(Expression, Position, Length) Example: SELECT SUBSTR('FineDataLink',5,4) For example, to extract two characters starting from the third character in the Contract Type string, you can use the following statement, as shown in the following figure. `select Contract Type,SUBSTR(Contract Type,3,2) FROM DB Table Input4`

String Character Case Conversion

You are advised to use UPPER and LOWER in New Calculation Column of FineDataLink.

You can also use the syntax in Spark SQL.

Name

Usage in the Spark SQL Operator

Lowercase to uppercase

UPPER(Expression)

Example: SELECT UPPER('FineDataLink')

Uppercase to lowercase

LOWER(Expression)

Example: SELECT LOWER('FineDataLink')

Repeating a String

You are advised to use REPEAT in New Calculation Column of FineDataLink.

You can also use the syntax in Spark SQL.

REPEAT (String, Number): Repeats a string the specified number of times.

Example: select repeat('Small',2)

Padding a String

LPAD(String1, Length, String2): Returns the String1 value left-padded with the String2 value to a length of Length characters.

Example:

select Contract Payment Type,Contract Type,lpad(Contract Type,35,Contract Payment Type) FROM DB Table Input5

Left-pad the Contract Type value with the Contract Payment Type value to a length of 35 characters. The returned result set is shown in the following figure.

RPAD(String1, Length, String2): Returns the String1 value right-padded with the String2 value to a length of Length characters.

Example:

select Contract Payment Type,Contract Type,rpad(Contract Type,35,Contract Payment Type) FROM DB Table Input6

Right-pad the Contract Type value with the Contract Payment Type value to a length of 10 characters. The returned result set is shown in the following figure.

Finding a String Within a String List

FIND_IN_SET(String, String list): Returns a number representing the position of the first occurrence of the string in the string list, where the string list is a string composed of substrings separated by comma (,). Returns 0 if the string is not in the string list.

Example:

select find_in_set('a small b','cd, ef, a small b, de')

You are advised to use FIND in New Calculation Column of FineDatalink.

String Position

POSITION(String1 IN String2): Returns a number representing the position of the first occurrence of the String1 value in the String value. Returns 0 if the String1 value is not in the String value.

Example:

select POSITION('b' IN 'abcb')

You are advised to use FIND in New Calculation Column of FineDatalink.

Substring Replacement

REGEXP_REPLACE(String1, String2, String3): Replaces occurrences in the String1 value that match the regular expression specified by String2 with the String3 value and returns the resulting string.

Example:

select regexp_replace("Purchase", "Pu|ch", "tt")

The substrings Pu and ch in the string Purchase will be replaced with tt, as shown in the following figure.

Helpful
Not helpful
Only read

中文（简体）

English

Spark SQL String Function

Overview

String Character Count

String Concatenation

Description

Notes

String Reversal

String Character Extraction

String Character Case Conversion

Repeating a String

Padding a String

Finding a String Within a String List

String Position

Substring Replacement

附件列表