GSQL: Exploring SELECT statements through video recommendation

Overview

SELECT statements can often be a source of confusion when working with vertices in TigerGraph. This article is geared towards intermediate users of GSQL that want a fuller understanding of the possible use cases and capabilities of a SELECT statement. We will be working with a video recommendation schema that can select/rank personalized content for users on services like YouTube or TikTok.

What is a SELECT statement?

The SELECT statement allows you to select a controlled section of vertices within your graph database that meet certain requirements. Note that in Syntax v1, your output vertex set cannot be more than a single hop (edge traversal) away from your input vertex set.

If there are no hops, then the output set will be a direct subset of the input.

A visual representation

This diagram from the Official TigerGraph Documentation provides a good structural overview of the data flow within a 1-hop SELECT statement.

Source: docs.tigergraph.com

Graph schema

In this article, we will be working with a schema that is supposed to replicate interactions that can occur within video sharing services like YouTube or TikTok. Here are the types of vertices we will work with:

CREATE VERTEX user (
PRIMARY_ID username STRING,
username STRING,
communityID INT,
country STRING
)
CREATE VERTEX video (
PRIMARY_ID videoID INT,
country STRING
)
CREATE VERTEX tag (
PRIMARY_ID content STRING,
content STRING
)
CREATE VERTEX audio (
PRIMARY_ID content INT,
content INT
)

One may wonder, why do some vertex attributes like “username” or “content” appear as duplicates? Well, the reason is that a PRIMARY_ID exists to provide a unique identifier for the vertex that can be quickly accessed through memory (not where attributes are stored). Thus, if we want to treat a PRIMARY_ID like an attribute and reference its value in a query, we will also need to store it separately on the local disk.

Here are some of the interactions (edges) that we will observe…

A user can follow another user

CREATE DIRECTED EDGE following (FROM user, TO user)

A user can watch a video

CREATE UNDIRECTED EDGE watched (
FROM user,
TO video,
percentWatched INT,
commented BOOL,
liked BOOL)

A video can contain identifiable audio (which exists as an integer ID)

CREATE UNDIRECTED EDGE vAudio (FROM video, TO audio)

A video can point to a tag that describes its content (i.e. finance, pets, sports)

CREATE UNDIRECTED EDGE vTag (FROM video, TO tag)

SOURCE or TARGET

In the very first part of the SELECT statement, you can select which side of the hop you want to select your output set from (s or t). If there is no hop, you are forced to select from the source.

Attribute-filtering SELECT statement (no hop)

This is the simplest functionality of a SELECT statement, wherein we filter out the input vertex set by certain attribute criteria that must be met. This can be done through the WHERE clause (or the HAVING clause later in the control flow).

Compile a set of users with a certain username

STRING targetUsername;start = {user.*};
targetUsers = SELECT s
FROM start:s
WHERE s.username == targetUsername;
// Names are unique, so targetUsers should only contain one vertex

Edge-filtering SELECT statement (single hop)

Similarly, we can filter our output set by certain edge types (and their attributes) that originate from an input vertex set.

Compile a set of all videos watched by the input set of users

/* 
In this case, we continue using the singleton set targetUsers.
Thus, we compile all videos watched by this single user.
*/
watchedVideos = SELECT t
FROM targetUsers:s -(watched:e)-> video:t;

Essentially, watchedVideos will contain all the videos that pass each step of criteria from the FROM clause. This means that the output contains videos that have been watched and originate from the targetUsers set.

A single video t will only be included if it is part of a valid s,e,t triplet.

Attribute-filtering SELECT statement (single hop)

We can also filter the target side of the hop by characteristics of a vertex attribute, just like we previously did for the source set with no hop.

Compile a set of videos (from the USA) that were watched by the input user set

USAVideos = SELECT t
FROM targetUsers:s -(watched:e)-> video:t
WHERE t.country == "USA";

Accumulating within a SELECT statement

You can also increment accumulators at every passing vertex selection using either ACCUM or POST-ACCUM. This incrementation can be conditional.

Count how many videos have been commented on by the input user set

SumAccum<INT> @@numberOfCommentedVideos;commentedVideos = SELECT t
FROM targetUsers:s -(watched:e)-> video:t
ACCUM IF watched.commented
THEN @@numberOfCommentedVideos += 1
END;

Combining SELECT statements to mimic multi-hop

In GSQL Syntax v2, a single SELECT statement can perform multiple hops. The backbone of such a concept simply boils down to multiple consecutive single hops in Syntax v1.

Compile a set of tags referenced by videos (that have been liked by a user set)

// First, compile videos that have been liked by a user set
likedVideos = SELECT t
FROM targetUsers:s -(watched:e)-> video:t
WHERE watched.commented;
// Then, feed the output set likedVideos into the next hop/statement
likedTags = SELECT t
FROM likedVideos:s -(vTag:e)-> tag:t;

We have essentially mimicked the following Syntax v2 statement:

likedTags = 
SELECT t
FROM targetUsers:s -(watched:e)- video:f -(vTag:v)- tag:t
WHERE watched.commented;

Note that we had to change video:t to video:f (and vTag:e to vTag:v), as the letter identifiers of edges/vertices must be unique.

Here is another example. This one is slightly more complicated because both hops are made over the same edge type (there and back).

Compile a set of users who liked videos in common with the source/target group of users.

likedVideos = SELECT t
FROM targetUsers:s -(watched:e)-> video:t
WHERE watched.liked;
relatedUsers = SELECT t
FROM likedVideos:s -(watched:e)-> user:t
WHERE watched.liked;

We can logically add a third hop here, as well:

Compile a set of videos watched by users who liked videos in common with the source/target group of users.

likedVideos = SELECT t
FROM targetUsers:s -(watched:e)-> video:t
WHERE watched.liked;
relatedUsers = SELECT t
FROM likedVideos:s -(watched:e)-> user:t
WHERE watched.liked;
relatedVideos = SELECT t
FROM similarUsers:s -(watched:e)-> video:t;
Each arrow represents a hop

Developer Advocate Intern at TigerGraph