Avec le langage SQL standard de BigQuery, il est possible d’utiliser la clause WITH.
Ceci présente de nombreux avantages :
- Lecture du code SQL facilitée
- Process de Debugging plus facile et rapide (moins de code à tester/retravailler)
- Allegement du code SQL (pour les sous-requetes utilisées à de multiples reprises)
La clause WITH permet simplement de donner un nom à une sous-requête pour y faire reference (comme les alias pour les noms de champs ou de tables).
Ainsi, si une sous-requête est utilisée plusieurs fois et qu’on souhaite la modifier, le changement dans la clause WITH appliquera les modifications partout.
Par exemple, si on prend le code SQL ci-dessous, il peut être ecrit avec une clause WITH.
#standardSQL
SELECT DISTINCT date, visitor_id, session_id, ClientId
FROM (
SELECT
date,
fullVisitorId AS visitor_id,
CONCAT(date,'-',fullVisitorId,'-',CAST(visitId AS STRING)) AS session_id,
(SELECT value FROM UNNEST(hits.customDimensions) WHERE index=2 GROUP BY value) AS ClientId
FROM `mon-compte-big-query.xxx.ga_sessions_*`, UNNEST(hits) as hits
WHERE _TABLE_SUFFIX BETWEEN '20190101' AND '20191231'
)
WHERE session_id IN (
SELECT
CONCAT(date,'-',fullVisitorId,'-',CAST(visitId AS STRING)) AS session_id
FROM `mon-compte-big-query.xxx.ga_sessions_*`, UNNEST(hits) as hits
WHERE _TABLE_SUFFIX BETWEEN '20190101' AND '20191231'
AND hits.type = 'PAGE'
AND hits.page.pagePath = '/Resiliation/Infos/'
)
AND ClientId != ''
ORDER BY visitor_id
Code SQL réécrit avec la clause WITH :
#standardSQL
WITH
raw_data as (
SELECT
date,
fullVisitorId AS visitor_id,
CONCAT(date,'-',fullVisitorId,'-',CAST(visitId AS STRING)) AS session_id,
(SELECT value FROM UNNEST(hits.customDimensions) WHERE index=2 GROUP BY value) AS ClientId
FROM `mon-compte-big-query.xxx.ga_sessions_*`, UNNEST(hits) as hits
WHERE _TABLE_SUFFIX BETWEEN '20190101' AND '20191231'
),
session_id_resiliation as (
SELECT
CONCAT(date,'-',fullVisitorId,'-',CAST(visitId AS STRING)) AS session_id
FROM `mon-compte-big-query.xxx.ga_sessions_*`, UNNEST(hits) as hits
WHERE _TABLE_SUFFIX BETWEEN '20190101' AND '20191231'
AND hits.type = 'PAGE'
AND hits.page.pagePath = '/Resiliation/Infos/'
)
SELECT DISTINCT date, visitor_id, session_id, ClientId
FROM raw_data
WHERE session_id IN session_id_resiliation
AND ClientId != ''
ORDER BY visitor_id
Pour plus d’information :
– Interet d’utiliser WITH : https://stackoverflow.com/questions/12552288/sql-with-clause-example