Avec le langage SQL standard de BigQuery, il est possible d'utiliser la clause WITH.
Ceci présente de nombreux avantages :
- Lecture du code SQL facilitée
- Process de Debugging plus facile et rapide (moins de code à tester/retravailler)
- Allegement du code SQL (pour les sous-requetes utilisées à de multiples reprises)
La clause WITH permet simplement de donner un nom à une sous-requête pour y faire reference (comme les alias pour les noms de champs ou de tables).
Ainsi, si une sous-requête est utilisée plusieurs fois et qu'on souhaite la modifier, le changement dans la clause WITH appliquera les modifications partout.
Par exemple, si on prend le code SQL ci-dessous, il peut être ecrit avec une clause WITH.
#standardSQL SELECT DISTINCT date, visitor_id, session_id, ClientId FROM ( SELECT date, fullVisitorId AS visitor_id, CONCAT(date,'-',fullVisitorId,'-',CAST(visitId AS STRING)) AS session_id, (SELECT value FROM UNNEST(hits.customDimensions) WHERE index=2 GROUP BY value) AS ClientId FROM `mon-compte-big-query.xxx.ga_sessions_*`, UNNEST(hits) as hits WHERE _TABLE_SUFFIX BETWEEN '20190101' AND '20191231' ) WHERE session_id IN ( SELECT CONCAT(date,'-',fullVisitorId,'-',CAST(visitId AS STRING)) AS session_id FROM `mon-compte-big-query.xxx.ga_sessions_*`, UNNEST(hits) as hits WHERE _TABLE_SUFFIX BETWEEN '20190101' AND '20191231' AND hits.type = 'PAGE' AND hits.page.pagePath = '/Resiliation/Infos/' ) AND ClientId != '' ORDER BY visitor_id
Code SQL réécrit avec la clause WITH :
#standardSQL WITH raw_data as ( SELECT date, fullVisitorId AS visitor_id, CONCAT(date,'-',fullVisitorId,'-',CAST(visitId AS STRING)) AS session_id, (SELECT value FROM UNNEST(hits.customDimensions) WHERE index=2 GROUP BY value) AS ClientId FROM `mon-compte-big-query.xxx.ga_sessions_*`, UNNEST(hits) as hits WHERE _TABLE_SUFFIX BETWEEN '20190101' AND '20191231' ), session_id_resiliation as ( SELECT CONCAT(date,'-',fullVisitorId,'-',CAST(visitId AS STRING)) AS session_id FROM `mon-compte-big-query.xxx.ga_sessions_*`, UNNEST(hits) as hits WHERE _TABLE_SUFFIX BETWEEN '20190101' AND '20191231' AND hits.type = 'PAGE' AND hits.page.pagePath = '/Resiliation/Infos/' ) SELECT DISTINCT date, visitor_id, session_id, ClientId FROM raw_data WHERE session_id IN session_id_resiliation AND ClientId != '' ORDER BY visitor_id
Pour plus d'information :
- Interet d'utiliser WITH : https://stackoverflow.com/questions/12552288/sql-with-clause-example