Thanks for your answering !
This is an edit :
SELECT 'SYSA',
t1.lieu_stkph_cd,
Sum (t1.mt_pnu_cpta_dev_rep),
Sum (t2.mt_util_dev_rep)
FROM (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd,
a.mt_pnu_cpta_dev_rep
FROM prod_v_ec_dossier_a_sysa c
INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
WHERE c.pma_cd = '') AS t1
LEFT JOIN (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd,
b.mt_util_dev_rep
FROM prod_v_ec_dossier_a_sysa c
INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b
ON a.dt_art = b.dt_art
AND a.id_auto = b.id_auto
WHERE c.pma_cd = '') AS t2
ON T1.id_auto = t2.id_auto
and T1.dt_art = T2.dt_art and t1.lieu_stkph_cd = t2.lieu_stkph_cd
GROUP BY 1,
2
This is the outcome of this query:
LIEU_STKPH_CD PNU Amount UTILIZATION AMOUNT
1 200 €
250 €
It’s not accurate, I explain:
db_ftg_srs_prod_v.v_autorisation_a is linked to db_ftg_srs_prod_v.v_utilisation_a with
— ID_AUTO
— DT_ART
but I can have 1 ID_AUTO for X UTILISATION, so with this query I will multiply by X utilisation the PNU amount, which is not correct
Authorization table
ID_AUTO PNU amount
1 100 €
Utilization table
ID_AUTO ID_UTLIZATION UTILIZATION AMOUNT
1 1
100 €
1 2
150 €
So I have to separate those value:
Expected outcome
LIEU_STKPH_CD PNU Amount UTILIZATION AMOUNT
1 100 €
250 €
Do you have any idea ?
Thanks in advance
Christophe
There is a snipped of product code that does some row check. It’s actually migrated code that came into teradata and no one has bothered to change it to be TD savvy, should I say.
This code now throws
2646 : No More spool...
Error and that is not really a spool shortage but due to data-skew as would be evident to any Teradata Master.
Code logic is plain stupid but they are running it in Prod. Code change is NOT an option now because this is production. I can rewrite it using a Simple NOT Exists and the Query will run fine.
EXPLAIN SELECT ((COALESCE(FF.SKEW_COL,-99999))) AS Cnt1,
COUNT(*) AS Cnt
FROM DB.10_BILLON_FACT FF
WHERE FF.SKEW_COL IN(
SELECT F.SKEW_COL
FROM DB.10_BILLON_FACT F
EXCEPT
SELECT D.DIM_COL
FROM DB.Smaller_DIM D
)
Its failing because it wants to redistribute on SKEW_COL. WHATEVER I DO THIS WILL NOT CHANGE. SKEW_COL is 99% skewed.
here’s the explain.FAILS ON STEP # 4.1
This query is optimized using type 2 profile insert-sel, profileid
10001.
1) First, we lock a distinct DB."pseudo table" for read on a
RowHash to prevent global deadlock for DB.F.
2) Next, we lock a distinct DB."pseudo table" for read on a
RowHash to prevent global deadlock for DB.D.
3) We lock DB.F for read, and we lock DB.D for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from DB.F by way of an
all-rows scan with no residual conditions into Spool 6
(all_amps), which is redistributed by the hash code of (
DB.F.SKEW_COL) to all AMPs. Then we
do a SORT to order Spool 6 by row hash and the sort key in
spool field1 eliminating duplicate rows. The size of Spool 6
is estimated with low confidence to be 989,301 rows (
28,689,729 bytes). The estimated time for this step is 1
minute and 36 seconds.
2) We do an all-AMPs RETRIEVE step from DB.D by way of an
all-rows scan with no residual conditions into Spool 7
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 7 by the hash code of (
DB.D.DIM_COL). The size of Spool 7 is
estimated with low confidence to be 6,118,545 rows (
177,437,805 bytes). The estimated time for this step is 0.11
seconds.
5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 7 are joined using an exclusion
merge join, with a join condition of ("Field_1 = Field_1"). The
result goes into Spool 1 (all_amps), which is built locally on the
AMPs. The size of Spool 1 is estimated with low confidence to be
494,651 rows (14,344,879 bytes). The estimated time for this step
is 3.00 seconds.
6) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
way of an all-rows scan into Spool 5 (all_amps), which is
redistributed by the hash code of (
DB.F.SKEW_COL) to all AMPs. Then we
do a SORT to order Spool 5 by row hash. The size of Spool 5
is estimated with low confidence to be 494,651 rows (
12,366,275 bytes). The estimated time for this step is 0.13
seconds.
2) We do an all-AMPs RETRIEVE step from DB.FF by way of an
all-rows scan with no residual conditions into Spool 8
(all_amps) fanned out into 24 hash join partitions, which is
built locally on the AMPs. The size of Spool 8 is estimated
with high confidence to be 2,603,284,805 rows (
54,668,980,905 bytes). The estimated time for this step is
24.40 seconds.
7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
an all-rows scan into Spool 9 (all_amps) fanned out into 24 hash
join partitions, which is duplicated on all AMPs. The size of
Spool 9 is estimated with low confidence to be 249,304,104 rows (
5,235,386,184 bytes). The estimated time for this step is 1.55
seconds.
8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to Spool 9 (Last Use) by way of an
all-rows scan. Spool 8 and Spool 9 are joined using a inclusion
hash join of 24 partitions, with a join condition of (
"SKEW_COL = SKEW_COL"). The
result goes into Spool 4 (all_amps), which is built locally on the
AMPs. The size of Spool 4 is estimated with index join confidence
to be 1,630,304,007 rows (37,496,992,161 bytes). The estimated
time for this step is 11.92 seconds.
9) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
way of an all-rows scan , grouping by field1 (
DB.FF.SKEW_COL). Aggregate Intermediate
Results are computed globally, then placed in Spool 11. The size
of Spool 11 is estimated with low confidence to be 494,651 rows (
14,344,879 bytes). The estimated time for this step is 35.00
seconds.
10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
an all-rows scan into Spool 2 (group_amps), which is built locally
on the AMPs. The size of Spool 2 is estimated with low confidence
to be 494,651 rows (16,323,483 bytes). The estimated time for
this step is 0.01 seconds.
11) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 2 are sent back to the user as the result of
statement 1. The total estimated time is 2 minutes and 52 seconds.
There are some 900K unique values of skewed_ column and * ( interestingly there are 6 Million unique values for DIM_COL, which is why I think it is veering towards the Fact table column. But still..it knows from the Low Unique value in the bigger table, that its badly skewed )
My Q is after knowing that SKEWED_COL is 99% skewed due to a constant value like -9999 WHY does the optimizer still redistribute by this skewed column instead of using alternate PRPD approach. A similar ( but not same ) situation happened in past but when we upgraded to faster box ( more AMPS ) it went away .
Anything that comes to mind that will make it change plans. I tried most diagnostics — no result. Created a SI ( On a similar VT but it will still skew ).SKEWING is inevitable , ( You can artificially change the data — I am aware so to minimize this BUT all that is NOT after the fact. Now we are in PROD. Everything is over ) but even after it knows the Col is Skewed, why re-distribute it when other options are available
Its not the NULL value that skewing . Its a constant flag value ( probably value rep. of the NULL like -9999 that is causing the skew as I mentioned in the poster ) . If you rewrite the Q as I updated it works fine. I preferred NOT EXISTS because the latter will not need NULL CHECKING ( as a practice though from my DD knowledge — i know both cols are declared NOT NULL ) . I have updated the Poster with an alternative code that will work ( though like I explained — i finalized with the NOT exists version)
Select count(*) , f.SKEW_COL
from (
select ff.SKEW_COL
from DB.10_BILLON_FACT ff
where ff.SKEW_COL not in (
select d.DIM_COL
from DB.Smaller_DIM d )) as f
Group by f.SKEW_COL
Can I not get the optimizer query rewrite feature to think through the Q and rewrite with above logic. The above will NOT redistribute but JUST SORT By the Skewed Column
Существует фрагмент кода продукта, который выполняет некоторую проверку строк. Это фактически перенесенный код, который вошел в teradata, и никто не удосужился изменить его, чтобы он был здравым смыслом TD, если я скажу. Этот код сейчас выкидывает
2646 : No More spool...
Ошибка, и это на самом деле не недостаток катушки, а из-за перекоса данных, что было бы очевидно любому мастеру Teradata.
Логика кода просто глупая, но они запускают ее в Prod. Изменение кода не является вариантом сейчас, потому что это производство. Я могу переписать его, используя Simple NOT Exists, и запрос будет работать нормально.
EXPLAIN SELECT ((COALESCE(FF.SKEW_COL,-99999))) AS Cnt1,
COUNT(*) AS Cnt
FROM DB.10_BILLON_FACT FF
WHERE FF.SKEW_COL IN(
SELECT F.SKEW_COL
FROM DB.10_BILLON_FACT F
EXCEPT
SELECT D.DIM_COL
FROM DB.Smaller_DIM D
)
Его сбой, потому что он хочет перераспределить на SKEW_COL. Что бы я ни делал, ЭТО НЕ ИЗМЕНИТСЯ. SKEW_COL перекошен на 99%.
вот объяснение.СБОЙ НА ШАГЕ № 4.1
This query is optimized using type 2 profile insert-sel, profileid
10001.
1) First, we lock a distinct DB."pseudo table" for read on a
RowHash to prevent global deadlock for DB.F.
2) Next, we lock a distinct DB."pseudo table" for read on a
RowHash to prevent global deadlock for DB.D.
3) We lock DB.F for read, and we lock DB.D for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from DB.F by way of an
all-rows scan with no residual conditions into Spool 6
(all_amps), which is redistributed by the hash code of (
DB.F.SKEW_COL) to all AMPs. Then we
do a SORT to order Spool 6 by row hash and the sort key in
spool field1 eliminating duplicate rows. The size of Spool 6
is estimated with low confidence to be 989,301 rows (
28,689,729 bytes). The estimated time for this step is 1
minute and 36 seconds.
2) We do an all-AMPs RETRIEVE step from DB.D by way of an
all-rows scan with no residual conditions into Spool 7
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 7 by the hash code of (
DB.D.DIM_COL). The size of Spool 7 is
estimated with low confidence to be 6,118,545 rows (
177,437,805 bytes). The estimated time for this step is 0.11
seconds.
5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 7 are joined using an exclusion
merge join, with a join condition of ("Field_1 = Field_1"). The
result goes into Spool 1 (all_amps), which is built locally on the
AMPs. The size of Spool 1 is estimated with low confidence to be
494,651 rows (14,344,879 bytes). The estimated time for this step
is 3.00 seconds.
6) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
way of an all-rows scan into Spool 5 (all_amps), which is
redistributed by the hash code of (
DB.F.SKEW_COL) to all AMPs. Then we
do a SORT to order Spool 5 by row hash. The size of Spool 5
is estimated with low confidence to be 494,651 rows (
12,366,275 bytes). The estimated time for this step is 0.13
seconds.
2) We do an all-AMPs RETRIEVE step from DB.FF by way of an
all-rows scan with no residual conditions into Spool 8
(all_amps) fanned out into 24 hash join partitions, which is
built locally on the AMPs. The size of Spool 8 is estimated
with high confidence to be 2,603,284,805 rows (
54,668,980,905 bytes). The estimated time for this step is
24.40 seconds.
7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
an all-rows scan into Spool 9 (all_amps) fanned out into 24 hash
join partitions, which is duplicated on all AMPs. The size of
Spool 9 is estimated with low confidence to be 249,304,104 rows (
5,235,386,184 bytes). The estimated time for this step is 1.55
seconds.
8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to Spool 9 (Last Use) by way of an
all-rows scan. Spool 8 and Spool 9 are joined using a inclusion
hash join of 24 partitions, with a join condition of (
"SKEW_COL = SKEW_COL"). The
result goes into Spool 4 (all_amps), which is built locally on the
AMPs. The size of Spool 4 is estimated with index join confidence
to be 1,630,304,007 rows (37,496,992,161 bytes). The estimated
time for this step is 11.92 seconds.
9) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
way of an all-rows scan , grouping by field1 (
DB.FF.SKEW_COL). Aggregate Intermediate
Results are computed globally, then placed in Spool 11. The size
of Spool 11 is estimated with low confidence to be 494,651 rows (
14,344,879 bytes). The estimated time for this step is 35.00
seconds.
10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
an all-rows scan into Spool 2 (group_amps), which is built locally
on the AMPs. The size of Spool 2 is estimated with low confidence
to be 494,651 rows (16,323,483 bytes). The estimated time for
this step is 0.01 seconds.
11) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 2 are sent back to the user as the result of
statement 1. The total estimated time is 2 minutes and 52 seconds.
Есть около 900K уникальных значений для столбца skewed_ и * (интересно, что для DIM_COL есть 6 миллионов уникальных значений, поэтому я думаю, что он поворачивается к столбцу таблицы фактов. Но все же… он знает по значению Low Unique в большем таблица, это плохо перекосило) Мой вопрос после того, как я узнал, что SKEWED_COL перекошен на 99% из-за постоянного значения, например -9999 ПОЧЕМУ оптимизатор все еще перераспределяет по этому перекосу столбца вместо использования альтернативного подхода PRPD. Подобная (но не та же самая) ситуация случалась в прошлом, но когда мы перешли на более быструю коробку (больше AMPS), она ушла.
Все, что приходит на ум, заставит его изменить планы. Я пытался большинство диагностики — безрезультатно. Создан SI (На подобном VT, но он все еще будет искажен).SKEWING неизбежен, (Вы можете искусственно изменить данные — я знаю, чтобы минимизировать это, НО все, что НЕ после факта. Теперь мы в PROD. Все окончен) но даже после того, как он знает, что Col Skewed, зачем перераспределять его, когда доступны другие варианты
Это не значение NULL, которое искажает. Это постоянное значение флага (вероятно, значение rep. NULL, например -9999, которое вызывает перекос, как я упоминал на постере). Если вы перепишите Q, как я обновил, он работает нормально. Я предпочел NOT EXISTS, потому что последний не будет нуждаться в проверке NULL (как практика, хотя из моих знаний DD — я знаю, что оба col объявлены как NOT NULL) . Я обновил Плакат альтернативным кодом, который будет работать (хотя, как я объяснил, я завершил работу с версией НЕ существует)
Select count(*) , f.SKEW_COL
from (
select ff.SKEW_COL
from DB.10_BILLON_FACT ff
where ff.SKEW_COL not in (
select d.DIM_COL
from DB.Smaller_DIM d )) as f
Group by f.SKEW_COL
Могу ли я не заставить функцию переписывания запросов оптимизатора продумать вопрос Q и переписать с помощью приведенной выше логики. Вышеуказанное НЕ будет перераспределять, но ПРОСТО СОРТИРОВАТЬ по перекошенной колонке
Спасибо за ваш ответ!
Это редактирование:
SELECT 'SYSA',
t1.lieu_stkph_cd,
Sum (t1.mt_pnu_cpta_dev_rep),
Sum (t2.mt_util_dev_rep)
FROM (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd,
a.mt_pnu_cpta_dev_rep
FROM prod_v_ec_dossier_a_sysa c
INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
WHERE c.pma_cd = '') AS t1
LEFT JOIN (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd,
b.mt_util_dev_rep
FROM prod_v_ec_dossier_a_sysa c
INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b
ON a.dt_art = b.dt_art
AND a.id_auto = b.id_auto
WHERE c.pma_cd = '') AS t2
ON T1.id_auto = t2.id_auto
and T1.dt_art = T2.dt_art and t1.lieu_stkph_cd = t2.lieu_stkph_cd
GROUP BY 1,
2
Это результат этого запроса:
LIEU_STKPH_CD PNU Сумма UTILIZATION AMOUNT
1 200 € 250 €
Это не точно, я объясняю:
Db_ftg_srs_prod_v.v_autorisation_a связан с db_ftg_srs_prod_v.v_utilisation_a с
— ID_AUTO
— DT_ART
но у меня может быть 1 ID_AUTO для X UTILIZATION, поэтому с помощью этого запроса я умножу на X использование суммы PNU, что не правильно
Таблица авторизации
ID_AUTO сумма PNU
1 100 €
Таблица использования
ID_AUTO ID_UTLIZATION AMTUNT UTILIZATION
1 1 100 €
1 2 150 €
Поэтому я должен отделить эти значения:
Ожидаемый результат
LIEU_STKPH_CD PNU Сумма UTILIZATION AMOUNT
1 100 € 250 €
Есть ли у вас какие-либо идеи ?
Заранее спасибо
Christophe
3 ответа
Лучший ответ
Есть несколько способов избавиться от умноженных значений, например, агрегирование перед объединением
SELECT 'SYSA',
t1.lieu_stkph_cd,
t1.mt_pnu_cpta_dev_rep,
t2.mt_util_dev_rep
FROM (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd,
a.mt_pnu_cpta_dev_rep
FROM prod_v_ec_dossier_a_sysa c
INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
WHERE c.pma_cd = '') AS t1
LEFT JOIN (SELECT a.id_auto, a.dt_art, c.lieu_stkph_cd,
Sum(b.mt_util_dev_rep) AS mt_util_dev_rep
FROM prod_v_ec_dossier_a_sysa c
INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b
ON a.dt_art = b.dt_art
AND a.id_auto = b.id_auto
WHERE c.pma_cd = ''
GROUP BY 1,
2 ) AS t2
ON T1.id_auto = t2.id_auto
AND T1.dt_art = T2.dt_art AND t1.lieu_stkph_cd = t2.lieu_stkph_cd
Но кажется, что вам не нужно объединять две производные таблицы, это должно вернуть тот же результат:
SELECT 'SYSA',
t1.lieu_stkph_cd,
-- this value is multiplied by the number of rows
-- so simply divide by that number to revert the multiplication
Sum (a.mt_pnu_cpta_dev_rep) / Count(*),
Sum (b.mt_util_dev_rep)
FROM prod_v_ec_dossier_a_sysa c
JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
JOIN db_ftg_srs_prod_v.v_utilisation_a b
ON a.dt_art = b.dt_art
AND a.id_auto = b.id_auto
WHERE c.pma_cd = ''
GROUP BY 1,
2
0
dnoeth
2 Ноя 2017 в 16:51
Ваш первый запрос в порядке, потому что вы делаете
SELECT ( select ... ) as field 1,
( select ... ) as field 2,
Но ваше второе, вы делаете кросс-соединение
SELECT *
FROM ( select ... ) as query1,
( select ... ) as query2
Это создает запрос с query1 x query2 строк
Вы хотите:
SELECT query.*
FROM ( SELECT ( select ... ) as field1,
( select ... ) as field2
.....
) as query
0
Juan Carlos Oropeza
31 Окт 2017 в 15:45
Делая большие предположения о связи ваших данных. Вы должны присоединиться, используя фактическое предложение JOIN
с ON
, чтобы объяснить, как эти два подзапроса должны быть объединены. Что-то вроде:
SELECT 'SYSA',
t1.lieu_stkph_cd,
Sum (t1.mt_pnu_cpta_dev_rep),
Sum (t2.mt_util_dev_rep)
FROM (SELECT c.lieu_stkph_cd,
a.mt_pnu_cpta_dev_rep
FROM prod_v_ec_dossier_a_sysa c
INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
WHERE c.pma_cd = '') AS t1
INNER JOIN (SELECT c.lieu_stkph_cd,
b.mt_util_dev_rep
FROM prod_v_ec_dossier_a_sysa c
INNER JOIN db_ftg_srs_prod_v.v_autorisation_a a
ON a.id_doss = c.dosscta_no
AND a.cd_prd_cpta = c.prct_no
AND a.cd_entite_cpta = c.entite_cd
INNER JOIN db_ftg_srs_prod_v.v_utilisation_a b
ON a.dt_art = b.dt_art
AND a.id_auto = b.id_auto
WHERE c.pma_cd = '') AS t2
ON T1.lieu_stkph_cd = t2.lieu_stkph_cd
GROUP BY 1,
2
1
JNevill
31 Окт 2017 в 15:45
Loading Application…
Tracking Consent
PDFs
Site Feedback
Help