当前位置：首页 > news >正文

西安手机商城网站建设公众号营销

news 2026/4/25 3:27:40

西安手机商城网站建设,公众号营销,厚街做网站,wordpress 设置缩略图目录前言优化策略推荐使用group by代替distinct去重前言优化策略推荐使用group by代替distinct去重参考#xff1a; hive中groupby和distinct区别以及性能比较 - cnblogs数据倾斜之count(distinct) - cnblogs 重要结论#xff1a; 两者都会在map阶段count#xff0c… 目录前言优化策略推荐使用group by代替distinct去重前言优化策略推荐使用group by代替distinct去重参考 hive中groupby和distinct区别以及性能比较 - cnblogs数据倾斜之count(distinct) - cnblogs 重要结论两者都会在map阶段count但reduce阶段distinct只有一个 group by 可以有多个进行并行聚合所以group by会快。 distinct 只生成一个reducer任务所有的id都聚集到同一个reducer任务进行去重然后在聚合非常容易造成数据倾斜。distinct耗费内存可能产生OOM但效率高。 group by 将数据分组到了多个reducer上进行处理所以较快。groupby排序消耗时间更多在时间复杂度允许下空间复杂度更低。例子在一个具有5,563,985,064个记录的hive表中对其中的两个字段进行查询耗时如下 -- 耗时00:11:17 select col1,col2 from 库名xxx.表名xxx where ds20230224 group by col1,col2;-- 耗时00:25:07 select distinct col1,col2 from 库名xxx.表名xxx where ds20230224;【其他优化策略待更新】

查看全文

http://www.hkea.cn/news/14403145/