Impala SQL常见报错问题排查与解决记录

勤奋不是嘴上说说而已,而是实际的行动,在勤奋的苦度中持之以恒,永不退却。业精于勤,荒于嬉;行成于思,毁于随。在人生的仕途上,我们毫不迟疑地选择勤奋,她是几乎于世界上一切成就的催产婆。只要我们拥着勤奋去思考,拥着勤奋的手去耕耘,用抱勤奋的心去对待工作,浪迹红尘而坚韧不拔,那么,我们的生命就会绽放火花,让人生的时光更加的闪亮而精彩。

导读:本篇文章讲解 Impala SQL常见报错问题排查与解决记录,希望对大家有帮助,欢迎收藏,转发!站点地址:www.bmabk.com,来源:原文

背景

最近在全权负责的一款数据产品,基于Hive-JDBC执行用户提交的Impala SQL,实现数据集,数据监控,数据推送等功能。Hive-JDBC版本2.1.1-cdh6.2.1

<dependency>
	<groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>2.1.1-cdh6.2.1</version>
</dependency>

报错

ParseException: Syntax error in line 34:sum(enable)Encountered: ENABLE.Expected: ALL…

之前CDH版本执行无误的SQL,在CDH集群升级到7.1.7版本后执行失败,具体报错信息如下:ParseException: Syntax error in line 34:\nsum(enable) as 辅助\n ^\nEncountered: ENABLE\nExpected: ALL, CASE, CAST, DATE, DEFAULT, DISTINCT, EXISTS, FALSE, IF, INTERVAL, LEFT, NOT, NULL, REPLACE, RIGHT, TRUNCATE, TRUE, IDENTIFIER\n\nCAUSED BY: Exception: Syntax error
即,enable是一个敏感关键词,报语法错误。

解决方法:加上反单引号,即:

sum(`enable`)

注:在SQL第一行加set ENABLE_EXPR_REWRITES = 0;不能解决问题。

关于set ENABLE_EXPR_REWRITES = 0;,请参考set ENABLE_EXPR_REWRITES = 0及GROUP BY expression must not contain aggregate functions等问题解决

default.url_decode() unknown for database default. Currently this db has 4 functions.

SQL里面包含url_decode()函数,SQL执行失败。如图,非hive-jdbc问题,在可视化查询界面执行也是失败的:
在这里插入图片描述

TransmitData() to ip:27000 failed: Network error: Client connection negotiation failed: client connection to ip:27000: connect: Connection refused (error 111)

节点TCP通信问题,手动重试。参考TransmitData 27000 Network-error

Failed to close HDFS file.The DiskSpace quota of is exceeded

具体的报错信息如下:

Failed to close HDFS file: hdfs://ppdhdpha/user/hive/warehouse/lkzc.db/jh_100701_cdd_noapply_cutdown_user/_impala_insert_staging//.0.parq\\nError(255): Unknown error 255\\nRoot cause: RemoteException: The DiskSpace quota of /user/hive/warehouse/lkzc.db is exceeded: quota = 65970697666560 B = 60 TB but diskspace consumed = 65971183008777 B = 60.00 TB

lkzc业务线的库空间满,无法继续写入数据,见上面的报错语句里_impala_insert_staging提示。

解决方案:删除无用的大表,释放空间。

The library hdfs:/udf/hive.udf-1.0.2.jar last modified time does not match the expected last modified time. Run ‘refresh functions <db name>’.

某个SQL通过hive-jdbc提交到大数据平台,报错如下:

java.lang.Exception: java.sql.SQLException: The library hdfs:///udf/hive.udf-1.0.2.jar last modified time 1670929674 does not match the expected last modified time 1653377949. Run 'refresh functions <db name>'.

很莫名其妙的报错。偶现。再次执行没有问题。

Failed to close HDFS file.Unable to close file because the last blockBP does not have enough number of replicas.

报错信息:

java.sql.SQLException: Failed to close HDFS file: hdfs:/user/hive/warehouse/lkzc.db/jh_061201_ffhywkt_chuda/_impala_insert_staging/
Error(255): Unknown error 255
Root cause: IOException: Unable to close file because the last blockBP does not have enough number of replicas.

Failed due to unreachable impalad(s): hadoopcbd008156.ppdgdsl.com:2200

SQL提交到大数据impala集群某个节点,而这个节点刚好有点网络异常等原因,导致unreachable。
暂时没有比较好的解决方法:

  1. 手动retry,再提交一次SQL,大概率(?)就不会到这个节点;
  2. 程序增加自动retry逻辑,针对特定类型的报错才有retry逻辑;retry次数不能过多,比如1~2次即可。因为平台用户提交的SQL多达上w条,需要考虑用户SQL自动重试后对impala集群资源的使用率问题

org.apache.hive.service.cli.HiveSQLException: Invalid or unknown query handle

SQLException: Cancelled

具体报错信息如下:

java.sql.SQLException: Cancelled
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:260)
	at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:473)
	at com.alibaba.druid.pool.DruidPooledStatement.executeQuery(DruidPooledStatement.java:308)

通过hive-jdbc提交到大数据平台的SQL执行报错:Cancelled,没有更多报错日志。原因:大数据平台节点资源紧张,直接抛弃任务。
区别于下面的Cancelled from Impala’s debug web interface报错。
解决方法:重试,优化SQL。貌似没有更好的方法。

Cancelled in ScannerContext

Cancelled from Impala’s debug web interface

具体报错信息如下:

java.sql.SQLException: Cancelled from Impala's debug web interface
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:356)
	at com.alibaba.druid.pool.DruidPooledResultSet.next(DruidPooledResultSet.java:68)

解决方法:重试,优化SQL。貌似没有更好的方法。

Invalid query handle: xxxx.yyyy

报错信息如下:

ERROR c.a.druid.pool.DruidPooledStatement - clearResultSet error
org.apache.hive.service.cli.HiveSQLException: Invalid query handle: d84d9133d8a6ce9c:9a77cd100000000
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:266)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:252)
	at org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:210)
	at org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:221)
	at org.apache.hive.jdbc.HiveQueryResultSet.close(HiveQueryResultSet.java:308)
	at com.alibaba.druid.pool.DruidPooledResultSet.close(DruidPooledResultSet.java:86)
	at com.alibaba.druid.pool.DruidPooledStatement.clearResultSet(DruidPooledStatement.java:206)
	at com.alibaba.druid.pool.DruidPooledStatement.close(DruidPooledStatement.java:514)

报错代码片段:

finally {
    if (ps != null) {
        ps.close();
    }
    if (con != null) {
        con.close();
    }
}

发生在Statement.close()处。

任务超时被kill。大数据平台资源有限,不可能让用户的查询SQL无限期执行下去。任务查杀规则综合考虑3方面因素(查询数据量 + 执行时间 + 完成度):
在这里插入图片描述
针对数据块这一问题,非常不建议使用select *,而列出具体的查询字段,另外加上limit 10000字句限制条数。

SQLException: Sender timed out waiting for receiver fragment instance: , dest node: 66

具体的报错信息如下:

java.sql.SQLException: Sender timed out waiting for receiver fragment instance: 394c696029ddcce6:a51b7cab000007cc, dest node: 66
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:260)
	at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:473)
	at com.alibaba.druid.pool.DruidPooledStatement.executeQuery(DruidPooledStatement.java:308)

有时候重试成功,有时候失败。登录到Hadoop集群机器里,查看任务调度执行日志:
在这里插入图片描述
发现这个SQL,居然要全表扫描一张1.2W+分区,扫描20.1TB数据。

解决方法:重试,优化SQL。貌似没有更好的方法。

Failed to open HDFS file.No such file or directory.Root cause: RemoteException: File does not exist:

任务执行失败,具体的报错信息如下:

java.sql.SQLException: Disk I/O error: Failed to open HDFS file hdfs://ppdhdpha/user/hive/warehouse/test.db/chengzhangquanyi_huolizhiguoqi_chuda/2c43254ab60211d3-cf0e47d200000235_298950249_data.0.
Error(2): No such file or directory
Root cause: RemoteException: File does not exist: /user/hive/warehouse/test.db/chengzhangquanyi_huolizhiguoqi_chuda/2c43254ab60211d3-cf0e47d200000235_298950249_data.0.
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

报错原因:数据表不存在。尝试方法:refresh tableinvalidate table

Error(s) deleting partition directories. First error (of 37) was: Hdfs op. Input/output error

java.sql.SQLException: Error(s) deleting partition directories. First error (of 37) was: Hdfs op (DELETE hdfs://ppdhdpha/user/hive/warehouse/cszc.db/zmj_sop_m0_base_snp/1d460f3a4d87ea14-a4c4521100000091_870332460_data.0.) failed, error was: hdfs://ppdhdpha/user/hive/warehouse/cszc.db/zmj_sop_m0_base_snp/1d460f3a4d87ea14-a4c4521100000091_870332460_data.0.
Error(5): Input/output error

解决方法:重试,优化SQL。貌似没有更好的方法。

java.sql.SQLException: Admission for query exceeded timeout 180000ms in pool root.report. Queued reason: queue is not empty (size 5); queued queries are executed first.

具体报错信息:

ERROR c.alibaba.druid.pool.DruidDataSource - create connection SQLException, url: jdbc:hive2://111.222.333.66:21050/default, errorCode 0, state HY000
java.sql.SQLException: Admission for query exceeded timeout 180000ms in pool root.report. Queued reason: queue is not empty (size 4); queued queries are executed first.
	at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:260)
	at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:473)
	at com.alibaba.druid.pool.DruidAbstractDataSource.validateConnection(DruidAbstractDataSource.java:1434)
	at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1733)
	at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2838)

提交到大数据平台的SQL需要排队,排队超过3分钟被抛弃。

GetConnectionTimeoutException & Rejected query from pool root.report: queue full, limit=200, num_queued=200.

具体报错信息:

java.lang.Exception: java.lang.Exception: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 120000, active 1, maxActive 150, creating 0, createErrorCount 2
        at com.xy.cloudiview.common.util.JdbcUtil.getConnection(JdbcUtil.java:136)
        at com.xy.cloudiview.datasetsubscript.business.service.impl.TableWarnServiceImpl.getSqlCount(TableWarnServiceImpl.java:244)
        at com.xy.cloudiview.datasetsubscript.business.service.impl.TableWarnServiceImpl.executeTableWarnJob(TableWarnServiceImpl.java:66)
        at com.xy.cloudiview.datasetsubscript.business.xxljob.IviewTableWarnJobHandler.execute(IviewTableWarnJobHandler.java:45)
        at com.ppdai.job.core.thread.JobThread.run(JobThread.java:142)
Caused by: java.lang.Exception: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 120000, active 1, maxActive 150, creating 0, createErrorCount 2
        at com.xy.cloudiview.common.util.JdbcUtil.getConnection(JdbcUtil.java:122)
        ... 4 common frames omitted
Caused by: com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 120000, active 1, maxActive 150, creating 0, createErrorCount 2
        at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1773)
        at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1427)
        at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1407)
        at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1397)
        at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:100)
        at com.xy.cloudiview.common.util.JdbcUtil.getConnection(JdbcUtil.java:118)
        ... 4 common frames omitted
Caused by: java.sql.SQLException: Rejected query from pool root.report: queue full, limit=200, num_queued=200.
        at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:260)
        at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:473)
        at com.alibaba.druid.pool.DruidAbstractDataSource.validateConnection(DruidAbstractDataSource.java:1434)
        at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1733)
        at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2838)

大数据平台设置SQL提交数量,最多运行150个,排队200个,排队超时时间为5分钟。后续提交的SQL,排队连接超时,报错。

ImpalaRuntimeException: Error making ‘updateTableColumnStatistics’ RPC to Hive Metastore: CAUSED BY: MetaException: Could not create “increment”/“table” value-generation container SEQUENCE_TABLE since autoCreate flags do not allow it.

ImpalaRuntimeException: Error making ‘dropTable’ RPC to Hive Metastore: \nCAUSED BY: MetaException: One or more instances could not be deleted

Failed to close HDFS file: hdfs://ppdhdpha/user/hive/warehouse/xkyy.db/cdjk_apibg_willa/_impala_insert_staging/.0.parq\nError(255): Unknown error 255\nRoot cause: RemoteException: The DiskSpace quota of /user/hive/warehouse/xkyy.db is exceeded: quota = 43980465111040 B = 40 TB but diskspace consumed = 43981119305382 B = 40.00 TB

File ‘hdfs://ppdcdpha/user/hive/warehouse/usrt.db/willa_liubiao_daily’ has an invalid Parquet version number: 336\n\n. Please check that it is a valid Parquet file. This error can also occur due to stale metadata. If you believe this is a valid Parquet file, try running refresh usrt.willa_liubiao_daily

表不存在

有两类报错信息:

  1. Table does not exist:
  1. Could not resolve table reference

区别是什么???

Decompressor: invalid compressed length. Data is likely corrupt.

待执行的查询SQL依赖于表rhino_man_checked_hitrule,而表的存储文件损坏,需要重新生成表。

Expected to read a compressed text file of size 5553487 bytes. But only read 5401212 bytes. This may indicate data file corruption. (file: hdfs://hdpha/user/hive/warehouse/rpt.db/rhino_man_checked_hitrule/000000_0.snappy).

待执行的查询SQL依赖于表rhino_man_checked_hitrule,而表的存储文件损坏,需要重新生成表。

参考

Sender timed out waiting for receiver fragment instance: , dest node: 66
Invalid query handle,感觉借鉴意义不大

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/142143.html

(0)
飞熊的头像飞熊bm

相关推荐

发表回复

登录后才能评论
极客之音——专业性很强的中文编程技术网站,欢迎收藏到浏览器,订阅我们!