hive中解析json字符串常用的函数是get_json_object
, 但对于数组形式存储的, 如下名字marked_json的字段:
[{"id":559,"labelType":3,"labelCode":"BUMANYI","labelName":"不满意","isGrouped":1},{"id":560,"labelType":3,"labelCode":"YTSBCW","labelName":"意图识别错误","isGrouped":1}]
平时在需求中, 常会有要提取其中多个id的需求, 碰到这种情形, 当然要感慨, 如果系每个json串是array的一个子元素多好啊.
这种感慨, 也就是处理的思路, 既然还不是array, 能不能分离成array?
于是乎, 掐头去尾后以'},{'
分隔, split处理形成数组.
split(substring(marked_json,3,length(marked_json)-4),'\\},\\{')
lateral view explode
语法。
lateral view explode(split(substring(marked_json,3,length(marked_json)-4),'\\},\\{')) view_table_name as mj
"id":559,"labelType":3,"labelCode":"BUMANYI","labelName":"不满意","isGrouped":1
这种光光秃秃的字符串, 要想提取其中的信息, 还要再转回json格式 concat('{', mj, '}')
下面, 可以提取我们感兴的内容了, 比如id:
select get_json_object(concat('{', mj, '}'), '$.id') as id from hive_table lateral view explode(split(substring(marked_json,3,length(marked_json)-4),'\\},\\{')) vv as mj
select get_json_object(concat('{"jl":',marked_json,'}'),'$.jl[0].id'),get_json_object(concat('{"jl":',marked_json,'}'),'$.jl[1].id') from hive_table
zh6kj7
zh6kj7