Hive中的复合数据结构以及函数的用法说明是什么

本篇文章为大家展示了Hive中的复合数据结构以及函数的用法说明是什么，内容简明扼要并且容易理解，绝对能使你眼前一亮，通过这篇文章的详细介绍希望你能有所收获。目前 hive 支持的复合数据类型有以下几种：map
(key1, value1, key2, value2, …) Creates a map with the given ke开发云主机域名y/value pairs
struct
(val1, val2, val3, …) Creates a struct with the given field values. Struct field names will be col1, col2, …
named_struct
(name1, val1, name2, val2, …) Creates a struct with the given field names and values. (as of Hive 0.8.0)
array
(val1, val2, …) Creates an array with the given elements
create_union
(tag, val1, val2, …) Creates a union type with the value that is being pointed to by the tag parameter常见的函数就不废话了，和标准sql类似，下面我们要聊到的基本是HQL里面专有的函数，hive里面的函数大致分为如下几种：Built-in、Misc.、UDF、UDTF、UDAF我们就挑几个标准SQL里没有，但是在HIVE SQL在做统计分析常用到的来说吧。这是内置的对集合进行操作的函数，用法举例：其中建表所用的测试数据你可以用如下链接的脚本自动生成：http://my.oschina.net/leejun2005/blog/76631测试数据：first {“store”:{“fruit”:[{“weight”:8,”type”:”apple”},{“weight”:9,”type”:”pear”}],”bicycle”:{“price”:19.951,”color”:”red1″}},”email”:”amy@only_for_json_udf_test.net”,”owner”:”amy1″} third
first {“store”:{“fruit”:[{“weight”:9,”type”:”apple”},{“weight”:91,”type”:”pear”}],”bicycle”:{“price”:19.952,”color”:”red2″}},”email”:”amy@only_for_json_udf_test.net”,”owner”:”amy2″} third
first {“store”:{“fruit”:[{“weight”:10,”type”:”apple”},{“weight”:911,”type”:”pear”}],”bicycle”:{“price”:19.953,”color”:”red3″}},”email”:”amy@only_for_json_udf_test.net”,”owner”:”amy3″} third这里尤其要注意UDTF的问题，官方文档有说明：json_tuple
A new json_tuple() UDTF is introduced in hive 0.7. It takes a set of names (keys) and a JSON string, and returns a tuple of values using one function. This is much more efficient than calling GET_JSON_OBJECT to retrieve more than one key from a single JSON string. In any case where a single JSON string would be parsed more than once, your query will be more efficient if you parse it once, which is what JSON_TUPLE is for. As JSON_TUPLE is a UDTF, you will need to use the LATERAL VIEW syntax in order to achieve the same goal.

For example,should be changed toUDTF(User-Defined Table-Generating Functions) 用来解决输入一行输出多行(On-to-many maping) 的需求。通过Lateral view可以方便的将UDTF得到的行转列的结果集合在一起提供服务，因为直接在SELECT使用UDTF会存在限制，即仅仅能包含单个字段，不光是多个UDTF，仅仅单个UDTF加上其他字段也是不可以，hive提示在UDTF中仅仅能有单一的表达式。如下：
hive> select my_test(“abcef:aa”) as qq,’abcd’ from sunwg01;
FAILED: Error in semantic analysis: Only a single expression in the SELECT clause is supported with UDTF’s使用Lateral view可以实现上面的需求，Lateral view语法如下：
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)*
fromClause: FROM baseTable (lateralView)*
hive> create table sunwg ( a array, b array )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ‘t’
> COLLECTION ITEMS TERMINATED BY ‘,’;
OK
Time taken: 1.145 seconds
hive> load data local inpath ‘/home/hjl/sunwg/sunwg.txt’ overwrite into table sunwg;
Copying data from file:/home/hjl/sunwg/sunwg.txt
Loading data to table sunwg
OK
Time taken: 0.162 seconds
hive> select * from sunwg;
OK
[10,11] [“tom”,”mary”]
[20,21] [“kate”,”tim”]
Time taken: 0.069 seconds
hive>
> SELECT a, name
> FROM sunwg LATERAL VIEW explode(b) r1 AS name;
OK
[10,11] tom
[10,11] mary
[20,21] kate
[20,21] tim
Time taken: 8.497 seconds

hive> SELECT id, name
> FROM sunwg LATERAL VIEW explode(a) r1 AS id
> LATERAL VIEW explode(b) r2 AS name;
OK
10 tom
10 mary
11 tom
11 mary
20 kate
20 tim
21 kate
21 tim
Time taken: 9.687 seconds测试数据：url1 http://facebook.com/path2/p.php?k1=v1&k2=v2#Ref1
url2 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject
url3 https://www.google.com.hk/#hl=zh-CN&newwindow=1&safe=strict&q=hive+translate+example&oq=hive+translate+example&gs_l=serp.3…10174.11861.6.12051.8.8.0.0.0.0.132.883.0j7.7.0…0.0…1c.1j4.8.serp.0B9C1T_n0Hs&bav=on.2,or.&bvm=bv.44770516,d.aGc&fp=e13e41a6b9dab3f6&biw=1241&bih=589结果：url1 facebook.com /path2/p.php k1=v1&k2=v2 v1
url2 cwiki.apache.org /confluence/display/Hive/LanguageManual+UDF NULL NULL
url3 www.google.com.hk / NULL NULLexplode 是一个 hive 内置的表生成函数：Built-in Table-Generating Functions (UDTF)，主要是解决1 to N 的问题，即它可以把一行输入拆成多行，比如一个 array 的每个元素拆成一行，作为一个虚表输出。它有如下需要注意的地方：从上面的原理与语法上可知，select 列中不能 udtf 和其它非 udtf 列混用，udtf 不能嵌套，不支持GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY还有 select 中出现的 udtf 一定需要列别名，否则会报错：lateral view是Hive中提供给UDTF的conjunction，它可以解决UDTF不能添加额外的select列的问题。当我们想对hive表中某一列进行split之后，想对其转换成1 to N的模式，即一行转多列。hive不允许我们在UDTF函数之外，再添加其它select语句。如下，我们想将登录某个游戏的用户id放在一个字段user_ids里，对每一行数据用UDTF后输出多行。提示语法分析错误，UDTF不支持函数之外的select 语句，如果我们想支持怎么办呢？接下来就是Lateral View 登场的时候了。Lateral view 其实就是用来和像类似explode这种UDTF函数联用的。lateral view 会将UDTF生成的结果放到一个虚拟表中，然后这个虚拟表（1 to N）会和输入行即每个game_id进行join 来达到连接UDTF外的select字段的目的（源表和拆分的虚表按行做行内1 join N 的直接连接），这也是为什么 LATERAL VIEW udtf(expression) 后面需要表别名和列别名的原因。Lateral View SyntaxlateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)*fromClause: FROM baseTable (lateralView)*可以看出，可以在2个地方用Lateral view：在udtf前面用在from baseTable后面用例如：pageid adid_listfront_page [1, 2, 3]contact_page [3, 4, 5]pageid adidfront_page 1front_page 2front_page 3contact_page 3contact_page 4contact_page 5From语句后可以跟多个Lateral View。A FROM clause can have multiple LATERAL VIEW clauses. Subsequent LATERAL VIEWS can reference columns from any of the tables appearing to the left of the LATERAL VIEW.给定数据：Array col1 Array col2[1, 2] [a”, “b”, “c”][3, 4] [d”, “e”, “f”]转换目标：想同时把第一列和第二列拆开，类似做笛卡尔乘积。我们可以这样写：还有一种情况，如果UDTF转换的Array是空的怎么办呢？在Hive0.12里面会支持outer关键字，如果UDTF的结果是空，默认会被忽略输出。如果加上outer关键字，则会像left outer join 一样，还是会输出select出的列，而UDTF的输出结果是NULL。Lateral View通常和UDTF一起出现，为了解决UDTF不允许在select字段的问题。Multiple Lateral View可以实现类似笛卡尔乘积。Outer关键字可以把不输出的UDTF的空结果，输出成NULL，防止丢失数据。上述内容就是Hive中的复合数据结构以及函数的用法说明是什么，你们学到知识或技能了吗？如果还想学到更多技能或者丰富自己的知识储备，欢迎关注开发云行业资讯频道。

相关推荐: 虚拟主机上传东西为什么很慢

虚拟主机上传东西为什么很慢？很多新手对此不是很清楚，为了帮助大家解决这个难题，下面小编将为大家详细讲解，有这方面需求的人可以来学习下，希望你能有所收获。有时候我们使用FTP向虚拟主机上传东西速度会很慢，这里我们分析及解决下上传速度慢的原因。影响FTP传输速度的…