13-setup-kyuubi.md
1 # Use Kyuubi JDBC to Access Lakesoul's Table 2 3 <!-- 4 SPDX-FileCopyrightText: 2023 LakeSoul Contributors 5 6 SPDX-License-Identifier: Apache-2.0 7 --> 8 9 :::tip 10 Available since version 2.4. 11 ::: 12 13 LakeSoul implements Flink/Spark Connector.We could use Spark/Flink SQL queries towards Lakesoul by using kyuubi. 14 15 16 ## Requirements 17 18 |Components | Version | 19 |-----------|--------------| 20 | Kyuubi | 1.8 | 21 | Spark | 3.3 | 22 | Flink | 1.20 | 23 | LakeSoul | VAR::VERSION | 24 | Java | 1.8 | 25 26 The operating environment is Linux, and Spark, Flink, and Kyuubi have been installed. It is recommended to use Hadoop Yarn to run the Kyuubi Engine. Also, you could start local spark/flink cluster. 27 28 [Deploy Kyuubi engines on Yarn](https://kyuubi.readthedocs.io/en/v1.7.3/deployment/engine_on_yarn.html). 29 30 ## Flink SQL Access Lakesoul's Table 31 32 ### 1. Dependencies 33 34 Download LakeSoul Flink Jar from: https://github.com/lakesoul-io/LakeSoul/releases/download/vVAR::VERSION/lakesoul-flink-1.20-VAR::VERSION.jar 35 36 And put the jar file under `$FLINK_HOME/lib`. 37 38 ### 2. Configurations 39 40 Please set the PG parameters related to Lakesoul according to this document: 41 [Setup Metadata Database Connection for Flink](02-setup-spark.md#setup-metadata-database-connection-for-flink) 42 43 After this, you could start flink session cluster or application as usual. 44 45 ### 3. LakeSoul Operations 46 47 Use Kyuubi beeline to connect Flink SQL Engine: 48 49 ```bash 50 $KYUUBI_HOME/bin/beeline -u 'jdbc:hive2://localhost:10009/default;user=admin;?kyuubi.engine.type=FLINK_SQL' 51 ``` 52 Flink SQL Access Lakesoul : 53 54 ```SQL 55 create catalog lakesoul with('type'='lakesoul'); 56 use catalog lakesoul; 57 use `default`; 58 59 create table if not exists test_lakesoul_table_v1 (`id` INT, name STRING, score INT,`date` STRING,region STRING, PRIMARY KEY (`id`,`name`) NOT ENFORCED ) PARTITIONED BY (`region`,`date`) WITH ( 'connector'='lakeSoul', 'use_cdc'='true','format'='lakesoul', 'path'='hdfs:///lakesoul-test-bucket/default/test_lakesoul_table_v1/', 'hashBucketNum'='4'); 60 61 insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (1,'AAA', 100, '2023-05-11', 'China'); 62 insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (2,'BBB', 100, '2023-05-11', 'China'); 63 insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (3,'AAA', 98, '2023-05-10', 'China'); 64 65 select * from `lakesoul`.`default`.test_lakesoul_table_v1 limit 1; 66 67 drop table `lakesoul`.`default`.test_lakesoul_table_v1; 68 ``` 69 You could replace the location schema from `hdfs://` to `file://` . 70 71 More details about Flink SQL with LakeSoul refer to : [Flink Lakesoul Connector](./06-flink-lakesoul-connector.md) 72 73 ## Spark SQL Access Lakesoul's Table 74 75 ### 1. Dependencies 76 77 Download LakeSoul Spark Jar from: https://github.com/lakesoul-io/LakeSoul/releases/download/vVAR::VERSION/lakesoul-spark-3.3-VAR::VERSION.jar 78 79 And put the jar file under `$SPARK_HOME/jars`. 80 81 ### 2. Configurations 82 1. Please set the PG parameters related to Lakesoul according to this document: 83 [Setup Metadata Database Connection for Spark](02-setup-spark.md#pass-lakesoul_home-environment-variable-to-your-spark-job) 84 2. Add spark conf to `$SPARK_CONF_DIR/spark-defaults.conf` 85 86 ``` 87 spark.sql.extensions=com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension 88 89 spark.sql.catalog.lakesoul=org.apache.spark.sql.lakesoul.catalog.LakeSoulCatalog 90 91 spark.sql.defaultCatalog=lakesoul 92 93 spark.sql.caseSensitive=false 94 spark.sql.legacy.parquet.nanosAsLong=false 95 ``` 96 97 ### 3. LakeSoul Operations 98 Use Kyuubi beeline to connect Spark SQL Engine: 99 100 ```bash 101 $KYUUBI_HOME/bin/beeline -u 'jdbc:hive2://localhost:10009/default;user=admin;?kyuubi.engine.type=SPARK_SQL' 102 ``` 103 Spark SQL Access Lakesoul : 104 105 ```SQL 106 use default; 107 108 create table if not exists test_lakesoul_table_v2 (id INT, name STRING, score INT, date STRING,region STRING) USING lakesoul PARTITIONED BY (region,date) LOCATION 'hdfs:///lakesoul-test-bucket/default/test_lakesoul_table_v2/'; 109 110 insert into test_lakesoul_table_v2 values (1,'AAA', 100, '2023-05-11', 'China'); 111 insert into test_lakesoul_table_v2 values (2,'BBB', 100, '2023-05-11', 'China'); 112 insert into test_lakesoul_table_v2 values (3,'AAA', 98, '2023-05-10', 'China'); 113 114 select * from test_lakesoul_table_v2 limit 1; 115 116 drop table test_lakesoul_table_v2; 117 ``` 118 You could replace the location schema from `hdfs://` to `file://` . 119 120 More details about Spark SQL with LakeSoul refer to : [Operate LakeSoulTable by Spark SQL](./03-spark-api-docs.md#7-operate-lakesoultable-by-spark-sql)