Cradicle Explorer

/ website / docs / 03-Usage Docs / 13-setup-kyuubi.md
13-setup-kyuubi.md
  1  # Use Kyuubi JDBC to Access Lakesoul's Table
  2  
  3  <!--
  4  SPDX-FileCopyrightText: 2023 LakeSoul Contributors
  5  
  6  SPDX-License-Identifier: Apache-2.0
  7  -->
  8  
  9  :::tip
 10  Available since version 2.4.
 11  :::
 12  
 13  LakeSoul implements Flink/Spark Connector.We could use Spark/Flink SQL queries towards Lakesoul by using kyuubi.
 14  
 15  
 16  ## Requirements
 17  
 18  |Components | Version      |
 19  |-----------|--------------|
 20  | Kyuubi | 1.8          |
 21  | Spark  | 3.3          |
 22  | Flink  | 1.20         |
 23  | LakeSoul | VAR::VERSION |
 24  | Java     | 1.8          |
 25  
 26  The operating environment is Linux, and Spark, Flink, and Kyuubi have been installed. It is recommended to use Hadoop Yarn to run the Kyuubi Engine. Also, you could start local spark/flink cluster.
 27  
 28  [Deploy Kyuubi engines on Yarn](https://kyuubi.readthedocs.io/en/v1.7.3/deployment/engine_on_yarn.html).
 29  
 30  ## Flink SQL Access Lakesoul's Table
 31  
 32  ### 1. Dependencies
 33  
 34  Download LakeSoul Flink Jar from: https://github.com/lakesoul-io/LakeSoul/releases/download/vVAR::VERSION/lakesoul-flink-1.20-VAR::VERSION.jar
 35  
 36  And put the jar file under `$FLINK_HOME/lib`.
 37  
 38  ### 2. Configurations
 39  
 40  Please set the PG parameters related to Lakesoul according to this document: 
 41  [Setup Metadata Database Connection for Flink](02-setup-spark.md#setup-metadata-database-connection-for-flink)
 42  
 43   After this, you could start flink session cluster or application as usual.
 44  
 45  ### 3. LakeSoul Operations
 46  
 47  Use Kyuubi beeline to connect Flink SQL Engine:
 48  
 49  ```bash
 50  $KYUUBI_HOME/bin/beeline -u 'jdbc:hive2://localhost:10009/default;user=admin;?kyuubi.engine.type=FLINK_SQL'
 51  ```
 52  Flink SQL Access Lakesoul : 
 53  
 54  ```SQL
 55  create catalog lakesoul with('type'='lakesoul');
 56  use catalog lakesoul;
 57  use `default`;
 58  
 59  create table if not exists test_lakesoul_table_v1 (`id` INT, name STRING, score INT,`date` STRING,region STRING, PRIMARY KEY (`id`,`name`) NOT ENFORCED ) PARTITIONED BY (`region`,`date`) WITH ( 'connector'='lakeSoul', 'use_cdc'='true','format'='lakesoul', 'path'='hdfs:///lakesoul-test-bucket/default/test_lakesoul_table_v1/', 'hashBucketNum'='4');
 60  
 61  insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (1,'AAA', 100, '2023-05-11', 'China');
 62  insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (2,'BBB', 100, '2023-05-11', 'China');
 63  insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (3,'AAA', 98, '2023-05-10', 'China');
 64  
 65  select * from `lakesoul`.`default`.test_lakesoul_table_v1 limit 1;
 66  
 67  drop table `lakesoul`.`default`.test_lakesoul_table_v1;
 68  ```
 69  You could replace the location schema from  `hdfs://` to `file://` .
 70  
 71  More details about Flink SQL with LakeSoul refer to : [Flink Lakesoul Connector](./06-flink-lakesoul-connector.md) 
 72  
 73  ## Spark SQL Access Lakesoul's Table
 74  
 75  ### 1. Dependencies
 76  
 77  Download LakeSoul Spark Jar from: https://github.com/lakesoul-io/LakeSoul/releases/download/vVAR::VERSION/lakesoul-spark-3.3-VAR::VERSION.jar
 78  
 79  And put the jar file under `$SPARK_HOME/jars`. 
 80  
 81  ### 2. Configurations
 82  1. Please set the PG parameters related to Lakesoul according to this document: 
 83  [Setup Metadata Database Connection for Spark](02-setup-spark.md#pass-lakesoul_home-environment-variable-to-your-spark-job)
 84  2. Add spark conf to `$SPARK_CONF_DIR/spark-defaults.conf`
 85  
 86      ```
 87      spark.sql.extensions=com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension
 88  
 89      spark.sql.catalog.lakesoul=org.apache.spark.sql.lakesoul.catalog.LakeSoulCatalog
 90  
 91      spark.sql.defaultCatalog=lakesoul
 92  
 93      spark.sql.caseSensitive=false
 94      spark.sql.legacy.parquet.nanosAsLong=false
 95      ```
 96  
 97  ### 3. LakeSoul Operations
 98  Use Kyuubi beeline to connect Spark SQL Engine:
 99  
100  ```bash
101  $KYUUBI_HOME/bin/beeline -u 'jdbc:hive2://localhost:10009/default;user=admin;?kyuubi.engine.type=SPARK_SQL'
102  ```
103  Spark SQL Access Lakesoul : 
104  
105  ```SQL
106  use default;
107  
108  create table if not exists test_lakesoul_table_v2 (id INT, name STRING, score INT, date STRING,region STRING) USING lakesoul PARTITIONED BY (region,date) LOCATION 'hdfs:///lakesoul-test-bucket/default/test_lakesoul_table_v2/';
109  
110  insert into test_lakesoul_table_v2 values (1,'AAA', 100, '2023-05-11', 'China');
111  insert into test_lakesoul_table_v2 values (2,'BBB', 100, '2023-05-11', 'China');
112  insert into test_lakesoul_table_v2 values (3,'AAA', 98, '2023-05-10', 'China');
113  
114  select * from test_lakesoul_table_v2 limit 1;
115  
116  drop table test_lakesoul_table_v2;
117  ```
118  You could replace the location schema from  `hdfs://` to `file://` .
119  
120  More details about Spark SQL with LakeSoul refer to : [Operate LakeSoulTable by Spark SQL](./03-spark-api-docs.md#7-operate-lakesoultable-by-spark-sql)