/ website / docs / 02-Tutorials / 03-snapshot-manage.md
03-snapshot-manage.md
 1  # Snapshot Related API Usage Tutorial
 2  
 3  <!--
 4  SPDX-FileCopyrightText: 2023 LakeSoul Contributors
 5  
 6  SPDX-License-Identifier: Apache-2.0
 7  -->
 8  
 9  LakeSoul uses snapshots to record each updated file set and generate a new version number in the metadata. If the historical snapshot version has not been cleaned up, it can also be read, rolled back and cleaned up through the LakeSoul API. Since the snapshot version is an internal mechanism, LakeSoul provides a timestamp-based snapshot management API for convenience.
10  
11  For snapshot read in Flink SQL, please refer to [Flink Connector](../03-Usage%20Docs/06-flink-lakesoul-connector.md).
12  
13  ## Snapshot Read
14  In some cases, it may be necessary to query the snapshot data of a partition of a table at a previous point in time, also known as Time Travel. The way LakeSoul performs reading a snapshot at a point in time:
15  ```scala
16  import com.dmetasoul.lakesoul.tables.LakeSoulTable
17  import org.apache.spark.sql._
18  val spark = SparkSession. builder. master("local")
19    .config("spark.sql.extensions", "com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension")
20    .getOrCreate()
21  
22  val tablePath = "s3a://bucket-name/table/path/is/also/table/name"
23  // Read the data of the 'date=2022-01-02' partition when the timestamp is less than or equal to and closest to '2022-01-01 15:15:15'
24  val lakeSoulTable = LakeSoulTable.forPathSnapshot(tablePath, "date=2022-01-02", "2022-01-01 15:15:15")
25  ```
26  
27  ## Snapshot Rollback
28  Sometimes due to errors in newly written data, it is necessary to roll back to a certain historical snapshot version. How to perform a snapshot rollback to a point in time using LakeSoul:
29  ```scala
30  import com.dmetasoul.lakesoul.tables.LakeSoulTable
31  import org.apache.spark.sql._
32  val spark = SparkSession. builder. master("local")
33    .config("spark.sql.extensions", "com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension")
34    .getOrCreate()
35  
36  val tablePath = "s3a://bucket-name/table/path/is/also/table/name"
37  val lakeSoulTable = LakeSoulTable. forPath(tablePath)
38  
39  //Rollback metadata and storage data partitioned as '2021-01-02' when the timestamp is less than or equal to and the closest to '2022-01-01 15:15:15'
40  lakeSoulTable.rollbackPartition("date='2022-01-02'", "2022-01-01 15:15:15")
41  //sql
42  spark.sql("call LakeSoulTable.rollback(partitionvalue=>map('date','2022-01-02'),toTime=>'2022-01-01 15:15:15',tablePath=>'" + tablePath + "')")
43  spark.sql("call LakeSoulTable.rollback(partitionvalue=>map('date','2022-01-02'),toTime=>'2022-01-01 15:15:15',tzoneId=>'Asia/Shanghai',tableName=>'lakesoul')")
44  
45  ```
46  The rollback operation itself will create a new snapshot version, while other version snapshots and data will not be deleted.
47  
48  ## Snapshot Cleanup
49  If the historical snapshot is no longer needed, for example, Compaction has been executed, you can call the cleanup method to clean up the snapshot data before a certain point in time:
50  ```scala
51  import com.dmetasoul.lakesoul.tables.LakeSoulTable
52  import org.apache.spark.sql._
53  val spark = SparkSession. builder. master("local")
54    .config("spark.sql.extensions", "com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension")
55    .getOrCreate()
56  
57  val tablePath = "s3a://bucket-name/table/path/is/also/table/name"
58  val lakeSoulTable = LakeSoulTable. forPath(tablePath)
59  
60  //Clean up metadata and storage data partitioned as 'date=2022-01-02' and earlier than "2022-01-01 15:15:15"
61  lakeSoulTable.cleanupPartitionData("date='2022-01-02'", "2022-01-01 15:15:15")
62  ```