Cradicle Explorer

mongo.mdx
  1  ---
  2  title: MongoDB Integration
  3  titleShort: MongoDB
  4  description: "Sync onchain data to your MongoDB database using Apibara."
  5  priority: 699
  6  updatedAt: 2023-08-29 15:00
  7  ---
  8  
  9  # MongoDB integration
 10  
 11  The MongoDB integration provides a way to mirror onchain data to a MongoDB
 12  collection of your choice. Data is automatically inserted as it's produced by
 13  the chain, and it's invalidated in case of chain reorganizations.
 14  
 15   - The integration can be used to **populate a collection with data from one or
 16     more networks or smart contracts**.
 17   - Create powerful analytics with MongoDB pipelines.
 18   - Change how collections are queried without re-indexing.
 19  
 20  
 21  ### Installation
 22  
 23  ```
 24  apibara plugins install sink-mongo
 25  ```
 26  
 27  
 28  ### Configuration
 29  
 30   - `connectionString: string`: the Mongo connection URL of your database.
 31   - `database: string`: the target database name.
 32   - `collectionName: string`: the target collection name.
 33   - `collectionNames: [string]`: a list of target collection names. See the "Multiple Collections" section for more information.
 34   - `entityMode: boolean`: enable entity mode. See the "Entity
 35     storage" section for more information.
 36  
 37  
 38  ### Collection schema
 39  
 40  The transformation step is required to return an array of objects. Data is
 41  converted to BSON and then written to the collection. The MongoDB integration
 42  adds a `_cursor` column to each record so that data can be invalidated in case
 43  of chain reorganizations.
 44  
 45  
 46  ### Querying data
 47  
 48  When querying data, you should always add the following property to your MongoDB filter
 49  to ensure you get the latest value:
 50  
 51  ```ts
 52  {
 53      "_cursor.to": null,
 54  }
 55  ```
 56  
 57  The "Storage & Data Invalidation" section at the end of this document contains
 58  information on why you need to add this condition to your filter.
 59  
 60  
 61  ### Entity storage
 62  
 63  The MongoDB integration works with two types of data:
 64  
 65   - Immutable logs (default): the values returned by the indexer represent
 66     something that doesn't change over time, in other words they're a list of
 67     items. For example, they represent a list of token transfers.
 68   - Mutable entities: the values returned by the indexer represent the state of
 69     an entity at a given block. For example, token balances change block by
 70     block as users transfer the token.
 71  
 72  You can index entities by setting the `entityMode` option to true. When you set
 73  this option, the indexer expects the transform function to return a list of
 74  update operations. An update operation is a JavaScript object with an `entity`
 75  property used to filter which entities should be updated, and an `update` property
 76  with either an [update
 77  document](https://www.mongodb.com/docs/manual/reference/method/db.collection.updateMany/#std-label-updateMany-behavior-update-expressions)
 78  or an [aggregation
 79  pipeline](https://www.mongodb.com/docs/manual/reference/method/db.collection.updateMany/#std-label-updateMany-behavior-aggregation-pipeline).
 80  
 81  
 82  **Example**: our indexer tracks token ownership for an ERC-721 smart contract
 83  together with the number of transactions for each token.
 84  We enable entity storage by setting the `entityMode` option to `true`.
 85  The transform function returns the entities that need update, together with the
 86  operation to update their state.
 87  
 88  ```ts
 89  export default function transform(block: Block) {
 90    // Example to show the shape of data returned by transform.
 91    return [
 92      {
 93        entity: { contract, tokenId: "1" },
 94        update: { "$set": { owner: "0xA" }, "$inc": { "txCount": 1 } }
 95      },
 96      {
 97        entity: { contract, tokenId: "2" },
 98        update: { "$set": { owner: "0xB" }, "$inc": { "txCount": 3 } }
 99      },
100    ];
101  }
102  ```
103  
104  The integration will iterate through the new entities and update the existing
105  values (if any) using the following MongoDB pseudo-query:
106  
107  ```ts
108  for (const doc of returnValue) {
109    db.collection.updateMany({
110      filter: doc.entity,
111      update: doc.update,
112      options: {
113        upsert: true,
114      }
115    });
116  }
117  ```
118  
119  Notice that in reality the query is more complex, please refer to the next
120  section to learn more about how the MongoDB integration stores data.
121  
122  ### Multiple Collections
123  
124  You can write to multiple collections at the same time using the
125  `collectionNames` option, doing so the transform function should specify what
126  collection to the write the data to:
127  - For standard mode, put the returned data in a `data` key and add a
128    `collection` key, the returned value will look like `{ data: any, collection:
129    string }`
130  - For entity mode, just add a `collection` key to the returned object, it
131    should look like `{ entity: any, collection: string, update: any }`
132  
133  ### Storage & Data Invalidation
134  
135  Storing blockchain data poses an additional challenge since we must be able to
136  rollback the database state in case of chain reorganizations.
137  This integration adds an additional `_cursor` field to all documents to track
138  for which block range a piece of data is valid for.
139  
140  ```ts
141  type Cursor = {
142      /** Block (inclusive) when this piece of data was created. */
143      from: number,
144      /** Block (exclusive) at which this piece of data became invalid. */
145      to: number | null,
146  };
147  ```
148  
149  It follows that a field is valid at the most recent block if its `_cursor.to`
150  field is `null`.
151  
152  **Example**: we're indexing an ERC-721 token with the following transfers:
153  
154   - block: 1000, transfer from 0x0 to 0xA
155   - block: 1010, transfer from 0xA to 0xB
156   - block: 1020, transfer from 0xB to 0xC
157  
158  If we put the token ownership on a timeline, it looks like the following diagram.
159  
160  ```txt
161  1000                    1010                  1020
162  --+-----------------------+---------------------+---- - - - - - - -
163    [ { owner: "0xA }       )
164                            [ { owner: "0xB" }    )
165                                                  [ { owner: "0xC" }
166  ```
167  
168  Which translates to the following documents in the MongoDB collection.
169  
170  After the first transfer:
171  
172  ```json
173  [
174      { "owner": "0xA", "_cursor": { "from": 1000, "to": null } }
175  ]
176  ```
177  
178  After the second transfer:
179  
180  ```json
181  [
182      { "owner": "0xA", "_cursor": { "from": 1000, "to": 1010 } },
183      { "owner": "0xB", "_cursor": { "from": 1010, "to": null } }
184  ]
185  ```
186  
187  And after the third transfer:
188  
189  ```json
190  [
191      { "owner": "0xA", "_cursor": { "from": 1000, "to": 1010 } },
192      { "owner": "0xB", "_cursor": { "from": 1010, "to": 1020 } },
193      { "owner": "0xC", "_cursor": { "from": 1020, "to": null } }
194  ]
195  ```
196