/ docs / workflow / task / index.md
index.md
 1  # Tasks
 2  
 3  ![task](../../images/task.png#only-light)
 4  ![task](../../images/task-dark.png#only-dark)
 5  
 6  Workflows execute tasks. Tasks are callable objects with a number of parameters to control the processing of data at a given step. While similar to pipelines, tasks encapsulate processing and don't perform signficant transformations on their own. Tasks perform logic to prepare content for the underlying action(s).
 7  
 8  A simple task is shown below.
 9  
10  ```python
11  Task(lambda x: [y * 2 for y in x])
12  ```
13  
14  The task above executes the function above for all input elements.
15  
16  Tasks work well with pipelines, since pipelines are callable objects. The example below will summarize each input element.
17  
18  ```python
19  summary = Summary()
20  Task(summary)
21  ```
22  
23  Tasks can operate independently but work best with workflows, as workflows add large-scale stream processing.
24  
25  ```python
26  summary = Summary()
27  task = Task(summary)
28  task(["Very long text here"])
29  
30  workflow = Workflow([task])
31  list(workflow(["Very long text here"]))
32  ```
33  
34  Tasks can also be created with configuration as part of a workflow.
35  
36  ```yaml
37  workflow:
38    tasks:
39      - action: summary 
40  ```
41  
42  ::: txtai.workflow.Task.__init__
43  
44  ## Multi-action task concurrency
45  
46  The default processing mode is to run actions sequentially. Multiprocessing support is already built in at a number of levels. Any of the GPU models will maximize GPU utilization for example and even in CPU mode, concurrency is utilized. But there are still use cases for task action concurrency. For example, if the system has multiple GPUs, the task runs external sequential code, or the task has a large number of I/O tasks.
47  
48  In addition to sequential processing, multi-action tasks can run either multithreaded or with multiple processes. The advantages of each approach are discussed below.
49  
50  - *multithreading* - no overhead of creating separate processes or pickling data. But Python can only execute a single thread due the GIL, so this approach won't help with CPU bound actions. This method works well with I/O bound actions and GPU actions.
51  
52  - *multiprocessing* - separate subprocesses are created and data is exchanged via pickling. This method can fully utilize all CPU cores since each process runs independently. This method works well with CPU bound actions.
53  
54  More information on multiprocessing can be found in the [Python documentation](https://docs.python.org/3/library/multiprocessing.html).
55  
56  ## Multi-action task merges
57  
58  Multi-action tasks will generate parallel outputs for the input data. The task output can be merged together in a couple different ways.
59  
60  ### ::: txtai.workflow.Task.hstack
61  ### ::: txtai.workflow.Task.vstack
62  ### ::: txtai.workflow.Task.concat
63  
64  ## Extract task output columns
65  
66  With column-wise merging, each output row will be a tuple of output values for each task action. This can be fed as input to a downstream task and that task can have separate tasks work with each element.
67  
68  A simple example:
69  
70  ```python
71  workflow = Workflow([Task(lambda x: [y * 3 for y in x], unpack=False, column=0)])
72  list(workflow([(2, 8)]))
73  ```
74  
75  For the example input tuple of (2, 2), the workflow will only select the first element (2) and run the task against that element. 
76  
77  ```python
78  workflow = Workflow([Task([lambda x: [y * 3 for y in x], 
79                             lambda x: [y - 1 for y in x]],
80                             unpack=False, column={0:0, 1:1})])
81  list(workflow([(2, 8)]))
82  ```
83  
84  The example above applies a separate action to each input column. This simple construct can help build extremely powerful workflow graphs!