The task management system is responsible for deciding when tasks should be scheduled to run. It considers creation time, job dependencies, and capacity when choosing tasks to execute.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ansible/awx/llms.txt
Use this file to discover all available pages before exploring further.
System Components
The task management system consists of three separate components:- Dependency Manager
- Task Manager
- Workflow Manager
Scheduling Considerations
When choosing a task to run, the system considers:- Creation time: Earlier tasks are prioritized
- Job dependencies: Dependent tasks wait for prerequisites
- Capacity: Available resources on execution nodes
Dependency Manager
Purpose
Responsible for looking at each pending task and determining whether it should create a dependency for that task.Example: Update on Launch
Ifscm_update_on_launch is enabled for a project, a project update will be created as a dependency when a job using that project is launched.
Dependency Chain
Dependencies can have their own dependencies:Dependency Manager Steps
-
Get pending tasks (parent tasks) that have
dependencies_processed = False -
Cache related objects as optimization:
- Related projects
- Related inventory sources
-
Create dependencies when needed:
- Project or inventory update not already created
- Last update failed
- Last update outside cache timeout window
- Additional logic for inventory updates
-
Link dependencies to parent task:
- Use
dependent_jobsfield - Allows canceling parent if dependency fails
- Use
-
Mark dependencies processed:
- Update parent tasks with
dependencies_processed = True
- Update parent tasks with
-
Check nested dependencies:
- Inventory source updates can have project update dependencies
Update on Launch Logic
Projects and inventory sources marked as “update on launch” trigger updates when related job templates are launched. Rules:- Update triggered when related job template is launched
- Update not triggered if:
- Recent update exists
- Last update finished successfully
- Finished time within configured cache window
- Failed updates always trigger new update
update on launchjobs havelaunch_typeofdependent- If dependent job fails, related jobs also fail
Task Manager
Purpose
Responsible for looking at each pending task and determining whether Task Manager can start that task.Task Manager Steps
-
Get tasks that have
dependencies_processed = True:- Pending tasks
- Waiting tasks
- Running tasks
-
Process running tasks first:
- Build dependency graph
- Account for currently consumed capacity
- Track capacity in-memory:
TaskManagerInstances: Instance capacity trackingTaskManagerInstanceGroups: Group capacity tracking
-
For each pending task:
- Check if total tasks started this cycle >
start_task_limit - Check if task has timed out
- Check if task is blocked (by dependencies or concurrency rules)
- Check if preferred instances have enough capacity
- Check if total tasks started this cycle >
-
Start the task:
- Change status to
waiting - Submit task to dispatcher (via pg_notify)
- Change status to
Blocking Logic
Hard blocking: Database-backed viadependent_jobs field
- Job A will not run if any of its
dependent_jobsare still running - Represented in database
- No database representation
- Example: Job A and Job B based on same template with
allow_simultaneousdisabled - Job B blocked if Job A is running
- Determined via Dependency Graph
Task Manager Rules
- Groups of blocked tasks run in chronological order
- Tasks run when capacity available (one job always allowed per instance group)
- Only one Project Update per Project at a time
- Only one Inventory Update per Inventory Source at a time
- Only one Job per Job Template at a time (unless
allow_simultaneousis enabled) - Only one System Job at a time
Node Affinity Decider
The Task Manager decides which exact node a job will run on. Decision process:- Construct set of groups where job can run
- Consider user-configured group execution policy
- Consider user-configured capacity
- Traverse groups to find suitable node
- First choice: Node with largest remaining capacity that can fit the job
- Fallback: Largest idle node, even if job exceeds capacity
- This allows instances to exceed capacity limits when necessary
Workflow Manager
Purpose
Responsible for looking at each workflow job and determining if the next node can run.Workflow Manager Steps
- Get all running workflow jobs
-
Build workflow DAG for each workflow job:
- Directed Acyclic Graph of workflow nodes
- Represents workflow structure
-
For each workflow job:
- Check if timed out
- Check if next node can start based on:
- Previous node status
- Success/failure/always logic
- Convergence rules
-
Create and start new tasks:
- Create task for next workflow node
- Signal start
Workflow Execution
Workflows execute based on node relationships:System Architecture
Entry Point: schedule()
Each manager has a single entry point:schedule().
Locking mechanism:
- Attempts to acquire single, global lock in database
- If lock cannot be acquired, method returns
- Lock indicates another instance is currently running
Atomic Transactions
Each manager runs inside an atomic DB transaction:- If dispatcher task is killed, no partial updates
- All-or-nothing execution
- Consistency guaranteed
Hybrid Scheduler: Periodic + Event
Managers run in two ways: a) Periodically: Background task (every 30 seconds by default) b) Event-triggered: On job creation or completionWorkflow Manager doesn’t run directly on a schedule - it piggy-backs off Task Manager. If Task Manager sees running workflow jobs, it schedules Workflow Manager.
- Reduces latency: Jobs start faster with event-triggered execution
- Fail-safe: Periodic execution catches missed events
- Resilience: System progresses even if events are missed
Bulk Reschedule
Utility classes prevent scheduling too many managers:ScheduleTaskManager.schedule() ensures only one Task Manager is scheduled after all tasks are processed, not one per task.
Timing Out
Because of the global lock, only one manager can run at a time. Timeout protection:- Parent dispatcher process will SIGKILL stuck managers
- Timeout after a few minutes
- Allows new manager to take over
- Manager runs in transaction, so SIGKILL rolls back changes
- Next run re-processes same tasks
- Risk: Manager never progresses (times out every cycle)
- Solution: Manager checks time and bails out early if near timeout
- Commits partial progress before timeout
- Next cycle continues from where previous left off
Job Lifecycle Detail
Status Transitions
Status Meanings
| Status | State |
|---|---|
| pending | Job launched, but: 1. Not yet seen by scheduler 2. Blocked by another task 3. Not enough capacity |
| waiting | Job submitted to dispatcher via pg_notify |
| running | Job is running on an AWX node |
| successful | Job finished with return code 0 |
| failed | Job finished with return code ≠ 0 |
| error | System failure |
| canceled | Manually canceled by user |
Capacity Calculation
Instance Capacity
Each instance has:- Total capacity: Configured or calculated from resources
- Consumed capacity: Sum of running job impacts
- Remaining capacity: Total - Consumed
Job Impact
Jobs consume capacity based on:- Forks: Higher forks = higher impact
- Job type: Some jobs have fixed impact (e.g., system jobs = 5)
Special Capacity Rule
Managers Are Short-Lived
Manager instances are ephemeral:- Created: New instance on each run
- Load data: Pull relevant data from database
- Process: Execute scheduling logic
- Cleanup: Instance destroyed
- No stale state
- Fresh data every cycle
- No memory leaks from long-running processes
Debugging the Task Manager
Checking Task Status
Forcing Task Manager Run
Checking Capacity
Common Issues
Jobs stuck in pending:- Check if dependencies are satisfied
- Check capacity on instance groups
- Check for blocking jobs (concurrent jobs disabled)
- Verify task manager is running
- Check dispatcher is running:
awx-manage dispatcherctl status - Check for errors in logs:
/var/log/tower/ - Verify database connectivity
Performance Tuning
start_task_limit
Limits tasks started per Task Manager cycle:Task Manager Period
How often Task Manager runs:Database Indexes
Ensure indexes exist on:statusfielddependencies_processedfieldcreatedtimestamp