Airflow - Xcom In
Now go build DAGs that actually share information – cleanly and reliably.
XComs are for coordination , not data transfer . Final Takeaway XComs are Airflow’s glue. They turn a set of isolated tasks into a coherent pipeline. Use them for small control signals, IDs, and results. Keep them light. And when you’re tempted to pass a big blob of data – stop, and ask yourself: should this be in object storage instead?
Here’s a structured, useful blog post about — written for data engineers who want to move beyond basic tasks and build real DAGs. Mastering XComs in Apache Airflow: Cross‑Task Communication Without the Pain One of the first surprises when learning Airflow is that tasks run isolated from each other. You can’t just set task_2.data = task_1.data . So how do you pass a value from one task to another? XComs . xcom in airflow
push >> pull Pattern 1: Passing an ID from a query to a processing task @task def get_latest_record_id() -> int: # Imagine a SQL query here return 42 @task def process_record(record_id: int): print(f"Processing record record_id")
Here, each mapped task gets its own XCom value, and aggregate receives a list of all results. ❌ Passing large data # BAD – will bloat metadata DB @task def bad_task(): return large_dataframe.to_dict() # can be MB/GB ✅ Better: Store data in S3/GCS and pass the path as an XCom. ❌ Pulling from a task that hasn’t run @task def step_one(): return 1 @task def step_two(x): # If step_one failed or was skipped, this will raise an error return x + 1 Now go build DAGs that actually share information
push = PythonOperator(task_id='push_task', python_callable=push_function) pull = PythonOperator(task_id='pull_task', python_callable=pull_function)
@task def process(user_data: dict) -> str: return f"Processed user user_data['name']" They turn a set of isolated tasks into a coherent pipeline
✅ or ensure upstream dependencies with >> . ❌ Using XComs for many small values across many tasks Each XCom is a DB row. 10 000 tasks × 5 XComs = 50 000 rows – fine. But 100 000 tasks × 10 XComs = 1 million rows – slow. Advanced: XCom Backends Airflow 2.0+ lets you store XComs outside the metadata DB. Useful if you need slightly larger values or lower DB load.
