PICurv 0.1.0
A Parallel Particle-In-Cell Solver for Curvilinear LES
Loading...
Searching...
No Matches
Run Lifecycle Guide

This page explains how a PICurv run moves from a new solve to restart, post-processing reuse, and cluster job generation. It is the operational view of the run directory lifecycle.

1. What A Run Lifecycle Means

For PICurv, a "run" is not just a solver launch. It is the full set of generated artifacts under runs/<run_id>/, including:

  • normalized runtime config artifacts under config/,
  • solver and post logs under logs/,
  • solver outputs and restart files,
  • optional scheduler scripts and submission metadata under scheduler/.

Key rule:

  • every picurv run --solve ... creates a fresh run directory,
  • picurv does not mutate an old run directory in place when you start a new solve,
  • restart workflows read from an existing run but still create a new run directory for the restarted continuation.

run_id is generated automatically as <case_basename>_<timestamp>.

2. Start A New Run

Typical local solve + post:

./bin/picurv run --solve --post-process -n 4 \
--case my_case/case.yml \
--solver my_case/solver.yml \
--monitor my_case/monitor.yml \
--post my_case/post.yml

Typical cluster solve + post:

./bin/picurv run --solve --post-process \
--case my_case/case.yml \
--solver my_case/solver.yml \
--monitor my_case/monitor.yml \
--post my_case/post.yml \
--cluster my_case/cluster.yml

Recommended preflight:

  1. picurv validate ...
  2. picurv run ... --dry-run
  3. if using Slurm, picurv run ... --cluster ... --no-submit

This sequence verifies contract correctness before consuming runtime or queue time.

3. Read The Run Directory Correctly

A typical run directory contains:

  • runs/<run_id>/config/: generated .control, BC files, copied YAML inputs, and post.run
  • runs/<run_id>/logs/: solver/postprocessor runtime logs and metrics written by PICurv itself
  • runs/<run_id>/results/: solver outputs when monitor paths use the default layout
  • runs/<run_id>/scheduler/: generated Slurm scripts, submission.json, and cluster stdout/stderr in cluster mode
  • runs/<run_id>/manifest.json: top-level run metadata

Practical interpretation:

  • if validation succeeds but runtime is wrong, inspect config/ first,
  • if scheduler behavior is wrong, inspect scheduler/solver.sbatch or scheduler/post.sbatch,
  • scheduler/submission.json is the source of truth for delayed submit and run-directory-based cancel,
  • if restart/post-only behavior is wrong, confirm the previous run directory contents before changing YAML again.

4. Local, Login-Node, and Batch Launch Resolution

PICurv now separates case physics from site execution policy.

Local multi-rank precedence:

  1. PICURV_MPI_LAUNCHER
  2. MPI_LAUNCHER
  3. nearest .picurv-execution.yml
  4. nearest legacy .picurv-local.yml
  5. built-in mpiexec

Cluster batch precedence:

  1. cluster.yml -> execution
  2. nearest .picurv-execution.yml -> cluster_execution
  3. nearest .picurv-execution.yml -> default_execution
  4. built-in srun

This gives three clean cases:

  • workstation users usually need no extra file,
  • picurv init creates .picurv-execution.yml in each new case with inert defaults,
  • cluster login-node users can edit .picurv-execution.yml when needed,
  • batch users can reuse that same file unless cluster.yml needs a batch-specific override.

5. Restart From An Existing Run

Restart uses the normal solve workflow. There is no separate restart command.

Example:

run_control:
start_step: 500
total_steps: 1000
restart_from_run_dir: "../runs/flat_channel_20260303-120000"
operation_mode:
eulerian_field_source: "load"
models:
physics:
particles:
restart_mode: "load" # or "init"

Operational meaning:

  • previous run ended at step 500,
  • new run loads state from that old run,
  • first new advanced step is 501,
  • new run finishes at 1500,
  • restarted continuation is written into a new runs/<new_run_id>/ directory.

Before launching:

  • verify restart source files exist for the requested step,
  • use start_step equal to the saved step, not the next desired step,
  • verify monitor.yml directory names match the source run layout.

6. Postprocess An Existing Run

When solver outputs already exist, reuse the run directory directly:

./bin/picurv run --post-process \
--run-dir runs/flat_channel_20260303-120000 \
--post my_case/alt_analysis.yml

Use this when:

  • you want a different visualization/statistics recipe,
  • solver data are already on disk,
  • you do not want to rerun the solver.

PICurv will auto-identify the required case/monitor/control artifacts from runs/<run_id>/config/.

7. Batch Job Generation And Reuse

In cluster mode, picurv writes scheduler artifacts into the new run directory:

  • scheduler/solver.sbatch
  • scheduler/solver_<jobid>.out / scheduler/solver_<jobid>.err
  • scheduler/post.sbatch
  • scheduler/post_<jobid>.out / scheduler/post_<jobid>.err
  • scheduler/submission.json

Recommended operational pattern:

  1. --dry-run to confirm launch commands and artifact paths
  2. --no-submit to inspect generated batch scripts
  3. picurv submit --run-dir runs/<run_id> only after the scripts look correct
  4. picurv cancel --run-dir runs/<run_id> when you need to stop a submitted stage without separate job-id bookkeeping

This is especially useful when changing:

  • MPI launcher tokens,
  • resource counts,
  • queue/account settings,
  • restart or post-only job behavior.

Operational examples:

./bin/picurv submit --run-dir runs/<run_id>
./bin/picurv cancel --run-dir runs/<run_id> --stage solve

Generated Slurm solver jobs also export runtime walltime metadata into solver.sbatch, so the solver can estimate completed-step cost and request a graceful final write before remaining walltime gets too tight. If the cluster profile also requests an early signal, PICurv traps SIGUSR1, SIGTERM, and SIGINT, then uses the same safe-checkpoint final-write path. Use signal: "USR1@300" for srun, or signal: "B:USR1@300" plus exec mpirun ... for direct mpirun batch launches.

8. Safe Rules Of Thumb

  • Treat runs/<run_id>/config/ as the ground truth for what the binaries actually consumed.
  • Do not hand-edit generated scheduler scripts unless debugging a one-off issue; prefer fixing YAML or .picurv-execution.yml.
  • Use a fresh restarted run instead of overwriting the previous run directory.
  • Use post-only reruns when analysis changes but solver data do not.
  • Keep site launcher policy in .picurv-execution.yml; keep scheduler policy in cluster.yml.
  • Keep the shutdown warning window longer than the slowest expected timestep if you rely on the fallback signal path.
  • If the runtime walltime guard is too eager or too late for a workload, tune execution.walltime_guard in the cluster profile rather than editing generated scripts.

9. Related Pages

CFD Reader Guidance and Practical Use

This page describes Run Lifecycle Guide within the PICurv workflow. For CFD users, the most reliable reading strategy is to map the page content to a concrete run decision: what is configured, what runtime stage it influences, and which diagnostics should confirm expected behavior.

Treat this page as both a conceptual reference and a runbook. If you are debugging, pair the method/procedure described here with monitor output, generated runtime artifacts under runs/<run_id>/config, and the associated solver/post logs so numerical intent and implementation behavior stay aligned.

What To Extract Before Changing A Case

  • Identify which YAML role or runtime stage this page governs.
  • List the primary control knobs (paths, stage flags, scheduler settings, restart source, or launcher policy).
  • Record expected success indicators (artifact presence, step continuity, job script contents, or stable reused outputs).
  • Record failure signals that require rollback or parameter isolation.

Practical CFD Troubleshooting Pattern

  1. Reproduce the workflow on a small case or short time window.
  2. Confirm generated config/ and scheduler/ artifacts before blaming the solver.
  3. Change one lifecycle variable at a time: launcher, resources, restart source, or post recipe.
  4. If behavior remains inconsistent, compare against a known-good prior run directory and re-check the generated artifacts.