Pipeline parallelism is widely used to train large language models (LLMs). However, increasing heterogeneity in model architectures exacerbates pipeline bubbles, thereby reducing training efficiency. Prior approaches typically optimize only a single dimension (i.e., partitioning, placement, or scheduling) of a pipeline, leaving substantial pipeline bubbles. Although a natural approach to further reduce bubbles is co-optimization, it introduces complex performance modeling, a combinatorial search space, and irregular execution orders. We propose OctoPipe, a novel pipeline parallelism system that co-optimizes partitioning, placement, and scheduling. First, we build a graph-based pipeline simulator to provide accurate performance estimates for co-optimization. Second, we develop an iterative bubble-aware tuner to efficiently explore the large search space. Third, we implement a unified pipeline executor that dynamically orchestrates computation and communication to support irregular execution orders without deadlocks while maximizing communication-computation overlap. Experiments show that OctoPipe achieves 1.22-2.14x throughput improvement over Megatron-LM across various heterogeneous LLM architectures and scales.