Introducing pulp_deb 3.0—the Future is Structured!
Today, we want to introduce the pulp_deb 3.0 release! In this blog article, we explain why we do this, how we come to do it, and how we have solved one or two existing problems. Enjoy reading!
Why is ATIX Releasing pulp_deb 3.0?
Our customers know the ATIX Engineering Team as the authors of regular orcharhino releases. orcharhino is an enterprise product based on the Foreman/Katello open-source project. Katello in turn makes use of the Pulp project as part of its back end. Pulp is a platform for managing and hosting repositories. pulp_deb is a Pulp plug-in for adding APT repository (Debian and Ubuntu) support to Pulp. Ever since ATIX introduced APT support for Katello, we have been strongly involved with pulp_deb development. Starting with the rewrite for Pulp 3, we have become the plug-in’s principal maintainers. After years of incremental changes, we are now ready for a major version bump!
How We Got Here
To understand what makes this a major release, we need to explain what has led to this point.
As you may or may not know, Pulp 3 is a complete rewrite of Pulp 2. To my best knowledge, they do not share a single line of code. pulp_deb for Pulp 3 started with a clean slate, and we were able to incorporate many hard-learned lessons from the Pulp 2 days right from the start.
Since we had to achieve feature parity for the features that Katello’s APT support relies on in a reasonable time frame, we also ended up reimporting some of our design choices from Pulp 2 that we might otherwise have left out. And as life goes, we also hit all new design challenges specific to Pulp 3.
The main challenge with pulp_deb in comparison to other Pulp plug-ins such as pulp_rpm has always been that the APT repository format subdivides the metadata in every APT repository into one or more distributions (aka releases), which are further subdivided into one or more components, which can contain package indices for one or more architectures. This means that in addition to knowing what repository some .deb package is in (this is a pulpcore functionality that Pulp plug-ins don’t need to reinvent), the pulp_deb plug-in must also record what “distribution-component combinations” each package belongs to. For Pulp 3, this is solved by saving this information in what we call “structure content,” which is stored alongside the packages themselves within pulp_deb’s APT repositories.
Structured Upload API
Many of the changes with the pulp_deb 3.0 release are improvements in handling structure content. For example, the package upload/creation API end point now allows users to specify what distribution and component the uploaded package should be added to. Doing so will automatically create all relevant structure content, without the need for users to have an in-depth understanding of pulp_deb’s plumbing.
The “Colliding Structure Problem” Is Finally Fixed
Most importantly, we have fixed a fundamental design flaw with structured content that was present from the very beginnings of pulp_deb for Pulp 3. This issue made it possible for different instances of structure content to represent the exact same APT repository structure. Under certain conditions, for example when content was moved between repositories, such colliding structure content could end up in a single repository version. Such repository versions would fail to publish.
While these cases are so rare that most users have probably never heard of them, they can be very problematic when encountered. In addition, what we have dubbed the “colliding structure problem” has increasingly become a blocker for new features we want to add, including all the quality-of-life improvements for using structured content.
With pulp_deb 3.0, the problem is fixed, and all existing duplicate structure content will be merged by a database migration during upgrade. Since this is a database migration that alters data, we highly recommend a backup before the upgrade, and the median user should expect an extra 30 minutes during the upgrade for the migration to run. The actual time can vary significantly, since it depends on how much duplicate structure content there is, and what system resources are available.
Publications are structured by default
In the past, when using pulp_deb’s APT publisher, users would have to explicitly specify either “simple” and/or “structured” publication mode. The “simple” mode is essentially a workaround to avoid having to deal with structured content. With pulp_deb ≥ 3.0, the “structured” mode will be enabled by default, and the “simple” mode will be disabled by default. Going forward, the “simple” mode will be deprecated, but we expect it to be around for quite some time to come since we will not drop it until all API workflows guarantee the creation of consistent structured content for every repository version created.
I am also pleased to report that the 3.0 release contains several recent community submissions from the Microsoft Azure Core Linux team, who are now hosting Microsoft’s APT repositories using Pulp. You can see this GitHub repository for more information on their project. Some highlights from their pulp_deb contributions:
Pulp CLI: we released a new version of pulp-cli-deb that includes the new content subcommands for uploading content, listing content, filtering content, and more.
Documentation: we completely reworked and expanded the first three chapters of the plug-in documentation with a focus on easy-to-use structured workflow examples with Pulp CLI.
Test suite: we completely reworked the pulp_deb test suite to use the pytest framework and lose the dependency on pulp-smash. This will make it easy to add the test coverage needed to move new features from the status “experimental” to “fully supported.” It also puts us at the cutting edge of the wider pytest conversion effort within the Pulp community.
Road map: the new version of the documentation includes a “road map and experimental” section to give some indication where the plug-in might be headed next (no actual promises).
Katello: please read my Foreman community RFC on how we plan to transition Katello to an all-structured APT content world.
The pulp_deb 3.0 release marks the most significant point in the pulp_deb plugin’s development since 2.6.1 became the first GA release for Pulp 3 almost 3 years ago! It represents the culmination of many releases before it. A fundamental design flaw has finally been fixed. The documentation has been rewritten with user-friendly examples based on Pulp CLI. The Pulp CLI for pulp_deb commands are starting to cover more of the core workflows. The test suite is now in a much better place. Design workarounds from the Pulp 2 days are being left behind, and a new, entirely structured course has been plotted going forward. On these more solid foundations, several new features are in the works or being stabilized. A huge thanks goes out to everyone in the Pulp community as well as everyone here at ATIX who has helped make this possible!
Latest posts by Quirin Pamp (see all)
- Introducing pulp_deb 3.0—the Future is Structured! - 30. August 2023