ATIX AG
  • Services
    • Consulting
      • Linux Platform Operations​
      • Infrastructure Automation
      • Container Platforms and Cloud
      • DevOps Processes, Tooling and Culture
      • Cloud Native Software Development
    • Products
      • orcharhino
        • About orcharhino
        • Support
        • orcharhino operation
    • Technologies
      • Ansible
      • AWX and Ansible Automation Platform
      • Docker
      • Foreman
      • GitLab
      • Istio
      • Kubernetes
      • Linux Distributions
      • OpenShift
      • Puppet
      • OpenVox
      • Rancher
      • Rundeck
      • SaltStack
      • SUSE Manager
      • Terraform
  • Trainings
    • Ansible Training
    • AWX Training
    • Docker & Container Training
    • Git Training
    • Kubernetes Training
    • OpenShift Training
    • orcharhino Training
    • OpenVox/Puppet Trainings
    • Terraform Training
  • Events
    • Webinars
  • Blog
  • Company
    • About Us
    • References
    • Corporate values
    • Social engagement
    • Newsroom
    • Newsletter
    • Contact us
  • Career
  • German
  • Click to open the search input field Click to open the search input field Search
  • Menu Menu
Delbian Repositories Pulp

Managing Large Debian Repositories with Pulp

Pulp is a free, open-source platform for software repository management. You can fetch, upload, and distribute content from various sources. Repository versioning makes sure that nothing is lost as you can always roll back to previous versions. The pulp_deb plugin adds APT repository support.

There is such a thing as Pulp Debian support, and it has been around for a while. It was expanded by ATIX for use with Katello a few years ago. It works great for small to medium-sized repositories. However, performance is not ideal.

Challenge

Around 2019, ATIX consultants wanted to synchronize all of Debian Stretch and Ubuntu Xenial for a demo. Unfortunately, they found that it generally takes about five hours, only to fail with a “Cannot allocate memory” error. What was going on?

ebook Infrastucture Automation

Free E-Book

Infrastructure Automation with Linux and Open Source Tools

This free e-book provides practical examples of how you can use Linux and open-source tools to simplify recurring tasks, reduce errors, and establish scalable IT processes.
Discover it now.

Download now for free

To answer this question, they needed to take a closer look at the pulp_deb implementation. Code is organized into several steps. The implementation relies heavily on the python-debpkgr dependency, which in turn relies on deb822 from the python-debian library. python-debpkgr is mainly designed to take a pile of Debian packages and organize them into an APT repository. The structure of Debian repositories looks like this:


/dists/ stretch / Release
/dists/ stretch /main/binary -amd64/ Packages
/dists/ stretch / contrib /binary -amd64/ Packages
/dists/ stretch /non -free/binary -amd64/ Packages
/pool/

During a sync, we have the “MetadataStep,” which is provided with a list of releases, components, and packages (with meta data) from the Mongo DB. It then applies a logic: for every combination of architecture, component, and release, a list of packages is generated. These lists contain the paths to the actual .deb package files on the disk. Finally, each list is passed to a debpkgr call as an argument.

debpkgr is mainly designed to take a pile of Debian packages and turn them into a repo. So, it does just that: Each .deb file is accessed on the disk to extract the meta data debpkgr needs. Due to the way the package lists overlap for different architectures, many of these .deb files will actually be parsed multiple times.

The solution

Our experts’ first thought was: maybe there’s a quick-and-dirty fix? However, they also considered a complete redesign of the way debpkgr works. Another alternative might be dropping debpkgr (from the MetadataStep) and implementing everything themselves.

The basic idea was to exclusively use information from the Mongo DB to create the repository structure. The old implementation already had to parse the meta data from the Mongo DB in order to generate the lists that were then passed to debpkgr. This essentially remained unchanged. Our experts had to create the desired directory structure themselves. They also had to build the symlinks to the actual .deb files themselves. They then needed the ability to write Packages and Release files. As one always does, they happened upon a few stumbling blocks:

debpkgr generates md5sum, sha1, and sha256 for metadata. The existing data base model only stored sha256 hashes. Actually using the meta data from the data base revealed a bug. User-defined meta data fields/fields were not stored in the existing data base model.

Our consultants came up with the following results:

  • Two major pull requests:

    • Ensure the db is used consistently by quba42 · Pull Request #61 · pulp/pulp_deb
    • MetadataStep performance by quba42 · Pull Request #57 · pulp/pulp_deb
  • An end to our memory problems
  • Syncs for medium-sized repositories (1500 packages) that are more than twice as fast

  • Syncing Ubuntu Xenial (main, restricted, universe, multiverse) for amd64 (53837 Packages) within 3h36m on the test system

What did everyone learn? It is important to know your tools! Furthermore, you have to take your time to plan the architecture and gain the required domain knowledge.

You might also like
ThumbnailATIX Blog Images ATIX’s Debian/Ubuntu Erratum Service is now open source!
ansible Automated Windows Patches with Ansible
Live patching und Foreman Live Patching & Foreman—how it fits together
Custom Discovery Images for SecureBoot Custom Discovery Images for SecureBoot
orcharhino and Secureboot thumbnail-100 orcharhino and SecureBoot
oc summit Five reasons why the premiere of the orcharhino Summit was a complete success
ATIX-Crew
+ postsBio

Der ATIX-Crew besteht aus Leuten, die in unterschiedlichen Bereichen tätig sind: Consulting, Development/Engineering, Support, Vertrieb und Marketing.

  • ATIX-Crew
    Foreman Birthday Party 2024
  • ATIX-Crew
    CrewDay 2024
  • ATIX-Crew
    Navigating the XZ Security Vulnerability: A Comprehensive Guide
  • ATIX-Crew
    Automating Kafka with Ansible
  • ATIX-Crew
    Configuration Management across Different Networks with AWX
  • ATIX-Crew
    pulp_deb past, present, and future – tell us what pulp_deb development should focus on next
  • ATIX-Crew
    DevOps Culture “without Bullshit”
  • ATIX-Crew
    ATIX @ CfgMgmt Camp 2020
  • ATIX-Crew
    The Future of Ansible
  • ATIX-Crew
    A somewhat different OSAD
  • ATIX-Crew
    Save The Rhino
  • ATIX-Crew
    Ansible Collections – More clarity and easier sharing in Ansible
  • ATIX-Crew
    SBOL: Open Source based exchange for biotech enthusiasts
  • ATIX-Crew
    Test Ansible roles with Molecule
  • ATIX-Crew
    Workshops in the Cloud – What Ansible, Docker and the GitLab CI/CD offer for this
  • ATIX-Crew
    ATIX Crew on Tour: Geocaching and pasta salad by the water
  • ATIX-Crew
    ATIX’s Debian/Ubuntu Erratum Service is now open source!
  • ATIX-Crew
    The ATIX crew @ Config Management Camp 2019
  • ATIX-Crew
    From the starting block into the fire!
  • ATIX-Crew
    orcharhino-installer Plugin Selection
  • ATIX-Crew
    fpm – the fast track to the parcel
  • ATIX-Crew
    ATIX #CrewDay: Rhino learns to speak!
  • ATIX-Crew
    SaltStack: Salty alternative to the puppet player
  • ATIX-Crew
    Snapshots for everyone!
  • ATIX-Crew
    Configuration management with Ansible
  • ATIX-Crew
    Creation of Foreman RPM packages with Docker containers
  • ATIX-Crew
    Rancher: New containers for the (server) farm – quick and easy
  • ATIX-Crew
    Rancher: Conducting a container pack
  • ATIX-Crew
    Docker Swarm: A herd of containers
  • ATIX-Crew
    ATIX at the Chemnitz Linux Days 2017 – A story from the perspective of our learners
  • ATIX-Crew
    Software-defined storage
  • ATIX-Crew
    Docker containers – a lightweight alternative to virtualization
  • ATIX-Crew
    Selenium IDE – Automated testing of web applications with a browser

On this page

ISO 27001 Certified Download ISO 27001 Certificate
ISO 9001 Certified Download ISO 9001 Certificate
Newsletter
Never miss anything again. Sign up for the ATIX newsletter!
Sign up now
Blog
  • Blog Start Page
  • ATIX Insights
  • Cloud Native
  • Container Plattformen und Cloud
  • DevOps
  • Infrastructure Automation
  • Linux Platform Operations
  • orcharhino
Privacy & Legal

Privacy Policy

Imprint

Terms and Conditions

B2B

Twitter     Facebook    LinkedIn    Youtube     mastodon=

© Copyright – ATIX AG

Scroll to top Scroll to top Scroll to top