Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fresh deployment = Cannot open config file /etc/pihole/pihole.toml in exclusive mode (r): Bad file descriptor #1715

Open
6 tasks done
virtualex-itv opened this issue Feb 19, 2025 · 22 comments

Comments

@virtualex-itv
Copy link

virtualex-itv commented Feb 19, 2025

This is a: Bug

Details

spinning up a new instance of pihole v6 today in docker and using the newer env vars. however, when i spin up the container I am seeing the following error in the logs:

Cannot open config file /etc/pihole/pihole.toml in exclusive mode (r): Bad file descriptor

therefore, none of the settings i am trying to configure are getting applied.

Note: This seems to happen only if/when using a persistent volume on an external device (NFS in my case).

Related Issues

  • I have searched this repository/Pi-hole forums for existing issues and pull requests that look similar

How to reproduce the issue

  1. Environment data
  • Operating System: Ubuntu
  • VMware VMs & RasPi 4B:
  • Kernel Architecture:x86 & Arm 64bit
  • Docker Install Info and version:
    • Software source: official docker-ce
    • Supplimentary Software: portainer
  • Hardware architecture: x86 & ARMv7
  1. docker-compose.yml contents, docker run shell command, or paste a screenshot of any UI based configuration of containers here
services:
  pihole:
    image: pihole/pihole:latest
    container_name: pihole
    restart: unless-stopped
    security_opt:
      - no-new-privileges:false
    hostname: pi.hole
    environment:
      # recommended
      - TZ=${TZ}
      - FTLCONF_webserver_api_password=${SECRET}
      # optional
      - FTLCONF_dns_upstreams=1.1.1.1;9.9.9.9
      - FTLCONF_dns_dnssec=true
      - FTLCONF_dns_bogusPriv=true
      - FTLCONF_dns_domainNeeded=true
      - FTLCONF_webserver_interface_theme=auto
      # advanced
      - FTLCONF_dns_specialDomains_iCloudPrivateRelay=false
    volumes:
      - ${DOCKER_DIR}/appdata/pihole/pihole:/etc/pihole
      - ${DOCKER_DIR}/appdata/pihole/dnsmasq.d:/etc/dnsmasq.d
    networks:
      proxy:
    ports:
      - 53:53/tcp
      - 53:53/udp
      - 1080:80/tcp
      - 1443:443/tcp
    dns:
      - 127.0.0.1
      - 1.1.1.1

networks:
  proxy:
    external: true

These common fixes didn't work for my issue

  • I have tried removing/destroying my container, and re-creating a new container
  • I have tried fresh volume data by backing up and moving/removing the old volume data
  • I have tried running the stock docker run example(s) in the readme (removing any customizations I added)
  • I have tried a newer or older version of Docker Pi-hole (depending what version the issue started in for me)
  • I have tried running without my volume data mounts to eliminate volumes as the cause

If the above debugging / fixes revealed any new information note it here.
Add any other debugging steps you've taken or theories on root cause that may help.

@paimonsoror
Copy link

+1 on this for me as well - happens in kubernetes only when using a PV. Without PV it operates normally

@Kakuhiry
Copy link

+1 also going through this. Had to downgrade to v5 from scratch to be able to use it normally as a PV. Hope we can get it fixed soon

@EthanBannister
Copy link

+1 docker with NFS volumes

@pushpinderbal
Copy link

+1 on kubernetes using a Longhorn RWX persistent volume.

I wonder if it's due to flock() works on NFS shares and requires the file to be loaded in write mode
https://github.com/pi-hole/FTL/blob/eaa7dbb4cf4316413558c9d4b4e3f2a44f8e8203/src/config/toml_helper.c#L72

https://man7.org/linux/man-pages/man2/flock.2.html

Up to Linux 2.6.11, flock() does not lock files over NFS (i.e.,
the scope of locks was limited to the local system). Instead, one
could use fcntl(2) byte-range locking, which does work over NFS,
given a sufficiently recent version of Linux and a server which
supports locking.
Since Linux 2.6.12, NFS clients support flock() locks by emulating
them as fcntl(2) byte-range locks on the entire file. This means
that fcntl(2) and flock() locks do interact with one another over
NFS. It also means that in order to place an exclusive lock, the
file must be opened for writing.
Since Linux 2.6.37, the kernel supports a compatibility mode that
allows flock() locks (and also fcntl(2) byte region locks) to be
treated as local; see the discussion of the local_lock option in
nfs(5).

@yubiuser
Copy link
Member

We tried to bring up a fix for this, it's this PR pi-hole/FTL#2218

It would be great if you could test this. Therefore, you need to build the Pi-hole image yourself including this particular FTL branch. Steps to build are pretty simple. It will build an image with pihole:local tag.

  1. Clone this repo
  2. run `./build.sh -f tweak/file_locking
  3. spin up the container

More details on building the image locally: https://github.com/pi-hole/docker-pi-hole?tab=readme-ov-file#building-the-image-locally

Thanks for your help

@pushpinderbal
Copy link

thank you @yubiuser . I'm seeing better results now running inside kubernetes. the container log still shows the error message, however it doesn't crash anymore as it used to on restoring a backup and configuration modification works fine. DNS resolution is OK. is the error message expected?

ERROR: Cannot get exclusive lock for /etc/pihole/pihole.toml: Bad file descriptor

@paimonsoror
Copy link

paimonsoror commented Feb 21, 2025

Same result as @pushpinderbal ! But we are up and running and came straight off of an upgrade from v5. I'm going to do a fresh build next.

Incase anyone wants to test as well: https://hub.docker.com/repository/docker/paimonsoror/pihole/general

edit: Fresh install to PV w/ a post install restore from teleporter backup success as well!

@DL6ER
Copy link
Member

DL6ER commented Feb 21, 2025

Yes, the error message is still part of the deal because your particular setup prevents FTL to get an exclusive lock on the file because this is not supported by the filesystem you are using. I still think it is meaningful as - when others would be writing to this file while FTL is writing, too, it may be damaged/invalid. Debugging of such a case is easier when we know that the mechanism trying to prevent exactly this is not applicable here.

@yubiuser
Copy link
Member

Shouldn't the error then be turned into a warning?

@evulhotdog
Copy link

Confirmed that this resolved my issue.

@EmilyNerdGirl
Copy link

EmilyNerdGirl commented Feb 22, 2025

I just tried image 2025.02.04 after upgrading and didn't resolve the problem for me in k8s with nfs for storage. The pod doesn't come up and restarts.

  [i] Starting FTL configuration
  [i] Password already set in config file
  [i] Starting crond for scheduled scripts. Randomizing times for gravity and update checker

  [i] Ensuring logrotate script exists in /etc/pihole

  [i] Gravity migration checks
  [i] Existing gravity database found - schema will be upgraded if necessary


  [i] pihole-FTL pre-start checks
  [i] Setting capabilities on pihole-FTL where possible
  [i] Applying the following caps to pihole-FTL:
        * CAP_CHOWN
        * CAP_NET_BIND_SERVICE
        * CAP_NET_RAW

  [i] Starting pihole-FTL (no-daemon) as pihole

  [i] Version info:
      Core version is v6.0.3 (Latest: v6.0.3)
      Web version is v6.0.1 (Latest: v6.0.1)
      FTL version is v6.0.2 (Latest: v6.0.2)

2025-02-21 23:59:14.264 UTC [56M] INFO: ########## FTL started on pihole-5bdbb9b846-xtr64! ##########
2025-02-21 23:59:14.264 UTC [56M] INFO: FTL branch: master
2025-02-21 23:59:14.264 UTC [56M] INFO: FTL version: v6.0.2
2025-02-21 23:59:14.264 UTC [56M] INFO: FTL commit: ac500d5f
2025-02-21 23:59:14.264 UTC [56M] INFO: FTL date: 2025-02-21 21:48:20 +0000
2025-02-21 23:59:14.264 UTC [56M] INFO: FTL user: pihole
2025-02-21 23:59:14.265 UTC [56M] INFO: Compiled for linux/amd64 (compiled on CI) using cc (Alpine 14.2.0) 14.2.0
2025-02-21 23:59:14.266 UTC [56M] WARNING: Cannot get exclusive lock for /etc/pihole/pihole.toml: Bad file descriptor
2025-02-21 23:59:14.271 UTC [56M] WARNING: Cannot get exclusive lock for /etc/pihole/pihole.toml: Bad file descriptor
2025-02-21 23:59:14.276 UTC [56M] INFO: Wrote config file:
2025-02-21 23:59:14.276 UTC [56M] INFO:  - 152 total entries
2025-02-21 23:59:14.276 UTC [56M] INFO:  - 149 entries are default
2025-02-21 23:59:14.276 UTC [56M] INFO:  - 3 entries are modified
2025-02-21 23:59:14.276 UTC [56M] INFO:  - 0 entries are forced through environment
2025-02-21 23:59:14.298 UTC [56M] INFO: Parsed config file /etc/pihole/pihole.toml successfully
2025-02-21 23:59:14.298 UTC [56M] INFO: PID file does not exist or not readable
2025-02-21 23:59:14.298 UTC [56M] INFO: No other running FTL process found.
2025-02-21 23:59:14.298 UTC [56M] WARNING: Insufficient permissions to set process priority to -10 (CAP_SYS_NICE required), process priority remains at 0
2025-02-21 23:59:14.305 UTC [56M] INFO: PID of FTL process: 56
2025-02-21 23:59:14.307 UTC [56M] INFO: listening on 0.0.0.0 port 53
2025-02-21 23:59:14.307 UTC [56M] INFO: listening on :: port 53
2025-02-21 23:59:14.310 UTC [56M] INFO: PID of FTL process: 56
2025-02-21 23:59:14.381 UTC [56M] INFO: Database version is 21
2025-02-21 23:59:14.386 UTC [56M] INFO: Database successfully initialized
2025-02-21 23:59:23.702 UTC [127M] WARNING: Cannot get exclusive lock for /etc/pihole/pihole.toml: Bad file descriptor
2025-02-21 23:59:23.727 UTC [131M] WARNING: Cannot get exclusive lock for /etc/pihole/pihole.toml: Bad file descriptor
2025-02-21 23:59:23.742 UTC [135M] WARNING: Cannot get exclusive lock for /etc/pihole/pihole.toml: Bad file descriptor

@evulhotdog
Copy link

evulhotdog commented Feb 22, 2025

I just tried image 2025.02.04 and didn't resolve the problem for me in k8s with nfs for storage.

@EmilyNerdGirl I still get the error, but it is a warning now, but the service itself works as intended, which tbh it likely did before, I just was focusing on the ERROR message.

What isn't working for you?

@EmilyNerdGirl
Copy link

I just tried image 2025.02.04 and didn't resolve the problem for me in k8s with nfs for storage.

@EmilyNerdGirl I still get the error, but it is a warning now, but the service itself works as intended, which tbh it likely did before, I just was focusing on the ERROR message.

What isn't working for you?

The pod is still failing to come up, and eventually fails and is restarted. If I roll back to 2024.7.0 it works fine. Is there an upgrade step I may be missing causing it to fail out?

@evulhotdog
Copy link

evulhotdog commented Feb 22, 2025

I just tried image 2025.02.04 and didn't resolve the problem for me in k8s with nfs for storage.

@EmilyNerdGirl I still get the error, but it is a warning now, but the service itself works as intended, which tbh it likely did before, I just was focusing on the ERROR message.
What isn't working for you?

The pod is still failing to come up, and eventually fails and is restarted. If I roll back to 2024.7.0 it works fine. Is there an upgrade step I may be missing causing it to fail out?

Can you post your full deployment config and any other relevant manifests?

edit:

Mine for context...

kind: Deployment
metadata:
  name: pihole
  namespace: default
  labels:
    app: pihole
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pihole
  template:
    metadata:
      labels:
        app: pihole
    spec:
      containers:
      - name: pihole
        image: ghcr.io/pi-hole/pihole:2025.02.4
        imagePullPolicy: Always
        securityContext:
          capabilities:
            add: ["SYS_NICE"]

        livenessProbe:
          httpGet:
            scheme: HTTP
            path: /admin/
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 30
          timeoutSeconds: 5
          failureThreshold: 3

        resources:
          requests:
            memory: 100Mi
            cpu: "10m"

        env:
        - name: TZ
          value: "America/New_York"
        - name: FTLCONF_webserver_api_password
          valueFrom:
           secretKeyRef:
             name: pihole
             key: password

        # Sets upstream DNS servers
        - name: FTLCONF_dns_upstreams
          value: 10.0.10.1

        # Disable rate limiter
        - name: FTLCONF_dns_rateLimit_count
          value: "0"
        - name: FTLCONF_dns_rateLimit_interval
          value: "0"

        # Allows more than just the local subnet to use the resolver.
        - name: FTLCONF_dns_listeningMode
          value: "all"

        ports:
        - containerPort: 80

        volumeMounts:
        - name: etc-pihole
          mountPath: /etc/pihole
        - name: etc-dnsmasq-d
          mountPath: /etc/dnsmasq.d
        - name: dshm
          mountPath: /dev/shm

      volumes:
        - name: etc-pihole
          persistentVolumeClaim:
            claimName: pvc-pihole-etc-pihole
        - name: etc-dnsmasq-d
          persistentVolumeClaim:
            claimName: pvc-pihole-etc-dnsmasq-d
# Pi-hole runs into some limit on the default image.
# https://www.reddit.com/r/pihole/comments/zgm48b/pihole_reporting_ram_shortage_when_there_is/
# Fix: https://stackoverflow.com/questions/43373463/how-to-increase-shm-size-of-a-kubernetes-container-shm-size-equivalent-of-doc
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 500Mi

@EmilyNerdGirl
Copy link

Thank you for the help!

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "32"
    meta.helm.sh/release-name: pihole
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2024-08-19T20:32:09Z"
  generation: 32
  labels:
    app: pihole
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: pihole
    chart: pihole-2.27.0
    heritage: Helm
    release: pihole
  name: pihole
  namespace: default
  resourceVersion: "728944080"
  uid: 9ae0db1c-c5fd-4c9d-98df-151b44f5db6f
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: pihole
      release: pihole
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
        checksum.config.adlists: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546
        checksum.config.blacklist: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546
        checksum.config.dnsmasqConfig: e515b926ce9520dd95e2f7bf9660e9b478d29cf4febb2d7bd877af307a71ff5
        checksum.config.regex: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546
        checksum.config.staticDhcpConfig: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546
        checksum.config.whitelist: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546
        kubectl.kubernetes.io/restartedAt: "2025-02-18T14:14:29-05:00"
      creationTimestamp: null
      labels:
        app: pihole
        app.kubernetes.io/name: pihole
        release: pihole
    spec:
      containers:
      - env:
        - name: PIHOLE_HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: PIHOLE_PORT
          value: "80"
        - name: PIHOLE_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: pihole-password
        image: ekofr/pihole-exporter:latest
        imagePullPolicy: IfNotPresent
        name: exporter
        ports:
        - containerPort: 9617
          name: prometheus
          protocol: TCP
        resources:
          limits:
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - env:
        - name: WEB_PORT
          value: "80"
        - name: VIRTUAL_HOST
          value: pi.hole
        - name: WEBPASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: pihole-password
        - name: PIHOLE_DNS_
          value: 192.168.0.1
        - name: RATE_LIMIT
          value: 0/0
        image: pihole/pihole:2025.02.4
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 10
          httpGet:
            path: /admin/index.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: pihole
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 53
          name: dns
          protocol: TCP
        - containerPort: 53
          name: dns-udp
          protocol: UDP
        - containerPort: 443
          name: https
          protocol: TCP
        - containerPort: 67
          name: client-udp
          protocol: UDP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /admin/index.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: "6"
            memory: 4Gi
          requests:
            cpu: "2"
            memory: 4Gi
        securityContext:
          privileged: false
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/pihole
          name: config
        - mountPath: /etc/dnsmasq.d/02-custom.conf
          name: custom-dnsmasq
          subPath: 02-custom.conf
        - mountPath: /etc/addn-hosts
          name: custom-dnsmasq
          subPath: addn-hosts
      dnsConfig:
        nameservers:
        - 127.0.0.1
        - 1.1.1.1
      dnsPolicy: None
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: config
        persistentVolumeClaim:
          claimName: pihole
      - configMap:
          defaultMode: 420
          name: pihole-custom-dnsmasq
        name: custom-dnsmasq

@Maitlandk
Copy link

I am also seeing the same issue while using the pihole helm chart in k3s and using NFS based storage via local-path-provisioner

@evulhotdog
Copy link

evulhotdog commented Feb 22, 2025

I think you should read the release notes...

Edit: The helm chart may also not be updated for v6, either. 🤷‍♀

@EmilyNerdGirl
Copy link

I think you should read the release notes...

Edit: The helm chart may also not be updated for v6, either. 🤷‍♀

I was just about to post this... Looks like the env variables in the old helm chart are incorrect. Going to convert over to a regular deployment and see what happens 🤞

@Maitlandk
Copy link

Yeah I think the issue is the variable renaming from the helm chart now which isn't related to this. I'm not set up to test at the moment though

@EmilyNerdGirl
Copy link

Thank you @evulhotdog ! Rewriting it using a deployment and I am up and running again :)

@paimonsoror
Copy link

paimonsoror commented Feb 22, 2025

Fwiw in have a pr for updates to Mojo's helm charts... Just waiting for this fix confirmation before i mo e it out of draft

MoJo2600/pihole-kubernetes#343

@vincentDcmps
Copy link

always got issue with image 2025.02.4 with docker and nfs volume

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests