fix: reload on resetting to defaults #159

BrennanPaciorek · 2023-07-18T15:54:52Z

Enhancement:
Make resetting to defaults reload instead of restart firewalld

Reason:
Reloading in firewalld should successfully complete the configuration reset, and restarting adds downtime which can be used to open a connection that persists after firewalld has finishes restarting; this connection can be used to bypass firewall rules, since firewalld will not block traffic from active connections.

Result:
Minimal downtime when using previous: replaced

Addresses an issue brought up in #140, where due to the restart on resetting to defaults, the feature may not be suitable for production environments.

Signed-off-by: Brennan Paciorek <[email protected]>

codecov · 2023-07-18T15:57:42Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (e8e8769) 53.62% compared to head (29a1126) 53.62%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #159   +/-   ##
=======================================
  Coverage   53.62%   53.62%           
=======================================
  Files           2        2           
  Lines         800      800           
=======================================
  Hits          429      429           
  Misses        371      371

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

richm · 2023-07-18T19:36:33Z

how can we test this?

BrennanPaciorek · 2023-07-18T20:22:57Z

how can we test this?

If you mean verifying that this will not introduce bugs into the feature, we could use firewall-cmd with ansible.builtin.command. This would involve:

Using the current implementation of previous: replaced to reset the firewall configuration
getting some representation of the active firewall implementation (something like a hash of nft list ruleset)
restarting firewalld service instead of reloading
get the representation of the active firewall implementation again
Compare the two stored values
If nft list ruleset output is less predictable than I think, testing it may be more difficult than this.

If you mean testing minimal downtime, systemctl reload firewalld uses firewalld's reload implementation to reload configuration from files and systemctl restart firewalld turns the firewall off and back on (leaving the firewall off for a few seconds, which is usually a bad idea for a production environment).

BrennanPaciorek · 2023-07-18T20:42:22Z

If nft list ruleset output is less predictable than I think, testing it may be more difficult than this.

It appears that restart and reload produce functionally identical configurations, but the rules tend to be reordered slightly within the same rule chains, so I'll need to find a more accurate way to compare two configurations.

richm · 2023-07-18T20:42:28Z

how can we test this?

If you mean verifying that this will not introduce bugs into the feature, we could use firewall-cmd with ansible.builtin.command. This would involve:
* Using the current implementation of `previous: replaced` to reset the firewall configuration

* getting some representation of the active firewall implementation (something like a hash of `nft list ruleset`)

* restarting firewalld service instead of reloading

* get the representation of the active firewall implementation again

* Compare the two stored values
  If `nft list ruleset` output is less predictable than I think, testing it may be more difficult than this.
If you mean testing minimal downtime, systemctl reload firewalld uses firewalld's reload implementation to reload configuration from files and systemctl restart firewalld turns the firewall off and back on (leaving the firewall off for a few seconds, which is usually a bad idea for a production environment).

I mean the latter - check that the fix ensures minimal downtime - is there a way we can automate this - if not, how do we tell QE what to test?

BrennanPaciorek · 2023-07-18T20:55:03Z

is there a way we can automate this

Testing service downtime (or uptime) wouldn't work, because with reload, the service technically does not stop.

An idea would to be change a zone to DROP (or set a ICMP block) and place an interface on this zone, then spam a set number of packets (or more accurately, play them at a set interval) that would be dropped under this configuration. If the packets get sent back, it can be taken to mean the firewall is down, if they are lost, you can take it to mean the firewall is up and dropping the packets. Whichever loses the most packets can be understood as having the least downtime.

or one can up a service like a simple http server and block it with the firewall, and do a similar test.

Does this seem like a valid test?

richm · 2023-07-18T21:14:54Z

is there a way we can automate this

Testing service downtime (or uptime) wouldn't work, because with reload, the service technically does not stop.

An idea would to be change a zone to DROP (or set a ICMP block) and place an interface on this zone, then spam a set number of packets (or more accurately, play them at a set interval) that would be dropped under this configuration. If the packets get sent back, it can be taken to mean the firewall is down, if they are lost, you can take it to mean the firewall is up and dropping the packets. Whichever loses the most packets can be understood as having the least downtime.

or one can up a service like a simple http server and block it with the firewall, and do a similar test.

Does this seem like a valid test?

They both seem like valid tests - by "simple http server", maybe nc -l -p 80 -o file? then we could look at file and see if packets were dropped?

BrennanPaciorek · 2023-07-19T22:40:06Z

They both seem like valid tests - by "simple http server", maybe nc -l -p 80 -o file? then we could look at file and see if packets were dropped?

nc would be better for mimicking production environments, while ping is probably a more convenient tool for getting the necessary metrics (%packet loss, estimated downtime)

One issue so far in implementing this in a tox-friendly manner is that firewalld accepts outbound traffic on the loopback interface before evaluating policies (where outbound traffic is blocked), and disabling this rule takes some time (~100ms), which may result in more "downtime" than a production environment (or even a test environment with less restrictions), where this limitation of needing to use the loopback interface for tests does not exist.

The other issue with these tests is that using previous: replaced feature would undo the firewall rules needed to test the effectiveness of the modified feature.
Would comparisons in downtime between systemctl reload firewalld.service and systemctl restart firewalld.service suffice for testing this? If not, we can use a modified reset script to re-add any required rules before restarting or reloading firewalld.

richm · 2023-07-19T23:14:32Z

They both seem like valid tests - by "simple http server", maybe nc -l -p 80 -o file? then we could look at file and see if packets were dropped?

nc would be better for mimicking production environments, while ping is probably a more convenient tool for getting the necessary metrics (%packet loss, estimated downtime)

One issue so far in implementing this in a tox-friendly manner is that firewalld accepts outbound traffic on the loopback interface before evaluating policies (where outbound traffic is blocked), and disabling this rule takes some time (~100ms), which may result in more "downtime" than a production environment (or even a test environment with less restrictions), where this limitation of needing to use the loopback interface for tests does not exist.

The other issue with these tests is that using previous: replaced feature would undo the firewall rules needed to test the effectiveness of the modified feature. Would comparisons in downtime between systemctl reload firewalld.service and systemctl restart firewalld.service suffice for testing this?

Yes, I think that would be fine.

If not, we can use a modified reset script to re-add any required rules before restarting or reloading firewalld.

Signed-off-by: Brennan Paciorek <[email protected]>

tests/files/test_ping.sh

BrennanPaciorek · 2023-07-20T17:57:54Z

Okay I've implemented the described test in a way that is compatible with the integration tests, I can move this to another file, but the test takes some time and probably shouldn't be run every PR, since its not testing any code prone to change, just highlighting the security difference made with the one line change

tests/files/test_ping.sh

.sanity-ansible-ignore-2.12.txt

BrennanPaciorek · 2023-07-20T18:36:39Z

tests/files/test_ping.sh

+  podman network create --subnet 172.16.1.0/24 --gateway 172.16.1.1 --interface-name podmanbr0 podmanbr0
+  trap "podman network rm podmanbr0" EXIT
+  imageid=$(podman build -q /tmp)
+  podman run -d --privileged --net podmanbr0 --ip 172.16.1.2 --name test-firewalld --rm "$imageid" /usr/lib/systemd/systemd || exit 1


The --rm option was already provided, should I move it?

Co-authored-by: Richard Megginson <[email protected]>

richm · 2023-07-20T19:12:05Z

[citest]

BrennanPaciorek · 2023-07-20T20:46:50Z

tests/files/test_ping.sh

+ping -c 500 -i 0.01 172.16.1.2 1>/tmp/ping1 2>/dev/null &
+pid="$!"
+trap "rm -f /tmp/ping1" EXIT
+podman exec test-firewalld systemctl restart firewalld.service
+wait "$pid"
+
+ping -c 500 -i 0.01 172.16.1.2 1>/tmp/ping2 2>/dev/null &
+pid="$!"
+trap "rm -f /tmp/ping2" EXIT
+podman exec test-firewalld systemctl reload firewalld.service
+wait "$pid"


The order of this sometimes introduces a fail case where a connection is established from systemctl restart firewalld.service, I swapped the two tests locally, but am waiting for CI tests to run.

BrennanPaciorek · 2023-07-20T20:52:28Z

Locally, I can't seem to reproduce the same error that is showing on these CI tests. Maybe it has to do with the order in which the tests are being run?

richm · 2023-07-20T20:52:57Z

Locally, I can't seem to reproduce the same error that is showing on these CI tests. Maybe it has to do with the order in which the tests are being run?

Yes, that must be what it is

test_ping.sh could carry a connection open if restart is done first, swapping the two fixes this, because there should be no downtime with systemctl reload firewalld

tests/files/test_ping.sh

__firewall_service no longer necessary for files/get_files_checksums.sh, so it was removed

richm · 2023-07-21T14:21:50Z

[citest]

tests/tests_reload_on_reset.yml

BrennanPaciorek · 2023-07-21T16:43:12Z

Modified test one more time, as some platforms running the test incorrectly reported packet loss due to the timeout I had set being too short.

richm · 2023-07-21T17:01:09Z

[citest]

fix: reload on resetting to defaults

511690b

Signed-off-by: Brennan Paciorek <[email protected]>

richm approved these changes Jul 18, 2023

View reviewed changes

BrennanPaciorek added 2 commits July 20, 2023 13:46

test: reload on previous replaced quality test

ba943d7

Signed-off-by: Brennan Paciorek <[email protected]>

tests - fix linter issues

a79e91b

BrennanPaciorek commented Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

tests/files/test_ping.sh Outdated Show resolved Hide resolved

test: ignore ansible test shebang check

0289da2

BrennanPaciorek force-pushed the reload-on-reset branch from ff7fb22 to 0289da2 Compare July 20, 2023 18:27

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

richm reviewed Jul 20, 2023

View reviewed changes

.sanity-ansible-ignore-2.12.txt Outdated Show resolved Hide resolved

BrennanPaciorek commented Jul 20, 2023

View reviewed changes

BrennanPaciorek and others added 7 commits July 20, 2023 14:36

Update tests/files/test_ping.sh

e363271

Co-authored-by: Richard Megginson <[email protected]>

Update tests/files/test_ping.sh

3344e93

Co-authored-by: Richard Megginson <[email protected]>

Update tests/files/test_ping.sh

1262395

Co-authored-by: Richard Megginson <[email protected]>

Update tests/files/test_ping.sh

15dd90f

Co-authored-by: Richard Megginson <[email protected]>

Update .sanity-ansible-ignore-2.12.txt

b4e21d2

Update .sanity-ansible-ignore-2.13.txt

1517156

Update .sanity-ansible-ignore-2.14.txt

68d20ab

BrennanPaciorek added 5 commits July 20, 2023 14:57

Update .sanity-ansible-ignore-2.12.txt

eea02b2

Update .sanity-ansible-ignore-2.13.txt

2dd12c5

Update .sanity-ansible-ignore-2.14.txt

233158e

Update .sanity-ansible-ignore-2.15.txt

454c89c

Update .sanity-ansible-ignore-2.9.txt

1e94a3d

BrennanPaciorek commented Jul 20, 2023

View reviewed changes

test: change order in which ping tests are run

6513678

test_ping.sh could carry a connection open if restart is done first, swapping the two fixes this, because there should be no downtime with systemctl reload firewalld

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Show resolved Hide resolved

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

BrennanPaciorek added 2 commits July 20, 2023 18:09

fix - change systemctl reload to firewall-cmd --reload

05fc94b

tests: make shellcheck not fail

e3cad3e

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Show resolved Hide resolved

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

richm reviewed Jul 20, 2023

View reviewed changes

tests/files/test_ping.sh Outdated Show resolved Hide resolved

test - debug script, reduce pings and add timeout

0098fa8

BrennanPaciorek force-pushed the reload-on-reset branch from 3a5339e to 0098fa8 Compare July 21, 2023 14:02

files: remove unnecessary variable from reset script args

d6d80d2

__firewall_service no longer necessary for files/get_files_checksums.sh, so it was removed

richm reviewed Jul 21, 2023

View reviewed changes

tests/tests_reload_on_reset.yml Show resolved Hide resolved

BrennanPaciorek added 3 commits July 21, 2023 12:34

ci: skip tests/tests_reload_on_reset.yml on EL7

cee01c7

test: modify tests/files/test_ping.sh parameters

05b0bbd

test: remove trailing spaces (ansiblelint)

29a1126

richm approved these changes Jul 21, 2023

View reviewed changes

richm merged commit cb392b6 into linux-system-roles:main Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reload on resetting to defaults #159

fix: reload on resetting to defaults #159

BrennanPaciorek commented Jul 18, 2023 •

edited

Loading

codecov bot commented Jul 18, 2023 •

edited

Loading

richm commented Jul 18, 2023

BrennanPaciorek commented Jul 18, 2023 •

edited

Loading

BrennanPaciorek commented Jul 18, 2023

richm commented Jul 18, 2023

BrennanPaciorek commented Jul 18, 2023 •

edited

Loading

richm commented Jul 18, 2023

BrennanPaciorek commented Jul 19, 2023 •

edited

Loading

richm commented Jul 19, 2023

BrennanPaciorek commented Jul 20, 2023 •

edited

Loading

BrennanPaciorek Jul 20, 2023 •

edited

Loading

richm commented Jul 20, 2023

BrennanPaciorek Jul 20, 2023

BrennanPaciorek commented Jul 20, 2023

richm commented Jul 20, 2023

richm commented Jul 21, 2023

BrennanPaciorek commented Jul 21, 2023

richm commented Jul 21, 2023

fix: reload on resetting to defaults #159

fix: reload on resetting to defaults #159

Conversation

BrennanPaciorek commented Jul 18, 2023 • edited Loading

codecov bot commented Jul 18, 2023 • edited Loading

Codecov Report

richm commented Jul 18, 2023

BrennanPaciorek commented Jul 18, 2023 • edited Loading

BrennanPaciorek commented Jul 18, 2023

richm commented Jul 18, 2023

BrennanPaciorek commented Jul 18, 2023 • edited Loading

richm commented Jul 18, 2023

BrennanPaciorek commented Jul 19, 2023 • edited Loading

richm commented Jul 19, 2023

BrennanPaciorek commented Jul 20, 2023 • edited Loading

BrennanPaciorek Jul 20, 2023 • edited Loading

Choose a reason for hiding this comment

richm commented Jul 20, 2023

BrennanPaciorek Jul 20, 2023

Choose a reason for hiding this comment

BrennanPaciorek commented Jul 20, 2023

richm commented Jul 20, 2023

richm commented Jul 21, 2023

BrennanPaciorek commented Jul 21, 2023

richm commented Jul 21, 2023

BrennanPaciorek commented Jul 18, 2023 •

edited

Loading

codecov bot commented Jul 18, 2023 •

edited

Loading

BrennanPaciorek commented Jul 18, 2023 •

edited

Loading

BrennanPaciorek commented Jul 18, 2023 •

edited

Loading

BrennanPaciorek commented Jul 19, 2023 •

edited

Loading

BrennanPaciorek commented Jul 20, 2023 •

edited

Loading

BrennanPaciorek Jul 20, 2023 •

edited

Loading