Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xvsec: use mutex lock instead of spin lock for Linux v6.x #217

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cyyself
Copy link

@cyyself cyyself commented Jun 16, 2023

The xvsec takes about 50 seconds to replace a 30MB PR block bitstream on VCU118. When I use Linux v6.3 with an xvsec patch from #142 on my host, the host usually jams when using xvsec to replace a bitstream. IMHO, using spin lock for quite a long time will harm the host system, which makes my host stop responding.

When I use QEMU and passthrough my FPGA PCIe to the guest and use xvsec to program bitstream, although the system still works, there are some bugs printed in dmesg:

[  823.510494] Xilinx VSEC Library v2020.2.1
[  823.510516] xvsec_initialize : dev_ctx address : 00000000da44524c
[  823.510576] The major number is 243, bus no : 8, dev no : 0
[  823.510699] xvsec_cdev_create : xvsec0800
[  823.510700] xvsec_mcap_module_init: mcap_ctx address : 000000003a70b8e5
[  823.510720] xvsec_mcap_get_revision: Version details vsec_id:1, rev_id: 1
[  823.510721] The major number is 242, bus no : 8, dev no : 0
[  823.510840] xvsec_cdev_create : xvsec0800_mcap
[  823.510842] xvsec_initialize() success for device 1
[  823.510842] xvsec_drv_init : Success
[  826.899315] xvsec_gen_open success
[  826.899341] xvsec_mcap_open: mcap_ctx address : 000000003a70b8e5
[  826.899350] xvsec_mcap_get_revision: vsec_id:1, rev_id: 1
[  826.899602] Ctrl Data : 0x364, 0x10101
[  826.899607] Bit File Name : /home/cyy/example_pblock_partition_partial.bit
[  826.899610] Before fopen
[  826.899611] file name : 000000001b4eff79
[  826.899621] After fopen
[  826.899622] After getsize
[  826.899700] found sync pattern : 146
[  847.899847] rcu: INFO: rcu_preempt self-detected stall on CPU
[  847.899860] rcu:     20-....: (5250 ticks this GP) idle=d954/1/0x4000000000000000 softirq=4578/5647 fqs=2484
[  847.899869] rcu:     (t=5250 jiffies g=41237 q=19342 ncpus=24)
[  847.899877] CPU: 20 PID: 3460 Comm: xvsecctl Tainted: G            E      6.3.0-1-amd64 #1  Debian 6.3.7-1
[  847.899878] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
[  847.899879] RIP: 0010:pci_mmcfg_write+0xb3/0xd0
[  847.899884] Code: 5d 41 5e 41 5f c3 cc cc cc cc 48 8d 14 28 44 89 e0 66 89 02 eb de 48 8d 14 28 44 89 e0 88 02 eb d3 48 8d 14 28 44 89 e0 89 02 <eb> c8 e8 a6 7c 61 ff b8 ea ff ff ff eb c3 b8 ea ff ff ff c3 cc cc
[  847.899885] RSP: 0018:ffffa0ca80bcbc08 EFLAGS: 00000246
[  847.899887] RAX: 0000000000000000 RBX: 0000000000800000 RCX: 0000000000000368
[  847.899916] RDX: ffffa0ca90800368 RSI: 0000000000000008 RDI: 0000000000000000
[  847.899917] RBP: 0000000000000368 R08: 0000000000000004 R09: 0000000000000000
[  847.899917] R10: 0000000000000008 R11: ffffffff8734c750 R12: 0000000000000000
[  847.899917] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
[  847.899918] FS:  00007f97fc179740(0000) GS:ffff8c88fbd00000(0000) knlGS:0000000000000000
[  847.899919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  847.899919] CR2: 00007f97fc30b1b0 CR3: 000000011342e000 CR4: 0000000000750ee0
[  847.899922] PKRU: 55555554
[  847.899922] Call Trace:
[  847.899924]  <IRQ>
[  847.899927]  ? rcu_dump_cpu_stacks+0xc4/0x100
[  847.899930]  ? rcu_sched_clock_irq+0x555/0x1210
[  847.899933]  ? sched_slice+0x87/0x140
[  847.899936]  ? __cgroup_account_cputime_field+0x61/0x90
[  847.899938]  ? update_process_times+0x7b/0xb0
[  847.899939]  ? tick_sched_handle+0x22/0x60
[  847.899941]  ? tick_sched_timer+0x73/0x90
[  847.899942]  ? __pfx_tick_sched_timer+0x10/0x10
[  847.899943]  ? __hrtimer_run_queues+0x10f/0x2b0
[  847.899944]  ? hrtimer_interrupt+0x102/0x240
[  847.899945]  ? __sysvec_apic_timer_interrupt+0x80/0x170
[  847.899947]  ? sysvec_apic_timer_interrupt+0x9d/0xd0
[  847.899948]  </IRQ>
[  847.899949]  <TASK>
[  847.899949]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  847.899951]  ? __pfx_pci_mmcfg_write+0x10/0x10
[  847.899953]  ? pci_mmcfg_write+0xb3/0xd0
[  847.899954]  ? pci_mmcfg_write+0x4e/0xd0
[  847.899958]  xvsec_mcap_program+0x90c/0x9a0 [xvsec]
[  847.899963]  ? _printk+0x64/0x80
[  847.899966]  xvsec_mcap_program_bitstream+0x1a9/0x2b0 [xvsec]
[  847.899970]  xvsec_ioc_prog_bitstream+0xbc/0xf0 [xvsec]
[  847.899974]  xvsec_mcap_ioctl+0x35/0x50 [xvsec]
[  847.899977]  __x64_sys_ioctl+0x91/0xd0
[  847.899988]  do_syscall_64+0x5c/0xc0
[  847.899990]  ? syscall_exit_to_user_mode+0x1b/0x40
[  847.899991]  ? do_syscall_64+0x6b/0xc0
[  847.899992]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  847.899994] RIP: 0033:0x7f97fc279afb
[  847.899996] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  847.899997] RSP: 002b:00007ffec39f2f40 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  847.899998] RAX: ffffffffffffffda RBX: 0000000000000038 RCX: 00007f97fc279afb
[  847.899998] RDX: 00007ffec39f2fa0 RSI: 00000000c0086d06 RDI: 0000000000000004
[  847.899999] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  847.899999] R10: 4deb7e4047e070fa R11: 0000000000000246 R12: 000055ae10e7e760
[  847.899999] R13: 0000000000000000 R14: 000055ae1098d1c0 R15: 0000000000000000
[  847.900000]  </TASK>
[  872.320000] watchdog: BUG: soft lockup - CPU#20 stuck for 45s! [xvsecctl:3460]
[  872.320014] Modules linked in: xvsec(E) binfmt_misc intel_rapl_msr intel_rapl_common intel_pmc_core kvm_intel snd_hda_codec_generic ledtrig_audio snd_hda_intel kvm snd_intel_dspcfg snd_intel_sdw_acpi irqbypass snd_hda_codec ghash_clmulni_intel snd_hda_core sha512_ssse3 snd_hwdep sha512_generic nls_ascii aesni_intel snd_pcm crypto_simd cryptd virtio_gpu nls_cp437 snd_timer iTCO_wdt vfat intel_pmc_bxt virtio_dma_buf rapl drm_shmem_helper iTCO_vendor_support snd fat drm_kms_helper virtio_rng pcspkr watchdog soundcore rng_core virtio_console virtio_balloon joydev button sg serio_raw evdev drm fuse dm_mod loop efi_pstore configfs efivarfs qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sr_mod cdrom ahci libahci virtio_net xhci_pci xhci_hcd net_failover virtio_blk virtio_scsi failover libata usbcore scsi_mod virtio_pci crct10dif_pclmul i2c_i801 crct10dif_common psmouse virtio_pci_legacy_dev crc32_pclmul virtio_pci_modern_dev crc32c_intel virtio lpc_ich i2c_smbus
[  872.320051]  scsi_common usb_common virtio_ring [last unloaded: xvsec(E)]
[  872.320054] CPU: 20 PID: 3460 Comm: xvsecctl Tainted: G            E      6.3.0-1-amd64 #1  Debian 6.3.7-1
[  872.320055] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
[  872.320056] RIP: 0010:pci_mmcfg_write+0xb3/0xd0
[  872.320061] Code: 5d 41 5e 41 5f c3 cc cc cc cc 48 8d 14 28 44 89 e0 66 89 02 eb de 48 8d 14 28 44 89 e0 88 02 eb d3 48 8d 14 28 44 89 e0 89 02 <eb> c8 e8 a6 7c 61 ff b8 ea ff ff ff eb c3 b8 ea ff ff ff c3 cc cc
[  872.320062] RSP: 0018:ffffa0ca80bcbc08 EFLAGS: 00000246
[  872.320063] RAX: 0000000040000000 RBX: 0000000000800000 RCX: 0000000000000368
[  872.320064] RDX: ffffa0ca90800368 RSI: 0000000000000008 RDI: 0000000000000000
[  872.320065] RBP: 0000000000000368 R08: 0000000000000004 R09: 0000000040000000
[  872.320065] R10: 0000000000000008 R11: ffffffff8734c750 R12: 0000000040000000
[  872.320066] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
[  872.320066] FS:  00007f97fc179740(0000) GS:ffff8c88fbd00000(0000) knlGS:0000000000000000
[  872.320067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  872.320068] CR2: 00007f97fc30b1b0 CR3: 000000011342e000 CR4: 0000000000750ee0
[  872.320070] PKRU: 55555554
[  872.320070] Call Trace:
[  872.320072]  <IRQ>
[  872.320074]  ? watchdog_timer_fn+0x220/0x290
[  872.320076]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  872.320077]  ? __hrtimer_run_queues+0x10f/0x2b0
[  872.320079]  ? hrtimer_interrupt+0x102/0x240
[  872.320080]  ? __sysvec_apic_timer_interrupt+0x80/0x170
[  872.320082]  ? sysvec_apic_timer_interrupt+0x9d/0xd0
[  872.320083]  </IRQ>
[  872.320083]  <TASK>
[  872.320084]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  872.320086]  ? __pfx_pci_mmcfg_write+0x10/0x10
[  872.320088]  ? pci_mmcfg_write+0xb3/0xd0
[  872.320090]  ? pci_mmcfg_write+0x4e/0xd0
[  872.320092]  xvsec_mcap_program+0x90c/0x9a0 [xvsec]
[  872.320096]  ? _printk+0x64/0x80
[  872.320098]  xvsec_mcap_program_bitstream+0x1a9/0x2b0 [xvsec]
[  872.320102]  xvsec_ioc_prog_bitstream+0xbc/0xf0 [xvsec]
[  872.320106]  xvsec_mcap_ioctl+0x35/0x50 [xvsec]
[  872.320108]  __x64_sys_ioctl+0x91/0xd0
[  872.320112]  do_syscall_64+0x5c/0xc0
[  872.320113]  ? syscall_exit_to_user_mode+0x1b/0x40
[  872.320114]  ? do_syscall_64+0x6b/0xc0
[  872.320115]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  872.320116] RIP: 0033:0x7f97fc279afb
[  872.320118] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  872.320119] RSP: 002b:00007ffec39f2f40 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  872.320120] RAX: ffffffffffffffda RBX: 0000000000000038 RCX: 00007f97fc279afb
[  872.320120] RDX: 00007ffec39f2fa0 RSI: 00000000c0086d06 RDI: 0000000000000004
[  872.320121] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  872.320121] R10: 4deb7e4047e070fa R11: 0000000000000246 R12: 000055ae10e7e760
[  872.320122] R13: 0000000000000000 R14: 000055ae1098d1c0 R15: 0000000000000000
[  872.320123]  </TASK>
[  881.676662] xvsec_gen_close success

After replacing spin_lock with mutex_lock , everything works well on Linux Kernel v6.3. Use my branch with other patches mentioned in #210. I can run firesim on VCU118 on Linux v6.3 with the correct XDMA and XVSEC.

@cyyself cyyself changed the title xvsec: use mutex lock instead of spin lock xvsec: use mutex lock instead of spin lock for Linux v6.x Jun 16, 2023
@@ -221,7 +221,7 @@ static int xvsec_gen_open(struct inode *inode, struct file *filep)
dev_ctx = container_of(inode->i_cdev,
struct context, generic_cdev.cdev);

spin_lock(&dev_ctx->lock);
mutex_lock(&dev_ctx->lock);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mutex_lock_interruptible() is more appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants