Skip to main content

Hey!


Is there any support planned for 24.04? I don’t have a machine available with the older Ubuntu version. It says native support on 22.04 here but not that it is completely unsupported.

 

Thanks 

➜  voyager-sdk git:(release/v1.2.5) ./install.sh --all --media --user florianzarubanat]gmail.com --token <token>

Install/update prerequisite packages required by the installer itself (y/n): y
Install/update prerequisite packages required by the installer itself (y/n): y
ERROR: cfg/config-ubuntu-2404-amd64.yaml: File not found

 

Hi there, welcome to the community!

We’ve actually been discussing that with ​@quant.geek, regarding 24.04 in a Docker container:

Should have more on this subject soon, but just to get you up to date in the meantime 👍


Might be an update on this sooner then expected! 😃

 


Awesome, thanks. I got the example benchmark running. Two observations:

 

  1. I think the Docker guide isn’t entirely correct. I also needed to pass the device into the docker container to make it available. The naming of the card is a bit unfortunate (as it uses colons) so I first needed to create a symbolic link without colons.
  2. I observed (in dmesg) that there is some sort of fault in the PCIe driver. If you are interested I can post it here. Since everything seems to work I didn’t follow up on it though.

thanks


Awesome, thanks. I got the example benchmark running. Two observations:

 

  1. I think the Docker guide isn’t entirely correct. I also needed to pass the device into the docker container to make it available. The naming of the card is a bit unfortunate (as it uses colons) so I first needed to create a symbolic link without colons.
  2. I observed (in dmesg) that there is some sort of fault in the PCIe driver. If you are interested I can post it here. Since everything seems to work I didn’t follow up on it though.

thanks

Great to hear it’s working ​@zarubaf , but any feedback or findings you came across (like changes we might want to look at in the Docker documentation, use of colons, and what you found with the driver) would absolutely be helpful! Please go for it!


Sure happy to help with any input :-) 

>198626.875237] axl 0000:b3:00.0: Unregister directory 0000:b3:00.0
>198626.875314] axl 0000:b3:00.0: Unregistered triton-0:b3:0 (0 0)
>198626.875317] axl 0000:b3:00.0: Release dma mem triton-0:b3:0
>198626.962570] ------------- cut here ]------------
>198626.962576] axl 0000:b3:00.0: disabling already-disabled device
>198626.962602] WARNING: CPU: 1 PID: 453781 at drivers/pci/pci.c:2254 pci_disable_device+0xac/0xc0
>198626.962615] Modules linked in: metis(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c tls rpcsec_gss_krb5 nfsv4 nfs lockd grace netfs snd_seq_dummy snd_hrtimer overlay qrtr binfmt_misc zfs(PO) spl(O) nvidia_uvm(PO) nvidia_drm(PO) nvidia_modeset(PO) intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common dell_pc platform_profile nvidia(PO) skx_edac skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp snd_soc_avs snd_hda_codec_realtek snd_soc_hda_codec snd_hda_codec_generic snd_hda_ext_core snd_hda_scodec_component snd_soc_core snd_hda_codec_hdmi snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg coretemp snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm_intel snd_hwdep snd_pcm snd_seq_midi dell_wmi snd_seq_midi_event kvm snd_rawmidi snd_seq dell_smm_hwmon drm_ttm_helper rapl dell_smbios snd_seq_device nls_iso8859_1 snd_timer ttm dcdbas i2c_i801
>198626.962735] sparse_keymap dell_wmi_descriptor intel_cstate wmi_bmof intel_wmi_thunderbolt snd video i2c_mux mei_me soundcore ftdi_sio i2c_smbus ioatdma usbserial mei dca acpi_tad joydev input_leds mac_hid serio_raw msr parport_pc auth_rpcgss ppdev lp parport efi_pstore sunrpc nfnetlink dmi_sysfs ip_tables x_tables autofs4 8021q garp mrp stp llc dm_crypt hid_generic usbhid hid uas usb_storage nvme nvme_core nvme_auth crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 vmd ahci xhci_pci e1000e libahci xhci_pci_renesas wmi aesni_intel crypto_simd cryptd
>198626.962830] CPU: 1 UID: 0 PID: 453781 Comm: tee Tainted: P W OE 6.11.0-21-generic #21~24.04.1-Ubuntu
>198626.962840] Tainted: P]=PROPRIETARY_MODULE, W]=WARN, O]=OOT_MODULE, E]=UNSIGNED_MODULE
>198626.962842] Hardware name: Dell Inc. Precision 5820 Tower/06JWJY, BIOS 2.8.0 01/15/2021
>198626.962845] RIP: 0010:pci_disable_device+0xac/0xc0
>198626.962851] Code: 4d 85 e4 75 07 4c 8b a3 c8 00 00 00 48 8d bb c8 00 00 00 e8 36 75 22 00 4c 89 e2 48 c7 c7 50 5e 4f a8 48 89 c6 e8 e4 e0 75 ff <0f> 0b e9 66 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90
>198626.962856] RSP: 0018:ffffb827662b7a60 EFLAGS: 00010246
>198626.962862] RAX: 0000000000000000 RBX: ffff98240fbff000 RCX: 0000000000000000
>198626.962865] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>198626.962868] RBP: ffffb827662b7a70 R08: 0000000000000000 R09: 0000000000000000
>198626.962871] R10: 0000000000000000 R11: 0000000000000000 R12: ffff982401e2d240
>198626.962875] R13: ffffb827662b7ae0 R14: ffff98240fbff0c8 R15: ffff98240fbff380
>198626.962878] FS: 000076b474a44740(0000) GS:ffff98335fc80000(0000) knlGS:0000000000000000
>198626.962882] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>198626.962886] CR2: 000070dabb7fbf40 CR3: 0000000369468003 CR4: 00000000003706f0
>198626.962890] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>198626.962893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>198626.962896] Call Trace:
>198626.962899] <TASK>
>198626.962904] ? show_regs+0x6c/0x80
>198626.962911] ? __warn+0x88/0x140
>198626.962916] ? pci_disable_device+0xac/0xc0
>198626.962921] ? report_bug+0x182/0x1b0
>198626.962931] ? handle_bug+0x6e/0xb0
>198626.962936] ? exc_invalid_op+0x18/0x80
>198626.962941] ? asm_exc_invalid_op+0x1b/0x20
>198626.962952] ? pci_disable_device+0xac/0xc0
>198626.962956] pcim_disable_device+0x2e/0x50
>198626.962965] devm_action_release+0x12/0x30
>198626.962974] release_nodes+0x42/0xd0
>198626.962978] devres_release_all+0x97/0xe0
>198626.962987] device_unbind_cleanup+0x12/0x80
>198626.962994] device_release_driver_internal+0x230/0x270
>198626.963003] device_release_driver+0x12/0x20
>198626.963010] pci_stop_bus_device+0x92/0xc0
>198626.963019] pci_stop_bus_device+0x30/0xc0
>198626.963026] pci_stop_and_remove_bus_device_locked+0x1a/0x40
>198626.963033] remove_store+0x8f/0xa0
>198626.963039] dev_attr_store+0x14/0x40
>198626.963044] sysfs_kf_write+0x3b/0x60
>198626.963051] kernfs_fop_write_iter+0x14c/0x1e0
>198626.963057] vfs_write+0x2a1/0x490
>198626.963065] ksys_write+0x73/0x100
>198626.963070] __x64_sys_write+0x19/0x30
>198626.963075] x64_sys_call+0x7e/0x25f0
>198626.963082] do_syscall_64+0x7e/0x170
>198626.963089] ? __handle_mm_fault+0x62f/0x770
>198626.963099] ? __count_memcg_events+0x86/0x160
>198626.963106] ? count_memcg_events.constprop.0+0x2a/0x50
>198626.963114] ? handle_mm_fault+0x1df/0x2d0
>198626.963122] ? do_user_addr_fault+0x5d5/0x870
>198626.963128] ? irqentry_exit_to_user_mode+0x43/0x250
>198626.963136] ? irqentry_exit+0x43/0x50
>198626.963143] ? clear_bhb_loop+0x15/0x70
>198626.963150] ? clear_bhb_loop+0x15/0x70
>198626.963156] ? clear_bhb_loop+0x15/0x70
>198626.963162] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>198626.963168] RIP: 0033:0x76b474b5b887
>198626.963173] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>198626.963177] RSP: 002b:00007ffc3bb1b968 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>198626.963182] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 000076b474b5b887
>198626.963185] RDX: 0000000000000002 RSI: 00007ffc3bb1baa0 RDI: 0000000000000003
>198626.963188] RBP: 00007ffc3bb1baa0 R08: 0000000000000002 R09: 0000000000000001
>198626.963191] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000002
>198626.963194] R13: 00005e63a92a02c0 R14: 000076b474c5da00 R15: 0000000000000002
>198626.963200] </TASK>
>198626.963202] ---- end trace 0000000000000000 ]---
>198626.963927] pci_bus 0000:b3: busn_res: bus b3] is released
>198629.034767] pci 0000:02:00.0: PCI bridge to bus 03]
>198629.034971] pci 0000:b2:00.0: 8086:2030] type 01 class 0x060400 PCIe Root Port
>198629.035003] pci 0000:b2:00.0: PCI bridge to bus b3]
>198629.035008] pci 0000:b2:00.0: bridge window mem 0xd9000000-0xdbffffff]
>198629.035082] pci 0000:b2:00.0: PME# supported from D0 D3hot D3cold
>198629.035380] pci 0000:b2:00.0: Adding to iommu group 0
>198629.035667] pci 0000:b3:00.0: 1f9d:1100] type 00 class 0x120000 PCIe Endpoint
>198629.035719] pci 0000:b3:00.0: BAR 0 mem 0xd9010000-0xd9010fff 64bit]
>198629.035722] pci 0000:b3:00.0: BAR 2 mem 0xda000000-0xdbffffff]
>198629.035726] pci 0000:b3:00.0: ROM mem 0xfa000000-0xfa00ffff pref]
>198629.035780] pci 0000:b3:00.0: supports D1
>198629.035781] pci 0000:b3:00.0: PME# supported from D0 D1 D3hot
>198629.036040] pci 0000:b3:00.0: Adding to iommu group 2
>198629.036159] pci 0000:b2:00.0: PCI bridge to bus b3]
>198629.036199] pci 0000:b2:00.0: bridge window mem 0xd9000000-0xdbffffff]: assigned
>198629.036204] pci 0000:b3:00.0: BAR 2 mem 0xda000000-0xdbffffff]: assigned
>198629.036208] pci 0000:b3:00.0: ROM mem 0xd9000000-0xd900ffff pref]: assigned
>198629.036210] pci 0000:b3:00.0: BAR 0 mem 0xd9010000-0xd9010fff 64bit]: assigned
>198629.036222] pci 0000:b2:00.0: PCI bridge to bus b3]
>198629.036229] pci 0000:b2:00.0: bridge window mem 0xd9000000-0xdbffffff]
>198629.038289] axl 0000:b3:00.0: MSI registered 32 (32)
>198629.038301] axl 0000:b3:00.0: irq vec number 106
>198629.039704] axl 0000:b3:00.0: Data Link Layer Link Active Reporting capability
>198629.039790] axl 0000:b3:00.0: Register directory 0000:b3:00.0
>198629.039878] pcieport 10000:00:02.0: bridge window io 0x1000-0x0fff] to bus 01] add_size 1000
>198629.039883] pcieport 10000:00:03.0: bridge window io 0x1000-0x0fff] to bus 02] add_size 1000
>198629.039888] pcieport 10000:00:02.0: bridge window io size 0x1000]: can't assign; no space
>198629.039890] pcieport 10000:00:02.0: bridge window io size 0x1000]: failed to assign
>198629.039892] pcieport 10000:00:03.0: bridge window io size 0x1000]: can't assign; no space
>198629.039893] pcieport 10000:00:03.0: bridge window io size 0x1000]: failed to assign
>198629.039895] pcieport 10000:00:03.0: bridge window io size 0x1000]: can't assign; no space
>198629.039896] pcieport 10000:00:03.0: bridge window io size 0x1000]: failed to assign
>198629.039897] pcieport 10000:00:02.0: bridge window io size 0x1000]: can't assign; no space
>198629.039898] pcieport 10000:00:02.0: bridge window io size 0x1000]: failed to assign

This is the output from the kernel log. I think I managed to produce it when doing another `insmod` when the driver was already inserted, but not sure.

I ended up running the container as follows:

$ sudo ln -s /dev/metis-0:b3:0 /dev/metis
$ docker run -it --privileged -v /tmp:/tmp --device /dev/metis0 --network=host --name=voyager-sdk-1.2.5 ubuntu:22.04

Thanks again!


Reply