A fork/continuation of the original since the author has been away for a while. Supports kernels up to 6.15 with lots of other changes.

  • InvertedParallax@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 day ago

    Firstly, it’s not a real hub, it’s an emulated hub, and you can do that emulating everything as USB 2.0.

    Secondly you can have multiple hid interface endpoints on a single device.

    Thirdly, you wouldn’t be polling, these would be hid interrupt urbs, and you can storm them 1 per micropacket if you want, they just show up in the ehci buffers.

    Finally, no human is overflowing the hid interface like this, not even 8 of them.

    • kkj@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago
      1. You’d presumably do it as 2.0, but I used 1.1 for the numbers just to demonstrate that you can definitely fit 8 controllers into a packet even if you go for unreasonable levels of backwards compatibility.
      2. Without a hub, I’m not aware of a way to exceed 8 axes per HID device (7 in Windows for some reason). Each Xbox controller has six, so even two controllers can’t be one device.
      3. As far as I can find, most USB 2 implementations can take up to 1000 packets per second per root hub, regardless of packet size. I was already assuming one controller poll per packet for the hub version, and that’s 125Hz per controller with all eight.
      4. You aren’t actually pressing buttons at 125Hz, no. However, if your input is barely too late for one 125Hz poll, you can get enough delay to be noticeable in fast-paced games. Most controllers and mice use 1kHz for this reason, with some even supporting up to 8kHz if your USB implementation supports it (which apparently is pretty common with xHCI, but Microsoft didn’t want to rely on that for obvious reasons).
      • InvertedParallax@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        if your input is barely too late for one 125Hz poll,

        So those polls are generally isochronous to the USB bus transaction state, not based on polling frequency of the CPU, what happens is:

        1. USB interrupt URB comes in to HCI controller,.URB descriptor written to descriptor chain.

        2. Controller adds to descriptor chain, once chain length > WAT (| Timeout), interrupt and start processing incoming URBs.

        3. In interrupt controller, follow chain, push URBs onto usb stack queue, trigger handler tasklet

        4. Stack processes URB, routes to proper class driver

        5. Class driver checks if URB has file handle open (or has open ref from drivers like HID/input).

        6. If so, poll or other input read() returns value.

        Now it’s possible there are multi-input poll reads in games, and I’m doing linux of course.

        For MSFT it’s URB -> IRP -> WDM filter driver stack -> kernel32/directinput or win32 input stack (WNDPROCs after routing).

        In any of these cases, I’m struggling to see how interrupts would come in faster with the same code on PC.

        See, the same code probably runs on both MSFT and normal hardware, so it’s going to have the same structure, unless you actually believe a dev team is optimizing input latency that much, that’s often the lowest priority, they’ll optimize video lag more because it’s more noticeable. The engines themselves use DirectInput, and that’s routed through to libinput in WINE, and the same for all devices.

        Btw, DirectInput has a device-based interface, so it couldn’t poll like this anyway, basically each controller has its own input queue that is round-robin and pluck stuff out of their input stream when available.

        In any case, you’re not getting the latency improvement, both because it’s so different in software and because nothing can appreciate this.

        I’m not trying to be extra autistic for no reason, I’ve just had to make these decisions before, and these are how we have to think.