I think there are a few more options, but first, you need to acutely aware of the issue of OS thread safety. The SystemC kernel is not generally thread-safe unless you use async_request_update(), and a queue. I've done this several times.
That said you have several options:
Place software inside a SystemC SC_THREAD and provide convenience methods to initiate TLM transport calls. This is the simplest SystemC approach, but does not allow for modeling the actual CPU planned to be used and hence timing is not very accurate. This has the fastest performance from the SystemC point of view.
Place the software on a development board that has the target CPU and some type of communications to the machine where you will run SystemC. I generally use TCP/IP sockets. Replace the driver with a special socket call passing packets to to a remote machine and receiving back the response. On the SystemC side, create an OS thread to receive the socket and inject the message into SystemC via the async_request_update call and an unbounded STL queue or FIFO of some type. A TLM 1.0 FIFO might do. A receiving thread can then pass the data into the SystemC simulation and return the result. This is somewhat more overhead, but allows for a more accurate target CPU.
Obtain an ISS (Instruction Set Simulator) for the target processor and interface it to SystemC. This varies in complexity. You might look at existing models or talk with vendors such as Cadence or Mentor. Or perhaps your CPU vendor (e.g. Arm has some very nice models). As @Eyck suggested QEMU, DBT-RISE-RISCV. Also Gem5 (Google it).