XNU - Mach

12:00 PM Feb 10, 2017

Darwin is the open source portion of the operating systems that Apple ships in their products. This includes the kernel, a variety of libraries, command line tools, and device drivers. The core architecture of the operating system relies on the kernel, which handles input/output from all software requests and translates them into machine level instructions that the underlying CPU architecture supports. A kernel’s responsibility is to provide the most fundamental levels of abstraction so that applications and other software can be written in an environment where this is not of a concern of developers.

The kernel for iOS/macOS is called XNU, which is a recursive acronym for XNU is not Unix. It was originally developed by NextStep, and it shipped as a hybrid kernel. Its original components derive from the Mach microkernel, developed at Carnegie Mellon University, and the 4.3BSD subsystems. NextStep’s original vision for XNU contained an Objective C API for device drivers, and it was called Driver Kit. Later, when Apple acquired Next, it was replaced with IOKit, a framework that allows device drivers to be written in embedded C++.

The Mach microkernel is the most fundamental layer of XNU. Being a microkernel, it is a super thin, minimal, and lightweight level of abstraction where components can communicated with via messages. Mach is intended to be a surface level interface in which an operating system itself can be implemented on top of. XNU happens to be one of the few modern Unix based operating systems that was implemented on top of Mach.

The unique thing about Mach is that it contains an interface by which objects are communicated with through message passing. Mach objects can not directly be interfaced with or invoked on one another. The object that is the source of a message sends a message to a destination object, and the destination object queues the message until it can process and handle it. The content of the message is up to the sender to decide and what the destination object does in response is to its discretion.

The fundamental unit within Mach is a message, and it can be passed to an endpoint, also known as a port. Let’s take a look at how messages actually look with Mach. In <mach/message.h>, we find

typedef struct {
    mach_msg_header_t header;
    mach_msg_body_t body; 
} mach_msg_base_t;

This is the basic structure of a mach message, it contains a header and body.

Typically most implementations of mach messages create a message structure with these two fields, and a couple more fields to describe more metadata of the message. Usually that would entail the actual data sent from the message or other defined structures. Those are well documented online and will not be mentioned for length purposes.

The mach_msg_header_t contains the metadata about the message.

The mach_msg_body_t contains one field, which is the description count, and this represents the number of descriptors the come after these two fields in the actual message.

typedef	struct {
    mach_msg_bits_t	msgh_bits;
    mach_msg_size_t	msgh_size;
    mach_port_t		msgh_remote_port;
    mach_port_t		msgh_local_port;
    mach_port_name_t	msgh_voucher_port;
    mach_msg_id_t		msgh_id;
} mach_msg_header_t;

mach_msg() is a function within Mach’s API’s that perform the actual sending/receiving of a message. The function actually has an implementation in user mode.

mach_msg_return_t mach_msg(mach_msg_header_t *msg,
                           mach_msg_option_t option,
                           mach_msg_size_t send_size,
                           mach_msg_size_t rcv_size,
                           mach_port_name_t rcv_name,
                           mach_msg_timeout_t timeout,
                           mach_port_name_t notify);

These messages are sent between ports, the type mach_port_t. They are just 32 bit identifiers. Each port may receive messages from an infinite number of senders but only has one receiver, which means the messages have to be queued until they can be processed.

Mach objects are accessed through their ports. When you want a handle to some object, you are essentially aiming to seek a handle to the corresponding object’s port. Port access is maintained by a set of port rights, which are defined in <mach/port.h>.

#define MACH_PORT_RIGHT_SEND		((mach_port_right_t) 0)
#define MACH_PORT_RIGHT_RECEIVE		((mach_port_right_t) 1)
#define MACH_PORT_RIGHT_SEND_ONCE	((mach_port_right_t) 2)
#define MACH_PORT_RIGHT_PORT_SET	((mach_port_right_t) 3)
#define MACH_PORT_RIGHT_DEAD_NAME	((mach_port_right_t) 4)
#define MACH_PORT_RIGHT_LABELH	        ((mach_port_right_t) 5)
#define MACH_PORT_RIGHT_NUMBER		((mach_port_right_t) 6)

So now you should have a basic understanding of message passing in Mach. Let’s see how message passing is used in action.

static int32_t setup_receive_port(mach_port_t *receive_port){
    kern_return_t       err;
    mach_port_t         port = MACH_PORT_NULL;

    err = mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &port);

    if(err != KERN_SUCCESS){
        return -1;
    }

    err = mach_port_insert_right(mach_task_self(),
                                 port,
                                 port,
                                 MACH_MSG_TYPE_MAKE_SEND);
    if(err != KERN_SUCCESS){
        return -1;
    }

    *recv_port = port;
    return 0;
}

This function allocates a new mach_port_t that handles the receiving of the message. It sets the port rights of the current task to allow sending messages to this port.

static int32_t send_port(mach_port_t remote_port, 
                         mach_port_t port){
    kern_return_t err;

    struct
    {
        mach_msg_header_t          header;
        mach_msg_body_t            body;
        mach_msg_port_descriptor_t task_port;
    } msg;

    msg.header.msgh_remote_port = remote_port;
    msg.header.msgh_local_port = MACH_PORT_NULL;
    msg.header.msgh_bits = MACH_MSGH_BITS(MACH_MSG_TYPE_COPY_SEND, 0)
                                          | MACH_MSGH_BITS_COMPLEX;
    msg.header.msgh_size = sizeof msg;

    msg.body.msgh_descriptor_count = 1;
    msg.task_port.name = port;
    msg.task_port.disposition = MACH_MSG_TYPE_COPY_SEND;
    msg.task_port.type = MACH_MSG_PORT_DESCRIPTOR;

    err = mach_msg_send(&msg.header);
    if(err != KERN_SUCCESS){
        return -1;
    }

    return 0;
}

This function sends the message to the remote port with a custom msg structure defined in the function with our required header and body fields included.

static int32_t receive_port(mach_port_t receive_port, 
                            mach_port_t *port){
    kern_return_t err;

    struct{
        mach_msg_header_t          header;
        mach_msg_body_t            body;
        mach_msg_port_descriptor_t task_port;
        mach_msg_trailer_t         trailer;
    } msg;

    err = mach_msg(&msg.header, 
                   MACH_RCV_MSG, 
                   0,
                   sizeof msg, 
                   recv_port, 
                   MACH_MSG_TIMEOUT_NONE, 
                   MACH_PORT_NULL);

    if(err != KERN_SUCCESS){
        return -1;
    }

    *port = msg.task_port.name;
    return 0;
}

This function uses mach_msg() to retrieve a relevant port from a message sent from another port.

Basically this example, though the code for forking a process is not implemented (not shown for simplicity), passes the parent task port to the child process forked from the parent. Notice how we have to set up a receive port for passing the task port from the parent using the function mach_port_allocate() inside setup_receive_port.

The function setup_receive_port() uses the mach_port_allocate() function to create a port with the receive port rights and inserts rights into the current task to send messages to the newly allocated port.

The function send_port() creates a custom message structure with the required body and header structures we defined earlier and then sets the metadata for the message, which happens to be a port descriptor that contains the port name of the parent’s task. At last it sends the message to the remote port, which in this case is the child’s receive port.

In the function receive_port(), mach_msg() processes the next message the receive port has in its message queue and gets the task port of the parent from the message’s metadata.

As one can see in the given examples, using ports to process messages is an important concept for IPC within Mach.

mach_port_t happens to be one of the basic primitives within the Mach IPC namespace, but they actually are just a handle (pointer, in this case) to the real IPC object, also known as a ipc_port. How do I know? Let’s take a look at the headers within XNU.

A definition within XNU for ipc_ports is shown below from <osmfk/ipc/ipc_port.h>.

struct ipc_port {

    /*
    * Initial sub-structure in common with ipc_pset and rpc_port
    * First element is an ipc_object
    */
    struct ipc_object ip_object;

    union {
        struct ipc_space *receiver;
        struct ipc_port *destination;
        ipc_port_timestamp_t timestamp;
    } data;

    ipc_kobject_t ip_kobject;
    mach_port_mscount_t ip_mscount;
    mach_port_rights_t ip_srights;
    mach_port_rights_t ip_sorights;

    struct ipc_port *ip_nsrequest;
    struct ipc_port *ip_pdrequest;
    struct ipc_port_request *ip_dnrequests;

    unsigned int ip_pset_count;
    struct ipc_mqueue ip_messages;
    struct ipc_kmsg *ip_premsg;

#if	NORMA_VM
    /*
    *	These fields are needed for the use of XMM.
    *	Few ports need this information; it should
    *	be kept in XMM instead (TBD).  XXX
    */
    long		ip_norma_xmm_object_refs;
    struct ipc_port	*ip_norma_xmm_object;
#endif

#if	MACH_ASSERT
#define	IP_NSPARES		10
#define	IP_CALLSTACK_MAX	10
    queue_chain_t	ip_port_links;	/* all allocated ports */
    thread_t	ip_thread;	/* who made me?  thread context */
    unsigned long	ip_timetrack;	/* give an idea of "when" created */
    natural_t	ip_callstack[IP_CALLSTACK_MAX]; /* stack trace */
    unsigned long	ip_spares[IP_NSPARES]; /* for debugging */
#endif	/* MACH_ASSERT */
    int		alias;
};

Notice how the ipc_port object has a message queue denoted by the ipc_mqueue field, the mach_port_rights_t, the ipc_space and much more.

In the mach portion of the kernel in <mach/port.h>, we realize that the mach_port_t field is typedef to a ipc_port_t, which is a pointer to this ipc_port structure.

#ifndef	MACH_KERNEL_PRIVATE
/*
*	For kernel code that resides outside of Mach proper, we opaque the
*	port structure definition.
*/
struct ipc_port ;

#endif	/* MACH_KERNEL_PRIVATE */

typedef struct ipc_port	        *ipc_port_t;

#define IPC_PORT_NULL		((ipc_port_t) 0UL)
#define IPC_PORT_DEAD		((ipc_port_t)~0UL)
#define IPC_PORT_VALID(port) \
((port) != IPC_PORT_NULL && (port) != IPC_PORT_DEAD)

typedef ipc_port_t 		mach_port_t;

This is important to know because we now understand what a mach_port_t actually represents instead of seeing it as an opaque type. However, since the osfmk portion of the kernel is not exposed to user mode, no program can access these ipc based structures. Therefore, the mach_port_t type suffices for developers since all the Mach API functions that act on these underlying data structures are available to us. This is why the code sample shown above with message passing works.

Since mach_ports are one of the primitives that become the basis for IPC in XNU, it happens to be in some cases an attack surface against mitigiations within the kernel.

You may have noticed earlier that services/tasks can be looked up by their respective ports. launchd, also known as pid 1, as of the latest versions of OS X/iOS, is now responsible for what is called the bootstrap port. launchd is the first process that loads after the kernel_task, so it contains an interface by which services can lookup other services via the bootstrap port.

When the machine boots, launchd is executed in what is called the bootstrap context. Only a few system services are able to boot in this context. The context a task is qualified under determines which services can be looked by their respective ports. For example, when a user logs into their machine, the bootstrap task creates the login context, which is a subset of the startup context. All of a user’s processes lay under the login context. This creates a layer of security for services that potentiate the ability to perform untrusted, arbitrary behavior.

Since all port lookups are handled by the bootstrap port, every process that underlies launchd inherits the bootstrap port. An example is shown to demonstrate how a service can use the bootstrap port to lookup another service and retrieve it’s respective mach_port_t.

mach_port_t connect_to_service(const char* service_name){
    mach_port_t bs_port, service_port;
    kern_return_t err;

    task_get_bootstrap_port(mach_task_self(), &bs_port);
    err = bootstrap_look_up(bs_port, service_name, &service_port);

    if(err != KERN_SUCCESS){
        return MACH_PORT_NULL;
    }

    return service_port;
}

Written on February 10, 2017