Friday, February 10, 2012

Learning to hate AllocateAdapterChannel

Copyright © 2012, Steven E. Houchin. All rights reserved.

I have a scatter-gather device I'm supporting in a Windows PNP driver. Normally, a driver developer would use the GetScatterGatherList API to map multiple IRP buffers to a device's DMA capabilities. However, in my case, my device has peculiar buffer alignment requirements that I can't count on GetScatterGatherList to handle. But, no worries. The AllocateAdapterChannel API is available.

Wait! Not so fast. In order for my device to perform at peak speed, multiple Read and Write IRPs are simultaneously active and mapped to the DMA. What this means is, I can't wait for the occurrence of an interrupt and an IRP completion before mapping the next IRP to the DMA; I map dozens or hundreds of them to the DMA in advance.

GetScatterGatherList handles all this just fine as long as each IRP has its own associated separate DMA_ADAPTER object. In theory, AllocateAdapterChannel should be able to do the same, but it can't. There is this sneaky little note in the WDDK documentation that throws a fly into the oatmeal:

Only one DMA request can be queued for a device object at any one time. Therefore, the driver should not call AllocateAdapterChannel again for another DMA operation on the same device object until the AdapterControl routine has completed execution.

The key phrase there is "device object." DEVICE_OBJECT is a parameter to AllocateAdapterChannel. So, even if I have a separate DMA_ADAPTER object for each mapped IRP, it still uses just the one DEVICE_OBJECT. The note above blandly states "until the AdapterControl routine has completed execution." Exactly how do I know when it completes execution? If the driver is executing its assembly "ret" instruction, it is technically still in the AdapterControl callback, so anything I do inside it to notify another thread to proceed is too early.

I believe the issue here is that DEVICE_OBJECT is placed on a wait list by the kernel when the DMA is not immediately available. Thus, we can't have that object placed twice on a list. Maybe the answer is to create a separate, fake DEVICE_OBJECT for each IRP, just like I do with DMA_ADAPTER.

Charging ahead, I created a pool of DEVICE_OBJECTs, copied from the original, and used them round-robin for each simultaneous AllocateAdapterChannel, preventing any more calls if none available.

No luck. The failure manifests itself by the AdapterControl being called back twice in a row for the same IRP - i.e. two calls to AdapterControl for an IRPs single call to AllocateAdapterChannel. Now, maybe I am still doing something wrong managing my fake DEVICE_OBJECTs. For example, when do I really know that a DEVICE_OBJECT is available for reuse? It's the fly and oatmeal problem again with the AdapterControl callback: when has it really "completed execution?"

For now, I don't have a workable solution to this. I'll let you know if I figure it out.