Overall there are a coding style nits (e.g. some structures missing a space after } and before the name, line length, a bunch of little things, keep in mind that lindent sucks and you'll have to do things by hand). Some more substantial comments and questions are below. Also, is "buffer object" or "bo" redundant? Maybe just "buffer" would be sufficient? Anyway, most of the comments below are cosmetic, but there some real questions and issues in there too. Thanks for doing this work, it'll be good to finally have a memory manager in the DRM layer... Also, this patch is fairly huge. It would be great if it could be split up somehow to ease review. But if it's split, each piece has to be buildable to make git bisect easy, so it'll be tough to do. Maybe the locking stuff could be a standalone patch, and the buffer stuff could be split into internal management routines (which might be buildable but harmless), the kernel internal APIs, and kernel external APIs pieces? Other general worries: o RLIMIT_MEMLOCK - this patch allows a process to have memory pinned o interface documentation? I know you have that paper, but I think there needs to be both userland API documentation and kernel internal docs so DRM driver writers can add support for this stuff (this could take the form of kdoc comments on non-static functions as a minimum) o I think getting available DMA32 memory is trickier than using si_meminfo Thanks, Jesse > +#define DRM_FENCE_FLAG_EMIT 0x00000001 > +#define DRM_FENCE_FLAG_SHAREABLE 0x00000002 > +#define DRM_FENCE_FLAG_WAIT_LAZY 0x00000004 > +#define DRM_FENCE_FLAG_WAIT_IGNORE_SIGNALS 0x00000008 > + > +/* Reserved for driver use */ > +#define DRM_FENCE_MASK_DRIVER 0xFF000000 > + > +#define DRM_FENCE_TYPE_EXE 0x00000001 > + > +typedef struct drm_fence_arg { > + unsigned handle; > + int class; > + unsigned type; > + unsigned flags; Which of these fields refers to the flags above? If 'flags' can be any of them (including the ones below) maybe it should be called 'mask' like in the drm_bo_arg_request_t? > + unsigned signaled; > + unsigned expand_pad[4]; /*Future expansion */ Might make this "unsigned long" padding instead, so we can fit a pointer or 4 in there later if needed, rather than limiting x86_64 to two pointers. Also, should this structure be packed if it's part of the userspace ABI? > + enum { > + drm_fence_create, > + drm_fence_destroy, > + drm_fence_reference, > + drm_fence_unreference, > + drm_fence_signaled, > + drm_fence_flush, > + drm_fence_wait, > + drm_fence_emit, > + drm_fence_buffers > + } op; > +} drm_fence_arg_t; I guess no one will want the values in this enum outside of this structure? > +/* Buffer permissions, referring to how the GPU uses the buffers. > + * these translate to fence types used for the buffers. > + * Typically a texture buffer is read, A destination buffer is write and > + * a command (batch-) buffer is exe. Can be or-ed together. > + */ > + > +#define DRM_BO_FLAG_READ 0x00000001 > +#define DRM_BO_FLAG_WRITE 0x00000002 > +#define DRM_BO_FLAG_EXE 0x00000004 Defining as shifts would make the comment redundant. > +/* > + * Status flags. Can be read to determine the actual state of a buffer. > + * Can also be set in the buffer mask before validation. > + */ > + > +/* > + * Mask: Never evict this buffer. Not even with force. This type of buffer is only > + * available to root and must be manually removed before buffer manager shutdown 80 columns. > + * or lock. > + * Flags: Acknowledge > + */ > +#define DRM_BO_FLAG_NO_EVICT 0x00000010 Calling it 'DRM_BO_FLAG_LOCKED' or PINNED would avoid the negation, and LOCKED would be more consistent with the VM's page flags. And again, using shifts would make things clearer. > +/* > + * Mask: Require that the buffer is placed in mappable memory when validated. > + * If not set the buffer may or may not be in mappable memory when validated. > + * Flags: If set, the buffer is in mappable memory. > + */ > +#define DRM_BO_FLAG_MAPPABLE 0x00000020 > + > +/* Mask: The buffer should be shareable with other processes. > + * Flags: The buffer is shareable with other processes. > + */ > +#define DRM_BO_FLAG_SHAREABLE 0x00000040 > + > +/* Mask: If set, place the buffer in cache-coherent memory if available. > + * If clear, never place the buffer in cache coherent memory if validated. > + * Flags: The buffer is currently in cache-coherent memory. > + */ > +#define DRM_BO_FLAG_CACHED 0x00000080 DRM_BO_FLAG_CACHEABLE may be better given the description and prior flags? That would assume we always want cacheable buffers to be in cacheable memroy but that's generally true, right? > +/* Mask: Make sure that every time this buffer is validated, > + * it ends up on the same location provided that the memory mask is the same. > + * The buffer will also not be evicted when claiming space for > + * other buffers. Basically a pinned buffer but it may be thrown out as > + * part of buffer manager shutdown or locking. > + * Flags: Acknowledge. > + */ > +#define DRM_BO_FLAG_NO_MOVE 0x00000100 Maybe FIXED would better describe this then, since it can be evicted, but should always have the same offset. Again we should avoid NO* since it forces us to use if (!..NO..) double negation, which is ugly. > +/* Mask: Make sure the buffer is in cached memory when mapped for reading. > + * Flags: Acknowledge. > + */ > +#define DRM_BO_FLAG_READ_CACHED 0x00080000 Maybe FORCE_READ_CACHED to be consistent with the flags below? > + > +/* Mask: Force DRM_BO_FLAG_CACHED flag strictly also if it is set. > + * Flags: Acknowledge. > + */ > +#define DRM_BO_FLAG_FORCE_CACHING 0x00002000 > + > +/* > + * Mask: Force DRM_BO_FLAG_MAPPABLE flag strictly also if it is clear. > + * Flags: Acknowledge. > + */ > +#define DRM_BO_FLAG_FORCE_MAPPABLE 0x00004000 > + > +/* > + * Memory type flags that can be or'ed together in the mask, but only > + * one appears in flags. > + */ > + > +/* System memory */ > +#define DRM_BO_FLAG_MEM_LOCAL 0x01000000 > +/* Translation table memory */ > +#define DRM_BO_FLAG_MEM_TT 0x02000000 > +/* Vram memory */ > +#define DRM_BO_FLAG_MEM_VRAM 0x04000000 > +/* Up to the driver to define. */ > +#define DRM_BO_FLAG_MEM_PRIV0 0x08000000 > +#define DRM_BO_FLAG_MEM_PRIV1 0x10000000 > +#define DRM_BO_FLAG_MEM_PRIV2 0x20000000 > +#define DRM_BO_FLAG_MEM_PRIV3 0x40000000 > +#define DRM_BO_FLAG_MEM_PRIV4 0x80000000 Not sure if it's worth it (haven't made it through the whole patch yet), but defining some assertions to catch incompatible masks would be self documenting and may be useful in some of the code paths (using BUG_ON or similar). > +/* Memory flag mask */ > +#define DRM_BO_MASK_MEM 0xFF000000 > +#define DRM_BO_MASK_MEMTYPE 0xFF0000A0 These should be defined in terms of the above flags I think, just to make things more self-documenting. > +/* Don't block on validate and map */ > +#define DRM_BO_HINT_DONT_BLOCK 0x00000002 > +/* Don't place this buffer on the unfenced list.*/ > +#define DRM_BO_HINT_DONT_FENCE 0x00000004 Again, it would be good to state these positively instead, but I don't have any suggestions here. > +#define DRM_BO_HINT_WAIT_LAZY 0x00000008 > +#define DRM_BO_HINT_ALLOW_UNFENCED_MAP 0x00000010 What do these two mean (the second seems obvious, but what about the first?). > +typedef enum { > + drm_bo_type_dc, > + drm_bo_type_user, > + drm_bo_type_fake > +}drm_bo_type_t; What do these mean? 'dc' seems a little terse. > +typedef struct drm_bo_arg_request { > + unsigned handle; /* User space handle */ > + unsigned mask; > + unsigned hint; > + drm_u64_t size; > + drm_bo_type_t type; > + unsigned arg_handle; > + drm_u64_t buffer_start; > + unsigned page_alignment; > + unsigned expand_pad[4]; /*Future expansion */ > + enum { > + drm_bo_create, > + drm_bo_validate, > + drm_bo_map, > + drm_bo_unmap, > + drm_bo_fence, > + drm_bo_destroy, > + drm_bo_reference, > + drm_bo_unreference, > + drm_bo_info, > + drm_bo_wait_idle, > + drm_bo_ref_fence > + } op; > +} drm_bo_arg_request_t; Likewise with this enum, hopefully it really can be private. Though I guess it can be pulled out later if it's ever needed. > +#define DRM_BO_MEM_LOCAL 0 > +#define DRM_BO_MEM_TT 1 > +#define DRM_BO_MEM_VRAM 2 > +#define DRM_BO_MEM_PRIV0 3 > +#define DRM_BO_MEM_PRIV1 4 > +#define DRM_BO_MEM_PRIV2 5 > +#define DRM_BO_MEM_PRIV3 6 > +#define DRM_BO_MEM_PRIV4 7 These PRIV types are reserved for the driver? More fundamentally, it seems like there are several types: memory that the CPU can get at, memory that the gfx chip can get at, and all combinations thereof. The 'TT' type seems to represent memory that both chips can access? > + > +#define DRM_BO_MEM_TYPES 8 /* For now. */ > + > +typedef union drm_mm_init_arg{ > + struct { > + enum { > + mm_init, > + mm_takedown, init/cleanup is the more common pair. > #if defined(CONFIG_AGP) || defined(CONFIG_AGP_MODULE) Dave, it would be cool if we could get rid of this sort of conditional inclusion. :) > +#define DRM_MAP_HASH_ORDER 12 > +#define DRM_OBJECT_HASH_ORDER 12 > +#define DRM_FILE_PAGE_OFFSET_START ((0xFFFFFFFFUL >> PAGE_SHIFT) + 1) > +#define DRM_FILE_PAGE_OFFSET_SIZE ((0xFFFFFFFFUL >> PAGE_SHIFT) * 16) > +#define DRM_MM_INIT_MAX_PAGES 256 What's the reason for these size limitations? Should they be determined by the underlying driver instead? > @@ -282,7 +293,6 @@ typedef struct drm_magic_entry { > drm_hash_item_t hash_item; > struct list_head head; > struct drm_file *priv; > - struct drm_magic_entry *next; > } drm_magic_entry_t; Ooh, magic. :) > +/* > + * This should be small enough to allow the use of kmalloc for hash tables > + * instead of vmalloc. > + */ > + > +#define DRM_FILE_HASH_ORDER 8 > +typedef enum{ > + _DRM_REF_USE=0, > + _DRM_REF_TYPE1, > + _DRM_NO_REF_TYPES > +} drm_ref_t; Even if it's small, you may have a hard time getting the memory if the system has been up for any length of time, due to fragmentation. I'm not sure if that'll be an issue in practice, but if so, you may need to think about using a different data structure. OTOH, if this stuff is allocated at early boot it should be fine. > + > + /* > + * The user object hash table is global and resides in the > + * drm_device structure. We protect the lists and hash tables with the > + * device struct_mutex. A bit coarse-grained but probably the best > + * option. > + */ > + > + struct list_head refd_objects; > + struct list_head user_objects; So it's really per-device instance, right? E.g. one set per /dev/dri/cardX? > @@ -414,6 +449,10 @@ typedef struct drm_lock_data { > struct file *filp; /**< File descr of lock holder (0=kernel) */ > wait_queue_head_t lock_queue; /**< Queue of blocked processes */ > unsigned long lock_time; /**< Time of last lock in jiffies */ > + spinlock_t spinlock; > + uint32_t kernel_waiters; > + uint32_t user_waiters; Are these just counts? Or is there a specific reason they're uint32_t here? (And aren't POSIX types like that frowned upon in the kernel?) > +typedef struct drm_mm_node { > + struct list_head fl_entry; > + struct list_head ml_entry; > + int free; > + unsigned long start; > + unsigned long size; > + struct drm_mm *mm; > + void *private; > +} drm_mm_node_t; Is that a driver private? > void (*irq_uninstall) (struct drm_device * dev); > void (*reclaim_buffers) (struct drm_device * dev, struct file * filp); > void (*reclaim_buffers_locked) (struct drm_device *dev, > - struct file *filp); > + struct file * filp); > + void (*reclaim_buffers_idlelocked) (struct drm_device *dev, > + struct file * filp); > unsigned long (*get_map_ofs) (drm_map_t * map); > unsigned long (*get_reg_ofs) (struct drm_device * dev); > void (*set_version) (struct drm_device * dev, drm_set_version_t * sv); > > + struct drm_fence_driver *fence_driver; > + struct drm_bo_driver *bo_driver; So you're adding a sub-table of function pointers rather than just adding them to the main structure? > +extern int drm_i_have_hw_lock(struct file *filp); Isn't there a way to see which process has the lock and then compare against current? I don't usually see 'I' as part of an API. :) > +static inline drm_mm_t *drm_get_mm(drm_mm_node_t *block) > +{ > + return block->mm; > +} drm_get_mm(block) is longer than block->mm, so you must have other plans here? :) > +static int drm_agp_populate(drm_ttm_backend_t *backend, unsigned long num_pages, > + struct page **pages) { > + > + drm_agp_ttm_priv *agp_priv = (drm_agp_ttm_priv *) backend->private; > + struct page **cur_page, **last_page = pages + num_pages; > + DRM_AGP_MEM *mem; > + > + if (drm_alloc_memctl(num_pages * sizeof(void *))) > + return -1; Can you use the standard -Exx error codes in DRM? > +drm_ttm_backend_t *drm_agp_init_ttm(struct drm_device *dev, > + drm_ttm_backend_t *backend) > +{ > + > + drm_ttm_backend_t *agp_be; > + drm_agp_ttm_priv *agp_priv; > + struct agp_kern_info *info; > + > + if (!dev->agp) { > + DRM_ERROR("AGP is not initialized.\n"); > + return NULL; > + } > + info = &dev->agp->agp_info; > + > + if (info->version.major != AGP_REQUIRED_MAJOR || > + info->version.minor < AGP_REQUIRED_MINOR) { > + DRM_ERROR("Wrong agpgart version %d.%d\n" > + "\tYou need at least version %d.%d.\n", > + info->version.major, > + info->version.minor, > + AGP_REQUIRED_MAJOR, > + AGP_REQUIRED_MINOR); > + return NULL; > + } > + > + agp_be = (backend != NULL) ? backend: > + drm_ctl_calloc(1, sizeof(*agp_be), DRM_MEM_MAPPINGS); Why would agp_be be free here, but the TTM not yet initialized? > +static void drm_bo_destroy_locked(drm_buffer_object_t * bo); > +static int drm_bo_setup_vm_locked(drm_buffer_object_t * bo); > +static void drm_bo_takedown_vm_locked(drm_buffer_object_t * bo); > +static void drm_bo_unmap_virtual(drm_buffer_object_t * bo); > + > +static inline uint32_t drm_bo_type_flags(unsigned type) > +{ > + return (1 << (24 + type)); > +} 24? Isn't 42 the right answer? :) > +static int drm_bo_handle_move_mem(drm_buffer_object_t * bo, > + drm_bo_mem_reg_t * mem, > + int evict, int no_wait) You can use 'bool' these days if you want. > +static int drm_bo_expire_fence(drm_buffer_object_t * bo, int allow_errors) > +{ > + drm_device_t *dev = bo->dev; > + drm_buffer_manager_t *bm = &dev->bm; > + > + if (bo->fence) { You could save yourself some indenting on these functions by putting an if (!bo->fence) return 0; at the top. > +static void drm_bo_delayed_workqueue(struct work_struct *work) > +{ > + drm_buffer_manager_t *bm = > + container_of(work, drm_buffer_manager_t, wq.work); > + drm_device_t *dev = container_of(bm, drm_device_t, bm); > + > + DRM_DEBUG("Delayed delete Worker\n"); > + > + mutex_lock(&dev->struct_mutex); > + if (!bm->initialized) { > + mutex_unlock(&dev->struct_mutex); > + return; > + } > + drm_bo_delayed_delete(dev, 0); > + if (bm->initialized && !list_empty(&bm->ddestroy)) { > + schedule_delayed_work(&bm->wq, > + ((DRM_HZ / 100) < 1) ? 1 : DRM_HZ / 100); > + } > + mutex_unlock(&dev->struct_mutex); Won't this hold &dev->struct_mutex until the delayed work is complete? Seems like that could be a long time... > + evict_mem = bo->mem; > + evict_mem.mask = dev->driver->bo_driver->evict_mask(bo); > + ret = drm_bo_mem_space(bo, &evict_mem, no_wait); > + > + if (ret) { > + if (ret != -EAGAIN) This sort of stuff could be if (ret && ret != -EAGAIN) to save some indenting. > +static int drm_bo_mem_force_space(drm_device_t * dev, > + drm_bo_mem_reg_t * mem, > + uint32_t mem_type, int no_wait) > +{ > + drm_mm_node_t *node; > + drm_buffer_manager_t *bm = &dev->bm; > + drm_buffer_object_t *entry; > + drm_mem_type_manager_t *man = &bm->man[mem_type]; > + struct list_head *lru; > + unsigned long num_pages = mem->num_pages; > + int ret; > + > + mutex_lock(&dev->struct_mutex); > + do { > + node = drm_mm_search_free(&man->manager, num_pages, > + mem->page_alignment, 1); > + if (node) > + break; > + > + lru = &man->lru; > + if (lru->next == lru) > + break; > + > + entry = list_entry(lru->next, drm_buffer_object_t, lru); > + atomic_inc(&entry->usage); > + mutex_unlock(&dev->struct_mutex); > + mutex_lock(&entry->mutex); > + BUG_ON(entry->mem.flags & (DRM_BO_FLAG_NO_MOVE | DRM_BO_FLAG_NO_EVICT)); > + > + ret = drm_bo_evict(entry, mem_type, no_wait); > + mutex_unlock(&entry->mutex); > + drm_bo_usage_deref_unlocked(entry); > + if (ret) > + return ret; > + mutex_lock(&dev->struct_mutex); > + } while (1); So this could spin awhile... I guess you don't want to return an allocation failure to the caller? Oh I guess that'll happen if you go through all the LRU lists? Makes sense... I guess that traversal could be put in its own function, e.g. drm_mm_get_node() to make this function smaller (though it would only be worth it if the same construct was used elsewhere). > +static int drm_bo_check_unfenced(drm_buffer_object_t * bo) > +{ > + int ret; > + > + mutex_lock(&bo->mutex); > + ret = (bo->priv_flags & _DRM_BO_FLAG_UNFENCED); > + mutex_unlock(&bo->mutex); > + return ret; > +} Not sure how the lock is protecting you here? The caller is unprotected, so if the value can change it might still change after the return? Or is it just making sure the bo doesn't disappear while we read it (i.e. that it's ok to return a stale value, but not deref bo w/o holding the lock... but the refcount should do that, right)? > +/* > + * Wait until a buffer, scheduled to be fenced moves off the unfenced list. > + * Until then, we cannot really do anything with it except delete it. > + * The unfenced list is a PITA, and the operations > + * 1) validating > + * 2) submitting commands > + * 3) fencing > + * Should really be an atomic operation. > + * We now "solve" this problem by keeping > + * the buffer "unfenced" after validating, but before fencing. > + */ > + > +static int drm_bo_wait_unfenced(drm_buffer_object_t * bo, int no_wait, > + int eagain_if_wait) > +{ > + int ret = (bo->priv_flags & _DRM_BO_FLAG_UNFENCED); > + unsigned long _end = jiffies + 3 * DRM_HZ; > + > + if (ret && no_wait) > + return -EBUSY; > + else if (!ret) > + return 0; > + > + do { > + mutex_unlock(&bo->mutex); > + DRM_WAIT_ON(ret, bo->event_queue, 3 * DRM_HZ, > + !drm_bo_check_unfenced(bo)); > + mutex_lock(&bo->mutex); > + if (ret == -EINTR) > + return -EAGAIN; > + if (ret) { > + DRM_ERROR > + ("Error waiting for buffer to become fenced\n"); > + return ret; > + } > + ret = (bo->priv_flags & _DRM_BO_FLAG_UNFENCED); > + } while (ret && !time_after_eq(jiffies, _end)); > + if (ret) { > + DRM_ERROR("Timeout waiting for buffer to become fenced\n"); > + return ret; > + } > + if (eagain_if_wait) > + return -EAGAIN; > + > + return 0; > +} Wow, so the caller might wait up to 3 seconds for the buffer to become ready? Actually, maybe longer than 3 seconds in a NO_HZ kernel? This should only occur if another process shares the bo and it's busy with it for a long time, right? > +static int drm_copy_io_page(void *dst, void *src, unsigned long page) > +{ > + uint32_t *dstP = > + (uint32_t *) ((unsigned long)dst + (page << PAGE_SHIFT)); > + uint32_t *srcP = > + (uint32_t *) ((unsigned long)src + (page << PAGE_SHIFT)); > + > + int i; > + for (i = 0; i < PAGE_SIZE / sizeof(uint32_t); ++i) > + iowrite32(ioread32(srcP++), dstP++); > + return 0; > +} dst and src should be __iomem also (void __iomem *dst, void __iomem *src). > +static int drm_copy_io_ttm_page(drm_ttm_t * ttm, void *src, unsigned long page) Same here, src should be __iomem. > +static int drm_copy_ttm_io_page(drm_ttm_t * ttm, void *dst, unsigned long page) Same here, but for dst. > diff --git a/drivers/char/drm/drm_bufs.c b/drivers/char/drm/drm_bufs.c > index c113458..0e5b1e4 100644 > --- a/drivers/char/drm/drm_bufs.c > +++ b/drivers/char/drm/drm_bufs.c > @@ -427,6 +427,8 @@ int drm_rmmap_locked(drm_device_t *dev, > dmah.size = map->size; > __drm_pci_free(dev, &dmah); > break; > + case _DRM_TTM: > + BUG_ON(1); Just BUG() is probably sufficient. > static int __init drm_core_init(void) > { > - int ret = -ENOMEM; > + int ret; > + struct sysinfo si; > + unsigned long avail_memctl_mem; > + unsigned long max_memctl_mem; > + > + si_meminfo(&si); > + > + /* > + * AGP only allows low / DMA32 memory ATM. > + */ > + > + avail_memctl_mem = si.totalram - si.totalhigh; I don't think this will work well on machines with discontiguous memory or without high memory. You may need to poke into the zone structures if there's not an easy way to get a count of the number of pages in ZONE_DMA32. If you have to, it might be good to write a helper routine for the DRM or AGP layers to get memory availability given certain constraints. > + spin_unlock(&dev->lock.spinlock); > + if (locked) > + break; > + schedule(); > + } while (!time_after_eq(jiffies, _end)); > + > + if (!locked) { > + DRM_ERROR("reclaim_buffers_locked() deadlock. Please rework this\n" > + "\tdriver to use reclaim_buffers_idlelocked() instead.\n" > + "\tI will go on reclaiming the buffers anyway.\n"); Yuck, is there any better way of doing this? E.g. if a driver doesn't have an idlelocked() routine just refuse to run? > @@ -184,12 +186,8 @@ int drm_unlock(struct inode *inode, stru > if (dev->driver->kernel_context_switch_unlock) > dev->driver->kernel_context_switch_unlock(dev); > else { > - drm_lock_transfer(dev, &dev->lock.hw_lock->lock, > - DRM_KERNEL_CONTEXT); > - > - if (drm_lock_free(dev, &dev->lock.hw_lock->lock, > - DRM_KERNEL_CONTEXT)) { > - DRM_ERROR("\n"); > + if (drm_lock_free(&dev->lock,lock.context)) { > + /* FIXME: Should really bail out here. */ > } > } This would happen if the caller called drm_unlock but the lock was already free? Should we BUG() here? I wonder if there's some way to get lockdep type checking for the DRM lock... > diff --git a/drivers/char/drm/drm_memory.c b/drivers/char/drm/drm_memory.c > index 92a8670..e8c3e50 100644 > --- a/drivers/char/drm/drm_memory.c > +++ b/drivers/char/drm/drm_memory.c > @@ -36,6 +36,75 @@ > #include > #include "drmP.h" > > +static struct { > + spinlock_t lock; > + drm_u64_t cur_used; > + drm_u64_t low_threshold; > + drm_u64_t high_threshold; > +} drm_memctl = { > + .lock = SPIN_LOCK_UNLOCKED > +}; > + > +static inline size_t drm_size_align(size_t size) { > + > + register size_t tmpSize = 4; You can probably drop the 'register' keyword here, I think it's mostly ignored anyway. > +#include "drmP.h" > + > +int drm_add_user_object(drm_file_t * priv, drm_user_object_t * item, > + int shareable) > +{ > + drm_device_t *dev = priv->head->dev; > + int ret; > + > + atomic_set(&item->refcount, 1); > + item->shareable = shareable; > + item->owner = priv; > + > + ret = drm_ht_just_insert_please(&dev->object_hash, &item->hash, > + (unsigned long)item, 32, 0, 0); drm_object_hash_try_insert? (ht keeps making me think "hyperthreading" :) > + > +/*************************************************** > + * User space objects. (drm_object.c) > + */ > + > +#define drm_user_object_entry(_ptr, _type, _member) container_of(_ptr, _type, _member) > + > +typedef enum { > + drm_fence_type, > + drm_buffer_type, > + drm_ttm_type > + /* > + * Add other user space object types here. > + */ > +} drm_object_type_t; Having 'type' in the enum types is a bit redundant. > +/* > + * A ref object is a structure which is used to > + * keep track of references to user objects and to keep track of these > + * references so that they can be destroyed for example when the user space > + * process exits. Designed to be accessible using a pointer to the _user_ object. > + */ > + > +typedef struct drm_ref_object { > + drm_hash_item_t hash; > + struct list_head list; > + atomic_t refcount; > + drm_ref_t unref_action; > +} drm_ref_object_t; I wonder if you could use krefs (Documentation/kref.txt)? > +/* > + * Use kmalloc if possible. Otherwise fall back to vmalloc. > + */ > + > +static void ttm_alloc_pages(drm_ttm_t * ttm) > +{ > + unsigned long size = ttm->num_pages * sizeof(*ttm->pages); > + ttm->pages = NULL; > + > + if (drm_alloc_memctl(size)) > + return; > + > + if (size <= PAGE_SIZE) { > + ttm->pages = drm_calloc(1, size, DRM_MEM_TTM); > + } > + if (!ttm->pages) { > + ttm->pages = vmalloc_user(size); > + if (ttm->pages) > + ttm->page_flags |= DRM_TTM_PAGE_VMALLOC; > + } > + if (!ttm->pages) { > + drm_free_memctl(size); > + } > +} I think you've already discussed this with Arjan, but it would be good to be able to get rid of the kmalloc/vmalloc split. Vmalloc space is somewhat scarce on some arches (like i386) so it's best not to use it. > +static struct page *drm_ttm_alloc_page(void) > +{ > + struct page *page; > + > + if (drm_alloc_memctl(PAGE_SIZE)) { > + return NULL; > + } > + page = alloc_page(GFP_KERNEL | __GFP_ZERO | GFP_DMA32); > + if (!page) { > + drm_free_memctl(PAGE_SIZE); > + return NULL; > + } > + SetPageLocked(page); > + return page; > +} I think this is where you'll want to account against the user's memlocked limit, before you lock the page (or even try to allocate it). > +/** > + * buffer object vm functions. > + */ > + > +/** > + * \c Pagefault method for buffer objects. > + * > + * \param vma Virtual memory area. > + * \param address File offset. > + * \return Error or refault. The pfn is manually inserted. > + * > + * It's important that pfns are inserted while holding the bo->mutex lock. > + * otherwise we might race with unmap_mapping_range() which is always > + * called with the bo->mutex lock held. > + * > + * We're modifying the page attribute bits of the vma->vm_page_prot field, > + * without holding the mmap_sem in write mode. Only in read mode. > + * These bits are not used by the mm subsystem code, and we consider them > + * protected by the bo->mutex lock. > + */ So the DRM uses something other than kdoc for comments? > + > +static unsigned long drm_bo_vm_nopfn(struct vm_area_struct *vma, > + unsigned long address) > +{ > + drm_buffer_object_t *bo = (drm_buffer_object_t *) vma->vm_private_data; > + unsigned long page_offset; > + struct page *page = NULL; > + drm_ttm_t *ttm; > + drm_device_t *dev; > + unsigned long pfn; > + int err; > + unsigned long bus_base; > + unsigned long bus_offset; > + unsigned long bus_size; > + int ret = NOPFN_REFAULT; > + > + if (address > vma->vm_end) > + return NOPFN_SIGBUS; > + > + err = mutex_lock_interruptible(&bo->mutex); > + if (err) > + return NOPFN_REFAULT; > + > + err = drm_bo_wait(bo, 0, 0, 0); > + if (err) { > + ret = (err != -EAGAIN) ? NOPFN_SIGBUS : NOPFN_REFAULT; > + goto out_unlock; > + } > + > + /* > + * If buffer happens to be in a non-mappable location, > + * move it to a mappable. > + */ > + > + if (!(bo->mem.flags & DRM_BO_FLAG_MAPPABLE)) { > + uint32_t new_mask = bo->mem.mask | > + DRM_BO_FLAG_MAPPABLE | > + DRM_BO_FLAG_FORCE_MAPPABLE; > + err = drm_bo_move_buffer(bo, new_mask, 0, 0); > + if (err) { > + ret = (err != -EAGAIN) ? NOPFN_SIGBUS : NOPFN_REFAULT; > + goto out_unlock; > + } > + } > + > + dev = bo->dev; > + err = drm_bo_pci_offset(dev, &bo->mem, &bus_base, &bus_offset, > + &bus_size); > + > + if (err) { > + ret = NOPFN_SIGBUS; > + goto out_unlock; > + } > + > + page_offset = (address - vma->vm_start) >> PAGE_SHIFT; > + > + if (bus_size) { > + drm_mem_type_manager_t *man = &dev->bm.man[bo->mem.mem_type]; > + > + pfn = ((bus_base + bus_offset) >> PAGE_SHIFT) + page_offset; > + vma->vm_page_prot = drm_io_prot(man->drm_bus_maptype, vma); > + } else { > + ttm = bo->ttm; > + > + drm_ttm_fixup_caching(ttm); > + page = drm_ttm_get_page(ttm, page_offset); > + if (!page) { > + ret = NOPFN_OOM; > + goto out_unlock; > + } > + pfn = page_to_pfn(page); > + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); > + } > + > + err = vm_insert_pfn(vma, address, pfn); > + if (err) { > + ret = (err != -EAGAIN) ? NOPFN_OOM : NOPFN_REFAULT; > + goto out_unlock; > + } > +out_unlock: > + mutex_unlock(&bo->mutex); > + return ret; > +} I *think* this is ok, but I remember some requirements that VM_IO be set on these bus related pages, but the arch code should take care of that for you I think.