From mark.kettenis@xs4all.nl Tue Dec 30 22:49:59 2014 Delivered-To: david@gwynne.id.au Received: by 10.152.48.80 with SMTP id j16csp3110159lan; Tue, 30 Dec 2014 04:50:00 -0800 (PST) X-Received: by 10.68.69.80 with SMTP id c16mr98860136pbu.125.1419943799491; Tue, 30 Dec 2014 04:49:59 -0800 (PST) Return-Path: Received: from glazunov.sibelius.xs4all.nl (sibelius.xs4all.nl. [83.163.83.176]) by mx.google.com with ESMTPS id ag17si57551490pac.113.2014.12.30.04.49.57 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Dec 2014 04:49:59 -0800 (PST) Received-SPF: none (google.com: mark.kettenis@xs4all.nl does not designate permitted sender hosts) client-ip=83.163.83.176; Authentication-Results: mx.google.com; spf=none (google.com: mark.kettenis@xs4all.nl does not designate permitted sender hosts) smtp.mail=mark.kettenis@xs4all.nl Received: from glazunov.sibelius.xs4all.nl (kettenis@localhost [127.0.0.1]) by glazunov.sibelius.xs4all.nl (8.14.5/8.14.3) with ESMTP id sBUCnqFE021680; Tue, 30 Dec 2014 13:49:52 +0100 (CET) Received: (from kettenis@localhost) by glazunov.sibelius.xs4all.nl (8.14.5/8.14.3/Submit) id sBUCnqBm011193; Tue, 30 Dec 2014 13:49:52 +0100 (CET) Date: Tue, 30 Dec 2014 13:49:52 +0100 (CET) Message-Id: <201412301249.sBUCnqBm011193@glazunov.sibelius.xs4all.nl> From: Mark Kettenis To: david@gwynne.id.au CC: tech@openbsd.org In-reply-to: (message from David Gwynne on Tue, 30 Dec 2014 21:53:46 +1000) Subject: Re: km_alloc/km_free optimization References: <201412281956.sBSJuXkX014608@glazunov.sibelius.xs4all.nl> <201412282155.sBSLtdHW022041@glazunov.sibelius.xs4all.nl> Status: RO Content-Length: 5326 Lines: 183 > From: David Gwynne > Date: Tue, 30 Dec 2014 21:53:46 +1000 > > > On 29 Dec 2014, at 7:55 am, Mark Kettenis wrote: > > > >> Date: Sun, 28 Dec 2014 20:56:33 +0100 (CET) > >> From: Mark Kettenis > >> > >> P.S. I would not be surprised if this would "fix" the problems landisk > >> has with the "* 8" pool diff. Currently updating my landisk to > >> verify. > > > > And it seems this does fix the landisk problem. I strongly suspect > > there is some cache nastyness going on. > > i just got this with your diff + my pool garbage collector diff: > > uvm_fault(0xffffffff819042c0, 0x0, 0, 2) -> e > kernel: page fault trap, code=0 > Stopped at km_free+0x72: movq $0,0(%rax) > ddb{4}> tr > km_free() at km_free+0x72 > pool_large_free() at pool_large_free+0x7e > pool_allocator_free() at pool_allocator_free+0x30 > pool_p_free() at pool_p_free+0xd3 > pool_gc_pages() at pool_gc_pages+0xc9 > taskq_thread() at taskq_thread+0x3e > end trace frame: 0x0, count: -6 > > ive been running the gc in production for a week so far and havent > hit this, so i think its new with your diff. however, i havent > looked into it yet so i could be wrong. Hmm, I guess you have a pool that has dma constraints and uses in-page headers? That would actually prevent the direct map from being used, but km_free would not realize this, and try to unmap the direct mapping. Arguably that bug is already present in the current code. We have two options: 1. Make km_free(9) respect the kv_align attribute and skip the direct mapping code path. Should work, since the pool allocator code is careful enough to set the alignment for the km_free() calls. Would make things slightly fragile since people might forget though. But perhaps it is a good to have things crash in that case. 2. Align the physical pages such that mapping them directly produces the desired alignment. Increases the risk of allocations failing. Probably not an issue for smallish alignment values. This may be our only option if we don't find the root cause of the landisk issue. Obviously this has the benefit of using the direct map in more cases than we do now. The diff below implements #2. Index: uvm_km.c =================================================================== RCS file: /home/cvs/src/sys/uvm/uvm_km.c,v retrieving revision 1.123 diff -u -p -r1.123 uvm_km.c --- uvm_km.c 17 Dec 2014 06:58:11 -0000 1.123 +++ uvm_km.c 30 Dec 2014 12:18:54 -0000 @@ -809,11 +809,9 @@ km_alloc(size_t sz, const struct kmem_va struct pglist pgl; int mapflags = 0; vm_prot_t prot; + paddr_t pla_align; int pla_flags; int pla_maxseg; -#ifdef __HAVE_PMAP_DIRECT - paddr_t pa; -#endif vaddr_t va, sva; KASSERT(sz == round_page(sz)); @@ -828,46 +826,33 @@ km_alloc(size_t sz, const struct kmem_va if (kp->kp_zero) pla_flags |= UVM_PLA_ZERO; + pla_align = kp->kp_align; +#ifdef __HAVE_PMAP_DIRECT + if (pla_align < kv->kv_align) + pla_align = kv->kv_align; +#endif pla_maxseg = kp->kp_maxseg; if (pla_maxseg == 0) pla_maxseg = sz / PAGE_SIZE; if (uvm_pglistalloc(sz, kp->kp_constraint->ucr_low, - kp->kp_constraint->ucr_high, kp->kp_align, kp->kp_boundary, + kp->kp_constraint->ucr_high, pla_align, kp->kp_boundary, &pgl, pla_maxseg, pla_flags)) { return (NULL); } #ifdef __HAVE_PMAP_DIRECT - if (kv->kv_align) - goto alloc_va; -#if 1 /* - * For now, only do DIRECT mappings for single page - * allocations, until we figure out a good way to deal - * with contig allocations in km_free. + * Only use direct mappings for single page or single segment + * allocations. */ - if (!kv->kv_singlepage) - goto alloc_va; -#endif - /* - * Dubious optimization. If we got a contig segment, just map it - * through the direct map. - */ - TAILQ_FOREACH(pg, &pgl, pageq) { - if (pg != TAILQ_FIRST(&pgl) && - VM_PAGE_TO_PHYS(pg) != pa + PAGE_SIZE) - break; - pa = VM_PAGE_TO_PHYS(pg); - } - if (pg == NULL) { + if (kv->kv_singlepage || kp->kp_maxseg == 1) { TAILQ_FOREACH(pg, &pgl, pageq) { - vaddr_t v; - v = pmap_map_direct(pg); + va = pmap_map_direct(pg); if (pg == TAILQ_FIRST(&pgl)) - va = v; + sva = va; } - return ((void *)va); + return ((void *)sva); } #endif alloc_va: @@ -947,28 +932,35 @@ km_free(void *v, size_t sz, const struct struct vm_page *pg; struct pglist pgl; - sva = va = (vaddr_t)v; - eva = va + sz; + sva = (vaddr_t)v; + eva = sva + sz; - if (kp->kp_nomem) { + if (kp->kp_nomem) goto free_va; - } - if (kv->kv_singlepage) { #ifdef __HAVE_PMAP_DIRECT - pg = pmap_unmap_direct(va); - uvm_pagefree(pg); + if (kv->kv_singlepage || kp->kp_maxseg == 1) { + TAILQ_INIT(&pgl); + for (va = sva; va < eva; va += PAGE_SIZE) { + pg = pmap_unmap_direct(va); + TAILQ_INSERT_TAIL(&pgl, pg, pageq); + } + uvm_pglistfree(&pgl); + return; + } #else + if (kv->kv_singlepage) { struct uvm_km_free_page *fp = v; + mtx_enter(&uvm_km_pages.mtx); fp->next = uvm_km_pages.freelist; uvm_km_pages.freelist = fp; if (uvm_km_pages.freelistlen++ > 16) wakeup(&uvm_km_pages.km_proc); mtx_leave(&uvm_km_pages.mtx); -#endif return; } +#endif if (kp->kp_pageable) { pmap_remove(pmap_kernel(), sva, eva);