1 协程栈

1.1 协程栈的作用

协程栈表现出了协程的执行路径，在方法内部声明使用的变量也会储存在栈内存，还包括函数的传参以及返回值。

1.2 协程栈的位置

Go协程栈位于Go的堆内存上。而Go的堆内存位于操作系统的虚拟内存上。

1.3 栈空间不足的解决

1.3.1 内存逃逸--解决变量太大

当栈内存不足的时候，会引发内存逃逸，而如果某栈帧回收后，发现有需要继续使用的变量，或者是发现过大的变量的时候，也会引发内存逃逸。

指针逃逸

触发条件：函数返回了对象的指针。

空接口逃逸

触发条件：如果函数的参数类型是 interface{} 那么函数的实参很可能会逃逸。_例如fmt.Println()_
因为空接口类型函数通常会使用反射。

大变量逃逸

触发原因：过大的变量会导致占空间不足（Go栈的初始空间只有2-4k）

1.3.2 栈扩容--解决栈帧过多

Go的栈空间也是从堆上申请的，而函数被调用前会使用morestack()函数（由汇编实现）判断是否需要进行栈扩容。
方式：1.13版本之前使用分段栈，后使用连续栈。

分段栈

发现协程的栈空间不足的时候，在栈中接着开辟一个新空间，虽然节省空间，但会导致栈指针会不停在两块区域之间跳转。

连续栈

直接将原本区域的数据复制到新的栈空间中（内存变为原来的两倍），这样新的栈空间内存是连续的。（当空间使用率不足1/4的时候，内存空间会缩小为原来的1/2）。

2 堆内存结构

其实，go的栈内存用的也是堆内存。

2.1 操作系统的虚拟内存

指的是操作系统给应用提供的虚拟内存空间。操作系统不允许进程直接读写物理内存。虚拟内存和物理内存的关系是由操作系统管理的。

2.2 heapArena

Go每次申请的堆的内存单元（heapArena）为64MB，而一个系统最多拥有2^20个内存单元。至此，所有的内存单元组成了mheap(Go的堆内存)。

type heapArena struct {            //描述了一个64M的内存单元
    bitmap [heapArenaBitmapBytes]byte
    spans [pagesPerArena]*mspan
    pageInUse [pagesPerArena / 8]uint8
    pageMarks [pagesPerArena / 8]uint8
    pageSpecials [pagesPerArena / 8]uint8
    checkmarks *checkmarksMap
    zeroedBase uintptr
}

type mheap struct {                //描述了整个堆内存
    lock  mutex
    pages pageAlloc 

    sweepgen uint32 
    allspans []*mspan // all spans out there
    pagesInUse         atomic.Uint64
    pagesSwept         atomic.Uint64
    pagesSweptBasis    atomic.Uint64
    sweepHeapLiveBasis uint64
    sweepPagesPerByte  float64 
    scavengeGoal uint64
    reclaimIndex atomic.Uint64
    reclaimCredit atomic.Uintptr
    arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena    //记录所有的heapArena
    heapArenaAlloc linearAlloc
    arenaHints *arenaHint
    arena linearAlloc
    allArenas []arenaIdx
    sweepArenas []arenaIdx
    markArenas []arenaIdx
    curArena struct {
        base, end uintptr
    }
    _ uint32 // ensure 64-bit alignment of central
    central [numSpanClasses]struct {    //68<<1
        mcentral mcentral                //索引
        pad      [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
    }
    spanalloc             fixalloc // allocator for span*
    cachealloc            fixalloc // allocator for mcache*
    specialfinalizeralloc fixalloc // allocator for specialfinalizer*
    specialprofilealloc   fixalloc // allocator for specialprofile*
    specialReachableAlloc fixalloc // allocator for specialReachable
    speciallock           mutex    // lock for special record allocators.
    arenaHintAlloc        fixalloc // allocator for arenaHints
    unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

2.2.1 heapArena的使用

线性分配

直接一个接着一个向后增加，线性排布。若前面有对象被回收也不管，直到64M排布完全，再次遍历，寻找可使用的空缺。

链表分配

使用内存的时候，将空缺的内存块记录，并使用类似链表的方式储存记录空缺区域地址，有新的对象需要储存的时候，先遍历链表，将空缺补满。

正解：分级分配

分级分配只是一种思想。
将内存区域切分为不同的小的内存块，每个内存块按照级别分配相应的内存（由小到大），将所有的对象放在能储存它的最小的块中。

2.2.2 mspan内存管理单元

mspan表示N个内存大小相同的格子。Go中一共有68（0-67）种内存管理单元。而每一个内存单元都是现场按需切分的，不是一授权就全部切好。

//go:notinheap
type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none
    list *mSpanList // For debugging. TODO: Remove.

    startAddr uintptr // address of first byte of span aka s.base()
    npages    uintptr // number of pages in span

    manualFreeList gclinkptr // list of free objects in mSpanManual spans

    freeindex uintptr

    nelems uintptr // number of object in the span.

    allocCache uint64

    allocBits  *gcBits
    gcmarkBits *gcBits

    sweepgen    uint32
    divMul      uint32        // for divide by elemsize
    allocCount  uint16        // number of allocated objects
    spanclass   spanClass     // size class and noscan (uint8)
    state       mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)
    needzero    uint8         // needs to be zeroed before allocation
    elemsize    uintptr       // computed from sizeclass or from npages
    limit       uintptr       // end of data in span
    speciallock mutex         // guards specials list
    specials    *special      // linked list of special records sorted by offset.
}

2.2.3 mcentral中心索引

因为mspan的分布毫无规律，所以设置中心索引，引导内存分配。其中一共有136个mcentral结构体，分为68个需要GC扫描的组和68个不需要GC扫描的组。其实本质就是空置内存空间的链表头。

type mcentral struct {
    spanclass spanClass    //同级别的span
    partial [2]spanSet // list of spans with a free object
    full    [2]spanSet // list of spans with no free objects
}

2.2.4 mcache 线程缓存

因为mcentral是可读可写的，所以在高并发场景中，存在严重的高并发的锁冲突的问题：所以 为每一个线程创建一个线程缓存mcache。每一个线程P拥有一个mcache，而每一个mcache拥有136个mspan(68需要GC，68不需要，且每种一个)。

type mcache struct {
    
    nextSample uintptr // trigger heap sample after allocating this many bytes
    scanAlloc  uintptr // bytes of scannable heap allocated

    tiny       uintptr
    tinyoffset uintptr
    tinyAllocs uintptr

    alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass

    stackcache [_NumStackOrders]stackfreelist

    flushGen uint32
}

2.3 结构构成图

3 内存的分配

3.1 对象分级

一共被分为三级：
1 Tiny 微对象(0，16B) 无指针
2 Small 小对象[16B, 32KB]
3 Large 大对象(32KB，正无穷)
其中微，小对象分配至普通的mspan(指1-67号mspan)。而大对象需要量身定制，分配到0
级mspan中。

3.2 微对象分配

即从mcache中拿到本地（二级）mspan，将多个微对象合并成一个16Byte存入。

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    //一些错误判断
    delayedZeroing := false
    if size <= maxSmallSize {
        if noscan && size < maxTinySize {        //微对象逻辑
            off := c.tinyoffset
            if size&7 == 0 {
                off = alignUp(off, 8)
            } else if goarch.PtrSize == 4 && size == 12 {
                off = alignUp(off, 8)
            } else if size&3 == 0 {
                off = alignUp(off, 4)
            } else if size&1 == 0 {
                off = alignUp(off, 2)
            }
            if off+size <= maxTinySize && c.tiny != 0 {
                // The object fits into existing tiny block.
                x = unsafe.Pointer(c.tiny + off)
                c.tinyoffset = off + size
                c.tinyAllocs++
                mp.mallocing = 0
                releasem(mp)
                return x
            }
            span = c.alloc[tinySpanClass]            //获取二级span即16个字节
            v := nextFreeFast(span)
            if v == 0 {
                v, span, shouldhelpgc = c.nextFree(tinySpanClass)    //找到地址v
            }
            x = unsafe.Pointer(v)
            (*[2]uint64)(x)[0] = 0
            (*[2]uint64)(x)[1] = 0
            if !raceenabled && (size < c.tinyoffset || c.tiny == 0) {
                // Note: disabled when race detector is on, see comment near end of this function.
                c.tiny = uintptr(x)
                c.tinyoffset = size
            }
            size = maxTinySize
        } else {
            //...小对象
        }
    } else {
        //大对象
    }
    return x
}

3.3 小对象的分配

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
     //一些错误判断
    delayedZeroing := false
    if size <= maxSmallSize {
        if noscan && size < maxTinySize {
            //微小对象处理逻辑
        } else {
            var sizeclass uint8            //计算需要span的级别--查表
            if size <= smallSizeMax-8 {
                sizeclass = size_to_class8[divRoundUp(size, smallSizeDiv)]
            } else {
                sizeclass = size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]
            }
            size = uintptr(class_to_size[sizeclass])
            spc := makeSpanClass(sizeclass, noscan)
            span = c.alloc[spc]        //c即前面获取的本地span缓存--获取正确级别的span
            v := nextFreeFast(span)
            if v == 0 {
                v, span, shouldhelpgc = c.nextFree(spc)        //mcache替换
            }
            x = unsafe.Pointer(v)
            if needzero && span.needzero != 0 {
                memclrNoHeapPointers(unsafe.Pointer(v), size)
            }
        }
    } else {
        //大对象处理逻辑
    }
    return x
}

3.4 大对象的分配

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    //一些错误判断
    if size <= maxSmallSize {
        //微，小对象的处理逻辑 
    } else {
        shouldhelpgc = true
        span = c.allocLarge(size, noscan)
        span.freeindex = 1
        span.allocCount = 1
        size = span.elemsize
        x = unsafe.Pointer(span.base())
        if needzero && span.needzero != 0 {
            if noscan {
                delayedZeroing = true
            } else {
                memclrNoHeapPointers(x, size)
                
            }
        }
    }

    var scanSize uintptr
    if !noscan {
        heapBitsSetType(uintptr(x), size, dataSize, typ)
        if dataSize > typ.size {
            if typ.ptrdata != 0 {
                scanSize = dataSize - typ.size + typ.ptrdata
            }
        } else {
            scanSize = typ.ptrdata
        }
        c.scanAlloc += scanSize
    }

    publicationBarrier()
    if gcphase != _GCoff {
        gcmarknewobject(span, uintptr(x), size, scanSize)
    }
    if raceenabled {
        racemalloc(x, size)
    }
    if msanenabled {
        msanmalloc(x, size)
    }
    if asanenabled {
        rzBeg := unsafe.Add(x, userSize)
        asanpoison(rzBeg, size-userSize)
        asanunpoison(x, userSize)
    }
    if rate := MemProfileRate; rate > 0 {
        // Note cache c only valid while m acquired; see #47302
        if rate != 1 && size < c.nextSample {
            c.nextSample -= size
        } else {
            profilealloc(mp, x, size)
        }
    }
    mp.mallocing = 0
    releasem(mp)
    if delayedZeroing {
        if !noscan {
            throw("delayed zeroing on data that may contain pointers")
        }
        memclrNoHeapPointersChunked(size, x) // This is a possible preemption point: see #47302
    }
    if debug.malloc {
        if debug.allocfreetrace != 0 {
            tracealloc(x, size, typ)
        }

        if inittrace.active && inittrace.id == getg().goid {
            inittrace.bytes += uint64(size)
        }
    }
    if assistG != nil {
        assistG.gcAssistBytes -= int64(size - dataSize)
    }
    if shouldhelpgc {
        if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
            gcStart(t)
        }
    }
    if raceenabled && noscan && dataSize < maxTinySize {
        x = add(x, size-dataSize)
    }
    return x
}

调用函数allocLarge()

直接在heapArena中开辟新的mspan。

func (c *mcache) allocLarge(size uintptr, noscan bool) *mspan {
    if size+_PageSize < size {
        throw("out of memory")
    }
    npages := size >> _PageShift
    if size&_PageMask != 0 {
        npages++
    }

    deductSweepCredit(npages*_PageSize, npages)

    spc := makeSpanClass(0, noscan)        //创建0级的span
    s := mheap_.alloc(npages, spc)
    if s == nil {
        throw("out of memory")
    }
    stats := memstats.heapStats.acquire()
    atomic.Xadduintptr(&stats.largeAlloc, npages*pageSize)
    atomic.Xadduintptr(&stats.largeAllocCount, 1)
    memstats.heapStats.release()

    // Update heapLive.
    gcController.update(int64(s.npages*pageSize), 0)

    mheap_.central[spc].mcentral.fullSwept(mheap_.sweepgen).push(s)
    s.limit = s.base() + size
    heapBitsForAddr(s.base()).initSpan(s)
    return s
}

3.5 mcache的替换

在mcache中，每一个级别的mspan链表只有一个，当某一个mspan满了的时候，就会去mcentral中获取一串级别相同的mspan链表。

3.6 mcentral扩容

从操作系统中申请新的内存，增加heapArena。

Go 栈和堆