标签搜索

golang Map

BeforeIeave
2022-07-27 / 1 评论 / 210 阅读 / 正在检测是否收录...

1 HashMap的基本方法

1.1 开放寻址法(插入值为例)

首先,hashmap的底层是一个数组,每一个数组元素存储一个KV键值对。
将KV录入hash函数生成一串数据,再将此数据对数组的长度进行取模运算,得到应当存入的数组的位置(下标),倘若目标地址已经被占用了(hash碰撞),则向后依次遍历直到找到空闲的位置再存值。
读取时方法相似。

1.2 拉链法(插入值为例)

更改底层数组的存储方式,将底层的数组(hash槽)更改为储存KV键值对链表(hash桶)的指针(有些时候可以不用指针)。需要插入的时候,将链表下延即可。
读取时,一样找到相关的位置,遍历下方的链表寻值。

2 GO语言HashMap源码阅读

在runtime包的map.go下。

2.1 HashMap底层数据结构体

//map的头部指针
type hmap struct{
    count         int                //此表存的键值对的数量.
    flags        uint8
    B            uint8            //桶个数的关于2的对数 有8个桶->B的值为3
    noverflow    uint16
    hash0        uint32            //生成hash值的种子.用于hash算法
    
    buckets        unsafe.Pointer    //使用拉链发储存KV键值对.值是2^B.指向bmap数组
    oldbuckets    unsafe.Pointer
    nevacuate    uintptr
    
    extra        *mapextra
}

//桶的数据结构
type bmap struct{
    //tophash 存的是key对应的hash值的头一个字节(8位)
    tophash [bucketCnt]uint8    //bucketCnt 通过位运算得出值为8
    //其后跟有bucketCnt个key和相应数量的value
    
    //这里不写相应的key和value的存储代码是为了适应任意值类型的KV.
    //不用空接口的原因:会多一层结构体,更加麻烦
    
    //再之后还跟有overflow的指针(是否溢出8个键值对,是则指向新的桶)
}

注意:bmap中的tophash、keys、elems三个字段是分别组成三个数组的,只是具有下标对应的关系。
Map底层.png

2.2 Map的初始化

2.2.1 使用make进行初始化

观察编译过程使用终端命令

go build -gcflags -S ./文件夹名/函数名.go

发现最后调用的是tuntime.makmap的方法。
makemap的源码:(位置:Go SDK 1.18/src/runtime/map.go)

func makemap(t *maptype, hint int, h *hmap) *hmap {
    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
    if overflow || mem > maxAlloc {
        hint = 0
    }

    if h == nil {
        h = new(hmap)                //首先创建了一个hmap
    }
    h.hash0 = fastrand()

    B := uint8(0)
    for overLoadFactor(hint, B) {    //根据传入的提示算出桶的数量
        B++
    }
    h.B = B

    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)    //创建桶数组
        //同时创建出一些溢出桶备用
        if nextOverflow != nil {
            h.extra = new(mapextra)                //溢出桶的底层是mapextra
            h.extra.nextOverflow = nextOverflow    //nextOverflow指向所有的溢出桶
        }
    }

    return h
}

map底层2.png

2.2.2 使用字面量进行初始化

hash := map[string]int{
    "1": 2,
    "3": 4,
    "5": 6,
}

先使用make创建空间,当元素小于25个的时候,底层直接赋值,若大于25个元素则底层将转化为循环赋值。(分别生成k数组和v数组,再使用for循环)

2.3 Map的查询

2.3.1 寻值的过程

首先将需要查询的key值和hash0(哈希种子)录入hasher函数生成hasher二进制字符串,根据B的值取最后B位数字,表示的十进制数则为对应的桶号

Map的查询过程.png

此时tophash的值是此字符串的前八位。再根据tophash的值,在数组中遍历,再次验证key的正确性,倘若不是(hash碰撞),则再次向下搜寻相同tophash的值的位置,倘若还是没有,则判断overflow是否指向了溢出桶,以此往复。

2.3.2 插入(更改)值的过程

和寻找值的过程类似。

2.4 Map的扩容机制

当溢出桶的数量太多了,会导致性能严重下降,此时需要通过扩容优化。

2.4.1 扩容节点的判断

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    
//...
    
    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }
    
//...
    
}

即:如果map不再扩容当中,或者判断装载因子(最多平均一个hash槽6.5个key)超出范围,或者有太多的溢出桶,则实行扩容:hashGrow(t, h)

2.4.2 扩容类型

等量扩容

可以通俗地理解为数据不多,但是溢出桶过多(hash碰撞,或者原本的数据大量被删除)

翻倍扩容

可以通俗地理解为数据过多。

2.4.3 扩容操作

1.前置操作

1.创建一组新的桶
2.将头部指针oldbuckets指向原有的桶数组
3.buckets指针指向新的桶数组
4.将map标记为扩容状态

func hashGrow(t *maptype, h *hmap) {
    //计算B的新的溢出值
    bigger := uint8(1)
    if !overLoadFactor(h.count+1, h.B) {
        bigger = 0
        h.flags |= sameSizeGrow
    }
    //指针的替换更新
    oldbuckets := h.buckets
    //创建新的桶数组
    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
    
    //状态的更新
    flags := h.flags &^ (iterator | oldIterator)
    if h.flags&iterator != 0 {
        flags |= oldIterator
    }
    // commit the grow (atomic wrt gc)
    h.B += bigger
    //这里B的值变了,原本桶的数据会被分配到新的几个桶中
    //(原本的B增加后会变成新的数字,则将原本的桶内数据分配到新的几个桶中)
    h.flags = flags
    h.oldbuckets = oldbuckets
    h.buckets = newbuckets
    h.nevacuate = 0
    h.noverflow = 0

    //更新extra  新桶的溢出桶的信息更新
    if h.extra != nil && h.extra.overflow != nil {
        // Promote current overflow buckets to the old generation.
        if h.extra.oldoverflow != nil {
            throw("oldoverflow is not nil")
        }
        h.extra.oldoverflow = h.extra.overflow
        h.extra.overflow = nil
    }
    if nextOverflow != nil {
        if h.extra == nil {
            h.extra = new(mapextra)
        }
        h.extra.nextOverflow = nextOverflow
    }

    // the actual copying of the hash table data is done incrementally
    // by growWork() and evacuate().
}

2.将所有数据从旧桶驱逐到新桶中

Go采用渐进式驱逐:需要操作某一个旧桶的时候将数据迁移到新桶中
注意:读取的时候不进行驱逐只判断需要读取新桶还是旧桶。

func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
    newbit := h.noldbuckets()
    if !evacuated(b) {
        // TODO: reuse overflow buckets instead of using new ones, if there
        // is no iterator using the old buckets.  (If !oldIterator.)

        // xy contains the x and y (low and high) evacuation destinations.
        var xy [2]evacDst
        x := &xy[0]
        x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
        x.k = add(unsafe.Pointer(x.b), dataOffset)
        x.e = add(x.k, bucketCnt*uintptr(t.keysize))

        if !h.sameSizeGrow() {
            // Only calculate y pointers if we're growing bigger.
            // Otherwise GC can see bad pointers.
            y := &xy[1]
            y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
            y.k = add(unsafe.Pointer(y.b), dataOffset)
            y.e = add(y.k, bucketCnt*uintptr(t.keysize))
        }

        for ; b != nil; b = b.overflow(t) {
            k := add(unsafe.Pointer(b), dataOffset)
            e := add(k, bucketCnt*uintptr(t.keysize))
            for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
                top := b.tophash[i]
                if isEmpty(top) {
                    b.tophash[i] = evacuatedEmpty
                    continue
                }
                if top < minTopHash {
                    throw("bad map state")
                }
                k2 := k
                if t.indirectkey() {
                    k2 = *((*unsafe.Pointer)(k2))
                }
                var useY uint8
                if !h.sameSizeGrow() {
                    // Compute hash to make our evacuation decision (whether we need
                    // to send this key/elem to bucket x or bucket y).
                    hash := t.hasher(k2, uintptr(h.hash0))
                    if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
                        // If key != key (NaNs), then the hash could be (and probably
                        // will be) entirely different from the old hash. Moreover,
                        // it isn't reproducible. Reproducibility is required in the
                        // presence of iterators, as our evacuation decision must
                        // match whatever decision the iterator made.
                        // Fortunately, we have the freedom to send these keys either
                        // way. Also, tophash is meaningless for these kinds of keys.
                        // We let the low bit of tophash drive the evacuation decision.
                        // We recompute a new random tophash for the next level so
                        // these keys will get evenly distributed across all buckets
                        // after multiple grows.
                        useY = top & 1
                        top = tophash(hash)
                    } else {
                        if hash&newbit != 0 {
                            useY = 1
                        }
                    }
                }

                if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
                    throw("bad evacuatedN")
                }

                b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
                dst := &xy[useY]                 // evacuation destination

                if dst.i == bucketCnt {
                    dst.b = h.newoverflow(t, dst.b)
                    dst.i = 0
                    dst.k = add(unsafe.Pointer(dst.b), dataOffset)
                    dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
                }
                dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
                if t.indirectkey() {
                    *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
                } else {
                    typedmemmove(t.key, dst.k, k) // copy elem
                }
                if t.indirectelem() {
                    *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
                } else {
                    typedmemmove(t.elem, dst.e, e)
                }
                dst.i++
                // These updates might push these pointers past the end of the
                // key or elem arrays.  That's ok, as we have the overflow pointer
                // at the end of the bucket to protect against pointing past the
                // end of the bucket.
                dst.k = add(dst.k, uintptr(t.keysize))
                dst.e = add(dst.e, uintptr(t.elemsize))
            }
        }
        // Unlink the overflow buckets & clear key/elem to help GC.
        if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
            b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
            // Preserve b.tophash because the evacuation
            // state is maintained there.
            ptr := add(b, dataOffset)
            n := uintptr(t.bucketsize) - dataOffset
            memclrHasPointers(ptr, n)
        }
    }

    if oldbucket == h.nevacuate {
        advanceEvacuationMark(h, t, newbit)
    }
}

3.回收oldbuckets

在数据被驱逐完成后,将回收oldbuckets的内存。

2.4.4 注意

扩容可能并不是增加桶的数量而是整理桶内的数据。

3 Map的并发问题

在使用map的时候,经常处于高并发的环境中,若使用mutex包增加锁,则效率大打折扣,所以推荐使用sync.Map。

3.1 sync.Map源码阅读

type Map struct {
    mu Mutex

    // read contains the portion of the map's contents that are safe for
    // concurrent access (with or without mu held).
    //
    // The read field itself is always safe to load, but must only be stored with
    // mu held.
    //
    // Entries stored in read may be updated concurrently without mu, but updating
    // a previously-expunged entry requires that the entry be copied to the dirty
    // map and unexpunged with mu held.
    read atomic.Value // readOnly

    // dirty contains the portion of the map's contents that require mu to be
    // held. To ensure that the dirty map can be promoted to the read map quickly,
    // it also includes all of the non-expunged entries in the read map.
    //
    // Expunged entries are not stored in the dirty map. An expunged entry in the
    // clean map must be unexpunged and added to the dirty map before a new value
    // can be stored to it.
    //
    // If the dirty map is nil, the next write to the map will initialize it by
    // making a shallow copy of the clean map, omitting stale entries.
    dirty map[any]*entry

    // misses counts the number of loads since the read map was last updated that
    // needed to lock mu to determine whether the key was present.
    //
    // Once enough misses have occurred to cover the cost of copying the dirty
    // map, the dirty map will be promoted to the read map (in the unamended
    // state) and the next store to the map will make a new dirty copy.
    misses int
}

// readOnly is an immutable struct stored atomically in the Map.read field.
type readOnly struct {
    m       map[any]*entry
    amended bool // true if the dirty map contains some key not in m.
}
...
type entry struct {
    // p points to the interface{} value stored for the entry.
    //
    // If p == nil, the entry has been deleted, and either m.dirty == nil or
    // m.dirty[key] is e.
    //
    // If p == expunged, the entry has been deleted, m.dirty != nil, and the entry
    // is missing from m.dirty.
    //
    // Otherwise, the entry is valid and recorded in m.read.m[key] and, if m.dirty
    // != nil, in m.dirty[key].
    //
    // An entry can be deleted by atomic replacement with nil: when m.dirty is
    // next created, it will atomically replace nil with expunged and leave
    // m.dirty[key] unset.
    //
    // An entry's associated value can be updated by atomic replacement, provided
    // p != expunged. If p == expunged, an entry's associated value can be updated
    // only after first setting m.dirty[key] = e so that lookups using the dirty
    // map find the entry.
    p unsafe.Pointer // *interface{}
}

发现其包含四个字段(使用了两个map),简记为:

type Map struct{
    mu        //原子锁
    read    //key(interface{})和value(unsafe.Pointer)都可以是任意值的map
    dirty    //和read差不多也是万能指向的map
    misses    //未命中
}

syncMap.png

3.2 运行逻辑解释

首先 read结构体用于高并发的读写,dirty结构体用于储存追加的字段。
读写默认走read结构体。如果amended=true则取dirty结构体中的信息,此时misses+=1。
追加先进入read结构体,没有找到则返回到mu锁上dirty,再追加,然后read结构体中amended改为true,表示read结构体内的信息不完整。
当misses的值等于dirty的长度时,将dirty的内容给read,下方dirty变为nil,ammended和miss初始为false和0,若后续仍要追加信息,则再次重建dirty结构体。

3.3 更难的删除

3.3.1 正常删除

在read结构体中可以找到,则直接找到相关的key的值,再将指向value的Pointer置为空。

3.3.2 追加后删除

指dirty内的信息还没有给予read的时候。(上下信息不一致的时间段)
首先上锁后再dirty中找到相关的key,然后将指向value的Pointer置为空,在提升(diety变为nil)的时候,将这个key对应的Pointer变为expunged(删除)。则在重建dirty的时候不再重建Pointer为expunged的键值对,同时再次删除或者查询的时候若Pointer的值为此标识,认为其不存在。

3.4 总结

sync.Map使用了两个map,实质上是分离了扩容问题
将不会引发扩容的操作(查改)归于read(map)
将可能引发扩容的操作(插入)归于dirty(map)

0

评论

博主关闭了所有页面的评论