1 HashMap的基本方法
1.1 开放寻址法(插入值为例)
首先,hashmap的底层是一个数组,每一个数组元素存储一个KV键值对。
将KV录入hash函数生成一串数据,再将此数据对数组的长度进行取模运算,得到应当存入的数组的位置(下标),倘若目标地址已经被占用了(hash碰撞),则向后依次遍历直到找到空闲的位置再存值。
读取时方法相似。
1.2 拉链法(插入值为例)
更改底层数组的存储方式,将底层的数组(hash槽)更改为储存KV键值对链表(hash桶)的指针(有些时候可以不用指针)。需要插入的时候,将链表下延即可。
读取时,一样找到相关的位置,遍历下方的链表寻值。
2 GO语言HashMap源码阅读
2.1 HashMap底层数据结构体
//map的头部指针
type hmap struct{
count int //此表存的键值对的数量.
flags uint8
B uint8 //桶个数的关于2的对数 有8个桶->B的值为3
noverflow uint16
hash0 uint32 //生成hash值的种子.用于hash算法
buckets unsafe.Pointer //使用拉链发储存KV键值对.值是2^B.指向bmap数组
oldbuckets unsafe.Pointer
nevacuate uintptr
extra *mapextra
}
//桶的数据结构
type bmap struct{
//tophash 存的是key对应的hash值的头一个字节(8位)
tophash [bucketCnt]uint8 //bucketCnt 通过位运算得出值为8
//其后跟有bucketCnt个key和相应数量的value
//这里不写相应的key和value的存储代码是为了适应任意值类型的KV.
//不用空接口的原因:会多一层结构体,更加麻烦
//再之后还跟有overflow的指针(是否溢出8个键值对,是则指向新的桶)
}
注意:bmap中的tophash、keys、elems三个字段是分别组成三个数组的,只是具有下标对应的关系。
2.2 Map的初始化
2.2.1 使用make进行初始化
观察编译过程使用终端命令
go build -gcflags -S ./文件夹名/函数名.go
发现最后调用的是tuntime.makmap的方法。
makemap的源码:(位置:Go SDK 1.18/src/runtime/map.go)
func makemap(t *maptype, hint int, h *hmap) *hmap {
mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
if overflow || mem > maxAlloc {
hint = 0
}
if h == nil {
h = new(hmap) //首先创建了一个hmap
}
h.hash0 = fastrand()
B := uint8(0)
for overLoadFactor(hint, B) { //根据传入的提示算出桶的数量
B++
}
h.B = B
if h.B != 0 {
var nextOverflow *bmap
h.buckets, nextOverflow = makeBucketArray(t, h.B, nil) //创建桶数组
//同时创建出一些溢出桶备用
if nextOverflow != nil {
h.extra = new(mapextra) //溢出桶的底层是mapextra
h.extra.nextOverflow = nextOverflow //nextOverflow指向所有的溢出桶
}
}
return h
}
2.2.2 使用字面量进行初始化
hash := map[string]int{
"1": 2,
"3": 4,
"5": 6,
}
先使用make创建空间,当元素小于25个的时候,底层直接赋值,若大于25个元素则底层将转化为循环赋值。(分别生成k数组和v数组,再使用for循环)
2.3 Map的查询
2.3.1 寻值的过程
首先将需要查询的key值和hash0(哈希种子)录入hasher函数生成hasher二进制字符串,根据B的值取最后B位数字,表示的十进制数则为对应的桶号。
此时tophash的值是此字符串的前八位。再根据tophash的值,在数组中遍历,再次验证key的正确性,倘若不是(hash碰撞),则再次向下搜寻相同tophash的值的位置,倘若还是没有,则判断overflow是否指向了溢出桶,以此往复。
2.3.2 插入(更改)值的过程
2.4 Map的扩容机制
当溢出桶的数量太多了,会导致性能严重下降,此时需要通过扩容优化。
2.4.1 扩容节点的判断
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
//...
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again // Growing the table invalidates everything, so try again
}
//...
}
即:如果map不再扩容当中,或者判断装载因子(最多平均一个hash槽6.5个key)超出范围,或者有太多的溢出桶,则实行扩容:hashGrow(t, h)
2.4.2 扩容类型
等量扩容
可以通俗地理解为数据不多,但是溢出桶过多(hash碰撞,或者原本的数据大量被删除)
翻倍扩容
2.4.3 扩容操作
1.前置操作
1.创建一组新的桶
2.将头部指针oldbuckets指向原有的桶数组
3.buckets指针指向新的桶数组
4.将map标记为扩容状态
func hashGrow(t *maptype, h *hmap) {
//计算B的新的溢出值
bigger := uint8(1)
if !overLoadFactor(h.count+1, h.B) {
bigger = 0
h.flags |= sameSizeGrow
}
//指针的替换更新
oldbuckets := h.buckets
//创建新的桶数组
newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
//状态的更新
flags := h.flags &^ (iterator | oldIterator)
if h.flags&iterator != 0 {
flags |= oldIterator
}
// commit the grow (atomic wrt gc)
h.B += bigger
//这里B的值变了,原本桶的数据会被分配到新的几个桶中
//(原本的B增加后会变成新的数字,则将原本的桶内数据分配到新的几个桶中)
h.flags = flags
h.oldbuckets = oldbuckets
h.buckets = newbuckets
h.nevacuate = 0
h.noverflow = 0
//更新extra 新桶的溢出桶的信息更新
if h.extra != nil && h.extra.overflow != nil {
// Promote current overflow buckets to the old generation.
if h.extra.oldoverflow != nil {
throw("oldoverflow is not nil")
}
h.extra.oldoverflow = h.extra.overflow
h.extra.overflow = nil
}
if nextOverflow != nil {
if h.extra == nil {
h.extra = new(mapextra)
}
h.extra.nextOverflow = nextOverflow
}
// the actual copying of the hash table data is done incrementally
// by growWork() and evacuate().
}
2.将所有数据从旧桶驱逐到新桶中
Go采用渐进式驱逐:需要操作某一个旧桶的时候将数据迁移到新桶中
注意:读取的时候不进行驱逐只判断需要读取新桶还是旧桶。
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
newbit := h.noldbuckets()
if !evacuated(b) {
// TODO: reuse overflow buckets instead of using new ones, if there
// is no iterator using the old buckets. (If !oldIterator.)
// xy contains the x and y (low and high) evacuation destinations.
var xy [2]evacDst
x := &xy[0]
x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
x.k = add(unsafe.Pointer(x.b), dataOffset)
x.e = add(x.k, bucketCnt*uintptr(t.keysize))
if !h.sameSizeGrow() {
// Only calculate y pointers if we're growing bigger.
// Otherwise GC can see bad pointers.
y := &xy[1]
y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
y.k = add(unsafe.Pointer(y.b), dataOffset)
y.e = add(y.k, bucketCnt*uintptr(t.keysize))
}
for ; b != nil; b = b.overflow(t) {
k := add(unsafe.Pointer(b), dataOffset)
e := add(k, bucketCnt*uintptr(t.keysize))
for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
top := b.tophash[i]
if isEmpty(top) {
b.tophash[i] = evacuatedEmpty
continue
}
if top < minTopHash {
throw("bad map state")
}
k2 := k
if t.indirectkey() {
k2 = *((*unsafe.Pointer)(k2))
}
var useY uint8
if !h.sameSizeGrow() {
// Compute hash to make our evacuation decision (whether we need
// to send this key/elem to bucket x or bucket y).
hash := t.hasher(k2, uintptr(h.hash0))
if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
// If key != key (NaNs), then the hash could be (and probably
// will be) entirely different from the old hash. Moreover,
// it isn't reproducible. Reproducibility is required in the
// presence of iterators, as our evacuation decision must
// match whatever decision the iterator made.
// Fortunately, we have the freedom to send these keys either
// way. Also, tophash is meaningless for these kinds of keys.
// We let the low bit of tophash drive the evacuation decision.
// We recompute a new random tophash for the next level so
// these keys will get evenly distributed across all buckets
// after multiple grows.
useY = top & 1
top = tophash(hash)
} else {
if hash&newbit != 0 {
useY = 1
}
}
}
if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
throw("bad evacuatedN")
}
b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
dst := &xy[useY] // evacuation destination
if dst.i == bucketCnt {
dst.b = h.newoverflow(t, dst.b)
dst.i = 0
dst.k = add(unsafe.Pointer(dst.b), dataOffset)
dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
}
dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
if t.indirectkey() {
*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
} else {
typedmemmove(t.key, dst.k, k) // copy elem
}
if t.indirectelem() {
*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
} else {
typedmemmove(t.elem, dst.e, e)
}
dst.i++
// These updates might push these pointers past the end of the
// key or elem arrays. That's ok, as we have the overflow pointer
// at the end of the bucket to protect against pointing past the
// end of the bucket.
dst.k = add(dst.k, uintptr(t.keysize))
dst.e = add(dst.e, uintptr(t.elemsize))
}
}
// Unlink the overflow buckets & clear key/elem to help GC.
if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
// Preserve b.tophash because the evacuation
// state is maintained there.
ptr := add(b, dataOffset)
n := uintptr(t.bucketsize) - dataOffset
memclrHasPointers(ptr, n)
}
}
if oldbucket == h.nevacuate {
advanceEvacuationMark(h, t, newbit)
}
}
3.回收oldbuckets
2.4.4 注意
3 Map的并发问题
在使用map的时候,经常处于高并发的环境中,若使用mutex包增加锁,则效率大打折扣,所以推荐使用sync.Map。
3.1 sync.Map源码阅读
type Map struct {
mu Mutex
// read contains the portion of the map's contents that are safe for
// concurrent access (with or without mu held).
//
// The read field itself is always safe to load, but must only be stored with
// mu held.
//
// Entries stored in read may be updated concurrently without mu, but updating
// a previously-expunged entry requires that the entry be copied to the dirty
// map and unexpunged with mu held.
read atomic.Value // readOnly
// dirty contains the portion of the map's contents that require mu to be
// held. To ensure that the dirty map can be promoted to the read map quickly,
// it also includes all of the non-expunged entries in the read map.
//
// Expunged entries are not stored in the dirty map. An expunged entry in the
// clean map must be unexpunged and added to the dirty map before a new value
// can be stored to it.
//
// If the dirty map is nil, the next write to the map will initialize it by
// making a shallow copy of the clean map, omitting stale entries.
dirty map[any]*entry
// misses counts the number of loads since the read map was last updated that
// needed to lock mu to determine whether the key was present.
//
// Once enough misses have occurred to cover the cost of copying the dirty
// map, the dirty map will be promoted to the read map (in the unamended
// state) and the next store to the map will make a new dirty copy.
misses int
}
// readOnly is an immutable struct stored atomically in the Map.read field.
type readOnly struct {
m map[any]*entry
amended bool // true if the dirty map contains some key not in m.
}
...
type entry struct {
// p points to the interface{} value stored for the entry.
//
// If p == nil, the entry has been deleted, and either m.dirty == nil or
// m.dirty[key] is e.
//
// If p == expunged, the entry has been deleted, m.dirty != nil, and the entry
// is missing from m.dirty.
//
// Otherwise, the entry is valid and recorded in m.read.m[key] and, if m.dirty
// != nil, in m.dirty[key].
//
// An entry can be deleted by atomic replacement with nil: when m.dirty is
// next created, it will atomically replace nil with expunged and leave
// m.dirty[key] unset.
//
// An entry's associated value can be updated by atomic replacement, provided
// p != expunged. If p == expunged, an entry's associated value can be updated
// only after first setting m.dirty[key] = e so that lookups using the dirty
// map find the entry.
p unsafe.Pointer // *interface{}
}
发现其包含四个字段(使用了两个map),简记为:
type Map struct{
mu //原子锁
read //key(interface{})和value(unsafe.Pointer)都可以是任意值的map
dirty //和read差不多也是万能指向的map
misses //未命中
}
3.2 运行逻辑解释
首先 read结构体用于高并发的读写,dirty结构体用于储存追加的字段。
读写默认走read结构体。如果amended=true则取dirty结构体中的信息,此时misses+=1。
追加先进入read结构体,没有找到则返回到mu锁上dirty,再追加,然后read结构体中amended改为true,表示read结构体内的信息不完整。
当misses的值等于dirty的长度时,将dirty的内容给read,下方dirty变为nil,ammended和miss初始为false和0,若后续仍要追加信息,则再次重建dirty结构体。
3.3 更难的删除
3.3.1 正常删除
在read结构体中可以找到,则直接找到相关的key的值,再将指向value的Pointer置为空。
3.3.2 追加后删除
指dirty内的信息还没有给予read的时候。(上下信息不一致的时间段)
首先上锁后再dirty中找到相关的key,然后将指向value的Pointer置为空,在提升(diety变为nil)的时候,将这个key对应的Pointer变为expunged(删除)。则在重建dirty的时候不再重建Pointer为expunged的键值对,同时再次删除或者查询的时候若Pointer的值为此标识,认为其不存在。
3.4 总结
sync.Map使用了两个map,实质上是分离了扩容问题
将不会引发扩容的操作(查改)归于read(map)
将可能引发扩容的操作(插入)归于dirty(map)
评论