LCOV - code coverage report
Current view: top level - src/include/lib - radixtree.h (source / functions) Coverage Total Hit
Test: PostgreSQL 19devel Lines: 90.1 % 765 689
Test Date: 2026-03-03 14:15:12 Functions: 96.5 % 144 139
Legend: Lines:     hit not hit

            Line data    Source code
       1              : /*-------------------------------------------------------------------------
       2              :  *
       3              :  * radixtree.h
       4              :  *      Template for adaptive radix tree.
       5              :  *
       6              :  * A template to generate an "adaptive radix tree", specialized for value
       7              :  * types and for local/shared memory.
       8              :  *
       9              :  * The concept originates from the paper "The Adaptive Radix Tree: ARTful
      10              :  * Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper,
      11              :  * and Thomas Neumann, 2013.
      12              :  *
      13              :  * Radix trees have some advantages over hash tables:
      14              :  * - The keys are logically ordered, allowing efficient sorted iteration
      15              :  *   and range queries
      16              :  * - Operations using keys that are lexicographically close together
      17              :  *   will have favorable memory locality
      18              :  * - Memory use grows gradually rather than by doubling
      19              :  * - The key does not need to be stored with the value, since the key
      20              :  *   is implicitly contained in the path to the value
      21              :  *
      22              :  * Some disadvantages are:
      23              :  * - Point queries (along with insertion and deletion) are slower than
      24              :  *   a linear probing hash table as in simplehash.h
      25              :  * - Memory usage varies by key distribution, so is difficult to predict
      26              :  *
      27              :  * A classic radix tree consists of nodes, each containing an array of
      28              :  * pointers to child nodes.  The size of the array is determined by the
      29              :  * "span" of the tree, which is the number of bits of the key used to
      30              :  * index into the array.  For example, with a span of 6, a "chunk"
      31              :  * of 6 bits is extracted from the key at each node traversal, and
      32              :  * the arrays thus have a "fanout" of 2^6 or 64 entries.  A large span
      33              :  * allows a shorter tree, but requires larger arrays that may be mostly
      34              :  * wasted space.
      35              :  *
      36              :  * The key idea of the adaptive radix tree is to choose different
      37              :  * data structures based on the number of child nodes. A node will
      38              :  * start out small when it is first populated, and when it is full,
      39              :  * it is replaced by the next larger size. Conversely, when a node
      40              :  * becomes mostly empty, it is replaced by the next smaller node. The
      41              :  * bulk of the code complexity in this module stems from this dynamic
      42              :  * switching. One mitigating factor is using a span of 8, since bytes
      43              :  * are directly addressable.
      44              :  *
      45              :  * The ART paper mentions three ways to implement leaves:
      46              :  *
      47              :  * "- Single-value leaves: The values are stored using an addi-
      48              :  *  tional leaf node type which stores one value.
      49              :  *  - Multi-value leaves: The values are stored in one of four
      50              :  *  different leaf node types, which mirror the structure of
      51              :  *  inner nodes, but contain values instead of pointers.
      52              :  *  - Combined pointer/value slots: If values fit into point-
      53              :  *  ers, no separate node types are necessary. Instead, each
      54              :  *  pointer storage location in an inner node can either
      55              :  *  store a pointer or a value."
      56              :  *
      57              :  * We use a form of "combined pointer/value slots", as recommended. Values
      58              :  * of size (if fixed at compile time) equal or smaller than the platform's
      59              :  * pointer type are stored in the child slots of the last level node,
      60              :  * while larger values are the same as "single-value" leaves above. This
      61              :  * offers flexibility and efficiency. Variable-length types are currently
      62              :  * treated as single-value leaves for simplicity, but future work may
      63              :  * allow those to be stored in the child pointer arrays, when they're
      64              :  * small enough.
      65              :  *
      66              :  * There are two other techniques described in the paper that are not
      67              :  * implemented here:
      68              :  * - path compression "...removes all inner nodes that have only a single child."
      69              :  * - lazy path expansion "...inner nodes are only created if they are required
      70              :  *   to distinguish at least two leaf nodes."
      71              :  *
      72              :  * We do have a form of "poor man's path compression", however, enabled by
      73              :  * only supporting unsigned integer keys (for now assumed to be 64-bit):
      74              :  * A tree doesn't contain paths where the highest bytes of all keys are
      75              :  * zero. That way, the tree's height adapts to the distribution of keys.
      76              :  *
      77              :  * To handle concurrency, we use a single reader-writer lock for the
      78              :  * radix tree. If concurrent write operations are possible, the tree
      79              :  * must be exclusively locked during write operations such as RT_SET()
      80              :  * and RT_DELETE(), and share locked during read operations such as
      81              :  * RT_FIND() and RT_BEGIN_ITERATE().
      82              :  *
      83              :  * TODO: The current locking mechanism is not optimized for high
      84              :  * concurrency with mixed read-write workloads. In the future it might
      85              :  * be worthwhile to replace it with the Optimistic Lock Coupling or
      86              :  * ROWEX mentioned in the paper "The ART of Practical Synchronization"
      87              :  * by the same authors as the ART paper, 2016.
      88              :  *
      89              :  * To generate a radix tree and associated functions for a use case
      90              :  * several macros have to be #define'ed before this file is included.
      91              :  * Including the file #undef's all those, so a new radix tree can be
      92              :  * generated afterwards.
      93              :  *
      94              :  * The relevant parameters are:
      95              :  * - RT_PREFIX - prefix for all symbol names generated. A prefix of "foo"
      96              :  *   will result in radix tree type "foo_radix_tree" and functions like
      97              :  *   "foo_create"/"foo_free" and so forth.
      98              :  * - RT_DECLARE - if defined function prototypes and type declarations are
      99              :  *   generated
     100              :  * - RT_DEFINE - if defined function definitions are generated
     101              :  * - RT_SCOPE - in which scope (e.g. extern, static inline) do function
     102              :  *   declarations reside
     103              :  * - RT_VALUE_TYPE - the type of the value.
     104              :  * - RT_VARLEN_VALUE_SIZE() - for variable length values, an expression
     105              :  *   involving a pointer to the value type, to calculate size.
     106              :  *     NOTE: implies that the value is in fact variable-length,
     107              :  *     so do not set for fixed-length values.
     108              :  * - RT_RUNTIME_EMBEDDABLE_VALUE - for variable length values, allows
     109              :  *   storing the value in a child pointer slot, rather than as a single-
     110              :  *   value leaf, if small enough. This requires that the value, when
     111              :  *   read as a child pointer, can be tagged in the lowest bit.
     112              :  *
     113              :  * Optional parameters:
     114              :  * - RT_SHMEM - if defined, the radix tree is created in the DSA area
     115              :  *   so that multiple processes can access it simultaneously.
     116              :  * - RT_DEBUG - if defined add stats tracking and debugging functions
     117              :  *
     118              :  * Interface
     119              :  * ---------
     120              :  *
     121              :  * RT_CREATE        - Create a new, empty radix tree
     122              :  * RT_FREE          - Free the radix tree
     123              :  * RT_FIND          - Lookup the value for a given key
     124              :  * RT_SET           - Set a key-value pair
     125              :  * RT_BEGIN_ITERATE - Begin iterating through all key-value pairs
     126              :  * RT_ITERATE_NEXT  - Return next key-value pair, if any
     127              :  * RT_END_ITERATE   - End iteration
     128              :  * RT_MEMORY_USAGE  - Get the memory as measured by space in memory context blocks
     129              :  *
     130              :  * Interface for Shared Memory
     131              :  * ---------
     132              :  *
     133              :  * RT_ATTACH        - Attach to the radix tree
     134              :  * RT_DETACH        - Detach from the radix tree
     135              :  * RT_LOCK_EXCLUSIVE - Lock the radix tree in exclusive mode
     136              :  * RT_LOCK_SHARE    - Lock the radix tree in share mode
     137              :  * RT_UNLOCK        - Unlock the radix tree
     138              :  * RT_GET_HANDLE    - Return the handle of the radix tree
     139              :  *
     140              :  * Optional Interface
     141              :  * ---------
     142              :  *
     143              :  * RT_DELETE        - Delete a key-value pair. Declared/defined if RT_USE_DELETE is defined
     144              :  *
     145              :  *
     146              :  * Copyright (c) 2024-2026, PostgreSQL Global Development Group
     147              :  *
     148              :  * IDENTIFICATION
     149              :  *    src/include/lib/radixtree.h
     150              :  *
     151              :  *-------------------------------------------------------------------------
     152              :  */
     153              : 
     154              : #include "nodes/bitmapset.h"
     155              : #include "port/pg_bitutils.h"
     156              : #include "port/simd.h"
     157              : #include "utils/dsa.h"
     158              : #include "utils/memutils.h"
     159              : #ifdef RT_SHMEM
     160              : #include "miscadmin.h"
     161              : #include "storage/lwlock.h"
     162              : #endif
     163              : 
     164              : /* helpers */
     165              : #define RT_MAKE_PREFIX(a) CppConcat(a,_)
     166              : #define RT_MAKE_NAME(name) RT_MAKE_NAME_(RT_MAKE_PREFIX(RT_PREFIX),name)
     167              : #define RT_MAKE_NAME_(a,b) CppConcat(a,b)
     168              : /*
     169              :  * stringify a macro constant, from https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html
     170              :  */
     171              : #define RT_STR(s) RT_STR_(s)
     172              : #define RT_STR_(s) #s
     173              : 
     174              : /* function declarations */
     175              : #define RT_CREATE RT_MAKE_NAME(create)
     176              : #define RT_FREE RT_MAKE_NAME(free)
     177              : #define RT_FIND RT_MAKE_NAME(find)
     178              : #ifdef RT_SHMEM
     179              : #define RT_ATTACH RT_MAKE_NAME(attach)
     180              : #define RT_DETACH RT_MAKE_NAME(detach)
     181              : #define RT_GET_HANDLE RT_MAKE_NAME(get_handle)
     182              : #define RT_LOCK_EXCLUSIVE RT_MAKE_NAME(lock_exclusive)
     183              : #define RT_LOCK_SHARE RT_MAKE_NAME(lock_share)
     184              : #define RT_UNLOCK RT_MAKE_NAME(unlock)
     185              : #endif
     186              : #define RT_SET RT_MAKE_NAME(set)
     187              : #define RT_BEGIN_ITERATE RT_MAKE_NAME(begin_iterate)
     188              : #define RT_ITERATE_NEXT RT_MAKE_NAME(iterate_next)
     189              : #define RT_END_ITERATE RT_MAKE_NAME(end_iterate)
     190              : #ifdef RT_USE_DELETE
     191              : #define RT_DELETE RT_MAKE_NAME(delete)
     192              : #endif
     193              : #define RT_MEMORY_USAGE RT_MAKE_NAME(memory_usage)
     194              : #define RT_DUMP_NODE RT_MAKE_NAME(dump_node)
     195              : #define RT_STATS RT_MAKE_NAME(stats)
     196              : 
     197              : /* internal helper functions (no externally visible prototypes) */
     198              : #define RT_CHILDPTR_IS_VALUE RT_MAKE_NAME(childptr_is_value)
     199              : #define RT_VALUE_IS_EMBEDDABLE RT_MAKE_NAME(value_is_embeddable)
     200              : #define RT_GET_SLOT_RECURSIVE RT_MAKE_NAME(get_slot_recursive)
     201              : #define RT_DELETE_RECURSIVE RT_MAKE_NAME(delete_recursive)
     202              : #define RT_ALLOC_NODE RT_MAKE_NAME(alloc_node)
     203              : #define RT_ALLOC_LEAF RT_MAKE_NAME(alloc_leaf)
     204              : #define RT_FREE_NODE RT_MAKE_NAME(free_node)
     205              : #define RT_FREE_LEAF RT_MAKE_NAME(free_leaf)
     206              : #define RT_FREE_RECURSE RT_MAKE_NAME(free_recurse)
     207              : #define RT_EXTEND_UP RT_MAKE_NAME(extend_up)
     208              : #define RT_EXTEND_DOWN RT_MAKE_NAME(extend_down)
     209              : #define RT_COPY_COMMON RT_MAKE_NAME(copy_common)
     210              : #define RT_PTR_SET_LOCAL RT_MAKE_NAME(ptr_set_local)
     211              : #define RT_NODE_16_SEARCH_EQ RT_MAKE_NAME(node_16_search_eq)
     212              : #define RT_NODE_4_GET_INSERTPOS RT_MAKE_NAME(node_4_get_insertpos)
     213              : #define RT_NODE_16_GET_INSERTPOS RT_MAKE_NAME(node_16_get_insertpos)
     214              : #define RT_SHIFT_ARRAYS_FOR_INSERT RT_MAKE_NAME(shift_arrays_for_insert)
     215              : #define RT_SHIFT_ARRAYS_AND_DELETE RT_MAKE_NAME(shift_arrays_and_delete)
     216              : #define RT_COPY_ARRAYS_FOR_INSERT RT_MAKE_NAME(copy_arrays_for_insert)
     217              : #define RT_COPY_ARRAYS_AND_DELETE RT_MAKE_NAME(copy_arrays_and_delete)
     218              : #define RT_NODE_48_IS_CHUNK_USED RT_MAKE_NAME(node_48_is_chunk_used)
     219              : #define RT_NODE_48_GET_CHILD RT_MAKE_NAME(node_48_get_child)
     220              : #define RT_NODE_256_IS_CHUNK_USED RT_MAKE_NAME(node_256_is_chunk_used)
     221              : #define RT_NODE_256_GET_CHILD RT_MAKE_NAME(node_256_get_child)
     222              : #define RT_KEY_GET_SHIFT RT_MAKE_NAME(key_get_shift)
     223              : #define RT_SHIFT_GET_MAX_VAL RT_MAKE_NAME(shift_get_max_val)
     224              : #define RT_NODE_SEARCH RT_MAKE_NAME(node_search)
     225              : #define RT_NODE_DELETE RT_MAKE_NAME(node_delete)
     226              : #define RT_NODE_INSERT RT_MAKE_NAME(node_insert)
     227              : #define RT_ADD_CHILD_4 RT_MAKE_NAME(add_child_4)
     228              : #define RT_ADD_CHILD_16 RT_MAKE_NAME(add_child_16)
     229              : #define RT_ADD_CHILD_48 RT_MAKE_NAME(add_child_48)
     230              : #define RT_ADD_CHILD_256 RT_MAKE_NAME(add_child_256)
     231              : #define RT_GROW_NODE_4 RT_MAKE_NAME(grow_node_4)
     232              : #define RT_GROW_NODE_16 RT_MAKE_NAME(grow_node_16)
     233              : #define RT_GROW_NODE_48 RT_MAKE_NAME(grow_node_48)
     234              : #define RT_REMOVE_CHILD_4 RT_MAKE_NAME(remove_child_4)
     235              : #define RT_REMOVE_CHILD_16 RT_MAKE_NAME(remove_child_16)
     236              : #define RT_REMOVE_CHILD_48 RT_MAKE_NAME(remove_child_48)
     237              : #define RT_REMOVE_CHILD_256 RT_MAKE_NAME(remove_child_256)
     238              : #define RT_SHRINK_NODE_16 RT_MAKE_NAME(shrink_child_16)
     239              : #define RT_SHRINK_NODE_48 RT_MAKE_NAME(shrink_child_48)
     240              : #define RT_SHRINK_NODE_256 RT_MAKE_NAME(shrink_child_256)
     241              : #define RT_NODE_ITERATE_NEXT RT_MAKE_NAME(node_iterate_next)
     242              : #define RT_VERIFY_NODE RT_MAKE_NAME(verify_node)
     243              : 
     244              : /* type declarations */
     245              : #define RT_RADIX_TREE RT_MAKE_NAME(radix_tree)
     246              : #define RT_RADIX_TREE_CONTROL RT_MAKE_NAME(radix_tree_control)
     247              : #define RT_ITER RT_MAKE_NAME(iter)
     248              : #ifdef RT_SHMEM
     249              : #define RT_HANDLE RT_MAKE_NAME(handle)
     250              : #endif
     251              : #define RT_NODE RT_MAKE_NAME(node)
     252              : #define RT_CHILD_PTR RT_MAKE_NAME(child_ptr)
     253              : #define RT_NODE_ITER RT_MAKE_NAME(node_iter)
     254              : #define RT_NODE_4 RT_MAKE_NAME(node_4)
     255              : #define RT_NODE_16 RT_MAKE_NAME(node_16)
     256              : #define RT_NODE_48 RT_MAKE_NAME(node_48)
     257              : #define RT_NODE_256 RT_MAKE_NAME(node_256)
     258              : #define RT_SIZE_CLASS RT_MAKE_NAME(size_class)
     259              : #define RT_SIZE_CLASS_ELEM RT_MAKE_NAME(size_class_elem)
     260              : #define RT_SIZE_CLASS_INFO RT_MAKE_NAME(size_class_info)
     261              : #define RT_CLASS_4 RT_MAKE_NAME(class_4)
     262              : #define RT_CLASS_16_LO RT_MAKE_NAME(class_32_min)
     263              : #define RT_CLASS_16_HI RT_MAKE_NAME(class_32_max)
     264              : #define RT_CLASS_48 RT_MAKE_NAME(class_48)
     265              : #define RT_CLASS_256 RT_MAKE_NAME(class_256)
     266              : 
     267              : /* generate forward declarations necessary to use the radix tree */
     268              : #ifdef RT_DECLARE
     269              : 
     270              : typedef struct RT_RADIX_TREE RT_RADIX_TREE;
     271              : typedef struct RT_ITER RT_ITER;
     272              : 
     273              : #ifdef RT_SHMEM
     274              : typedef dsa_pointer RT_HANDLE;
     275              : #endif
     276              : 
     277              : #ifdef RT_SHMEM
     278              : RT_SCOPE    RT_RADIX_TREE *RT_CREATE(dsa_area *dsa, int tranche_id);
     279              : RT_SCOPE    RT_RADIX_TREE *RT_ATTACH(dsa_area *dsa, dsa_pointer dp);
     280              : RT_SCOPE void RT_DETACH(RT_RADIX_TREE * tree);
     281              : RT_SCOPE    RT_HANDLE RT_GET_HANDLE(RT_RADIX_TREE * tree);
     282              : RT_SCOPE void RT_LOCK_EXCLUSIVE(RT_RADIX_TREE * tree);
     283              : RT_SCOPE void RT_LOCK_SHARE(RT_RADIX_TREE * tree);
     284              : RT_SCOPE void RT_UNLOCK(RT_RADIX_TREE * tree);
     285              : #else
     286              : RT_SCOPE    RT_RADIX_TREE *RT_CREATE(MemoryContext ctx);
     287              : #endif
     288              : RT_SCOPE void RT_FREE(RT_RADIX_TREE * tree);
     289              : 
     290              : RT_SCOPE    RT_VALUE_TYPE *RT_FIND(RT_RADIX_TREE * tree, uint64 key);
     291              : RT_SCOPE bool RT_SET(RT_RADIX_TREE * tree, uint64 key, RT_VALUE_TYPE * value_p);
     292              : 
     293              : #ifdef RT_USE_DELETE
     294              : RT_SCOPE bool RT_DELETE(RT_RADIX_TREE * tree, uint64 key);
     295              : #endif
     296              : 
     297              : RT_SCOPE    RT_ITER *RT_BEGIN_ITERATE(RT_RADIX_TREE * tree);
     298              : RT_SCOPE    RT_VALUE_TYPE *RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p);
     299              : RT_SCOPE void RT_END_ITERATE(RT_ITER * iter);
     300              : 
     301              : RT_SCOPE uint64 RT_MEMORY_USAGE(RT_RADIX_TREE * tree);
     302              : 
     303              : #ifdef RT_DEBUG
     304              : RT_SCOPE void RT_STATS(RT_RADIX_TREE * tree);
     305              : #endif
     306              : 
     307              : #endif                          /* RT_DECLARE */
     308              : 
     309              : 
     310              : /* generate implementation of the radix tree */
     311              : #ifdef RT_DEFINE
     312              : 
     313              : /* The number of bits encoded in one tree level */
     314              : #define RT_SPAN BITS_PER_BYTE
     315              : 
     316              : /*
     317              :  * The number of possible partial keys, and thus the maximum number of
     318              :  * child pointers, for a node.
     319              :  */
     320              : #define RT_NODE_MAX_SLOTS (1 << RT_SPAN)
     321              : 
     322              : /* Mask for extracting a chunk from a key */
     323              : #define RT_CHUNK_MASK ((1 << RT_SPAN) - 1)
     324              : 
     325              : /* Maximum shift needed to extract a chunk from a key */
     326              : #define RT_MAX_SHIFT    RT_KEY_GET_SHIFT(UINT64_MAX)
     327              : 
     328              : /* Maximum level a tree can reach for a key */
     329              : #define RT_MAX_LEVEL    ((sizeof(uint64) * BITS_PER_BYTE) / RT_SPAN)
     330              : 
     331              : /* Get a chunk from the key */
     332              : #define RT_GET_KEY_CHUNK(key, shift) ((uint8) (((key) >> (shift)) & RT_CHUNK_MASK))
     333              : 
     334              : /* For accessing bitmaps */
     335              : #define RT_BM_IDX(x)    ((x) / BITS_PER_BITMAPWORD)
     336              : #define RT_BM_BIT(x)    ((x) % BITS_PER_BITMAPWORD)
     337              : 
     338              : /*
     339              :  * Node kinds
     340              :  *
     341              :  * The different node kinds are what make the tree "adaptive".
     342              :  *
     343              :  * Each node kind is associated with a different datatype and different
     344              :  * search/set/delete/iterate algorithms adapted for its size. The largest
     345              :  * kind, node256 is basically the same as a traditional radix tree,
     346              :  * and would be most wasteful of memory when sparsely populated. The
     347              :  * smaller nodes expend some additional CPU time to enable a smaller
     348              :  * memory footprint.
     349              :  *
     350              :  * NOTE: There are 4 node kinds, and this should never be increased,
     351              :  * for several reasons:
     352              :  * 1. With 5 or more kinds, gcc tends to use a jump table for switch
     353              :  *    statements.
     354              :  * 2. The 4 kinds can be represented with 2 bits, so we have the option
     355              :  *    in the future to tag the node pointer with the kind, even on
     356              :  *    platforms with 32-bit pointers. That would touch fewer cache lines
     357              :  *    during traversal and allow faster recovery from branch mispredicts.
     358              :  * 3. We can have multiple size classes per node kind.
     359              :  */
     360              : #define RT_NODE_KIND_4          0x00
     361              : #define RT_NODE_KIND_16         0x01
     362              : #define RT_NODE_KIND_48         0x02
     363              : #define RT_NODE_KIND_256        0x03
     364              : #define RT_NODE_KIND_COUNT      4
     365              : 
     366              : /*
     367              :  * Calculate the slab block size so that we can allocate at least 32 chunks
     368              :  * from the block.
     369              :  */
     370              : #define RT_SLAB_BLOCK_SIZE(size)    \
     371              :     Max(SLAB_DEFAULT_BLOCK_SIZE, pg_nextpower2_32(size * 32))
     372              : 
     373              : /* Common header for all nodes */
     374              : typedef struct RT_NODE
     375              : {
     376              :     /* Node kind, one per search/set algorithm */
     377              :     uint8       kind;
     378              : 
     379              :     /*
     380              :      * Max capacity for the current size class. Storing this in the node
     381              :      * enables multiple size classes per node kind. uint8 is sufficient for
     382              :      * all node kinds, because we only use this number to test if the node
     383              :      * needs to grow. Since node256 never needs to grow, we let this overflow
     384              :      * to zero.
     385              :      */
     386              :     uint8       fanout;
     387              : 
     388              :     /*
     389              :      * Number of children. uint8 is sufficient for all node kinds, because
     390              :      * nodes shrink when this number gets lower than some threshold. Since
     391              :      * node256 cannot possibly have zero children, we let the counter overflow
     392              :      * and we interpret zero as "256" for this node kind.
     393              :      */
     394              :     uint8       count;
     395              : }           RT_NODE;
     396              : 
     397              : 
     398              : /* pointer returned by allocation */
     399              : #ifdef RT_SHMEM
     400              : #define RT_PTR_ALLOC dsa_pointer
     401              : #define RT_INVALID_PTR_ALLOC InvalidDsaPointer
     402              : #define RT_PTR_ALLOC_IS_VALID(ptr) DsaPointerIsValid(ptr)
     403              : #else
     404              : #define RT_PTR_ALLOC RT_NODE *
     405              : #define RT_INVALID_PTR_ALLOC NULL
     406              : #define RT_PTR_ALLOC_IS_VALID(ptr) ((ptr) != NULL)
     407              : #endif
     408              : 
     409              : /*
     410              :  * A convenience type used when we need to work with a DSA pointer as well
     411              :  * as its local pointer. For local memory, both members are the same, so
     412              :  * we use a union.
     413              :  */
     414              : #ifdef RT_SHMEM
     415              : typedef struct RT_CHILD_PTR
     416              : #else
     417              : typedef union RT_CHILD_PTR
     418              : #endif
     419              : {
     420              :     RT_PTR_ALLOC alloc;
     421              :     RT_NODE    *local;
     422              : }           RT_CHILD_PTR;
     423              : 
     424              : 
     425              : /*
     426              :  * Helper macros and functions for value storage.
     427              :  * We either embed values in the child slots of the last level
     428              :  * node or store pointers to values to the child slots,
     429              :  * depending on the value size.
     430              :  */
     431              : 
     432              : #ifdef RT_VARLEN_VALUE_SIZE
     433              : #define RT_GET_VALUE_SIZE(v) RT_VARLEN_VALUE_SIZE(v)
     434              : #else
     435              : #define RT_GET_VALUE_SIZE(v) sizeof(RT_VALUE_TYPE)
     436              : #endif
     437              : 
     438              : /*
     439              :  * Return true if the value can be stored in the child array
     440              :  * of the lowest-level node, false otherwise.
     441              :  */
     442              : static inline bool
     443       123001 : RT_VALUE_IS_EMBEDDABLE(RT_VALUE_TYPE * value_p)
     444              : {
     445              : #ifdef RT_VARLEN_VALUE_SIZE
     446              : 
     447              : #ifdef RT_RUNTIME_EMBEDDABLE_VALUE
     448        16467 :     return RT_GET_VALUE_SIZE(value_p) <= sizeof(RT_PTR_ALLOC);
     449              : #else
     450              :     return false;
     451              : #endif
     452              : 
     453              : #else
     454       106534 :     return RT_GET_VALUE_SIZE(value_p) <= sizeof(RT_PTR_ALLOC);
     455              : #endif
     456              : }
     457              : 
     458              : /*
     459              :  * Return true if the child pointer contains the value, false
     460              :  * if the child pointer is a leaf pointer.
     461              :  */
     462              : static inline bool
     463      4680375 : RT_CHILDPTR_IS_VALUE(RT_PTR_ALLOC child)
     464              : {
     465              : #ifdef RT_VARLEN_VALUE_SIZE
     466              : 
     467              : #ifdef RT_RUNTIME_EMBEDDABLE_VALUE
     468              :     /* check for pointer tag */
     469              : #ifdef RT_SHMEM
     470       407858 :     return child & 1;
     471              : #else
     472      3968429 :     return ((uintptr_t) child) & 1;
     473              : #endif
     474              : 
     475              : #else
     476              :     return false;
     477              : #endif
     478              : 
     479              : #else
     480       304088 :     return sizeof(RT_VALUE_TYPE) <= sizeof(RT_PTR_ALLOC);
     481              : #endif
     482              : }
     483              : 
     484              : /*
     485              :  * Symbols for maximum possible fanout are declared first as they are
     486              :  * required to declare each node kind. The declarations of other fanout
     487              :  * values are followed as they need the struct sizes of each node kind.
     488              :  */
     489              : 
     490              : /* max possible key chunks without struct padding */
     491              : #define RT_FANOUT_4_MAX (8 - sizeof(RT_NODE))
     492              : 
     493              : /* equal to two 128-bit SIMD registers, regardless of availability */
     494              : #define RT_FANOUT_16_MAX    32
     495              : 
     496              : /*
     497              :  * This also determines the number of bits necessary for the isset array,
     498              :  * so we need to be mindful of the size of bitmapword.  Since bitmapword
     499              :  * can be 64 bits, the only values that make sense here are 64 and 128.
     500              :  * The ART paper uses at most 64 for this node kind, and one advantage
     501              :  * for us is that "isset" is a single bitmapword on most platforms,
     502              :  * rather than an array, allowing the compiler to get rid of loops.
     503              :  */
     504              : #define RT_FANOUT_48_MAX 64
     505              : 
     506              : #define RT_FANOUT_256   RT_NODE_MAX_SLOTS
     507              : 
     508              : /*
     509              :  * Node structs, one for each "kind"
     510              :  */
     511              : 
     512              : /*
     513              :  * node4 and node16 use one array for key chunks and another
     514              :  * array of the same length for children. The keys and children
     515              :  * are stored at corresponding positions, sorted by chunk.
     516              :  */
     517              : 
     518              : typedef struct RT_NODE_4
     519              : {
     520              :     RT_NODE     base;
     521              : 
     522              :     uint8       chunks[RT_FANOUT_4_MAX];
     523              : 
     524              :     /* number of children depends on size class */
     525              :     RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER];
     526              : }           RT_NODE_4;
     527              : 
     528              : typedef struct RT_NODE_16
     529              : {
     530              :     RT_NODE     base;
     531              : 
     532              :     uint8       chunks[RT_FANOUT_16_MAX];
     533              : 
     534              :     /* number of children depends on size class */
     535              :     RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER];
     536              : }           RT_NODE_16;
     537              : 
     538              : /*
     539              :  * node48 uses a 256-element array indexed by key chunks. This array
     540              :  * stores indexes into a second array containing the children.
     541              :  */
     542              : typedef struct RT_NODE_48
     543              : {
     544              :     RT_NODE     base;
     545              : 
     546              :     /* bitmap to track which slots are in use */
     547              :     bitmapword  isset[RT_BM_IDX(RT_FANOUT_48_MAX)];
     548              : 
     549              :     /*
     550              :      * Lookup table for indexes into the children[] array. We make this the
     551              :      * last fixed-size member so that it's convenient to memset separately
     552              :      * from the previous members.
     553              :      */
     554              :     uint8       slot_idxs[RT_NODE_MAX_SLOTS];
     555              : 
     556              : /* Invalid index */
     557              : #define RT_INVALID_SLOT_IDX 0xFF
     558              : 
     559              :     /* number of children depends on size class */
     560              :     RT_PTR_ALLOC children[FLEXIBLE_ARRAY_MEMBER];
     561              : }           RT_NODE_48;
     562              : 
     563              : /*
     564              :  * node256 is the largest node type. This node has an array of
     565              :  * children directly indexed by chunk.  Unlike other node kinds,
     566              :  * its array size is by definition fixed.
     567              :  */
     568              : typedef struct RT_NODE_256
     569              : {
     570              :     RT_NODE     base;
     571              : 
     572              :     /* bitmap to track which slots are in use */
     573              :     bitmapword  isset[RT_BM_IDX(RT_FANOUT_256)];
     574              : 
     575              :     /* slots for 256 children */
     576              :     RT_PTR_ALLOC children[RT_FANOUT_256];
     577              : }           RT_NODE_256;
     578              : 
     579              : #if defined(RT_SHMEM)
     580              : /*
     581              :  * Make sure the all nodes (except for node256) fit neatly into a DSA
     582              :  * size class.  We assume the RT_FANOUT_4 is in the range where DSA size
     583              :  * classes increment by 8 (as of PG17 up to 64 bytes), so we just hard
     584              :  * code that one.
     585              :  */
     586              : 
     587              : #if SIZEOF_DSA_POINTER < 8
     588              : #define RT_FANOUT_16_LO ((96 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
     589              : #define RT_FANOUT_16_HI Min(RT_FANOUT_16_MAX, (160 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
     590              : #define RT_FANOUT_48    Min(RT_FANOUT_48_MAX, (512 - offsetof(RT_NODE_48, children)) / sizeof(RT_PTR_ALLOC))
     591              : #else
     592              : #define RT_FANOUT_16_LO ((160 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
     593              : #define RT_FANOUT_16_HI Min(RT_FANOUT_16_MAX, (320 - offsetof(RT_NODE_16, children)) / sizeof(RT_PTR_ALLOC))
     594              : #define RT_FANOUT_48    Min(RT_FANOUT_48_MAX, (768 - offsetof(RT_NODE_48, children)) / sizeof(RT_PTR_ALLOC))
     595              : #endif                          /* SIZEOF_DSA_POINTER < 8 */
     596              : 
     597              : #else                           /* ! RT_SHMEM */
     598              : 
     599              : /* doesn't really matter, but may as well use the namesake */
     600              : #define RT_FANOUT_16_LO 16
     601              : /* use maximum possible */
     602              : #define RT_FANOUT_16_HI RT_FANOUT_16_MAX
     603              : #define RT_FANOUT_48    RT_FANOUT_48_MAX
     604              : 
     605              : #endif                          /* RT_SHMEM */
     606              : 
     607              : /*
     608              :  * To save memory in trees with sparse keys, it would make sense to have two
     609              :  * size classes for the smallest kind (perhaps a high class of 5 and a low class
     610              :  * of 2), but it would be more effective to utilize lazy expansion and
     611              :  * path compression.
     612              :  */
     613              : #define RT_FANOUT_4     4
     614              : 
     615              : StaticAssertDecl(RT_FANOUT_4 <= RT_FANOUT_4_MAX, "watch struct padding");
     616              : StaticAssertDecl(RT_FANOUT_16_LO < RT_FANOUT_16_HI, "LO subclass bigger than HI");
     617              : StaticAssertDecl(RT_FANOUT_48 <= RT_FANOUT_48_MAX, "more slots than isset bits");
     618              : 
     619              : /*
     620              :  * Node size classes
     621              :  *
     622              :  * Nodes of different kinds necessarily belong to different size classes.
     623              :  * One innovation in our implementation compared to the ART paper is
     624              :  * decoupling the notion of size class from kind.
     625              :  *
     626              :  * The size classes within a given node kind have the same underlying
     627              :  * type, but a variable number of children/values. This is possible
     628              :  * because each type (except node256) contains metadata that work the
     629              :  * same way regardless of how many child slots there are. The nodes
     630              :  * can introspect their allocated capacity at runtime.
     631              :  */
     632              : typedef enum RT_SIZE_CLASS
     633              : {
     634              :     RT_CLASS_4 = 0,
     635              :     RT_CLASS_16_LO,
     636              :     RT_CLASS_16_HI,
     637              :     RT_CLASS_48,
     638              :     RT_CLASS_256
     639              : }           RT_SIZE_CLASS;
     640              : 
     641              : /* Information for each size class */
     642              : typedef struct RT_SIZE_CLASS_ELEM
     643              : {
     644              :     const char *name;
     645              :     int         fanout;
     646              :     size_t      allocsize;
     647              : }           RT_SIZE_CLASS_ELEM;
     648              : 
     649              : 
     650              : static const RT_SIZE_CLASS_ELEM RT_SIZE_CLASS_INFO[] = {
     651              :     [RT_CLASS_4] = {
     652              :         .name = RT_STR(RT_PREFIX) "_radix_tree node4",
     653              :         .fanout = RT_FANOUT_4,
     654              :         .allocsize = sizeof(RT_NODE_4) + RT_FANOUT_4 * sizeof(RT_PTR_ALLOC),
     655              :     },
     656              :     [RT_CLASS_16_LO] = {
     657              :         .name = RT_STR(RT_PREFIX) "_radix_tree node16_lo",
     658              :         .fanout = RT_FANOUT_16_LO,
     659              :         .allocsize = sizeof(RT_NODE_16) + RT_FANOUT_16_LO * sizeof(RT_PTR_ALLOC),
     660              :     },
     661              :     [RT_CLASS_16_HI] = {
     662              :         .name = RT_STR(RT_PREFIX) "_radix_tree node16_hi",
     663              :         .fanout = RT_FANOUT_16_HI,
     664              :         .allocsize = sizeof(RT_NODE_16) + RT_FANOUT_16_HI * sizeof(RT_PTR_ALLOC),
     665              :     },
     666              :     [RT_CLASS_48] = {
     667              :         .name = RT_STR(RT_PREFIX) "_radix_tree node48",
     668              :         .fanout = RT_FANOUT_48,
     669              :         .allocsize = sizeof(RT_NODE_48) + RT_FANOUT_48 * sizeof(RT_PTR_ALLOC),
     670              :     },
     671              :     [RT_CLASS_256] = {
     672              :         .name = RT_STR(RT_PREFIX) "_radix_tree node256",
     673              :         .fanout = RT_FANOUT_256,
     674              :         .allocsize = sizeof(RT_NODE_256),
     675              :     },
     676              : };
     677              : 
     678              : #define RT_NUM_SIZE_CLASSES lengthof(RT_SIZE_CLASS_INFO)
     679              : 
     680              : #ifdef RT_SHMEM
     681              : /* A magic value used to identify our radix tree */
     682              : #define RT_RADIX_TREE_MAGIC 0x54A48167
     683              : #endif
     684              : 
     685              : /* Contains the actual tree, plus ancillary info */
     686              : typedef struct RT_RADIX_TREE_CONTROL
     687              : {
     688              : #ifdef RT_SHMEM
     689              :     RT_HANDLE   handle;
     690              :     uint32      magic;
     691              :     LWLock      lock;
     692              : #endif
     693              : 
     694              :     RT_PTR_ALLOC root;
     695              :     uint64      max_val;
     696              :     int64       num_keys;
     697              :     int         start_shift;
     698              : 
     699              :     /* statistics */
     700              : #ifdef RT_DEBUG
     701              :     int64       num_nodes[RT_NUM_SIZE_CLASSES];
     702              :     int64       num_leaves;
     703              : #endif
     704              : }           RT_RADIX_TREE_CONTROL;
     705              : 
     706              : /* Entry point for allocating and accessing the tree */
     707              : struct RT_RADIX_TREE
     708              : {
     709              :     /* pointing to either local memory or DSA */
     710              :     RT_RADIX_TREE_CONTROL *ctl;
     711              : 
     712              : #ifdef RT_SHMEM
     713              :     dsa_area   *dsa;
     714              : #else
     715              :     MemoryContextData *node_slabs[RT_NUM_SIZE_CLASSES];
     716              : 
     717              :     /* leaf_context is used only for single-value leaves */
     718              :     MemoryContextData *leaf_context;
     719              : #endif
     720              : };
     721              : 
     722              : /*
     723              :  * Iteration support.
     724              :  *
     725              :  * Iterating over the radix tree produces each key/value pair in ascending
     726              :  * order of the key.
     727              :  */
     728              : 
     729              : /* state for iterating over a single node */
     730              : typedef struct RT_NODE_ITER
     731              : {
     732              :     RT_CHILD_PTR node;
     733              : 
     734              :     /*
     735              :      * The next index of the chunk array in RT_NODE_KIND_4 and RT_NODE_KIND_16
     736              :      * nodes, or the next chunk in RT_NODE_KIND_48 and RT_NODE_KIND_256 nodes.
     737              :      * 0 for the initial value.
     738              :      */
     739              :     int         idx;
     740              : }           RT_NODE_ITER;
     741              : 
     742              : /* state for iterating over the whole radix tree */
     743              : struct RT_ITER
     744              : {
     745              :     RT_RADIX_TREE *tree;
     746              : 
     747              :     /*
     748              :      * A stack to track iteration for each level. Level 0 is the lowest (or
     749              :      * leaf) level
     750              :      */
     751              :     RT_NODE_ITER node_iters[RT_MAX_LEVEL];
     752              :     int         top_level;
     753              :     int         cur_level;
     754              : 
     755              :     /* The key constructed during iteration */
     756              :     uint64      key;
     757              : };
     758              : 
     759              : 
     760              : /* verification (available only in assert-enabled builds) */
     761              : static void RT_VERIFY_NODE(RT_NODE * node);
     762              : 
     763              : static inline void
     764     18748129 : RT_PTR_SET_LOCAL(RT_RADIX_TREE * tree, RT_CHILD_PTR * node)
     765              : {
     766              : #ifdef RT_SHMEM
     767      1112166 :     node->local = dsa_get_address(tree->dsa, node->alloc);
     768              : #endif
     769     18748129 : }
     770              : 
     771              : /* Convenience functions for node48 and node256 */
     772              : 
     773              : /* Return true if there is an entry for "chunk" */
     774              : static inline bool
     775       518246 : RT_NODE_48_IS_CHUNK_USED(RT_NODE_48 * node, uint8 chunk)
     776              : {
     777       518246 :     return node->slot_idxs[chunk] != RT_INVALID_SLOT_IDX;
     778              : }
     779              : 
     780              : static inline RT_PTR_ALLOC *
     781       415864 : RT_NODE_48_GET_CHILD(RT_NODE_48 * node, uint8 chunk)
     782              : {
     783       415864 :     return &node->children[node->slot_idxs[chunk]];
     784              : }
     785              : 
     786              : /* Return true if there is an entry for "chunk" */
     787              : static inline bool
     788      4245355 : RT_NODE_256_IS_CHUNK_USED(RT_NODE_256 * node, uint8 chunk)
     789              : {
     790      4245355 :     int         idx = RT_BM_IDX(chunk);
     791      4245355 :     int         bitnum = RT_BM_BIT(chunk);
     792              : 
     793      4245355 :     return (node->isset[idx] & ((bitmapword) 1 << bitnum)) != 0;
     794              : }
     795              : 
     796              : static inline RT_PTR_ALLOC *
     797      4124104 : RT_NODE_256_GET_CHILD(RT_NODE_256 * node, uint8 chunk)
     798              : {
     799              :     Assert(RT_NODE_256_IS_CHUNK_USED(node, chunk));
     800      4124104 :     return &node->children[chunk];
     801              : }
     802              : 
     803              : /*
     804              :  * Return the smallest shift that will allowing storing the given key.
     805              :  */
     806              : static inline int
     807       107228 : RT_KEY_GET_SHIFT(uint64 key)
     808              : {
     809       107228 :     if (key == 0)
     810            0 :         return 0;
     811              :     else
     812       107228 :         return (pg_leftmost_one_pos64(key) / RT_SPAN) * RT_SPAN;
     813              : }
     814              : 
     815              : /*
     816              :  * Return the max value that can be stored in the tree with the given shift.
     817              :  */
     818              : static uint64
     819       107187 : RT_SHIFT_GET_MAX_VAL(int shift)
     820              : {
     821       107187 :     if (shift == RT_MAX_SHIFT)
     822           10 :         return UINT64_MAX;
     823              :     else
     824       107177 :         return (UINT64CONST(1) << (shift + RT_SPAN)) - 1;
     825              : }
     826              : 
     827              : /*
     828              :  * Allocate a new node with the given node kind and size class.
     829              :  */
     830              : static inline RT_CHILD_PTR
     831       133864 : RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_class)
     832              : {
     833              :     RT_CHILD_PTR allocnode;
     834              :     RT_NODE    *node;
     835              :     size_t      allocsize;
     836              : 
     837       133864 :     allocsize = RT_SIZE_CLASS_INFO[size_class].allocsize;
     838              : 
     839              : #ifdef RT_SHMEM
     840           62 :     allocnode.alloc = dsa_allocate(tree->dsa, allocsize);
     841              : #else
     842       133802 :     allocnode.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->node_slabs[size_class],
     843              :                                                         allocsize);
     844              : #endif
     845              : 
     846       133864 :     RT_PTR_SET_LOCAL(tree, &allocnode);
     847       133864 :     node = allocnode.local;
     848              : 
     849              :     /* initialize contents */
     850              : 
     851       133864 :     switch (kind)
     852              :     {
     853       124982 :         case RT_NODE_KIND_4:
     854       124982 :             memset(node, 0, offsetof(RT_NODE_4, children));
     855       124982 :             break;
     856         6674 :         case RT_NODE_KIND_16:
     857         6674 :             memset(node, 0, offsetof(RT_NODE_16, children));
     858         6674 :             break;
     859         2125 :         case RT_NODE_KIND_48:
     860              :             {
     861         2125 :                 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
     862              : 
     863         2125 :                 memset(n48, 0, offsetof(RT_NODE_48, slot_idxs));
     864         2125 :                 memset(n48->slot_idxs, RT_INVALID_SLOT_IDX, sizeof(n48->slot_idxs));
     865         2125 :                 break;
     866              :             }
     867           83 :         case RT_NODE_KIND_256:
     868           83 :             memset(node, 0, offsetof(RT_NODE_256, children));
     869           83 :             break;
     870            0 :         default:
     871            0 :             pg_unreachable();
     872              :     }
     873              : 
     874       133864 :     node->kind = kind;
     875              : 
     876              :     /*
     877              :      * For node256, this will actually overflow to zero, but that's okay
     878              :      * because that node doesn't need to introspect this value.
     879              :      */
     880       133864 :     node->fanout = RT_SIZE_CLASS_INFO[size_class].fanout;
     881              : 
     882              : #ifdef RT_DEBUG
     883              :     /* update the statistics */
     884        26070 :     tree->ctl->num_nodes[size_class]++;
     885              : #endif
     886              : 
     887       133864 :     return allocnode;
     888              : }
     889              : 
     890              : /*
     891              :  * Allocate a new leaf.
     892              :  */
     893              : static RT_CHILD_PTR
     894        14171 : RT_ALLOC_LEAF(RT_RADIX_TREE * tree, size_t allocsize)
     895              : {
     896              :     RT_CHILD_PTR leaf;
     897              : 
     898              : #ifdef RT_SHMEM
     899          682 :     leaf.alloc = dsa_allocate(tree->dsa, allocsize);
     900          682 :     RT_PTR_SET_LOCAL(tree, &leaf);
     901              : #else
     902        13489 :     leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_context, allocsize);
     903              : #endif
     904              : 
     905              : #ifdef RT_DEBUG
     906            0 :     tree->ctl->num_leaves++;
     907              : #endif
     908              : 
     909        14171 :     return leaf;
     910              : }
     911              : 
     912              : /*
     913              :  * Copy relevant members of the node header.
     914              :  * This is a separate function in case other fields are added.
     915              :  */
     916              : static inline void
     917        10963 : RT_COPY_COMMON(RT_CHILD_PTR newnode, RT_CHILD_PTR oldnode)
     918              : {
     919        10963 :     (newnode.local)->count = (oldnode.local)->count;
     920        10963 : }
     921              : 
     922              : /* Free the given node */
     923              : static void
     924        26683 : RT_FREE_NODE(RT_RADIX_TREE * tree, RT_CHILD_PTR node)
     925              : {
     926              : #ifdef RT_DEBUG
     927              :     int         i;
     928              : 
     929              :     /* update the statistics */
     930              : 
     931        40495 :     for (i = 0; i < RT_NUM_SIZE_CLASSES; i++)
     932              :     {
     933        40479 :         if ((node.local)->fanout == RT_SIZE_CLASS_INFO[i].fanout)
     934        26022 :             break;
     935              :     }
     936              : 
     937              :     /*
     938              :      * The fanout of node256 will appear to be zero within the node header
     939              :      * because of overflow, so we need an extra check here.
     940              :      */
     941        26038 :     if (i == RT_NUM_SIZE_CLASSES)
     942           16 :         i = RT_CLASS_256;
     943              : 
     944        26038 :     tree->ctl->num_nodes[i]--;
     945              :     Assert(tree->ctl->num_nodes[i] >= 0);
     946              : #endif
     947              : 
     948              : #ifdef RT_SHMEM
     949           28 :     dsa_free(tree->dsa, node.alloc);
     950              : #else
     951        26655 :     pfree(node.alloc);
     952              : #endif
     953        26683 : }
     954              : 
     955              : static inline void
     956            3 : RT_FREE_LEAF(RT_RADIX_TREE * tree, RT_PTR_ALLOC leaf)
     957              : {
     958              :     Assert(leaf != tree->ctl->root);
     959              : 
     960              : #ifdef RT_DEBUG
     961              :     /* update the statistics */
     962            0 :     tree->ctl->num_leaves--;
     963              :     Assert(tree->ctl->num_leaves >= 0);
     964              : #endif
     965              : 
     966              : #ifdef RT_SHMEM
     967            0 :     dsa_free(tree->dsa, leaf);
     968              : #else
     969            3 :     pfree(leaf);
     970              : #endif
     971            3 : }
     972              : 
     973              : /***************** SEARCH *****************/
     974              : 
     975              : /*
     976              :  * Return the address of the child corresponding to "chunk",
     977              :  * or NULL if there is no such element.
     978              :  */
     979              : static inline RT_PTR_ALLOC *
     980      3415504 : RT_NODE_16_SEARCH_EQ(RT_NODE_16 * node, uint8 chunk)
     981              : {
     982      3415504 :     int         count = node->base.count;
     983              : #ifndef USE_NO_SIMD
     984              :     Vector8     spread_chunk;
     985              :     Vector8     haystack1;
     986              :     Vector8     haystack2;
     987              :     Vector8     cmp1;
     988              :     Vector8     cmp2;
     989              :     uint32      bitfield;
     990      3415504 :     RT_PTR_ALLOC *slot_simd = NULL;
     991              : #endif
     992              : 
     993              : #if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
     994              :     RT_PTR_ALLOC *slot = NULL;
     995              : 
     996              :     for (int i = 0; i < count; i++)
     997              :     {
     998              :         if (node->chunks[i] == chunk)
     999              :         {
    1000              :             slot = &node->children[i];
    1001              :             break;
    1002              :         }
    1003              :     }
    1004              : #endif
    1005              : 
    1006              : #ifndef USE_NO_SIMD
    1007              :     /* replicate the search key */
    1008      3415504 :     spread_chunk = vector8_broadcast(chunk);
    1009              : 
    1010              :     /* compare to all 32 keys stored in the node */
    1011      3415504 :     vector8_load(&haystack1, &node->chunks[0]);
    1012      3415504 :     vector8_load(&haystack2, &node->chunks[sizeof(Vector8)]);
    1013      3415504 :     cmp1 = vector8_eq(spread_chunk, haystack1);
    1014      3415504 :     cmp2 = vector8_eq(spread_chunk, haystack2);
    1015              : 
    1016              :     /* convert comparison to a bitfield */
    1017      3415504 :     bitfield = vector8_highbit_mask(cmp1) | (vector8_highbit_mask(cmp2) << sizeof(Vector8));
    1018              : 
    1019              :     /* mask off invalid entries */
    1020      3415504 :     bitfield &= ((UINT64CONST(1) << count) - 1);
    1021              : 
    1022              :     /* convert bitfield to index by counting trailing zeros */
    1023      3415504 :     if (bitfield)
    1024      2872328 :         slot_simd = &node->children[pg_rightmost_one_pos32(bitfield)];
    1025              : 
    1026              :     Assert(slot_simd == slot);
    1027      3415504 :     return slot_simd;
    1028              : #else
    1029              :     return slot;
    1030              : #endif
    1031              : }
    1032              : 
    1033              : /*
    1034              :  * Search for the child pointer corresponding to "key" in the given node.
    1035              :  *
    1036              :  * Return child if the key is found, otherwise return NULL.
    1037              :  */
    1038              : static inline RT_PTR_ALLOC *
    1039     14400937 : RT_NODE_SEARCH(RT_NODE * node, uint8 chunk)
    1040              : {
    1041              :     /* Make sure we already converted to local pointer */
    1042              :     Assert(node != NULL);
    1043              : 
    1044     14400937 :     switch (node->kind)
    1045              :     {
    1046      6323992 :         case RT_NODE_KIND_4:
    1047              :             {
    1048      6323992 :                 RT_NODE_4  *n4 = (RT_NODE_4 *) node;
    1049              : 
    1050      7931193 :                 for (int i = 0; i < n4->base.count; i++)
    1051              :                 {
    1052      7287335 :                     if (n4->chunks[i] == chunk)
    1053      5680134 :                         return &n4->children[i];
    1054              :                 }
    1055       643858 :                 return NULL;
    1056              :             }
    1057      3415504 :         case RT_NODE_KIND_16:
    1058      3415504 :             return RT_NODE_16_SEARCH_EQ((RT_NODE_16 *) node, chunk);
    1059       439638 :         case RT_NODE_KIND_48:
    1060              :             {
    1061       439638 :                 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
    1062       439638 :                 int         slotpos = n48->slot_idxs[chunk];
    1063              : 
    1064       439638 :                 if (slotpos == RT_INVALID_SLOT_IDX)
    1065       114620 :                     return NULL;
    1066              : 
    1067       325018 :                 return RT_NODE_48_GET_CHILD(n48, chunk);
    1068              :             }
    1069      4221803 :         case RT_NODE_KIND_256:
    1070              :             {
    1071      4221803 :                 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
    1072              : 
    1073      4221803 :                 if (!RT_NODE_256_IS_CHUNK_USED(n256, chunk))
    1074       122685 :                     return NULL;
    1075              : 
    1076      4099118 :                 return RT_NODE_256_GET_CHILD(n256, chunk);
    1077              :             }
    1078            0 :         default:
    1079            0 :             pg_unreachable();
    1080              :     }
    1081              : }
    1082              : 
    1083              : /*
    1084              :  * Search the given key in the radix tree. Return the pointer to the value if found,
    1085              :  * otherwise return NULL.
    1086              :  *
    1087              :  * Since the function returns a pointer (to support variable-length values),
    1088              :  * the caller is responsible for locking until it's finished with the value.
    1089              :  */
    1090              : RT_SCOPE    RT_VALUE_TYPE *
    1091      5770754 : RT_FIND(RT_RADIX_TREE * tree, uint64 key)
    1092              : {
    1093              :     RT_CHILD_PTR node;
    1094      5770754 :     RT_PTR_ALLOC *slot = NULL;
    1095              :     int         shift;
    1096              : 
    1097              : #ifdef RT_SHMEM
    1098              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    1099              : #endif
    1100              : 
    1101      5770754 :     if (key > tree->ctl->max_val)
    1102         1433 :         return NULL;
    1103              : 
    1104              :     Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
    1105      5769321 :     node.alloc = tree->ctl->root;
    1106      5769321 :     shift = tree->ctl->start_shift;
    1107              : 
    1108              :     /* Descend the tree */
    1109     18019833 :     while (shift >= 0)
    1110              :     {
    1111     13554049 :         RT_PTR_SET_LOCAL(tree, &node);
    1112     13554049 :         slot = RT_NODE_SEARCH(node.local, RT_GET_KEY_CHUNK(key, shift));
    1113     13554049 :         if (slot == NULL)
    1114      1303537 :             return NULL;
    1115              : 
    1116     12250512 :         node.alloc = *slot;
    1117     12250512 :         shift -= RT_SPAN;
    1118              :     }
    1119              : 
    1120      4465784 :     if (RT_CHILDPTR_IS_VALUE(*slot))
    1121       275718 :         return (RT_VALUE_TYPE *) slot;
    1122              :     else
    1123              :     {
    1124      4190066 :         RT_PTR_SET_LOCAL(tree, &node);
    1125      4190066 :         return (RT_VALUE_TYPE *) node.local;
    1126              :     }
    1127              : }
    1128              : 
    1129              : /***************** INSERTION *****************/
    1130              : 
    1131              : #define RT_NODE_MUST_GROW(node) \
    1132              :     ((node)->count == (node)->fanout)
    1133              : 
    1134              : /*
    1135              :  * Return index of the chunk and slot arrays for inserting into the node,
    1136              :  * such that the arrays remain ordered.
    1137              :  */
    1138              : static inline int
    1139        10514 : RT_NODE_4_GET_INSERTPOS(RT_NODE_4 * node, uint8 chunk, int count)
    1140              : {
    1141              :     int         idx;
    1142              : 
    1143        24330 :     for (idx = 0; idx < count; idx++)
    1144              :     {
    1145        19523 :         if (node->chunks[idx] >= chunk)
    1146         5707 :             break;
    1147              :     }
    1148              : 
    1149        10514 :     return idx;
    1150              : }
    1151              : 
    1152              : /*
    1153              :  * Return index of the chunk and slot arrays for inserting into the node,
    1154              :  * such that the arrays remain ordered.
    1155              :  */
    1156              : static inline int
    1157        60935 : RT_NODE_16_GET_INSERTPOS(RT_NODE_16 * node, uint8 chunk)
    1158              : {
    1159        60935 :     int         count = node->base.count;
    1160              : #if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
    1161              :     int         index;
    1162              : #endif
    1163              : 
    1164              : #ifndef USE_NO_SIMD
    1165              :     Vector8     spread_chunk;
    1166              :     Vector8     haystack1;
    1167              :     Vector8     haystack2;
    1168              :     Vector8     cmp1;
    1169              :     Vector8     cmp2;
    1170              :     Vector8     min1;
    1171              :     Vector8     min2;
    1172              :     uint32      bitfield;
    1173              :     int         index_simd;
    1174              : #endif
    1175              : 
    1176              :     /*
    1177              :      * First compare the last element. There are two reasons to branch here:
    1178              :      *
    1179              :      * 1) A realistic pattern is inserting ordered keys. In that case,
    1180              :      * non-SIMD platforms must do a linear search to the last chunk to find
    1181              :      * the insert position. This will get slower as the node fills up.
    1182              :      *
    1183              :      * 2) On SIMD platforms, we must branch anyway to make sure we don't bit
    1184              :      * scan an empty bitfield. Doing the branch here eliminates some work that
    1185              :      * we might otherwise throw away.
    1186              :      */
    1187              :     Assert(count > 0);
    1188        60935 :     if (node->chunks[count - 1] < chunk)
    1189         8673 :         return count;
    1190              : 
    1191              : #if defined(USE_NO_SIMD) || defined(USE_ASSERT_CHECKING)
    1192              : 
    1193              :     for (index = 0; index < count; index++)
    1194              :     {
    1195              :         if (node->chunks[index] > chunk)
    1196              :             break;
    1197              :     }
    1198              : #endif
    1199              : 
    1200              : #ifndef USE_NO_SIMD
    1201              : 
    1202              :     /*
    1203              :      * This is a bit more complicated than RT_NODE_16_SEARCH_EQ(), because no
    1204              :      * unsigned uint8 comparison instruction exists, at least for SSE2. So we
    1205              :      * need to play some trickery using vector8_min() to effectively get >=.
    1206              :      * There'll never be any equal elements in current uses, but that's what
    1207              :      * we get here...
    1208              :      */
    1209        52262 :     spread_chunk = vector8_broadcast(chunk);
    1210        52262 :     vector8_load(&haystack1, &node->chunks[0]);
    1211        52262 :     vector8_load(&haystack2, &node->chunks[sizeof(Vector8)]);
    1212        52262 :     min1 = vector8_min(spread_chunk, haystack1);
    1213        52262 :     min2 = vector8_min(spread_chunk, haystack2);
    1214        52262 :     cmp1 = vector8_eq(spread_chunk, min1);
    1215        52262 :     cmp2 = vector8_eq(spread_chunk, min2);
    1216        52262 :     bitfield = vector8_highbit_mask(cmp1) | (vector8_highbit_mask(cmp2) << sizeof(Vector8));
    1217              : 
    1218              :     Assert((bitfield & ((UINT64CONST(1) << count) - 1)) != 0);
    1219        52262 :     index_simd = pg_rightmost_one_pos32(bitfield);
    1220              : 
    1221              :     Assert(index_simd == index);
    1222        52262 :     return index_simd;
    1223              : #else
    1224              :     return index;
    1225              : #endif
    1226              : }
    1227              : 
    1228              : /* Shift the elements right at "insertpos" by one */
    1229              : static inline void
    1230        66800 : RT_SHIFT_ARRAYS_FOR_INSERT(uint8 *chunks, RT_PTR_ALLOC * children, int count, int insertpos)
    1231              : {
    1232              :     /*
    1233              :      * This is basically a memmove, but written in a simple loop for speed on
    1234              :      * small inputs.
    1235              :      */
    1236       561280 :     for (int i = count - 1; i >= insertpos; i--)
    1237              :     {
    1238              :         /* workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481 */
    1239              : #ifdef __GNUC__
    1240       494480 :         __asm__("");
    1241              : #endif
    1242       494480 :         chunks[i + 1] = chunks[i];
    1243       494480 :         children[i + 1] = children[i];
    1244              :     }
    1245        66800 : }
    1246              : 
    1247              : /*
    1248              :  * Copy both chunk and slot arrays into the right
    1249              :  * place. The caller is responsible for inserting the new element.
    1250              :  */
    1251              : static inline void
    1252         4649 : RT_COPY_ARRAYS_FOR_INSERT(uint8 *dst_chunks, RT_PTR_ALLOC * dst_children,
    1253              :                           uint8 *src_chunks, RT_PTR_ALLOC * src_children,
    1254              :                           int count, int insertpos)
    1255              : {
    1256        50321 :     for (int i = 0; i < count; i++)
    1257              :     {
    1258        45672 :         int         sourceidx = i;
    1259              : 
    1260              :         /* use a branch-free computation to skip the index of the new element */
    1261        45672 :         int         destidx = i + (i >= insertpos);
    1262              : 
    1263        45672 :         dst_chunks[destidx] = src_chunks[sourceidx];
    1264        45672 :         dst_children[destidx] = src_children[sourceidx];
    1265              :     }
    1266         4649 : }
    1267              : 
    1268              : static inline RT_PTR_ALLOC *
    1269        11485 : RT_ADD_CHILD_256(RT_RADIX_TREE * tree, RT_CHILD_PTR node, uint8 chunk)
    1270              : {
    1271        11485 :     RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
    1272        11485 :     int         idx = RT_BM_IDX(chunk);
    1273        11485 :     int         bitnum = RT_BM_BIT(chunk);
    1274              : 
    1275              :     /* Mark the slot used for "chunk" */
    1276        11485 :     n256->isset[idx] |= ((bitmapword) 1 << bitnum);
    1277              : 
    1278        11485 :     n256->base.count++;
    1279        11485 :     RT_VERIFY_NODE((RT_NODE *) n256);
    1280              : 
    1281        11485 :     return RT_NODE_256_GET_CHILD(n256, chunk);
    1282              : }
    1283              : 
    1284              : static pg_noinline RT_PTR_ALLOC *
    1285           83 : RT_GROW_NODE_48(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node,
    1286              :                 uint8 chunk)
    1287              : {
    1288           83 :     RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
    1289              :     RT_CHILD_PTR newnode;
    1290              :     RT_NODE_256 *new256;
    1291           83 :     int         i = 0;
    1292              : 
    1293              :     /* initialize new node */
    1294           83 :     newnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_256, RT_CLASS_256);
    1295           83 :     new256 = (RT_NODE_256 *) newnode.local;
    1296              : 
    1297              :     /* copy over the entries */
    1298           83 :     RT_COPY_COMMON(newnode, node);
    1299          415 :     for (int word_num = 0; word_num < RT_BM_IDX(RT_NODE_MAX_SLOTS); word_num++)
    1300              :     {
    1301          332 :         bitmapword  bitmap = 0;
    1302              : 
    1303              :         /*
    1304              :          * Bit manipulation is a surprisingly large portion of the overhead in
    1305              :          * the naive implementation. Doing stores word-at-a-time removes a lot
    1306              :          * of that overhead.
    1307              :          */
    1308        21580 :         for (int bit = 0; bit < BITS_PER_BITMAPWORD; bit++)
    1309              :         {
    1310        21248 :             uint8       offset = n48->slot_idxs[i];
    1311              : 
    1312        21248 :             if (offset != RT_INVALID_SLOT_IDX)
    1313              :             {
    1314         5304 :                 bitmap |= ((bitmapword) 1 << bit);
    1315         5304 :                 new256->children[i] = n48->children[offset];
    1316              :             }
    1317              : 
    1318        21248 :             i++;
    1319              :         }
    1320              : 
    1321          332 :         new256->isset[word_num] = bitmap;
    1322              :     }
    1323              : 
    1324              :     /* free old node and update reference in parent */
    1325           83 :     *parent_slot = newnode.alloc;
    1326           83 :     RT_FREE_NODE(tree, node);
    1327              : 
    1328           83 :     return RT_ADD_CHILD_256(tree, newnode, chunk);
    1329              : }
    1330              : 
    1331              : static inline RT_PTR_ALLOC *
    1332        26778 : RT_ADD_CHILD_48(RT_RADIX_TREE * tree, RT_CHILD_PTR node, uint8 chunk)
    1333              : {
    1334        26778 :     RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
    1335              :     int         insertpos;
    1336        26778 :     int         idx = 0;
    1337              :     bitmapword  w,
    1338              :                 inverse;
    1339              : 
    1340              :     /* get the first word with at least one bit not set */
    1341        26778 :     for (int i = 0; i < RT_BM_IDX(RT_FANOUT_48_MAX); i++)
    1342              :     {
    1343        26778 :         w = n48->isset[i];
    1344        26778 :         if (w < ~((bitmapword) 0))
    1345              :         {
    1346        26778 :             idx = i;
    1347        26778 :             break;
    1348              :         }
    1349              :     }
    1350              : 
    1351              :     /* To get the first unset bit in w, get the first set bit in ~w */
    1352        26778 :     inverse = ~w;
    1353        26778 :     insertpos = idx * BITS_PER_BITMAPWORD;
    1354        26778 :     insertpos += bmw_rightmost_one_pos(inverse);
    1355              :     Assert(insertpos < n48->base.fanout);
    1356              : 
    1357              :     /* mark the slot used by setting the rightmost zero bit */
    1358        26778 :     n48->isset[idx] |= w + 1;
    1359              : 
    1360              :     /* insert new chunk into place */
    1361        26778 :     n48->slot_idxs[chunk] = insertpos;
    1362              : 
    1363        26778 :     n48->base.count++;
    1364        26778 :     RT_VERIFY_NODE((RT_NODE *) n48);
    1365              : 
    1366        26778 :     return &n48->children[insertpos];
    1367              : }
    1368              : 
    1369              : static pg_noinline RT_PTR_ALLOC *
    1370         4366 : RT_GROW_NODE_16(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node,
    1371              :                 uint8 chunk)
    1372              : {
    1373         4366 :     RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
    1374              :     int         insertpos;
    1375              : 
    1376         4366 :     if (n16->base.fanout < RT_FANOUT_16_HI)
    1377              :     {
    1378              :         RT_CHILD_PTR newnode;
    1379              :         RT_NODE_16 *new16;
    1380              : 
    1381              :         Assert(n16->base.fanout == RT_FANOUT_16_LO);
    1382              : 
    1383              :         /* initialize new node */
    1384         2257 :         newnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_16, RT_CLASS_16_HI);
    1385         2257 :         new16 = (RT_NODE_16 *) newnode.local;
    1386              : 
    1387              :         /* copy over existing entries */
    1388         2257 :         RT_COPY_COMMON(newnode, node);
    1389              :         Assert(n16->base.count == RT_FANOUT_16_LO);
    1390         2257 :         insertpos = RT_NODE_16_GET_INSERTPOS(n16, chunk);
    1391         2257 :         RT_COPY_ARRAYS_FOR_INSERT(new16->chunks, new16->children,
    1392         2257 :                                   n16->chunks, n16->children,
    1393              :                                   RT_FANOUT_16_LO, insertpos);
    1394              : 
    1395              :         /* insert new chunk into place */
    1396         2257 :         new16->chunks[insertpos] = chunk;
    1397              : 
    1398         2257 :         new16->base.count++;
    1399         2257 :         RT_VERIFY_NODE((RT_NODE *) new16);
    1400              : 
    1401              :         /* free old node and update references */
    1402         2257 :         RT_FREE_NODE(tree, node);
    1403         2257 :         *parent_slot = newnode.alloc;
    1404              : 
    1405         2257 :         return &new16->children[insertpos];
    1406              :     }
    1407              :     else
    1408              :     {
    1409              :         RT_CHILD_PTR newnode;
    1410              :         RT_NODE_48 *new48;
    1411              :         int         idx,
    1412              :                     bit;
    1413              : 
    1414              :         Assert(n16->base.fanout == RT_FANOUT_16_HI);
    1415              : 
    1416              :         /* initialize new node */
    1417         2109 :         newnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_48, RT_CLASS_48);
    1418         2109 :         new48 = (RT_NODE_48 *) newnode.local;
    1419              : 
    1420              :         /* copy over the entries */
    1421         2109 :         RT_COPY_COMMON(newnode, node);
    1422        69597 :         for (int i = 0; i < RT_FANOUT_16_HI; i++)
    1423        67488 :             new48->slot_idxs[n16->chunks[i]] = i;
    1424         2109 :         memcpy(&new48->children[0], &n16->children[0], RT_FANOUT_16_HI * sizeof(new48->children[0]));
    1425              : 
    1426              :         /*
    1427              :          * Since we just copied a dense array, we can fill "isset" using a
    1428              :          * single store, provided the length of that array is at most the
    1429              :          * number of bits in a bitmapword.
    1430              :          */
    1431              :         Assert(RT_FANOUT_16_HI <= BITS_PER_BITMAPWORD);
    1432         2109 :         new48->isset[0] = (bitmapword) (((uint64) 1 << RT_FANOUT_16_HI) - 1);
    1433              : 
    1434              :         /* put the new child at the end of the copied entries */
    1435         2109 :         insertpos = RT_FANOUT_16_HI;
    1436         2109 :         idx = RT_BM_IDX(insertpos);
    1437         2109 :         bit = RT_BM_BIT(insertpos);
    1438              : 
    1439              :         /* mark the slot used */
    1440         2109 :         new48->isset[idx] |= ((bitmapword) 1 << bit);
    1441              : 
    1442              :         /* insert new chunk into place */
    1443         2109 :         new48->slot_idxs[chunk] = insertpos;
    1444              : 
    1445         2109 :         new48->base.count++;
    1446         2109 :         RT_VERIFY_NODE((RT_NODE *) new48);
    1447              : 
    1448              :         /* free old node and update reference in parent */
    1449         2109 :         *parent_slot = newnode.alloc;
    1450         2109 :         RT_FREE_NODE(tree, node);
    1451              : 
    1452         2109 :         return &new48->children[insertpos];
    1453              :     }
    1454              : }
    1455              : 
    1456              : static inline RT_PTR_ALLOC *
    1457        58678 : RT_ADD_CHILD_16(RT_RADIX_TREE * tree, RT_CHILD_PTR node, uint8 chunk)
    1458              : {
    1459        58678 :     RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
    1460        58678 :     int         insertpos = RT_NODE_16_GET_INSERTPOS(n16, chunk);
    1461              : 
    1462              :     /* shift chunks and children */
    1463        58678 :     RT_SHIFT_ARRAYS_FOR_INSERT(n16->chunks, n16->children,
    1464        58678 :                                n16->base.count, insertpos);
    1465              : 
    1466              :     /* insert new chunk into place */
    1467        58678 :     n16->chunks[insertpos] = chunk;
    1468              : 
    1469        58678 :     n16->base.count++;
    1470        58678 :     RT_VERIFY_NODE((RT_NODE *) n16);
    1471              : 
    1472        58678 :     return &n16->children[insertpos];
    1473              : }
    1474              : 
    1475              : static pg_noinline RT_PTR_ALLOC *
    1476         2392 : RT_GROW_NODE_4(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node,
    1477              :                uint8 chunk)
    1478              : {
    1479         2392 :     RT_NODE_4  *n4 = (RT_NODE_4 *) (node.local);
    1480              :     RT_CHILD_PTR newnode;
    1481              :     RT_NODE_16 *new16;
    1482              :     int         insertpos;
    1483              : 
    1484              :     /* initialize new node */
    1485         2392 :     newnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_16, RT_CLASS_16_LO);
    1486         2392 :     new16 = (RT_NODE_16 *) newnode.local;
    1487              : 
    1488              :     /* copy over existing entries */
    1489         2392 :     RT_COPY_COMMON(newnode, node);
    1490              :     Assert(n4->base.count == RT_FANOUT_4);
    1491         2392 :     insertpos = RT_NODE_4_GET_INSERTPOS(n4, chunk, RT_FANOUT_4);
    1492         2392 :     RT_COPY_ARRAYS_FOR_INSERT(new16->chunks, new16->children,
    1493         2392 :                               n4->chunks, n4->children,
    1494              :                               RT_FANOUT_4, insertpos);
    1495              : 
    1496              :     /* insert new chunk into place */
    1497         2392 :     new16->chunks[insertpos] = chunk;
    1498              : 
    1499         2392 :     new16->base.count++;
    1500         2392 :     RT_VERIFY_NODE((RT_NODE *) new16);
    1501              : 
    1502              :     /* free old node and update reference in parent */
    1503         2392 :     *parent_slot = newnode.alloc;
    1504         2392 :     RT_FREE_NODE(tree, node);
    1505              : 
    1506         2392 :     return &new16->children[insertpos];
    1507              : }
    1508              : 
    1509              : static inline RT_PTR_ALLOC *
    1510         8122 : RT_ADD_CHILD_4(RT_RADIX_TREE * tree, RT_CHILD_PTR node, uint8 chunk)
    1511              : {
    1512         8122 :     RT_NODE_4  *n4 = (RT_NODE_4 *) (node.local);
    1513         8122 :     int         count = n4->base.count;
    1514         8122 :     int         insertpos = RT_NODE_4_GET_INSERTPOS(n4, chunk, count);
    1515              : 
    1516              :     /* shift chunks and children */
    1517         8122 :     RT_SHIFT_ARRAYS_FOR_INSERT(n4->chunks, n4->children,
    1518              :                                count, insertpos);
    1519              : 
    1520              :     /* insert new chunk into place */
    1521         8122 :     n4->chunks[insertpos] = chunk;
    1522              : 
    1523         8122 :     n4->base.count++;
    1524         8122 :     RT_VERIFY_NODE((RT_NODE *) n4);
    1525              : 
    1526         8122 :     return &n4->children[insertpos];
    1527              : }
    1528              : 
    1529              : /*
    1530              :  * Reserve slot in "node"'s child array. The caller will populate it
    1531              :  * with the actual child pointer.
    1532              :  *
    1533              :  * "parent_slot" is the address of the parent's child pointer to "node".
    1534              :  * If the node we're inserting into needs to grow, we update the parent's
    1535              :  * child pointer with the pointer to the new larger node.
    1536              :  */
    1537              : static inline RT_PTR_ALLOC *
    1538       111821 : RT_NODE_INSERT(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node,
    1539              :                uint8 chunk)
    1540              : {
    1541       111821 :     RT_NODE    *n = node.local;
    1542              : 
    1543       111821 :     switch (n->kind)
    1544              :     {
    1545        10514 :         case RT_NODE_KIND_4:
    1546              :             {
    1547        10514 :                 if (unlikely(RT_NODE_MUST_GROW(n)))
    1548         2392 :                     return RT_GROW_NODE_4(tree, parent_slot, node, chunk);
    1549              : 
    1550         8122 :                 return RT_ADD_CHILD_4(tree, node, chunk);
    1551              :             }
    1552        63044 :         case RT_NODE_KIND_16:
    1553              :             {
    1554        63044 :                 if (unlikely(RT_NODE_MUST_GROW(n)))
    1555         4366 :                     return RT_GROW_NODE_16(tree, parent_slot, node, chunk);
    1556              : 
    1557        58678 :                 return RT_ADD_CHILD_16(tree, node, chunk);
    1558              :             }
    1559        26861 :         case RT_NODE_KIND_48:
    1560              :             {
    1561        26861 :                 if (unlikely(RT_NODE_MUST_GROW(n)))
    1562           83 :                     return RT_GROW_NODE_48(tree, parent_slot, node, chunk);
    1563              : 
    1564        26778 :                 return RT_ADD_CHILD_48(tree, node, chunk);
    1565              :             }
    1566        11402 :         case RT_NODE_KIND_256:
    1567        11402 :             return RT_ADD_CHILD_256(tree, node, chunk);
    1568            0 :         default:
    1569            0 :             pg_unreachable();
    1570              :     }
    1571              : }
    1572              : 
    1573              : /*
    1574              :  * The radix tree doesn't have sufficient height. Put new node(s) on top,
    1575              :  * and move the old node below it.
    1576              :  */
    1577              : static pg_noinline void
    1578           27 : RT_EXTEND_UP(RT_RADIX_TREE * tree, uint64 key)
    1579              : {
    1580           27 :     int         target_shift = RT_KEY_GET_SHIFT(key);
    1581           27 :     int         shift = tree->ctl->start_shift;
    1582              : 
    1583              :     Assert(shift < target_shift);
    1584              : 
    1585              :     /* Grow tree upwards until start shift can accommodate the key */
    1586           84 :     while (shift < target_shift)
    1587              :     {
    1588              :         RT_CHILD_PTR node;
    1589              :         RT_NODE_4  *n4;
    1590              : 
    1591           57 :         node = RT_ALLOC_NODE(tree, RT_NODE_KIND_4, RT_CLASS_4);
    1592           57 :         n4 = (RT_NODE_4 *) node.local;
    1593           57 :         n4->base.count = 1;
    1594           57 :         n4->chunks[0] = 0;
    1595           57 :         n4->children[0] = tree->ctl->root;
    1596              : 
    1597              :         /* Update the root */
    1598           57 :         tree->ctl->root = node.alloc;
    1599              : 
    1600           57 :         shift += RT_SPAN;
    1601              :     }
    1602              : 
    1603           27 :     tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(target_shift);
    1604           27 :     tree->ctl->start_shift = target_shift;
    1605           27 : }
    1606              : 
    1607              : /*
    1608              :  * Insert a chain of nodes until we reach the lowest level,
    1609              :  * and return the address of a slot to be filled further up
    1610              :  * the call stack.
    1611              :  */
    1612              : static pg_noinline RT_PTR_ALLOC *
    1613         4985 : RT_EXTEND_DOWN(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, uint64 key, int shift)
    1614              : {
    1615              :     RT_CHILD_PTR node,
    1616              :                 child;
    1617              :     RT_NODE_4  *n4;
    1618              : 
    1619              :     /*
    1620              :      * The child pointer of the first node in the chain goes in the
    1621              :      * caller-provided slot.
    1622              :      */
    1623         4985 :     child = RT_ALLOC_NODE(tree, RT_NODE_KIND_4, RT_CLASS_4);
    1624         4985 :     *parent_slot = child.alloc;
    1625              : 
    1626         4985 :     node = child;
    1627         4985 :     shift -= RT_SPAN;
    1628              : 
    1629        15729 :     while (shift > 0)
    1630              :     {
    1631        10744 :         child = RT_ALLOC_NODE(tree, RT_NODE_KIND_4, RT_CLASS_4);
    1632              : 
    1633              :         /* We open-code the insertion ourselves, for speed. */
    1634        10744 :         n4 = (RT_NODE_4 *) node.local;
    1635        10744 :         n4->base.count = 1;
    1636        10744 :         n4->chunks[0] = RT_GET_KEY_CHUNK(key, shift);
    1637        10744 :         n4->children[0] = child.alloc;
    1638              : 
    1639        10744 :         node = child;
    1640        10744 :         shift -= RT_SPAN;
    1641              :     }
    1642              :     Assert(shift == 0);
    1643              : 
    1644              :     /* Reserve slot for the value. */
    1645         4985 :     n4 = (RT_NODE_4 *) node.local;
    1646         4985 :     n4->chunks[0] = RT_GET_KEY_CHUNK(key, 0);
    1647         4985 :     n4->base.count = 1;
    1648              : 
    1649         4985 :     return &n4->children[0];
    1650              : }
    1651              : 
    1652              : /*
    1653              :  * Workhorse for RT_SET
    1654              :  *
    1655              :  * "parent_slot" is the address of the child pointer we just followed,
    1656              :  * in the parent's array of children, needed if inserting into the
    1657              :  * current node causes it to grow.
    1658              :  */
    1659              : static RT_PTR_ALLOC *
    1660       431765 : RT_GET_SLOT_RECURSIVE(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, uint64 key, int shift, bool *found)
    1661              : {
    1662              :     RT_PTR_ALLOC *slot;
    1663              :     RT_CHILD_PTR node;
    1664       431765 :     uint8       chunk = RT_GET_KEY_CHUNK(key, shift);
    1665              : 
    1666       431765 :     node.alloc = *parent_slot;
    1667       431765 :     RT_PTR_SET_LOCAL(tree, &node);
    1668       431765 :     slot = RT_NODE_SEARCH(node.local, chunk);
    1669              : 
    1670       431765 :     if (slot == NULL)
    1671              :     {
    1672       111821 :         *found = false;
    1673              : 
    1674              :         /* reserve slot for the caller to populate */
    1675              : 
    1676       111821 :         slot = RT_NODE_INSERT(tree, parent_slot, node, chunk);
    1677              : 
    1678       111821 :         if (shift == 0)
    1679       106850 :             return slot;
    1680              :         else
    1681         4971 :             return RT_EXTEND_DOWN(tree, slot, key, shift);
    1682              :     }
    1683              :     else
    1684              :     {
    1685       319944 :         if (shift == 0)
    1686              :         {
    1687        11166 :             *found = true;
    1688        11166 :             return slot;
    1689              :         }
    1690              :         else
    1691       308778 :             return RT_GET_SLOT_RECURSIVE(tree, slot, key, shift - RT_SPAN, found);
    1692              :     }
    1693              : }
    1694              : 
    1695              : /*
    1696              :  * Set key to value that value_p points to. If the entry already exists, we
    1697              :  * update its value and return true. Returns false if entry doesn't yet exist.
    1698              :  *
    1699              :  * Taking a lock in exclusive mode is the caller's responsibility.
    1700              :  */
    1701              : RT_SCOPE bool
    1702       123001 : RT_SET(RT_RADIX_TREE * tree, uint64 key, RT_VALUE_TYPE * value_p)
    1703              : {
    1704              :     bool        found;
    1705              :     RT_PTR_ALLOC *slot;
    1706       123001 :     size_t      value_sz = RT_GET_VALUE_SIZE(value_p);
    1707              : 
    1708              : #ifdef RT_SHMEM
    1709              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    1710              : #endif
    1711              : 
    1712              :     Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
    1713              : 
    1714              :     /* Extend the tree if necessary */
    1715       123001 :     if (unlikely(key > tree->ctl->max_val))
    1716              :     {
    1717           41 :         if (tree->ctl->num_keys == 0)
    1718              :         {
    1719              :             RT_CHILD_PTR node;
    1720              :             RT_NODE_4  *n4;
    1721           14 :             int         start_shift = RT_KEY_GET_SHIFT(key);
    1722              : 
    1723              :             /*
    1724              :              * With an empty root node, we don't extend the tree upwards,
    1725              :              * since that would result in orphan empty nodes. Instead we open
    1726              :              * code inserting into the root node and extend downward from
    1727              :              * there.
    1728              :              */
    1729           14 :             node.alloc = tree->ctl->root;
    1730           14 :             RT_PTR_SET_LOCAL(tree, &node);
    1731           14 :             n4 = (RT_NODE_4 *) node.local;
    1732           14 :             n4->base.count = 1;
    1733           14 :             n4->chunks[0] = RT_GET_KEY_CHUNK(key, start_shift);
    1734              : 
    1735           14 :             slot = RT_EXTEND_DOWN(tree, &n4->children[0], key, start_shift);
    1736           14 :             found = false;
    1737           14 :             tree->ctl->start_shift = start_shift;
    1738           14 :             tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(start_shift);
    1739           14 :             goto have_slot;
    1740              :         }
    1741              :         else
    1742           27 :             RT_EXTEND_UP(tree, key);
    1743              :     }
    1744              : 
    1745       122987 :     slot = RT_GET_SLOT_RECURSIVE(tree, &tree->ctl->root,
    1746       122987 :                                  key, tree->ctl->start_shift, &found);
    1747              : 
    1748       123001 : have_slot:
    1749              :     Assert(slot != NULL);
    1750              : 
    1751       123001 :     if (RT_VALUE_IS_EMBEDDABLE(value_p))
    1752              :     {
    1753              :         /* free the existing leaf */
    1754       108830 :         if (found && !RT_CHILDPTR_IS_VALUE(*slot))
    1755            1 :             RT_FREE_LEAF(tree, *slot);
    1756              : 
    1757              :         /* store value directly in child pointer slot */
    1758       108830 :         memcpy(slot, value_p, value_sz);
    1759              : 
    1760              : #ifdef RT_RUNTIME_EMBEDDABLE_VALUE
    1761              :         /* tag child pointer */
    1762              : #ifdef RT_SHMEM
    1763            2 :         *slot |= 1;
    1764              : #else
    1765         2294 :         *((uintptr_t *) slot) |= 1;
    1766              : #endif
    1767              : #endif
    1768              :     }
    1769              :     else
    1770              :     {
    1771              :         RT_CHILD_PTR leaf;
    1772              : 
    1773        14171 :         if (found && !RT_CHILDPTR_IS_VALUE(*slot))
    1774              :         {
    1775              :             Assert(RT_PTR_ALLOC_IS_VALID(*slot));
    1776            2 :             leaf.alloc = *slot;
    1777            2 :             RT_PTR_SET_LOCAL(tree, &leaf);
    1778              : 
    1779            2 :             if (RT_GET_VALUE_SIZE((RT_VALUE_TYPE *) leaf.local) != value_sz)
    1780              :             {
    1781              :                 /*
    1782              :                  * different sizes, so first free the existing leaf before
    1783              :                  * allocating a new one
    1784              :                  */
    1785            2 :                 RT_FREE_LEAF(tree, *slot);
    1786            2 :                 leaf = RT_ALLOC_LEAF(tree, value_sz);
    1787            2 :                 *slot = leaf.alloc;
    1788              :             }
    1789              :         }
    1790              :         else
    1791              :         {
    1792              :             /* allocate new leaf and store it in the child array */
    1793        14169 :             leaf = RT_ALLOC_LEAF(tree, value_sz);
    1794        14169 :             *slot = leaf.alloc;
    1795              :         }
    1796              : 
    1797        14171 :         memcpy(leaf.local, value_p, value_sz);
    1798              :     }
    1799              : 
    1800              :     /* Update the statistics */
    1801       123001 :     if (!found)
    1802       111835 :         tree->ctl->num_keys++;
    1803              : 
    1804       123001 :     return found;
    1805              : }
    1806              : 
    1807              : /***************** SETUP / TEARDOWN *****************/
    1808              : 
    1809              : /*
    1810              :  * Create the radix tree root in the caller's memory context and return it.
    1811              :  *
    1812              :  * The tree's nodes and leaves are allocated in "ctx" and its children for
    1813              :  * local memory, or in "dsa" for shared memory.
    1814              :  */
    1815              : RT_SCOPE    RT_RADIX_TREE *
    1816              : #ifdef RT_SHMEM
    1817           29 : RT_CREATE(dsa_area *dsa, int tranche_id)
    1818              : #else
    1819       107086 : RT_CREATE(MemoryContext ctx)
    1820              : #endif
    1821              : {
    1822              :     RT_RADIX_TREE *tree;
    1823              :     RT_CHILD_PTR rootnode;
    1824              : #ifdef RT_SHMEM
    1825              :     dsa_pointer dp;
    1826              : #endif
    1827              : 
    1828       107115 :     tree = palloc0_object(RT_RADIX_TREE);
    1829              : 
    1830              : #ifdef RT_SHMEM
    1831           29 :     tree->dsa = dsa;
    1832           29 :     dp = dsa_allocate0(dsa, sizeof(RT_RADIX_TREE_CONTROL));
    1833           29 :     tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, dp);
    1834           29 :     tree->ctl->handle = dp;
    1835           29 :     tree->ctl->magic = RT_RADIX_TREE_MAGIC;
    1836           29 :     LWLockInitialize(&tree->ctl->lock, tranche_id);
    1837              : #else
    1838       107086 :     tree->ctl = palloc0_object(RT_RADIX_TREE_CONTROL);
    1839              : 
    1840              :     /* Create a slab context for each size class */
    1841       642516 :     for (int i = 0; i < RT_NUM_SIZE_CLASSES; i++)
    1842              :     {
    1843       535430 :         RT_SIZE_CLASS_ELEM size_class = RT_SIZE_CLASS_INFO[i];
    1844       535430 :         size_t      inner_blocksize = RT_SLAB_BLOCK_SIZE(size_class.allocsize);
    1845              : 
    1846       535430 :         tree->node_slabs[i] = SlabContextCreate(ctx,
    1847              :                                                 size_class.name,
    1848              :                                                 inner_blocksize,
    1849              :                                                 size_class.allocsize);
    1850              :     }
    1851              : 
    1852       107086 :     tree->leaf_context = ctx;
    1853              : #endif                          /* RT_SHMEM */
    1854              : 
    1855              :     /* add root node now so that RT_SET can assume it exists */
    1856       107115 :     rootnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_4, RT_CLASS_4);
    1857       107115 :     tree->ctl->root = rootnode.alloc;
    1858       107115 :     tree->ctl->start_shift = 0;
    1859       107115 :     tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(0);
    1860              : 
    1861       107115 :     return tree;
    1862              : }
    1863              : 
    1864              : #ifdef RT_SHMEM
    1865              : RT_SCOPE    RT_RADIX_TREE *
    1866           28 : RT_ATTACH(dsa_area *dsa, RT_HANDLE handle)
    1867              : {
    1868              :     RT_RADIX_TREE *tree;
    1869              :     dsa_pointer control;
    1870              : 
    1871           28 :     tree = palloc0_object(RT_RADIX_TREE);
    1872              : 
    1873              :     /* Find the control object in shared memory */
    1874           28 :     control = handle;
    1875              : 
    1876           28 :     tree->dsa = dsa;
    1877           28 :     tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, control);
    1878              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    1879              : 
    1880           28 :     return tree;
    1881              : }
    1882              : 
    1883              : RT_SCOPE void
    1884           28 : RT_DETACH(RT_RADIX_TREE * tree)
    1885              : {
    1886              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    1887           28 :     pfree(tree);
    1888           28 : }
    1889              : 
    1890              : RT_SCOPE    RT_HANDLE
    1891           28 : RT_GET_HANDLE(RT_RADIX_TREE * tree)
    1892              : {
    1893              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    1894           28 :     return tree->ctl->handle;
    1895              : }
    1896              : 
    1897              : RT_SCOPE void
    1898          103 : RT_LOCK_EXCLUSIVE(RT_RADIX_TREE * tree)
    1899              : {
    1900              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    1901          103 :     LWLockAcquire(&tree->ctl->lock, LW_EXCLUSIVE);
    1902          103 : }
    1903              : 
    1904              : RT_SCOPE void
    1905       210842 : RT_LOCK_SHARE(RT_RADIX_TREE * tree)
    1906              : {
    1907              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    1908       210842 :     LWLockAcquire(&tree->ctl->lock, LW_SHARED);
    1909       210842 : }
    1910              : 
    1911              : RT_SCOPE void
    1912       210945 : RT_UNLOCK(RT_RADIX_TREE * tree)
    1913              : {
    1914              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    1915       210945 :     LWLockRelease(&tree->ctl->lock);
    1916       210945 : }
    1917              : 
    1918              : /*
    1919              :  * Recursively free all nodes allocated in the DSA area.
    1920              :  */
    1921              : static void
    1922           34 : RT_FREE_RECURSE(RT_RADIX_TREE * tree, RT_PTR_ALLOC ptr, int shift)
    1923              : {
    1924              :     RT_CHILD_PTR node;
    1925              : 
    1926           34 :     check_stack_depth();
    1927              : 
    1928           34 :     node.alloc = ptr;
    1929           34 :     RT_PTR_SET_LOCAL(tree, &node);
    1930              : 
    1931           34 :     switch (node.local->kind)
    1932              :     {
    1933           25 :         case RT_NODE_KIND_4:
    1934              :             {
    1935           25 :                 RT_NODE_4  *n4 = (RT_NODE_4 *) node.local;
    1936              : 
    1937           37 :                 for (int i = 0; i < n4->base.count; i++)
    1938              :                 {
    1939           12 :                     RT_PTR_ALLOC child = n4->children[i];
    1940              : 
    1941           12 :                     if (shift > 0)
    1942            5 :                         RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
    1943            7 :                     else if (!RT_CHILDPTR_IS_VALUE(child))
    1944            5 :                         dsa_free(tree->dsa, child);
    1945              :                 }
    1946              : 
    1947           25 :                 break;
    1948              :             }
    1949            2 :         case RT_NODE_KIND_16:
    1950              :             {
    1951            2 :                 RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
    1952              : 
    1953           37 :                 for (int i = 0; i < n16->base.count; i++)
    1954              :                 {
    1955           35 :                     RT_PTR_ALLOC child = n16->children[i];
    1956              : 
    1957           35 :                     if (shift > 0)
    1958            0 :                         RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
    1959           35 :                     else if (!RT_CHILDPTR_IS_VALUE(child))
    1960           35 :                         dsa_free(tree->dsa, child);
    1961              :                 }
    1962              : 
    1963            2 :                 break;
    1964              :             }
    1965            3 :         case RT_NODE_KIND_48:
    1966              :             {
    1967            3 :                 RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
    1968              : 
    1969          771 :                 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
    1970              :                 {
    1971              :                     RT_PTR_ALLOC child;
    1972              : 
    1973          768 :                     if (!RT_NODE_48_IS_CHUNK_USED(n48, i))
    1974          633 :                         continue;
    1975              : 
    1976          135 :                     child = *RT_NODE_48_GET_CHILD(n48, i);
    1977              : 
    1978          135 :                     if (shift > 0)
    1979            0 :                         RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
    1980          135 :                     else if (!RT_CHILDPTR_IS_VALUE(child))
    1981          135 :                         dsa_free(tree->dsa, child);
    1982              :                 }
    1983              : 
    1984            3 :                 break;
    1985              :             }
    1986            4 :         case RT_NODE_KIND_256:
    1987              :             {
    1988            4 :                 RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
    1989              : 
    1990         1028 :                 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
    1991              :                 {
    1992              :                     RT_PTR_ALLOC child;
    1993              : 
    1994         1024 :                     if (!RT_NODE_256_IS_CHUNK_USED(n256, i))
    1995          517 :                         continue;
    1996              : 
    1997          507 :                     child = *RT_NODE_256_GET_CHILD(n256, i);
    1998              : 
    1999          507 :                     if (shift > 0)
    2000            0 :                         RT_FREE_RECURSE(tree, child, shift - RT_SPAN);
    2001          507 :                     else if (!RT_CHILDPTR_IS_VALUE(child))
    2002          507 :                         dsa_free(tree->dsa, child);
    2003              :                 }
    2004              : 
    2005            4 :                 break;
    2006              :             }
    2007              :     }
    2008              : 
    2009              :     /* Free the inner node */
    2010           34 :     dsa_free(tree->dsa, ptr);
    2011           34 : }
    2012              : #endif
    2013              : 
    2014              : /*
    2015              :  * Free the radix tree, including all nodes and leaves.
    2016              :  */
    2017              : RT_SCOPE void
    2018          699 : RT_FREE(RT_RADIX_TREE * tree)
    2019              : {
    2020              : #ifdef RT_SHMEM
    2021              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    2022              : 
    2023              :     /* Free all memory used for radix tree nodes */
    2024              :     Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
    2025           29 :     RT_FREE_RECURSE(tree, tree->ctl->root, tree->ctl->start_shift);
    2026              : 
    2027              :     /*
    2028              :      * Vandalize the control block to help catch programming error where other
    2029              :      * backends access the memory formerly occupied by this radix tree.
    2030              :      */
    2031           29 :     tree->ctl->magic = 0;
    2032           29 :     dsa_free(tree->dsa, tree->ctl->handle);
    2033              : #else
    2034              :     /*
    2035              :      * Free all space allocated within the leaf context and delete all child
    2036              :      * contexts such as those used for nodes.
    2037              :      */
    2038          670 :     MemoryContextReset(tree->leaf_context);
    2039              : 
    2040          670 :     pfree(tree->ctl);
    2041              : #endif
    2042          699 :     pfree(tree);
    2043          699 : }
    2044              : 
    2045              : /***************** ITERATION *****************/
    2046              : 
    2047              : /*
    2048              :  * Create and return an iterator for the given radix tree
    2049              :  * in the caller's memory context.
    2050              :  *
    2051              :  * Taking a lock in shared mode during the iteration is the caller's
    2052              :  * responsibility.
    2053              :  */
    2054              : RT_SCOPE    RT_ITER *
    2055          676 : RT_BEGIN_ITERATE(RT_RADIX_TREE * tree)
    2056              : {
    2057              :     RT_ITER    *iter;
    2058              :     RT_CHILD_PTR root;
    2059              : 
    2060          676 :     iter = palloc0_object(RT_ITER);
    2061          676 :     iter->tree = tree;
    2062              : 
    2063              :     Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
    2064          676 :     root.alloc = iter->tree->ctl->root;
    2065          676 :     RT_PTR_SET_LOCAL(tree, &root);
    2066              : 
    2067          676 :     iter->top_level = iter->tree->ctl->start_shift / RT_SPAN;
    2068              : 
    2069              :     /* Set the root to start */
    2070          676 :     iter->cur_level = iter->top_level;
    2071          676 :     iter->node_iters[iter->cur_level].node = root;
    2072          676 :     iter->node_iters[iter->cur_level].idx = 0;
    2073              : 
    2074          676 :     return iter;
    2075              : }
    2076              : 
    2077              : /*
    2078              :  * Scan the inner node and return the next child pointer if one exists, otherwise
    2079              :  * return NULL.
    2080              :  */
    2081              : static inline RT_PTR_ALLOC *
    2082       127861 : RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
    2083              : {
    2084       127861 :     uint8       key_chunk = 0;
    2085              :     RT_NODE_ITER *node_iter;
    2086              :     RT_CHILD_PTR node;
    2087       127861 :     RT_PTR_ALLOC *slot = NULL;
    2088              : 
    2089              : #ifdef RT_SHMEM
    2090              :     Assert(iter->tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    2091              : #endif
    2092              : 
    2093       127861 :     node_iter = &(iter->node_iters[level]);
    2094       127861 :     node = node_iter->node;
    2095              : 
    2096              :     Assert(node.local != NULL);
    2097              : 
    2098       127861 :     switch ((node.local)->kind)
    2099              :     {
    2100        16700 :         case RT_NODE_KIND_4:
    2101              :             {
    2102        16700 :                 RT_NODE_4  *n4 = (RT_NODE_4 *) (node.local);
    2103              : 
    2104        16700 :                 if (node_iter->idx >= n4->base.count)
    2105         8178 :                     return NULL;
    2106              : 
    2107         8522 :                 slot = &n4->children[node_iter->idx];
    2108         8522 :                 key_chunk = n4->chunks[node_iter->idx];
    2109         8522 :                 node_iter->idx++;
    2110         8522 :                 break;
    2111              :             }
    2112         5372 :         case RT_NODE_KIND_16:
    2113              :             {
    2114         5372 :                 RT_NODE_16 *n16 = (RT_NODE_16 *) (node.local);
    2115              : 
    2116         5372 :                 if (node_iter->idx >= n16->base.count)
    2117          267 :                     return NULL;
    2118              : 
    2119         5105 :                 slot = &n16->children[node_iter->idx];
    2120         5105 :                 key_chunk = n16->chunks[node_iter->idx];
    2121         5105 :                 node_iter->idx++;
    2122         5105 :                 break;
    2123              :             }
    2124        92730 :         case RT_NODE_KIND_48:
    2125              :             {
    2126        92730 :                 RT_NODE_48 *n48 = (RT_NODE_48 *) (node.local);
    2127              :                 int         chunk;
    2128              : 
    2129       519497 :                 for (chunk = node_iter->idx; chunk < RT_NODE_MAX_SLOTS; chunk++)
    2130              :                 {
    2131       517478 :                     if (RT_NODE_48_IS_CHUNK_USED(n48, chunk))
    2132        90711 :                         break;
    2133              :                 }
    2134              : 
    2135        92730 :                 if (chunk >= RT_NODE_MAX_SLOTS)
    2136         2019 :                     return NULL;
    2137              : 
    2138        90711 :                 slot = RT_NODE_48_GET_CHILD(n48, chunk);
    2139              : 
    2140        90711 :                 key_chunk = chunk;
    2141        90711 :                 node_iter->idx = chunk + 1;
    2142        90711 :                 break;
    2143              :             }
    2144        13059 :         case RT_NODE_KIND_256:
    2145              :             {
    2146        13059 :                 RT_NODE_256 *n256 = (RT_NODE_256 *) (node.local);
    2147              :                 int         chunk;
    2148              : 
    2149        18497 :                 for (chunk = node_iter->idx; chunk < RT_NODE_MAX_SLOTS; chunk++)
    2150              :                 {
    2151        18432 :                     if (RT_NODE_256_IS_CHUNK_USED(n256, chunk))
    2152        12994 :                         break;
    2153              :                 }
    2154              : 
    2155        13059 :                 if (chunk >= RT_NODE_MAX_SLOTS)
    2156           65 :                     return NULL;
    2157              : 
    2158        12994 :                 slot = RT_NODE_256_GET_CHILD(n256, chunk);
    2159              : 
    2160        12994 :                 key_chunk = chunk;
    2161        12994 :                 node_iter->idx = chunk + 1;
    2162        12994 :                 break;
    2163              :             }
    2164              :     }
    2165              : 
    2166              :     /* Update the key */
    2167       117332 :     iter->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
    2168       117332 :     iter->key |= (((uint64) key_chunk) << (level * RT_SPAN));
    2169              : 
    2170       117332 :     return slot;
    2171              : }
    2172              : 
    2173              : /*
    2174              :  * Return pointer to value and set key_p as long as there is a key.  Otherwise
    2175              :  * return NULL.
    2176              :  */
    2177              : RT_SCOPE    RT_VALUE_TYPE *
    2178       108010 : RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
    2179              : {
    2180       108010 :     RT_PTR_ALLOC *slot = NULL;
    2181              : 
    2182       128506 :     while (iter->cur_level <= iter->top_level)
    2183              :     {
    2184              :         RT_CHILD_PTR node;
    2185              : 
    2186       127861 :         slot = RT_NODE_ITERATE_NEXT(iter, iter->cur_level);
    2187              : 
    2188       127861 :         if (iter->cur_level == 0 && slot != NULL)
    2189              :         {
    2190              :             /* Found a value at the leaf node */
    2191       107365 :             *key_p = iter->key;
    2192       107365 :             node.alloc = *slot;
    2193              : 
    2194       107365 :             if (RT_CHILDPTR_IS_VALUE(*slot))
    2195       107365 :                 return (RT_VALUE_TYPE *) slot;
    2196              :             else
    2197              :             {
    2198        11887 :                 RT_PTR_SET_LOCAL(iter->tree, &node);
    2199        11887 :                 return (RT_VALUE_TYPE *) node.local;
    2200              :             }
    2201              :         }
    2202              : 
    2203        20496 :         if (slot != NULL)
    2204              :         {
    2205              :             /* Found the child slot, move down the tree */
    2206         9967 :             node.alloc = *slot;
    2207         9967 :             RT_PTR_SET_LOCAL(iter->tree, &node);
    2208              : 
    2209         9967 :             iter->cur_level--;
    2210         9967 :             iter->node_iters[iter->cur_level].node = node;
    2211         9967 :             iter->node_iters[iter->cur_level].idx = 0;
    2212              :         }
    2213              :         else
    2214              :         {
    2215              :             /* Not found the child slot, move up the tree */
    2216        10529 :             iter->cur_level++;
    2217              :         }
    2218              :     }
    2219              : 
    2220              :     /* We've visited all nodes, so the iteration finished */
    2221          645 :     return NULL;
    2222              : }
    2223              : 
    2224              : /*
    2225              :  * Terminate the iteration. The caller is responsible for releasing any locks.
    2226              :  */
    2227              : RT_SCOPE void
    2228          676 : RT_END_ITERATE(RT_ITER * iter)
    2229              : {
    2230          676 :     pfree(iter);
    2231          676 : }
    2232              : 
    2233              : /***************** DELETION *****************/
    2234              : 
    2235              : #ifdef RT_USE_DELETE
    2236              : 
    2237              : /* Delete the element at "deletepos" */
    2238              : static inline void
    2239        22015 : RT_SHIFT_ARRAYS_AND_DELETE(uint8 *chunks, RT_PTR_ALLOC * children, int count, int deletepos)
    2240              : {
    2241              :     /*
    2242              :      * This is basically a memmove, but written in a simple loop for speed on
    2243              :      * small inputs.
    2244              :      */
    2245        99803 :     for (int i = deletepos; i < count - 1; i++)
    2246              :     {
    2247              :         /* workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481 */
    2248              : #ifdef __GNUC__
    2249        77788 :         __asm__("");
    2250              : #endif
    2251        77788 :         chunks[i] = chunks[i + 1];
    2252        77788 :         children[i] = children[i + 1];
    2253              :     }
    2254        22015 : }
    2255              : 
    2256              : /*
    2257              :  * Copy both chunk and slot arrays into the right
    2258              :  * place. The element at "deletepos" is deleted by skipping it.
    2259              :  */
    2260              : static inline void
    2261         2081 : RT_COPY_ARRAYS_AND_DELETE(uint8 *dst_chunks, RT_PTR_ALLOC * dst_children,
    2262              :                           uint8 *src_chunks, RT_PTR_ALLOC * src_children,
    2263              :                           int count, int deletepos)
    2264              : {
    2265         8324 :     for (int i = 0; i < count - 1; i++)
    2266              :     {
    2267              :         /*
    2268              :          * use a branch-free computation to skip the index of the deleted
    2269              :          * element
    2270              :          */
    2271         6243 :         int         sourceidx = i + (i >= deletepos);
    2272         6243 :         int         destidx = i;
    2273              : 
    2274         6243 :         dst_chunks[destidx] = src_chunks[sourceidx];
    2275         6243 :         dst_children[destidx] = src_children[sourceidx];
    2276              :     }
    2277         2081 : }
    2278              : 
    2279              : /*
    2280              :  * Note: While all node-growing functions are called to perform an insertion
    2281              :  * when no more space is available, shrinking is not a hard-and-fast requirement.
    2282              :  * When shrinking nodes, we generally wait until the count is about 3/4 of
    2283              :  * the next lower node's fanout. This prevents ping-ponging between different
    2284              :  * node sizes.
    2285              :  *
    2286              :  * Some shrinking functions delete first and then shrink, either because we
    2287              :  * must or because it's fast and simple that way. Sometimes it's faster to
    2288              :  * delete while shrinking.
    2289              :  */
    2290              : 
    2291              : /*
    2292              :  * Move contents of a node256 to a node48. Any deletion should have happened
    2293              :  * in the caller.
    2294              :  */
    2295              : static void pg_noinline
    2296           16 : RT_SHRINK_NODE_256(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk)
    2297              : {
    2298           16 :     RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
    2299              :     RT_CHILD_PTR newnode;
    2300              :     RT_NODE_48 *new48;
    2301           16 :     int         slot_idx = 0;
    2302              : 
    2303              :     /* initialize new node */
    2304           16 :     newnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_48, RT_CLASS_48);
    2305           16 :     new48 = (RT_NODE_48 *) newnode.local;
    2306              : 
    2307              :     /* copy over the entries */
    2308           16 :     RT_COPY_COMMON(newnode, node);
    2309         4112 :     for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
    2310              :     {
    2311         4096 :         if (RT_NODE_256_IS_CHUNK_USED(n256, i))
    2312              :         {
    2313          768 :             new48->slot_idxs[i] = slot_idx;
    2314          768 :             new48->children[slot_idx] = n256->children[i];
    2315          768 :             slot_idx++;
    2316              :         }
    2317              :     }
    2318              : 
    2319              :     /*
    2320              :      * Since we just copied a dense array, we can fill "isset" using a single
    2321              :      * store, provided the length of that array is at most the number of bits
    2322              :      * in a bitmapword.
    2323              :      */
    2324              :     Assert(n256->base.count <= BITS_PER_BITMAPWORD);
    2325           16 :     new48->isset[0] = (bitmapword) (((uint64) 1 << n256->base.count) - 1);
    2326              : 
    2327              :     /* free old node and update reference in parent */
    2328           16 :     *parent_slot = newnode.alloc;
    2329           16 :     RT_FREE_NODE(tree, node);
    2330           16 : }
    2331              : 
    2332              : static inline void
    2333         4486 : RT_REMOVE_CHILD_256(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk)
    2334              : {
    2335              :     int         shrink_threshold;
    2336         4486 :     RT_NODE_256 *n256 = (RT_NODE_256 *) node.local;
    2337         4486 :     int         idx = RT_BM_IDX(chunk);
    2338         4486 :     int         bitnum = RT_BM_BIT(chunk);
    2339              : 
    2340              :     /* Mark the slot free for "chunk" */
    2341         4486 :     n256->isset[idx] &= ~((bitmapword) 1 << bitnum);
    2342              : 
    2343         4486 :     n256->base.count--;
    2344              : 
    2345              :     /*
    2346              :      * A full node256 will have a count of zero because of overflow, so we
    2347              :      * delete first before checking the shrink threshold.
    2348              :      */
    2349              :     Assert(n256->base.count > 0);
    2350              : 
    2351              :     /* This simplifies RT_SHRINK_NODE_256() */
    2352         4486 :     shrink_threshold = BITS_PER_BITMAPWORD;
    2353         4486 :     shrink_threshold = Min(RT_FANOUT_48 / 4 * 3, shrink_threshold);
    2354              : 
    2355         4486 :     if (n256->base.count <= shrink_threshold)
    2356           16 :         RT_SHRINK_NODE_256(tree, parent_slot, node, chunk);
    2357         4486 : }
    2358              : 
    2359              : /*
    2360              :  * Move contents of a node48 to a node16. Any deletion should have happened
    2361              :  * in the caller.
    2362              :  */
    2363              : static void pg_noinline
    2364         2025 : RT_SHRINK_NODE_48(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk)
    2365              : {
    2366         2025 :     RT_NODE_48 *n48 = (RT_NODE_48 *) (node.local);
    2367              :     RT_CHILD_PTR newnode;
    2368              :     RT_NODE_16 *new16;
    2369         2025 :     int         destidx = 0;
    2370              : 
    2371              :     /*
    2372              :      * Initialize new node. For now we skip the larger node16 size class for
    2373              :      * simplicity.
    2374              :      */
    2375         2025 :     newnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_16, RT_CLASS_16_LO);
    2376         2025 :     new16 = (RT_NODE_16 *) newnode.local;
    2377              : 
    2378              :     /* copy over all existing entries */
    2379         2025 :     RT_COPY_COMMON(newnode, node);
    2380       520425 :     for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
    2381              :     {
    2382       518400 :         if (n48->slot_idxs[chunk] != RT_INVALID_SLOT_IDX)
    2383              :         {
    2384        24300 :             new16->chunks[destidx] = chunk;
    2385        24300 :             new16->children[destidx] = n48->children[n48->slot_idxs[chunk]];
    2386        24300 :             destidx++;
    2387              :         }
    2388              :     }
    2389              : 
    2390              :     Assert(destidx < new16->base.fanout);
    2391              : 
    2392         2025 :     RT_VERIFY_NODE((RT_NODE *) new16);
    2393              : 
    2394              :     /* free old node and update reference in parent */
    2395         2025 :     *parent_slot = newnode.alloc;
    2396         2025 :     RT_FREE_NODE(tree, node);
    2397         2025 : }
    2398              : 
    2399              : static inline void
    2400        66763 : RT_REMOVE_CHILD_48(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk)
    2401              : {
    2402        66763 :     RT_NODE_48 *n48 = (RT_NODE_48 *) node.local;
    2403        66763 :     int         deletepos = n48->slot_idxs[chunk];
    2404              : 
    2405              :     /* For now we skip the larger node16 size class for simplicity */
    2406        66763 :     int         shrink_threshold = RT_FANOUT_16_LO / 4 * 3;
    2407              :     int         idx;
    2408              :     int         bitnum;
    2409              : 
    2410              :     Assert(deletepos != RT_INVALID_SLOT_IDX);
    2411              : 
    2412        66763 :     idx = RT_BM_IDX(deletepos);
    2413        66763 :     bitnum = RT_BM_BIT(deletepos);
    2414        66763 :     n48->isset[idx] &= ~((bitmapword) 1 << bitnum);
    2415        66763 :     n48->slot_idxs[chunk] = RT_INVALID_SLOT_IDX;
    2416              : 
    2417        66763 :     n48->base.count--;
    2418              : 
    2419              :     /*
    2420              :      * To keep shrinking simple, do it after deleting, which is fast for
    2421              :      * node48 anyway.
    2422              :      */
    2423        66763 :     if (n48->base.count <= shrink_threshold)
    2424         2025 :         RT_SHRINK_NODE_48(tree, parent_slot, node, chunk);
    2425        66763 : }
    2426              : 
    2427              : /*
    2428              :  * Move contents of a node16 to a node4, and delete the one at "deletepos".
    2429              :  * By deleting as we move, we can avoid memmove operations in the new
    2430              :  * node.
    2431              :  */
    2432              : static void pg_noinline
    2433         2081 : RT_SHRINK_NODE_16(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 deletepos)
    2434              : {
    2435         2081 :     RT_NODE_16 *n16 = (RT_NODE_16 *) (node.local);
    2436              :     RT_CHILD_PTR newnode;
    2437              :     RT_NODE_4  *new4;
    2438              : 
    2439              :     /* initialize new node */
    2440         2081 :     newnode = RT_ALLOC_NODE(tree, RT_NODE_KIND_4, RT_CLASS_4);
    2441         2081 :     new4 = (RT_NODE_4 *) newnode.local;
    2442              : 
    2443              :     /* copy over existing entries, except for the one at "deletepos" */
    2444         2081 :     RT_COPY_COMMON(newnode, node);
    2445         2081 :     RT_COPY_ARRAYS_AND_DELETE(new4->chunks, new4->children,
    2446         2081 :                               n16->chunks, n16->children,
    2447         2081 :                               n16->base.count, deletepos);
    2448              : 
    2449         2081 :     new4->base.count--;
    2450         2081 :     RT_VERIFY_NODE((RT_NODE *) new4);
    2451              : 
    2452              :     /* free old node and update reference in parent */
    2453         2081 :     *parent_slot = newnode.alloc;
    2454         2081 :     RT_FREE_NODE(tree, node);
    2455         2081 : }
    2456              : 
    2457              : static inline void
    2458        19916 : RT_REMOVE_CHILD_16(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk, RT_PTR_ALLOC * slot)
    2459              : {
    2460        19916 :     RT_NODE_16 *n16 = (RT_NODE_16 *) node.local;
    2461        19916 :     int         deletepos = slot - n16->children;
    2462              : 
    2463              :     /*
    2464              :      * When shrinking to node4, 4 is hard-coded. After shrinking, the new node
    2465              :      * will end up with 3 elements and 3 is the largest count where linear
    2466              :      * search is faster than SIMD, at least on x86-64.
    2467              :      */
    2468        19916 :     if (n16->base.count <= 4)
    2469              :     {
    2470         2081 :         RT_SHRINK_NODE_16(tree, parent_slot, node, deletepos);
    2471         2081 :         return;
    2472              :     }
    2473              : 
    2474              :     Assert(deletepos >= 0);
    2475              :     Assert(n16->chunks[deletepos] == chunk);
    2476              : 
    2477        17835 :     RT_SHIFT_ARRAYS_AND_DELETE(n16->chunks, n16->children,
    2478        17835 :                                n16->base.count, deletepos);
    2479        17835 :     n16->base.count--;
    2480              : }
    2481              : 
    2482              : static inline void
    2483        19931 : RT_REMOVE_CHILD_4(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk, RT_PTR_ALLOC * slot)
    2484              : {
    2485        19931 :     RT_NODE_4  *n4 = (RT_NODE_4 *) node.local;
    2486              : 
    2487        19931 :     if (n4->base.count == 1)
    2488              :     {
    2489              :         Assert(n4->chunks[0] == chunk);
    2490              : 
    2491              :         /*
    2492              :          * If we're deleting the last entry from the root child node don't
    2493              :          * free it, but mark both the tree and the root child node empty. That
    2494              :          * way, RT_SET can assume it exists.
    2495              :          */
    2496        15751 :         if (parent_slot == &tree->ctl->root)
    2497              :         {
    2498           31 :             n4->base.count = 0;
    2499           31 :             tree->ctl->start_shift = 0;
    2500           31 :             tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(0);
    2501              :         }
    2502              :         else
    2503              :         {
    2504              :             /*
    2505              :              * Deleting last entry, so just free the entire node.
    2506              :              * RT_DELETE_RECURSIVE has already freed the value and lower-level
    2507              :              * children.
    2508              :              */
    2509        15720 :             RT_FREE_NODE(tree, node);
    2510              : 
    2511              :             /*
    2512              :              * Also null out the parent's slot -- this tells the next higher
    2513              :              * level to delete its child pointer
    2514              :              */
    2515        15720 :             *parent_slot = RT_INVALID_PTR_ALLOC;
    2516              :         }
    2517              :     }
    2518              :     else
    2519              :     {
    2520         4180 :         int         deletepos = slot - n4->children;
    2521              : 
    2522              :         Assert(deletepos >= 0);
    2523              :         Assert(n4->chunks[deletepos] == chunk);
    2524              : 
    2525         4180 :         RT_SHIFT_ARRAYS_AND_DELETE(n4->chunks, n4->children,
    2526         4180 :                                    n4->base.count, deletepos);
    2527              : 
    2528         4180 :         n4->base.count--;
    2529              :     }
    2530        19931 : }
    2531              : 
    2532              : /*
    2533              :  * Delete the child pointer corresponding to "key" in the given node.
    2534              :  */
    2535              : static inline void
    2536       111096 : RT_NODE_DELETE(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR node, uint8 chunk, RT_PTR_ALLOC * slot)
    2537              : {
    2538       111096 :     switch ((node.local)->kind)
    2539              :     {
    2540        19931 :         case RT_NODE_KIND_4:
    2541              :             {
    2542        19931 :                 RT_REMOVE_CHILD_4(tree, parent_slot, node, chunk, slot);
    2543        19931 :                 return;
    2544              :             }
    2545        19916 :         case RT_NODE_KIND_16:
    2546              :             {
    2547        19916 :                 RT_REMOVE_CHILD_16(tree, parent_slot, node, chunk, slot);
    2548        19916 :                 return;
    2549              :             }
    2550        66763 :         case RT_NODE_KIND_48:
    2551              :             {
    2552        66763 :                 RT_REMOVE_CHILD_48(tree, parent_slot, node, chunk);
    2553        66763 :                 return;
    2554              :             }
    2555         4486 :         case RT_NODE_KIND_256:
    2556              :             {
    2557         4486 :                 RT_REMOVE_CHILD_256(tree, parent_slot, node, chunk);
    2558         4486 :                 return;
    2559              :             }
    2560            0 :         default:
    2561            0 :             pg_unreachable();
    2562              :     }
    2563              : }
    2564              : 
    2565              : /* workhorse for RT_DELETE */
    2566              : static bool
    2567       415123 : RT_DELETE_RECURSIVE(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, uint64 key, int shift)
    2568              : {
    2569              :     RT_PTR_ALLOC *slot;
    2570              :     RT_CHILD_PTR node;
    2571       415123 :     uint8       chunk = RT_GET_KEY_CHUNK(key, shift);
    2572              : 
    2573       415123 :     node.alloc = *parent_slot;
    2574       415123 :     RT_PTR_SET_LOCAL(tree, &node);
    2575       415123 :     slot = RT_NODE_SEARCH(node.local, chunk);
    2576              : 
    2577       415123 :     if (slot == NULL)
    2578         8981 :         return false;
    2579              : 
    2580       406142 :     if (shift == 0)
    2581              :     {
    2582        95376 :         if (!RT_CHILDPTR_IS_VALUE(*slot))
    2583            0 :             RT_FREE_LEAF(tree, *slot);
    2584              : 
    2585        95376 :         RT_NODE_DELETE(tree, parent_slot, node, chunk, slot);
    2586        95376 :         return true;
    2587              :     }
    2588              :     else
    2589              :     {
    2590              :         bool        deleted;
    2591              : 
    2592       310766 :         deleted = RT_DELETE_RECURSIVE(tree, slot, key, shift - RT_SPAN);
    2593              : 
    2594              :         /* Child node was freed, so delete its slot now */
    2595       310766 :         if (*slot == RT_INVALID_PTR_ALLOC)
    2596              :         {
    2597              :             Assert(deleted);
    2598        15720 :             RT_NODE_DELETE(tree, parent_slot, node, chunk, slot);
    2599              :         }
    2600              : 
    2601       310766 :         return deleted;
    2602              :     }
    2603              : }
    2604              : 
    2605              : /*
    2606              :  * Delete the given key from the radix tree. If the key is found delete it
    2607              :  * and return true, otherwise do nothing and return false.
    2608              :  *
    2609              :  * Taking a lock in exclusive mode is the caller's responsibility.
    2610              :  */
    2611              : RT_SCOPE bool
    2612       104357 : RT_DELETE(RT_RADIX_TREE * tree, uint64 key)
    2613              : {
    2614              :     bool        deleted;
    2615              : 
    2616              : #ifdef RT_SHMEM
    2617              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    2618              : #endif
    2619              : 
    2620       104357 :     if (key > tree->ctl->max_val)
    2621            0 :         return false;
    2622              : 
    2623              :     Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
    2624       104357 :     deleted = RT_DELETE_RECURSIVE(tree, &tree->ctl->root,
    2625       104357 :                                   key, tree->ctl->start_shift);
    2626              : 
    2627              :     /* Found the key to delete. Update the statistics */
    2628       104357 :     if (deleted)
    2629              :     {
    2630        95376 :         tree->ctl->num_keys--;
    2631              :         Assert(tree->ctl->num_keys >= 0);
    2632              :     }
    2633              : 
    2634       104357 :     return deleted;
    2635              : }
    2636              : 
    2637              : #endif                          /* RT_USE_DELETE */
    2638              : 
    2639              : /***************** UTILITY FUNCTIONS *****************/
    2640              : 
    2641              : /*
    2642              :  * Return the statistics of the amount of memory used by the radix tree.
    2643              :  *
    2644              :  * Since dsa_get_total_size() does appropriate locking, the caller doesn't
    2645              :  * need to take a lock.
    2646              :  */
    2647              : RT_SCOPE uint64
    2648       147139 : RT_MEMORY_USAGE(RT_RADIX_TREE * tree)
    2649              : {
    2650       147139 :     size_t      total = 0;
    2651              : 
    2652              : #ifdef RT_SHMEM
    2653              :     Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
    2654         1210 :     total = dsa_get_total_size(tree->dsa);
    2655              : #else
    2656       145929 :     total = MemoryContextMemAllocated(tree->leaf_context, true);
    2657              : #endif
    2658              : 
    2659       147139 :     return total;
    2660              : }
    2661              : 
    2662              : /*
    2663              :  * Perform some sanity checks on the given node.
    2664              :  */
    2665              : static void
    2666       115927 : RT_VERIFY_NODE(RT_NODE * node)
    2667              : {
    2668              : #ifdef USE_ASSERT_CHECKING
    2669              : 
    2670              :     switch (node->kind)
    2671              :     {
    2672              :         case RT_NODE_KIND_4:
    2673              :             {
    2674              :                 RT_NODE_4  *n4 = (RT_NODE_4 *) node;
    2675              : 
    2676              :                 /* RT_DUMP_NODE(node); */
    2677              : 
    2678              :                 for (int i = 1; i < n4->base.count; i++)
    2679              :                     Assert(n4->chunks[i - 1] < n4->chunks[i]);
    2680              : 
    2681              :                 break;
    2682              :             }
    2683              :         case RT_NODE_KIND_16:
    2684              :             {
    2685              :                 RT_NODE_16 *n16 = (RT_NODE_16 *) node;
    2686              : 
    2687              :                 /* RT_DUMP_NODE(node); */
    2688              : 
    2689              :                 for (int i = 1; i < n16->base.count; i++)
    2690              :                     Assert(n16->chunks[i - 1] < n16->chunks[i]);
    2691              : 
    2692              :                 break;
    2693              :             }
    2694              :         case RT_NODE_KIND_48:
    2695              :             {
    2696              :                 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
    2697              :                 int         cnt = 0;
    2698              : 
    2699              :                 /* RT_DUMP_NODE(node); */
    2700              : 
    2701              :                 for (int i = 0; i < RT_NODE_MAX_SLOTS; i++)
    2702              :                 {
    2703              :                     uint8       slot = n48->slot_idxs[i];
    2704              :                     int         idx = RT_BM_IDX(slot);
    2705              :                     int         bitnum = RT_BM_BIT(slot);
    2706              : 
    2707              :                     if (!RT_NODE_48_IS_CHUNK_USED(n48, i))
    2708              :                         continue;
    2709              : 
    2710              :                     /* Check if the corresponding slot is used */
    2711              :                     Assert(slot < node->fanout);
    2712              :                     Assert((n48->isset[idx] & ((bitmapword) 1 << bitnum)) != 0);
    2713              : 
    2714              :                     cnt++;
    2715              :                 }
    2716              : 
    2717              :                 Assert(n48->base.count == cnt);
    2718              : 
    2719              :                 break;
    2720              :             }
    2721              :         case RT_NODE_KIND_256:
    2722              :             {
    2723              :                 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
    2724              :                 int         cnt;
    2725              : 
    2726              :                 /* RT_DUMP_NODE(node); */
    2727              : 
    2728              :                 cnt = pg_popcount((const char *) n256->isset,
    2729              :                                   RT_NODE_MAX_SLOTS / BITS_PER_BYTE);
    2730              : 
    2731              :                 /*
    2732              :                  * Check if the number of used chunk matches, accounting for
    2733              :                  * overflow
    2734              :                  */
    2735              :                 if (cnt == RT_FANOUT_256)
    2736              :                     Assert(n256->base.count == 0);
    2737              :                 else
    2738              :                     Assert(n256->base.count == cnt);
    2739              : 
    2740              :                 break;
    2741              :             }
    2742              :     }
    2743              : #endif
    2744       115927 : }
    2745              : 
    2746              : /***************** DEBUG FUNCTIONS *****************/
    2747              : 
    2748              : #ifdef RT_DEBUG
    2749              : 
    2750              : /*
    2751              :  * Print out tree stats, some of which are only collected in debugging builds.
    2752              :  */
    2753              : RT_SCOPE void
    2754           61 : RT_STATS(RT_RADIX_TREE * tree)
    2755              : {
    2756           61 :     fprintf(stderr, "max_val = " UINT64_FORMAT "\n", tree->ctl->max_val);
    2757           61 :     fprintf(stderr, "num_keys = %" PRId64 "\n", tree->ctl->num_keys);
    2758              : 
    2759              : #ifdef RT_SHMEM
    2760              :     fprintf(stderr, "handle = " DSA_POINTER_FORMAT "\n", tree->ctl->handle);
    2761              : #endif
    2762              : 
    2763           61 :     fprintf(stderr, "height = %d", tree->ctl->start_shift / RT_SPAN);
    2764              : 
    2765          366 :     for (int i = 0; i < RT_NUM_SIZE_CLASSES; i++)
    2766              :     {
    2767          305 :         RT_SIZE_CLASS_ELEM size_class = RT_SIZE_CLASS_INFO[i];
    2768              : 
    2769          305 :         fprintf(stderr, ", n%d = %" PRId64, size_class.fanout, tree->ctl->num_nodes[i]);
    2770              :     }
    2771              : 
    2772           61 :     fprintf(stderr, ", leaves = %" PRId64, tree->ctl->num_leaves);
    2773              : 
    2774           61 :     fprintf(stderr, "\n");
    2775           61 : }
    2776              : 
    2777              : /*
    2778              :  * Print out debugging information about the given node.
    2779              :  */
    2780              : static void
    2781              : pg_attribute_unused()
    2782            0 : RT_DUMP_NODE(RT_NODE * node)
    2783              : {
    2784              : #ifdef RT_SHMEM
    2785              : #define RT_CHILD_PTR_FORMAT DSA_POINTER_FORMAT
    2786              : #else
    2787              : #define RT_CHILD_PTR_FORMAT "%p"
    2788              : #endif
    2789              : 
    2790            0 :     fprintf(stderr, "kind %d, fanout %d, count %u\n",
    2791            0 :             (node->kind == RT_NODE_KIND_4) ? 4 :
    2792            0 :             (node->kind == RT_NODE_KIND_16) ? 16 :
    2793            0 :             (node->kind == RT_NODE_KIND_48) ? 48 : 256,
    2794            0 :             node->fanout == 0 ? 256 : node->fanout,
    2795            0 :             node->count == 0 ? 256 : node->count);
    2796              : 
    2797            0 :     switch (node->kind)
    2798              :     {
    2799            0 :         case RT_NODE_KIND_4:
    2800              :             {
    2801            0 :                 RT_NODE_4  *n4 = (RT_NODE_4 *) node;
    2802              : 
    2803            0 :                 fprintf(stderr, "chunks and slots:\n");
    2804            0 :                 for (int i = 0; i < n4->base.count; i++)
    2805              :                 {
    2806            0 :                     fprintf(stderr, "  [%d] chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
    2807            0 :                             i, n4->chunks[i], n4->children[i]);
    2808              :                 }
    2809              : 
    2810            0 :                 break;
    2811              :             }
    2812            0 :         case RT_NODE_KIND_16:
    2813              :             {
    2814            0 :                 RT_NODE_16 *n16 = (RT_NODE_16 *) node;
    2815              : 
    2816            0 :                 fprintf(stderr, "chunks and slots:\n");
    2817            0 :                 for (int i = 0; i < n16->base.count; i++)
    2818              :                 {
    2819            0 :                     fprintf(stderr, "  [%d] chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
    2820            0 :                             i, n16->chunks[i], n16->children[i]);
    2821              :                 }
    2822            0 :                 break;
    2823              :             }
    2824            0 :         case RT_NODE_KIND_48:
    2825              :             {
    2826            0 :                 RT_NODE_48 *n48 = (RT_NODE_48 *) node;
    2827            0 :                 char       *sep = "";
    2828              : 
    2829            0 :                 fprintf(stderr, "slot_idxs: \n");
    2830            0 :                 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
    2831              :                 {
    2832            0 :                     if (!RT_NODE_48_IS_CHUNK_USED(n48, chunk))
    2833            0 :                         continue;
    2834              : 
    2835            0 :                     fprintf(stderr, "  idx[%d] = %d\n",
    2836            0 :                             chunk, n48->slot_idxs[chunk]);
    2837              :                 }
    2838              : 
    2839            0 :                 fprintf(stderr, "isset-bitmap: ");
    2840            0 :                 for (int i = 0; i < (RT_FANOUT_48_MAX / BITS_PER_BYTE); i++)
    2841              :                 {
    2842            0 :                     fprintf(stderr, "%s%x", sep, ((uint8 *) n48->isset)[i]);
    2843            0 :                     sep = " ";
    2844              :                 }
    2845            0 :                 fprintf(stderr, "\n");
    2846              : 
    2847            0 :                 fprintf(stderr, "chunks and slots:\n");
    2848            0 :                 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
    2849              :                 {
    2850            0 :                     if (!RT_NODE_48_IS_CHUNK_USED(n48, chunk))
    2851            0 :                         continue;
    2852              : 
    2853            0 :                     fprintf(stderr, "  chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
    2854              :                             chunk,
    2855            0 :                             *RT_NODE_48_GET_CHILD(n48, chunk));
    2856              :                 }
    2857            0 :                 break;
    2858              :             }
    2859            0 :         case RT_NODE_KIND_256:
    2860              :             {
    2861            0 :                 RT_NODE_256 *n256 = (RT_NODE_256 *) node;
    2862            0 :                 char       *sep = "";
    2863              : 
    2864            0 :                 fprintf(stderr, "isset-bitmap: ");
    2865            0 :                 for (int i = 0; i < (RT_FANOUT_256 / BITS_PER_BYTE); i++)
    2866              :                 {
    2867            0 :                     fprintf(stderr, "%s%x", sep, ((uint8 *) n256->isset)[i]);
    2868            0 :                     sep = " ";
    2869              :                 }
    2870            0 :                 fprintf(stderr, "\n");
    2871              : 
    2872            0 :                 fprintf(stderr, "chunks and slots:\n");
    2873            0 :                 for (int chunk = 0; chunk < RT_NODE_MAX_SLOTS; chunk++)
    2874              :                 {
    2875            0 :                     if (!RT_NODE_256_IS_CHUNK_USED(n256, chunk))
    2876            0 :                         continue;
    2877              : 
    2878            0 :                     fprintf(stderr, "  chunk %x slot " RT_CHILD_PTR_FORMAT "\n",
    2879              :                             chunk,
    2880            0 :                             *RT_NODE_256_GET_CHILD(n256, chunk));
    2881              :                 }
    2882            0 :                 break;
    2883              :             }
    2884              :     }
    2885            0 : }
    2886              : #endif                          /* RT_DEBUG */
    2887              : 
    2888              : #endif                          /* RT_DEFINE */
    2889              : 
    2890              : 
    2891              : /* undefine external parameters, so next radix tree can be defined */
    2892              : #undef RT_PREFIX
    2893              : #undef RT_SCOPE
    2894              : #undef RT_DECLARE
    2895              : #undef RT_DEFINE
    2896              : #undef RT_VALUE_TYPE
    2897              : #undef RT_VARLEN_VALUE_SIZE
    2898              : #undef RT_RUNTIME_EMBEDDABLE_VALUE
    2899              : #undef RT_SHMEM
    2900              : #undef RT_USE_DELETE
    2901              : #undef RT_DEBUG
    2902              : 
    2903              : /* locally declared macros */
    2904              : #undef RT_MAKE_PREFIX
    2905              : #undef RT_MAKE_NAME
    2906              : #undef RT_MAKE_NAME_
    2907              : #undef RT_STR
    2908              : #undef RT_STR_
    2909              : #undef RT_SPAN
    2910              : #undef RT_NODE_MAX_SLOTS
    2911              : #undef RT_CHUNK_MASK
    2912              : #undef RT_MAX_SHIFT
    2913              : #undef RT_MAX_LEVEL
    2914              : #undef RT_GET_KEY_CHUNK
    2915              : #undef RT_BM_IDX
    2916              : #undef RT_BM_BIT
    2917              : #undef RT_NODE_MUST_GROW
    2918              : #undef RT_NODE_KIND_COUNT
    2919              : #undef RT_NUM_SIZE_CLASSES
    2920              : #undef RT_INVALID_SLOT_IDX
    2921              : #undef RT_SLAB_BLOCK_SIZE
    2922              : #undef RT_RADIX_TREE_MAGIC
    2923              : #undef RT_CHILD_PTR_FORMAT
    2924              : 
    2925              : /* type declarations */
    2926              : #undef RT_RADIX_TREE
    2927              : #undef RT_RADIX_TREE_CONTROL
    2928              : #undef RT_CHILD_PTR
    2929              : #undef RT_PTR_ALLOC
    2930              : #undef RT_INVALID_PTR_ALLOC
    2931              : #undef RT_HANDLE
    2932              : #undef RT_ITER
    2933              : #undef RT_NODE
    2934              : #undef RT_NODE_ITER
    2935              : #undef RT_NODE_KIND_4
    2936              : #undef RT_NODE_KIND_16
    2937              : #undef RT_NODE_KIND_48
    2938              : #undef RT_NODE_KIND_256
    2939              : #undef RT_NODE_4
    2940              : #undef RT_NODE_16
    2941              : #undef RT_NODE_48
    2942              : #undef RT_NODE_256
    2943              : #undef RT_SIZE_CLASS
    2944              : #undef RT_SIZE_CLASS_ELEM
    2945              : #undef RT_SIZE_CLASS_INFO
    2946              : #undef RT_CLASS_4
    2947              : #undef RT_CLASS_16_LO
    2948              : #undef RT_CLASS_16_HI
    2949              : #undef RT_CLASS_48
    2950              : #undef RT_CLASS_256
    2951              : #undef RT_FANOUT_4
    2952              : #undef RT_FANOUT_4_MAX
    2953              : #undef RT_FANOUT_16_LO
    2954              : #undef RT_FANOUT_16_HI
    2955              : #undef RT_FANOUT_16_MAX
    2956              : #undef RT_FANOUT_48
    2957              : #undef RT_FANOUT_48_MAX
    2958              : #undef RT_FANOUT_256
    2959              : 
    2960              : /* function declarations */
    2961              : #undef RT_CREATE
    2962              : #undef RT_FREE
    2963              : #undef RT_ATTACH
    2964              : #undef RT_DETACH
    2965              : #undef RT_LOCK_EXCLUSIVE
    2966              : #undef RT_LOCK_SHARE
    2967              : #undef RT_UNLOCK
    2968              : #undef RT_GET_HANDLE
    2969              : #undef RT_FIND
    2970              : #undef RT_SET
    2971              : #undef RT_BEGIN_ITERATE
    2972              : #undef RT_ITERATE_NEXT
    2973              : #undef RT_END_ITERATE
    2974              : #undef RT_DELETE
    2975              : #undef RT_MEMORY_USAGE
    2976              : #undef RT_DUMP_NODE
    2977              : #undef RT_STATS
    2978              : 
    2979              : /* internal helper functions */
    2980              : #undef RT_GET_VALUE_SIZE
    2981              : #undef RT_VALUE_IS_EMBEDDABLE
    2982              : #undef RT_CHILDPTR_IS_VALUE
    2983              : #undef RT_GET_SLOT_RECURSIVE
    2984              : #undef RT_DELETE_RECURSIVE
    2985              : #undef RT_ALLOC_NODE
    2986              : #undef RT_ALLOC_LEAF
    2987              : #undef RT_FREE_NODE
    2988              : #undef RT_FREE_LEAF
    2989              : #undef RT_FREE_RECURSE
    2990              : #undef RT_EXTEND_UP
    2991              : #undef RT_EXTEND_DOWN
    2992              : #undef RT_COPY_COMMON
    2993              : #undef RT_PTR_SET_LOCAL
    2994              : #undef RT_PTR_ALLOC_IS_VALID
    2995              : #undef RT_NODE_16_SEARCH_EQ
    2996              : #undef RT_NODE_4_GET_INSERTPOS
    2997              : #undef RT_NODE_16_GET_INSERTPOS
    2998              : #undef RT_SHIFT_ARRAYS_FOR_INSERT
    2999              : #undef RT_SHIFT_ARRAYS_AND_DELETE
    3000              : #undef RT_COPY_ARRAYS_FOR_INSERT
    3001              : #undef RT_COPY_ARRAYS_AND_DELETE
    3002              : #undef RT_NODE_48_IS_CHUNK_USED
    3003              : #undef RT_NODE_48_GET_CHILD
    3004              : #undef RT_NODE_256_IS_CHUNK_USED
    3005              : #undef RT_NODE_256_GET_CHILD
    3006              : #undef RT_KEY_GET_SHIFT
    3007              : #undef RT_SHIFT_GET_MAX_VAL
    3008              : #undef RT_NODE_SEARCH
    3009              : #undef RT_ADD_CHILD_4
    3010              : #undef RT_ADD_CHILD_16
    3011              : #undef RT_ADD_CHILD_48
    3012              : #undef RT_ADD_CHILD_256
    3013              : #undef RT_GROW_NODE_4
    3014              : #undef RT_GROW_NODE_16
    3015              : #undef RT_GROW_NODE_48
    3016              : #undef RT_REMOVE_CHILD_4
    3017              : #undef RT_REMOVE_CHILD_16
    3018              : #undef RT_REMOVE_CHILD_48
    3019              : #undef RT_REMOVE_CHILD_256
    3020              : #undef RT_SHRINK_NODE_16
    3021              : #undef RT_SHRINK_NODE_48
    3022              : #undef RT_SHRINK_NODE_256
    3023              : #undef RT_NODE_DELETE
    3024              : #undef RT_NODE_INSERT
    3025              : #undef RT_NODE_ITERATE_NEXT
    3026              : #undef RT_VERIFY_NODE
        

Generated by: LCOV version 2.0-1