LCOV - code coverage report
Current view: top level - src/backend/storage/lmgr - predicate.c (source / functions) Coverage Total Hit
Test: PostgreSQL 19devel Lines: 73.0 % 1284 937
Test Date: 2026-04-28 04:16:22 Functions: 87.5 % 72 63
Legend: Lines:     hit not hit

            Line data    Source code
       1              : /*-------------------------------------------------------------------------
       2              :  *
       3              :  * predicate.c
       4              :  *    POSTGRES predicate locking
       5              :  *    to support full serializable transaction isolation
       6              :  *
       7              :  *
       8              :  * The approach taken is to implement Serializable Snapshot Isolation (SSI)
       9              :  * as initially described in this paper:
      10              :  *
      11              :  *  Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008.
      12              :  *  Serializable isolation for snapshot databases.
      13              :  *  In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD
      14              :  *  international conference on Management of data,
      15              :  *  pages 729-738, New York, NY, USA. ACM.
      16              :  *  http://doi.acm.org/10.1145/1376616.1376690
      17              :  *
      18              :  * and further elaborated in Cahill's doctoral thesis:
      19              :  *
      20              :  *  Michael James Cahill. 2009.
      21              :  *  Serializable Isolation for Snapshot Databases.
      22              :  *  Sydney Digital Theses.
      23              :  *  University of Sydney, School of Information Technologies.
      24              :  *  http://hdl.handle.net/2123/5353
      25              :  *
      26              :  *
      27              :  * Predicate locks for Serializable Snapshot Isolation (SSI) are SIREAD
      28              :  * locks, which are so different from normal locks that a distinct set of
      29              :  * structures is required to handle them.  They are needed to detect
      30              :  * rw-conflicts when the read happens before the write.  (When the write
      31              :  * occurs first, the reading transaction can check for a conflict by
      32              :  * examining the MVCC data.)
      33              :  *
      34              :  * (1)  Besides tuples actually read, they must cover ranges of tuples
      35              :  *      which would have been read based on the predicate.  This will
      36              :  *      require modelling the predicates through locks against database
      37              :  *      objects such as pages, index ranges, or entire tables.
      38              :  *
      39              :  * (2)  They must be kept in RAM for quick access.  Because of this, it
      40              :  *      isn't possible to always maintain tuple-level granularity -- when
      41              :  *      the space allocated to store these approaches exhaustion, a
      42              :  *      request for a lock may need to scan for situations where a single
      43              :  *      transaction holds many fine-grained locks which can be coalesced
      44              :  *      into a single coarser-grained lock.
      45              :  *
      46              :  * (3)  They never block anything; they are more like flags than locks
      47              :  *      in that regard; although they refer to database objects and are
      48              :  *      used to identify rw-conflicts with normal write locks.
      49              :  *
      50              :  * (4)  While they are associated with a transaction, they must survive
      51              :  *      a successful COMMIT of that transaction, and remain until all
      52              :  *      overlapping transactions complete.  This even means that they
      53              :  *      must survive termination of the transaction's process.  If a
      54              :  *      top level transaction is rolled back, however, it is immediately
      55              :  *      flagged so that it can be ignored, and its SIREAD locks can be
      56              :  *      released any time after that.
      57              :  *
      58              :  * (5)  The only transactions which create SIREAD locks or check for
      59              :  *      conflicts with them are serializable transactions.
      60              :  *
      61              :  * (6)  When a write lock for a top level transaction is found to cover
      62              :  *      an existing SIREAD lock for the same transaction, the SIREAD lock
      63              :  *      can be deleted.
      64              :  *
      65              :  * (7)  A write from a serializable transaction must ensure that an xact
      66              :  *      record exists for the transaction, with the same lifespan (until
      67              :  *      all concurrent transaction complete or the transaction is rolled
      68              :  *      back) so that rw-dependencies to that transaction can be
      69              :  *      detected.
      70              :  *
      71              :  * We use an optimization for read-only transactions. Under certain
      72              :  * circumstances, a read-only transaction's snapshot can be shown to
      73              :  * never have conflicts with other transactions.  This is referred to
      74              :  * as a "safe" snapshot (and one known not to be is "unsafe").
      75              :  * However, it can't be determined whether a snapshot is safe until
      76              :  * all concurrent read/write transactions complete.
      77              :  *
      78              :  * Once a read-only transaction is known to have a safe snapshot, it
      79              :  * can release its predicate locks and exempt itself from further
      80              :  * predicate lock tracking. READ ONLY DEFERRABLE transactions run only
      81              :  * on safe snapshots, waiting as necessary for one to be available.
      82              :  *
      83              :  *
      84              :  * Lightweight locks to manage access to the predicate locking shared
      85              :  * memory objects must be taken in this order, and should be released in
      86              :  * reverse order:
      87              :  *
      88              :  *  SerializableFinishedListLock
      89              :  *      - Protects the list of transactions which have completed but which
      90              :  *          may yet matter because they overlap still-active transactions.
      91              :  *
      92              :  *  SerializablePredicateListLock
      93              :  *      - Protects the linked list of locks held by a transaction.  Note
      94              :  *          that the locks themselves are also covered by the partition
      95              :  *          locks of their respective lock targets; this lock only affects
      96              :  *          the linked list connecting the locks related to a transaction.
      97              :  *      - All transactions share this single lock (with no partitioning).
      98              :  *      - There is never a need for a process other than the one running
      99              :  *          an active transaction to walk the list of locks held by that
     100              :  *          transaction, except parallel query workers sharing the leader's
     101              :  *          transaction.  In the parallel case, an extra per-sxact lock is
     102              :  *          taken; see below.
     103              :  *      - It is relatively infrequent that another process needs to
     104              :  *          modify the list for a transaction, but it does happen for such
     105              :  *          things as index page splits for pages with predicate locks and
     106              :  *          freeing of predicate locked pages by a vacuum process.  When
     107              :  *          removing a lock in such cases, the lock itself contains the
     108              :  *          pointers needed to remove it from the list.  When adding a
     109              :  *          lock in such cases, the lock can be added using the anchor in
     110              :  *          the transaction structure.  Neither requires walking the list.
     111              :  *      - Cleaning up the list for a terminated transaction is sometimes
     112              :  *          not done on a retail basis, in which case no lock is required.
     113              :  *      - Due to the above, a process accessing its active transaction's
     114              :  *          list always uses a shared lock, regardless of whether it is
     115              :  *          walking or maintaining the list.  This improves concurrency
     116              :  *          for the common access patterns.
     117              :  *      - A process which needs to alter the list of a transaction other
     118              :  *          than its own active transaction must acquire an exclusive
     119              :  *          lock.
     120              :  *
     121              :  *  SERIALIZABLEXACT's member 'perXactPredicateListLock'
     122              :  *      - Protects the linked list of predicate locks held by a transaction.
     123              :  *          Only needed for parallel mode, where multiple backends share the
     124              :  *          same SERIALIZABLEXACT object.  Not needed if
     125              :  *          SerializablePredicateListLock is held exclusively.
     126              :  *
     127              :  *  PredicateLockHashPartitionLock(hashcode)
     128              :  *      - The same lock protects a target, all locks on that target, and
     129              :  *          the linked list of locks on the target.
     130              :  *      - When more than one is needed, acquire in ascending address order.
     131              :  *      - When all are needed (rare), acquire in ascending index order with
     132              :  *          PredicateLockHashPartitionLockByIndex(index).
     133              :  *
     134              :  *  SerializableXactHashLock
     135              :  *      - Protects both PredXact and SerializableXidHash.
     136              :  *
     137              :  *  SerialControlLock
     138              :  *      - Protects SerialControlData members
     139              :  *
     140              :  *  SLRU per-bank locks
     141              :  *      - Protects SerialSlruCtl
     142              :  *
     143              :  * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
     144              :  * Portions Copyright (c) 1994, Regents of the University of California
     145              :  *
     146              :  *
     147              :  * IDENTIFICATION
     148              :  *    src/backend/storage/lmgr/predicate.c
     149              :  *
     150              :  *-------------------------------------------------------------------------
     151              :  */
     152              : /*
     153              :  * INTERFACE ROUTINES
     154              :  *
     155              :  * predicate lock reporting
     156              :  *      GetPredicateLockStatusData(void)
     157              :  *      PageIsPredicateLocked(Relation relation, BlockNumber blkno)
     158              :  *
     159              :  * predicate lock maintenance
     160              :  *      GetSerializableTransactionSnapshot(Snapshot snapshot)
     161              :  *      SetSerializableTransactionSnapshot(Snapshot snapshot,
     162              :  *                                         VirtualTransactionId *sourcevxid)
     163              :  *      RegisterPredicateLockingXid(void)
     164              :  *      PredicateLockRelation(Relation relation, Snapshot snapshot)
     165              :  *      PredicateLockPage(Relation relation, BlockNumber blkno,
     166              :  *                      Snapshot snapshot)
     167              :  *      PredicateLockTID(Relation relation, const ItemPointerData *tid, Snapshot snapshot,
     168              :  *                       TransactionId tuple_xid)
     169              :  *      PredicateLockPageSplit(Relation relation, BlockNumber oldblkno,
     170              :  *                             BlockNumber newblkno)
     171              :  *      PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
     172              :  *                               BlockNumber newblkno)
     173              :  *      TransferPredicateLocksToHeapRelation(Relation relation)
     174              :  *      ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
     175              :  *
     176              :  * conflict detection (may also trigger rollback)
     177              :  *      CheckForSerializableConflictOut(Relation relation, TransactionId xid,
     178              :  *                                      Snapshot snapshot)
     179              :  *      CheckForSerializableConflictIn(Relation relation, const ItemPointerData *tid,
     180              :  *                                     BlockNumber blkno)
     181              :  *      CheckTableForSerializableConflictIn(Relation relation)
     182              :  *
     183              :  * final rollback checking
     184              :  *      PreCommit_CheckForSerializationFailure(void)
     185              :  *
     186              :  * two-phase commit support
     187              :  *      AtPrepare_PredicateLocks(void);
     188              :  *      PostPrepare_PredicateLocks(TransactionId xid);
     189              :  *      PredicateLockTwoPhaseFinish(FullTransactionId fxid, bool isCommit);
     190              :  *      predicatelock_twophase_recover(FullTransactionId fxid, uint16 info,
     191              :  *                                     void *recdata, uint32 len);
     192              :  */
     193              : 
     194              : #include "postgres.h"
     195              : 
     196              : #include "access/parallel.h"
     197              : #include "access/slru.h"
     198              : #include "access/transam.h"
     199              : #include "access/twophase.h"
     200              : #include "access/twophase_rmgr.h"
     201              : #include "access/xact.h"
     202              : #include "access/xlog.h"
     203              : #include "miscadmin.h"
     204              : #include "pgstat.h"
     205              : #include "port/pg_lfind.h"
     206              : #include "storage/predicate.h"
     207              : #include "storage/predicate_internals.h"
     208              : #include "storage/proc.h"
     209              : #include "storage/procarray.h"
     210              : #include "storage/shmem.h"
     211              : #include "storage/subsystems.h"
     212              : #include "utils/guc_hooks.h"
     213              : #include "utils/rel.h"
     214              : #include "utils/snapmgr.h"
     215              : #include "utils/wait_event.h"
     216              : 
     217              : /* Uncomment the next line to test the graceful degradation code. */
     218              : /* #define TEST_SUMMARIZE_SERIAL */
     219              : 
     220              : /*
     221              :  * Test the most selective fields first, for performance.
     222              :  *
     223              :  * a is covered by b if all of the following hold:
     224              :  *  1) a.database = b.database
     225              :  *  2) a.relation = b.relation
     226              :  *  3) b.offset is invalid (b is page-granularity or higher)
     227              :  *  4) either of the following:
     228              :  *      4a) a.offset is valid (a is tuple-granularity) and a.page = b.page
     229              :  *   or 4b) a.offset is invalid and b.page is invalid (a is
     230              :  *          page-granularity and b is relation-granularity
     231              :  */
     232              : #define TargetTagIsCoveredBy(covered_target, covering_target)           \
     233              :     ((GET_PREDICATELOCKTARGETTAG_RELATION(covered_target) == /* (2) */  \
     234              :       GET_PREDICATELOCKTARGETTAG_RELATION(covering_target))             \
     235              :      && (GET_PREDICATELOCKTARGETTAG_OFFSET(covering_target) ==          \
     236              :          InvalidOffsetNumber)                                /* (3) */  \
     237              :      && (((GET_PREDICATELOCKTARGETTAG_OFFSET(covered_target) !=         \
     238              :            InvalidOffsetNumber)                              /* (4a) */ \
     239              :           && (GET_PREDICATELOCKTARGETTAG_PAGE(covering_target) ==       \
     240              :               GET_PREDICATELOCKTARGETTAG_PAGE(covered_target)))         \
     241              :          || ((GET_PREDICATELOCKTARGETTAG_PAGE(covering_target) ==       \
     242              :               InvalidBlockNumber)                            /* (4b) */ \
     243              :              && (GET_PREDICATELOCKTARGETTAG_PAGE(covered_target)        \
     244              :                  != InvalidBlockNumber)))                               \
     245              :      && (GET_PREDICATELOCKTARGETTAG_DB(covered_target) ==    /* (1) */  \
     246              :          GET_PREDICATELOCKTARGETTAG_DB(covering_target)))
     247              : 
     248              : /*
     249              :  * The predicate locking target and lock shared hash tables are partitioned to
     250              :  * reduce contention.  To determine which partition a given target belongs to,
     251              :  * compute the tag's hash code with PredicateLockTargetTagHashCode(), then
     252              :  * apply one of these macros.
     253              :  * NB: NUM_PREDICATELOCK_PARTITIONS must be a power of 2!
     254              :  */
     255              : #define PredicateLockHashPartition(hashcode) \
     256              :     ((hashcode) % NUM_PREDICATELOCK_PARTITIONS)
     257              : #define PredicateLockHashPartitionLock(hashcode) \
     258              :     (&MainLWLockArray[PREDICATELOCK_MANAGER_LWLOCK_OFFSET + \
     259              :         PredicateLockHashPartition(hashcode)].lock)
     260              : #define PredicateLockHashPartitionLockByIndex(i) \
     261              :     (&MainLWLockArray[PREDICATELOCK_MANAGER_LWLOCK_OFFSET + (i)].lock)
     262              : 
     263              : #define NPREDICATELOCKTARGETENTS() \
     264              :     mul_size(max_predicate_locks_per_xact, add_size(MaxBackends, max_prepared_xacts))
     265              : 
     266              : #define SxactIsOnFinishedList(sxact) (!dlist_node_is_detached(&(sxact)->finishedLink))
     267              : 
     268              : /*
     269              :  * Note that a sxact is marked "prepared" once it has passed
     270              :  * PreCommit_CheckForSerializationFailure, even if it isn't using
     271              :  * 2PC. This is the point at which it can no longer be aborted.
     272              :  *
     273              :  * The PREPARED flag remains set after commit, so SxactIsCommitted
     274              :  * implies SxactIsPrepared.
     275              :  */
     276              : #define SxactIsCommitted(sxact) (((sxact)->flags & SXACT_FLAG_COMMITTED) != 0)
     277              : #define SxactIsPrepared(sxact) (((sxact)->flags & SXACT_FLAG_PREPARED) != 0)
     278              : #define SxactIsRolledBack(sxact) (((sxact)->flags & SXACT_FLAG_ROLLED_BACK) != 0)
     279              : #define SxactIsDoomed(sxact) (((sxact)->flags & SXACT_FLAG_DOOMED) != 0)
     280              : #define SxactIsReadOnly(sxact) (((sxact)->flags & SXACT_FLAG_READ_ONLY) != 0)
     281              : #define SxactHasSummaryConflictIn(sxact) (((sxact)->flags & SXACT_FLAG_SUMMARY_CONFLICT_IN) != 0)
     282              : #define SxactHasSummaryConflictOut(sxact) (((sxact)->flags & SXACT_FLAG_SUMMARY_CONFLICT_OUT) != 0)
     283              : /*
     284              :  * The following macro actually means that the specified transaction has a
     285              :  * conflict out *to a transaction which committed ahead of it*.  It's hard
     286              :  * to get that into a name of a reasonable length.
     287              :  */
     288              : #define SxactHasConflictOut(sxact) (((sxact)->flags & SXACT_FLAG_CONFLICT_OUT) != 0)
     289              : #define SxactIsDeferrableWaiting(sxact) (((sxact)->flags & SXACT_FLAG_DEFERRABLE_WAITING) != 0)
     290              : #define SxactIsROSafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_SAFE) != 0)
     291              : #define SxactIsROUnsafe(sxact) (((sxact)->flags & SXACT_FLAG_RO_UNSAFE) != 0)
     292              : #define SxactIsPartiallyReleased(sxact) (((sxact)->flags & SXACT_FLAG_PARTIALLY_RELEASED) != 0)
     293              : 
     294              : /*
     295              :  * Compute the hash code associated with a PREDICATELOCKTARGETTAG.
     296              :  *
     297              :  * To avoid unnecessary recomputations of the hash code, we try to do this
     298              :  * just once per function, and then pass it around as needed.  Aside from
     299              :  * passing the hashcode to hash_search_with_hash_value(), we can extract
     300              :  * the lock partition number from the hashcode.
     301              :  */
     302              : #define PredicateLockTargetTagHashCode(predicatelocktargettag) \
     303              :     get_hash_value(PredicateLockTargetHash, predicatelocktargettag)
     304              : 
     305              : /*
     306              :  * Given a predicate lock tag, and the hash for its target,
     307              :  * compute the lock hash.
     308              :  *
     309              :  * To make the hash code also depend on the transaction, we xor the sxid
     310              :  * struct's address into the hash code, left-shifted so that the
     311              :  * partition-number bits don't change.  Since this is only a hash, we
     312              :  * don't care if we lose high-order bits of the address; use an
     313              :  * intermediate variable to suppress cast-pointer-to-int warnings.
     314              :  */
     315              : #define PredicateLockHashCodeFromTargetHashCode(predicatelocktag, targethash) \
     316              :     ((targethash) ^ ((uint32) PointerGetDatum((predicatelocktag)->myXact)) \
     317              :      << LOG2_NUM_PREDICATELOCK_PARTITIONS)
     318              : 
     319              : 
     320              : /*
     321              :  * The SLRU buffer area through which we access the old xids.
     322              :  */
     323              : static bool SerialPagePrecedesLogically(int64 page1, int64 page2);
     324              : static int  serial_errdetail_for_io_error(const void *opaque_data);
     325              : 
     326              : static SlruDesc SerialSlruDesc;
     327              : 
     328              : #define SerialSlruCtl           (&SerialSlruDesc)
     329              : 
     330              : #define SERIAL_PAGESIZE         BLCKSZ
     331              : #define SERIAL_ENTRYSIZE            sizeof(SerCommitSeqNo)
     332              : #define SERIAL_ENTRIESPERPAGE   (SERIAL_PAGESIZE / SERIAL_ENTRYSIZE)
     333              : 
     334              : /*
     335              :  * Set maximum pages based on the number needed to track all transactions.
     336              :  */
     337              : #define SERIAL_MAX_PAGE         (MaxTransactionId / SERIAL_ENTRIESPERPAGE)
     338              : 
     339              : #define SerialNextPage(page) (((page) >= SERIAL_MAX_PAGE) ? 0 : (page) + 1)
     340              : 
     341              : #define SerialValue(slotno, xid) (*((SerCommitSeqNo *) \
     342              :     (SerialSlruCtl->shared->page_buffer[slotno] + \
     343              :     ((((uint32) (xid)) % SERIAL_ENTRIESPERPAGE) * SERIAL_ENTRYSIZE))))
     344              : 
     345              : #define SerialPage(xid) (((uint32) (xid)) / SERIAL_ENTRIESPERPAGE)
     346              : 
     347              : typedef struct SerialControlData
     348              : {
     349              :     int64       headPage;       /* newest initialized page */
     350              :     TransactionId headXid;      /* newest valid Xid in the SLRU */
     351              :     TransactionId tailXid;      /* oldest xmin we might be interested in */
     352              : }           SerialControlData;
     353              : 
     354              : typedef struct SerialControlData *SerialControl;
     355              : 
     356              : static SerialControl serialControl;
     357              : 
     358              : /*
     359              :  * When the oldest committed transaction on the "finished" list is moved to
     360              :  * SLRU, its predicate locks will be moved to this "dummy" transaction,
     361              :  * collapsing duplicate targets.  When a duplicate is found, the later
     362              :  * commitSeqNo is used.
     363              :  */
     364              : static SERIALIZABLEXACT *OldCommittedSxact;
     365              : 
     366              : 
     367              : /*
     368              :  * These configuration variables are used to set the predicate lock table size
     369              :  * and to control promotion of predicate locks to coarser granularity in an
     370              :  * attempt to degrade performance (mostly as false positive serialization
     371              :  * failure) gracefully in the face of memory pressure.
     372              :  */
     373              : int         max_predicate_locks_per_xact;   /* in guc_tables.c */
     374              : int         max_predicate_locks_per_relation;   /* in guc_tables.c */
     375              : int         max_predicate_locks_per_page;   /* in guc_tables.c */
     376              : 
     377              : /*
     378              :  * This provides a list of objects in order to track transactions
     379              :  * participating in predicate locking.  Entries in the list are fixed size,
     380              :  * and reside in shared memory.  The memory address of an entry must remain
     381              :  * fixed during its lifetime.  The list will be protected from concurrent
     382              :  * update externally; no provision is made in this code to manage that.  The
     383              :  * number of entries in the list, and the size allowed for each entry is
     384              :  * fixed upon creation.
     385              :  */
     386              : static PredXactList PredXact;
     387              : 
     388              : static void PredicateLockShmemRequest(void *arg);
     389              : static void PredicateLockShmemInit(void *arg);
     390              : static void PredicateLockShmemAttach(void *arg);
     391              : 
     392              : const ShmemCallbacks PredicateLockShmemCallbacks = {
     393              :     .request_fn = PredicateLockShmemRequest,
     394              :     .init_fn = PredicateLockShmemInit,
     395              :     .attach_fn = PredicateLockShmemAttach,
     396              : };
     397              : 
     398              : 
     399              : /*
     400              :  * This provides a pool of RWConflict data elements to use in conflict lists
     401              :  * between transactions.
     402              :  */
     403              : static RWConflictPoolHeader RWConflictPool;
     404              : 
     405              : /*
     406              :  * The predicate locking hash tables are in shared memory.
     407              :  * Each backend keeps pointers to them.
     408              :  */
     409              : static HTAB *SerializableXidHash;
     410              : static HTAB *PredicateLockTargetHash;
     411              : static HTAB *PredicateLockHash;
     412              : static dlist_head *FinishedSerializableTransactions;
     413              : 
     414              : /*
     415              :  * Tag for a dummy entry in PredicateLockTargetHash. By temporarily removing
     416              :  * this entry, you can ensure that there's enough scratch space available for
     417              :  * inserting one entry in the hash table. This is an otherwise-invalid tag.
     418              :  */
     419              : static const PREDICATELOCKTARGETTAG ScratchTargetTag = {0, 0, 0, 0};
     420              : static uint32 ScratchTargetTagHash;
     421              : static LWLock *ScratchPartitionLock;
     422              : 
     423              : /*
     424              :  * The local hash table used to determine when to combine multiple fine-
     425              :  * grained locks into a single courser-grained lock.
     426              :  */
     427              : static HTAB *LocalPredicateLockHash = NULL;
     428              : 
     429              : /*
     430              :  * Keep a pointer to the currently-running serializable transaction (if any)
     431              :  * for quick reference. Also, remember if we have written anything that could
     432              :  * cause a rw-conflict.
     433              :  */
     434              : static SERIALIZABLEXACT *MySerializableXact = InvalidSerializableXact;
     435              : static bool MyXactDidWrite = false;
     436              : 
     437              : /*
     438              :  * The SXACT_FLAG_RO_UNSAFE optimization might lead us to release
     439              :  * MySerializableXact early.  If that happens in a parallel query, the leader
     440              :  * needs to defer the destruction of the SERIALIZABLEXACT until end of
     441              :  * transaction, because the workers still have a reference to it.  In that
     442              :  * case, the leader stores it here.
     443              :  */
     444              : static SERIALIZABLEXACT *SavedSerializableXact = InvalidSerializableXact;
     445              : 
     446              : static int64 max_serializable_xacts;
     447              : 
     448              : /* local functions */
     449              : 
     450              : static SERIALIZABLEXACT *CreatePredXact(void);
     451              : static void ReleasePredXact(SERIALIZABLEXACT *sxact);
     452              : 
     453              : static bool RWConflictExists(const SERIALIZABLEXACT *reader, const SERIALIZABLEXACT *writer);
     454              : static void SetRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
     455              : static void SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact, SERIALIZABLEXACT *activeXact);
     456              : static void ReleaseRWConflict(RWConflict conflict);
     457              : static void FlagSxactUnsafe(SERIALIZABLEXACT *sxact);
     458              : 
     459              : static void SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo);
     460              : static SerCommitSeqNo SerialGetMinConflictCommitSeqNo(TransactionId xid);
     461              : static void SerialSetActiveSerXmin(TransactionId xid);
     462              : 
     463              : static uint32 predicatelock_hash(const void *key, Size keysize);
     464              : 
     465              : static void SummarizeOldestCommittedSxact(void);
     466              : static Snapshot GetSafeSnapshot(Snapshot origSnapshot);
     467              : static Snapshot GetSerializableTransactionSnapshotInt(Snapshot snapshot,
     468              :                                                       VirtualTransactionId *sourcevxid,
     469              :                                                       int sourcepid);
     470              : static bool PredicateLockExists(const PREDICATELOCKTARGETTAG *targettag);
     471              : static bool GetParentPredicateLockTag(const PREDICATELOCKTARGETTAG *tag,
     472              :                                       PREDICATELOCKTARGETTAG *parent);
     473              : static bool CoarserLockCovers(const PREDICATELOCKTARGETTAG *newtargettag);
     474              : static void RemoveScratchTarget(bool lockheld);
     475              : static void RestoreScratchTarget(bool lockheld);
     476              : static void RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target,
     477              :                                        uint32 targettaghash);
     478              : static void DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag);
     479              : static int  MaxPredicateChildLocks(const PREDICATELOCKTARGETTAG *tag);
     480              : static bool CheckAndPromotePredicateLockRequest(const PREDICATELOCKTARGETTAG *reqtag);
     481              : static void DecrementParentLocks(const PREDICATELOCKTARGETTAG *targettag);
     482              : static void CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
     483              :                                 uint32 targettaghash,
     484              :                                 SERIALIZABLEXACT *sxact);
     485              : static void DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash);
     486              : static bool TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
     487              :                                               PREDICATELOCKTARGETTAG newtargettag,
     488              :                                               bool removeOld);
     489              : static void PredicateLockAcquire(const PREDICATELOCKTARGETTAG *targettag);
     490              : static void DropAllPredicateLocksFromTable(Relation relation,
     491              :                                            bool transfer);
     492              : static void SetNewSxactGlobalXmin(void);
     493              : static void ClearOldPredicateLocks(void);
     494              : static void ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
     495              :                                        bool summarize);
     496              : static bool XidIsConcurrent(TransactionId xid);
     497              : static void CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag);
     498              : static void FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer);
     499              : static void OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
     500              :                                                     SERIALIZABLEXACT *writer);
     501              : static void CreateLocalPredicateLockHash(void);
     502              : static void ReleasePredicateLocksLocal(void);
     503              : 
     504              : 
     505              : /*------------------------------------------------------------------------*/
     506              : 
     507              : /*
     508              :  * Does this relation participate in predicate locking? Temporary and system
     509              :  * relations are exempt.
     510              :  */
     511              : static inline bool
     512       143521 : PredicateLockingNeededForRelation(Relation relation)
     513              : {
     514       183362 :     return !(relation->rd_id < FirstUnpinnedObjectId ||
     515        39841 :              RelationUsesLocalBuffers(relation));
     516              : }
     517              : 
     518              : /*
     519              :  * When a public interface method is called for a read, this is the test to
     520              :  * see if we should do a quick return.
     521              :  *
     522              :  * Note: this function has side-effects! If this transaction has been flagged
     523              :  * as RO-safe since the last call, we release all predicate locks and reset
     524              :  * MySerializableXact. That makes subsequent calls to return quickly.
     525              :  *
     526              :  * This is marked as 'inline' to eliminate the function call overhead in the
     527              :  * common case that serialization is not needed.
     528              :  */
     529              : static inline bool
     530     83238796 : SerializationNeededForRead(Relation relation, Snapshot snapshot)
     531              : {
     532              :     /* Nothing to do if this is not a serializable transaction */
     533     83238796 :     if (MySerializableXact == InvalidSerializableXact)
     534     83101430 :         return false;
     535              : 
     536              :     /*
     537              :      * Don't acquire locks or conflict when scanning with a special snapshot.
     538              :      * This excludes things like CLUSTER and REINDEX. They use the wholesale
     539              :      * functions TransferPredicateLocksToHeapRelation() and
     540              :      * CheckTableForSerializableConflictIn() to participate in serialization,
     541              :      * but the scans involved don't need serialization.
     542              :      */
     543       137366 :     if (!IsMVCCSnapshot(snapshot))
     544         1855 :         return false;
     545              : 
     546              :     /*
     547              :      * Check if we have just become "RO-safe". If we have, immediately release
     548              :      * all locks as they're not needed anymore. This also resets
     549              :      * MySerializableXact, so that subsequent calls to this function can exit
     550              :      * quickly.
     551              :      *
     552              :      * A transaction is flagged as RO_SAFE if all concurrent R/W transactions
     553              :      * commit without having conflicts out to an earlier snapshot, thus
     554              :      * ensuring that no conflicts are possible for this transaction.
     555              :      */
     556       135511 :     if (SxactIsROSafe(MySerializableXact))
     557              :     {
     558           33 :         ReleasePredicateLocks(false, true);
     559           33 :         return false;
     560              :     }
     561              : 
     562              :     /* Check if the relation doesn't participate in predicate locking */
     563       135478 :     if (!PredicateLockingNeededForRelation(relation))
     564       100375 :         return false;
     565              : 
     566        35103 :     return true;                /* no excuse to skip predicate locking */
     567              : }
     568              : 
     569              : /*
     570              :  * Like SerializationNeededForRead(), but called on writes.
     571              :  * The logic is the same, but there is no snapshot and we can't be RO-safe.
     572              :  */
     573              : static inline bool
     574     24569247 : SerializationNeededForWrite(Relation relation)
     575              : {
     576              :     /* Nothing to do if this is not a serializable transaction */
     577     24569247 :     if (MySerializableXact == InvalidSerializableXact)
     578     24561285 :         return false;
     579              : 
     580              :     /* Check if the relation doesn't participate in predicate locking */
     581         7962 :     if (!PredicateLockingNeededForRelation(relation))
     582         3436 :         return false;
     583              : 
     584         4526 :     return true;                /* no excuse to skip predicate locking */
     585              : }
     586              : 
     587              : 
     588              : /*------------------------------------------------------------------------*/
     589              : 
     590              : /*
     591              :  * These functions are a simple implementation of a list for this specific
     592              :  * type of struct.  If there is ever a generalized shared memory list, we
     593              :  * should probably switch to that.
     594              :  */
     595              : static SERIALIZABLEXACT *
     596         2927 : CreatePredXact(void)
     597              : {
     598              :     SERIALIZABLEXACT *sxact;
     599              : 
     600         2927 :     if (dlist_is_empty(&PredXact->availableList))
     601            0 :         return NULL;
     602              : 
     603         2927 :     sxact = dlist_container(SERIALIZABLEXACT, xactLink,
     604              :                             dlist_pop_head_node(&PredXact->availableList));
     605         2927 :     dlist_push_tail(&PredXact->activeList, &sxact->xactLink);
     606         2927 :     return sxact;
     607              : }
     608              : 
     609              : static void
     610         1692 : ReleasePredXact(SERIALIZABLEXACT *sxact)
     611              : {
     612              :     Assert(ShmemAddrIsValid(sxact));
     613              : 
     614         1692 :     dlist_delete(&sxact->xactLink);
     615         1692 :     dlist_push_tail(&PredXact->availableList, &sxact->xactLink);
     616         1692 : }
     617              : 
     618              : /*------------------------------------------------------------------------*/
     619              : 
     620              : /*
     621              :  * These functions manage primitive access to the RWConflict pool and lists.
     622              :  */
     623              : static bool
     624         1890 : RWConflictExists(const SERIALIZABLEXACT *reader, const SERIALIZABLEXACT *writer)
     625              : {
     626              :     dlist_iter  iter;
     627              : 
     628              :     Assert(reader != writer);
     629              : 
     630              :     /* Check the ends of the purported conflict first. */
     631         1890 :     if (SxactIsDoomed(reader)
     632         1890 :         || SxactIsDoomed(writer)
     633         1890 :         || dlist_is_empty(&reader->outConflicts)
     634          569 :         || dlist_is_empty(&writer->inConflicts))
     635         1361 :         return false;
     636              : 
     637              :     /*
     638              :      * A conflict is possible; walk the list to find out.
     639              :      *
     640              :      * The unconstify is needed as we have no const version of
     641              :      * dlist_foreach().
     642              :      */
     643          545 :     dlist_foreach(iter, &unconstify(SERIALIZABLEXACT *, reader)->outConflicts)
     644              :     {
     645          529 :         RWConflict  conflict =
     646              :             dlist_container(RWConflictData, outLink, iter.cur);
     647              : 
     648          529 :         if (conflict->sxactIn == writer)
     649          513 :             return true;
     650              :     }
     651              : 
     652              :     /* No conflict found. */
     653           16 :     return false;
     654              : }
     655              : 
     656              : static void
     657          792 : SetRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer)
     658              : {
     659              :     RWConflict  conflict;
     660              : 
     661              :     Assert(reader != writer);
     662              :     Assert(!RWConflictExists(reader, writer));
     663              : 
     664          792 :     if (dlist_is_empty(&RWConflictPool->availableList))
     665            0 :         ereport(ERROR,
     666              :                 (errcode(ERRCODE_OUT_OF_MEMORY),
     667              :                  errmsg("not enough elements in RWConflictPool to record a read/write conflict"),
     668              :                  errhint("You might need to run fewer transactions at a time or increase \"max_connections\".")));
     669              : 
     670          792 :     conflict = dlist_head_element(RWConflictData, outLink, &RWConflictPool->availableList);
     671          792 :     dlist_delete(&conflict->outLink);
     672              : 
     673          792 :     conflict->sxactOut = reader;
     674          792 :     conflict->sxactIn = writer;
     675          792 :     dlist_push_tail(&reader->outConflicts, &conflict->outLink);
     676          792 :     dlist_push_tail(&writer->inConflicts, &conflict->inLink);
     677          792 : }
     678              : 
     679              : static void
     680          134 : SetPossibleUnsafeConflict(SERIALIZABLEXACT *roXact,
     681              :                           SERIALIZABLEXACT *activeXact)
     682              : {
     683              :     RWConflict  conflict;
     684              : 
     685              :     Assert(roXact != activeXact);
     686              :     Assert(SxactIsReadOnly(roXact));
     687              :     Assert(!SxactIsReadOnly(activeXact));
     688              : 
     689          134 :     if (dlist_is_empty(&RWConflictPool->availableList))
     690            0 :         ereport(ERROR,
     691              :                 (errcode(ERRCODE_OUT_OF_MEMORY),
     692              :                  errmsg("not enough elements in RWConflictPool to record a potential read/write conflict"),
     693              :                  errhint("You might need to run fewer transactions at a time or increase \"max_connections\".")));
     694              : 
     695          134 :     conflict = dlist_head_element(RWConflictData, outLink, &RWConflictPool->availableList);
     696          134 :     dlist_delete(&conflict->outLink);
     697              : 
     698          134 :     conflict->sxactOut = activeXact;
     699          134 :     conflict->sxactIn = roXact;
     700          134 :     dlist_push_tail(&activeXact->possibleUnsafeConflicts, &conflict->outLink);
     701          134 :     dlist_push_tail(&roXact->possibleUnsafeConflicts, &conflict->inLink);
     702          134 : }
     703              : 
     704              : static void
     705          926 : ReleaseRWConflict(RWConflict conflict)
     706              : {
     707          926 :     dlist_delete(&conflict->inLink);
     708          926 :     dlist_delete(&conflict->outLink);
     709          926 :     dlist_push_tail(&RWConflictPool->availableList, &conflict->outLink);
     710          926 : }
     711              : 
     712              : static void
     713            3 : FlagSxactUnsafe(SERIALIZABLEXACT *sxact)
     714              : {
     715              :     dlist_mutable_iter iter;
     716              : 
     717              :     Assert(SxactIsReadOnly(sxact));
     718              :     Assert(!SxactIsROSafe(sxact));
     719              : 
     720            3 :     sxact->flags |= SXACT_FLAG_RO_UNSAFE;
     721              : 
     722              :     /*
     723              :      * We know this isn't a safe snapshot, so we can stop looking for other
     724              :      * potential conflicts.
     725              :      */
     726            6 :     dlist_foreach_modify(iter, &sxact->possibleUnsafeConflicts)
     727              :     {
     728            3 :         RWConflict  conflict =
     729            3 :             dlist_container(RWConflictData, inLink, iter.cur);
     730              : 
     731              :         Assert(!SxactIsReadOnly(conflict->sxactOut));
     732              :         Assert(sxact == conflict->sxactIn);
     733              : 
     734            3 :         ReleaseRWConflict(conflict);
     735              :     }
     736            3 : }
     737              : 
     738              : /*------------------------------------------------------------------------*/
     739              : 
     740              : /*
     741              :  * Decide whether a Serial page number is "older" for truncation purposes.
     742              :  * Analogous to CLOGPagePrecedes().
     743              :  */
     744              : static bool
     745            0 : SerialPagePrecedesLogically(int64 page1, int64 page2)
     746              : {
     747              :     TransactionId xid1;
     748              :     TransactionId xid2;
     749              : 
     750            0 :     xid1 = ((TransactionId) page1) * SERIAL_ENTRIESPERPAGE;
     751            0 :     xid1 += FirstNormalTransactionId + 1;
     752            0 :     xid2 = ((TransactionId) page2) * SERIAL_ENTRIESPERPAGE;
     753            0 :     xid2 += FirstNormalTransactionId + 1;
     754              : 
     755            0 :     return (TransactionIdPrecedes(xid1, xid2) &&
     756            0 :             TransactionIdPrecedes(xid1, xid2 + SERIAL_ENTRIESPERPAGE - 1));
     757              : }
     758              : 
     759              : static int
     760            0 : serial_errdetail_for_io_error(const void *opaque_data)
     761              : {
     762            0 :     TransactionId xid = *(const TransactionId *) opaque_data;
     763              : 
     764            0 :     return errdetail("Could not access serializable CSN of transaction %u.", xid);
     765              : }
     766              : 
     767              : #ifdef USE_ASSERT_CHECKING
     768              : static void
     769              : SerialPagePrecedesLogicallyUnitTests(void)
     770              : {
     771              :     int         per_page = SERIAL_ENTRIESPERPAGE,
     772              :                 offset = per_page / 2;
     773              :     int64       newestPage,
     774              :                 oldestPage,
     775              :                 headPage,
     776              :                 targetPage;
     777              :     TransactionId newestXact,
     778              :                 oldestXact;
     779              : 
     780              :     /* GetNewTransactionId() has assigned the last XID it can safely use. */
     781              :     newestPage = 2 * SLRU_PAGES_PER_SEGMENT - 1;    /* nothing special */
     782              :     newestXact = newestPage * per_page + offset;
     783              :     Assert(newestXact / per_page == newestPage);
     784              :     oldestXact = newestXact + 1;
     785              :     oldestXact -= 1U << 31;
     786              :     oldestPage = oldestXact / per_page;
     787              : 
     788              :     /*
     789              :      * In this scenario, the SLRU headPage pertains to the last ~1000 XIDs
     790              :      * assigned.  oldestXact finishes, ~2B XIDs having elapsed since it
     791              :      * started.  Further transactions cause us to summarize oldestXact to
     792              :      * tailPage.  Function must return false so SerialAdd() doesn't zero
     793              :      * tailPage (which may contain entries for other old, recently-finished
     794              :      * XIDs) and half the SLRU.  Reaching this requires burning ~2B XIDs in
     795              :      * single-user mode, a negligible possibility.
     796              :      */
     797              :     headPage = newestPage;
     798              :     targetPage = oldestPage;
     799              :     Assert(!SerialPagePrecedesLogically(headPage, targetPage));
     800              : 
     801              :     /*
     802              :      * In this scenario, the SLRU headPage pertains to oldestXact.  We're
     803              :      * summarizing an XID near newestXact.  (Assume few other XIDs used
     804              :      * SERIALIZABLE, hence the minimal headPage advancement.  Assume
     805              :      * oldestXact was long-running and only recently reached the SLRU.)
     806              :      * Function must return true to make SerialAdd() create targetPage.
     807              :      *
     808              :      * Today's implementation mishandles this case, but it doesn't matter
     809              :      * enough to fix.  Verify that the defect affects just one page by
     810              :      * asserting correct treatment of its prior page.  Reaching this case
     811              :      * requires burning ~2B XIDs in single-user mode, a negligible
     812              :      * possibility.  Moreover, if it does happen, the consequence would be
     813              :      * mild, namely a new transaction failing in SimpleLruReadPage().
     814              :      */
     815              :     headPage = oldestPage;
     816              :     targetPage = newestPage;
     817              :     Assert(SerialPagePrecedesLogically(headPage, targetPage - 1));
     818              : #if 0
     819              :     Assert(SerialPagePrecedesLogically(headPage, targetPage));
     820              : #endif
     821              : }
     822              : #endif
     823              : 
     824              : /*
     825              :  * GUC check_hook for serializable_buffers
     826              :  */
     827              : bool
     828         1279 : check_serial_buffers(int *newval, void **extra, GucSource source)
     829              : {
     830         1279 :     return check_slru_buffers("serializable_buffers", newval);
     831              : }
     832              : 
     833              : /*
     834              :  * Record a committed read write serializable xid and the minimum
     835              :  * commitSeqNo of any transactions to which this xid had a rw-conflict out.
     836              :  * An invalid commitSeqNo means that there were no conflicts out from xid.
     837              :  */
     838              : static void
     839            0 : SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
     840              : {
     841              :     TransactionId tailXid;
     842              :     int64       targetPage;
     843              :     int         slotno;
     844              :     int64       firstZeroPage;
     845              :     bool        isNewPage;
     846              :     LWLock     *lock;
     847              : 
     848              :     Assert(TransactionIdIsValid(xid));
     849              : 
     850            0 :     targetPage = SerialPage(xid);
     851            0 :     lock = SimpleLruGetBankLock(SerialSlruCtl, targetPage);
     852              : 
     853              :     /*
     854              :      * In this routine, we must hold both SerialControlLock and the SLRU bank
     855              :      * lock simultaneously while making the SLRU data catch up with the new
     856              :      * state that we determine.
     857              :      */
     858            0 :     LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
     859              : 
     860              :     /*
     861              :      * If 'xid' is older than the global xmin (== tailXid), there's no need to
     862              :      * store it, after all. This can happen if the oldest transaction holding
     863              :      * back the global xmin just finished, making 'xid' uninteresting, but
     864              :      * ClearOldPredicateLocks() has not yet run.
     865              :      */
     866            0 :     tailXid = serialControl->tailXid;
     867            0 :     if (!TransactionIdIsValid(tailXid) || TransactionIdPrecedes(xid, tailXid))
     868              :     {
     869            0 :         LWLockRelease(SerialControlLock);
     870            0 :         return;
     871              :     }
     872              : 
     873              :     /*
     874              :      * If the SLRU is currently unused, zero out the whole active region from
     875              :      * tailXid to headXid before taking it into use. Otherwise zero out only
     876              :      * any new pages that enter the tailXid-headXid range as we advance
     877              :      * headXid.
     878              :      */
     879            0 :     if (serialControl->headPage < 0)
     880              :     {
     881            0 :         firstZeroPage = SerialPage(tailXid);
     882            0 :         isNewPage = true;
     883              :     }
     884              :     else
     885              :     {
     886            0 :         firstZeroPage = SerialNextPage(serialControl->headPage);
     887            0 :         isNewPage = SerialPagePrecedesLogically(serialControl->headPage,
     888              :                                                 targetPage);
     889              :     }
     890              : 
     891            0 :     if (!TransactionIdIsValid(serialControl->headXid)
     892            0 :         || TransactionIdFollows(xid, serialControl->headXid))
     893            0 :         serialControl->headXid = xid;
     894            0 :     if (isNewPage)
     895            0 :         serialControl->headPage = targetPage;
     896              : 
     897            0 :     if (isNewPage)
     898              :     {
     899              :         /* Initialize intervening pages; might involve trading locks */
     900              :         for (;;)
     901              :         {
     902            0 :             lock = SimpleLruGetBankLock(SerialSlruCtl, firstZeroPage);
     903            0 :             LWLockAcquire(lock, LW_EXCLUSIVE);
     904            0 :             slotno = SimpleLruZeroPage(SerialSlruCtl, firstZeroPage);
     905            0 :             if (firstZeroPage == targetPage)
     906            0 :                 break;
     907            0 :             firstZeroPage = SerialNextPage(firstZeroPage);
     908            0 :             LWLockRelease(lock);
     909              :         }
     910              :     }
     911              :     else
     912              :     {
     913            0 :         LWLockAcquire(lock, LW_EXCLUSIVE);
     914            0 :         slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, &xid);
     915              :     }
     916              : 
     917            0 :     SerialValue(slotno, xid) = minConflictCommitSeqNo;
     918            0 :     SerialSlruCtl->shared->page_dirty[slotno] = true;
     919              : 
     920            0 :     LWLockRelease(lock);
     921            0 :     LWLockRelease(SerialControlLock);
     922              : }
     923              : 
     924              : /*
     925              :  * Get the minimum commitSeqNo for any conflict out for the given xid.  For
     926              :  * a transaction which exists but has no conflict out, InvalidSerCommitSeqNo
     927              :  * will be returned.
     928              :  */
     929              : static SerCommitSeqNo
     930           21 : SerialGetMinConflictCommitSeqNo(TransactionId xid)
     931              : {
     932              :     TransactionId headXid;
     933              :     TransactionId tailXid;
     934              :     SerCommitSeqNo val;
     935              :     int         slotno;
     936              : 
     937              :     Assert(TransactionIdIsValid(xid));
     938              : 
     939           21 :     LWLockAcquire(SerialControlLock, LW_SHARED);
     940           21 :     headXid = serialControl->headXid;
     941           21 :     tailXid = serialControl->tailXid;
     942           21 :     LWLockRelease(SerialControlLock);
     943              : 
     944           21 :     if (!TransactionIdIsValid(headXid))
     945           21 :         return 0;
     946              : 
     947              :     Assert(TransactionIdIsValid(tailXid));
     948              : 
     949            0 :     if (TransactionIdPrecedes(xid, tailXid)
     950            0 :         || TransactionIdFollows(xid, headXid))
     951            0 :         return 0;
     952              : 
     953              :     /*
     954              :      * The following function must be called without holding SLRU bank lock,
     955              :      * but will return with that lock held, which must then be released.
     956              :      */
     957            0 :     slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
     958            0 :                                         SerialPage(xid), &xid);
     959            0 :     val = SerialValue(slotno, xid);
     960            0 :     LWLockRelease(SimpleLruGetBankLock(SerialSlruCtl, SerialPage(xid)));
     961            0 :     return val;
     962              : }
     963              : 
     964              : /*
     965              :  * Call this whenever there is a new xmin for active serializable
     966              :  * transactions.  We don't need to keep information on transactions which
     967              :  * precede that.  InvalidTransactionId means none active, so everything in
     968              :  * the SLRU can be discarded.
     969              :  */
     970              : static void
     971         1767 : SerialSetActiveSerXmin(TransactionId xid)
     972              : {
     973         1767 :     LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
     974              : 
     975              :     /*
     976              :      * When no sxacts are active, nothing overlaps, set the xid values to
     977              :      * invalid to show that there are no valid entries.  Don't clear headPage,
     978              :      * though.  A new xmin might still land on that page, and we don't want to
     979              :      * repeatedly zero out the same page.
     980              :      */
     981         1767 :     if (!TransactionIdIsValid(xid))
     982              :     {
     983          874 :         serialControl->tailXid = InvalidTransactionId;
     984          874 :         serialControl->headXid = InvalidTransactionId;
     985          874 :         LWLockRelease(SerialControlLock);
     986          874 :         return;
     987              :     }
     988              : 
     989              :     /*
     990              :      * When we're recovering prepared transactions, the global xmin might move
     991              :      * backwards depending on the order they're recovered. Normally that's not
     992              :      * OK, but during recovery no serializable transactions will commit, so
     993              :      * the SLRU is empty and we can get away with it.
     994              :      */
     995          893 :     if (RecoveryInProgress())
     996              :     {
     997              :         Assert(serialControl->headPage < 0);
     998            0 :         if (!TransactionIdIsValid(serialControl->tailXid)
     999            0 :             || TransactionIdPrecedes(xid, serialControl->tailXid))
    1000              :         {
    1001            0 :             serialControl->tailXid = xid;
    1002              :         }
    1003            0 :         LWLockRelease(SerialControlLock);
    1004            0 :         return;
    1005              :     }
    1006              : 
    1007              :     Assert(!TransactionIdIsValid(serialControl->tailXid)
    1008              :            || TransactionIdFollows(xid, serialControl->tailXid));
    1009              : 
    1010          893 :     serialControl->tailXid = xid;
    1011              : 
    1012          893 :     LWLockRelease(SerialControlLock);
    1013              : }
    1014              : 
    1015              : /*
    1016              :  * Perform a checkpoint --- either during shutdown, or on-the-fly
    1017              :  *
    1018              :  * We don't have any data that needs to survive a restart, but this is a
    1019              :  * convenient place to truncate the SLRU.
    1020              :  */
    1021              : void
    1022         1938 : CheckPointPredicate(void)
    1023              : {
    1024              :     int64       truncateCutoffPage;
    1025              : 
    1026         1938 :     LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
    1027              : 
    1028              :     /* Exit quickly if the SLRU is currently not in use. */
    1029         1938 :     if (serialControl->headPage < 0)
    1030              :     {
    1031         1938 :         LWLockRelease(SerialControlLock);
    1032         1938 :         return;
    1033              :     }
    1034              : 
    1035            0 :     if (TransactionIdIsValid(serialControl->tailXid))
    1036              :     {
    1037              :         int64       tailPage;
    1038              : 
    1039            0 :         tailPage = SerialPage(serialControl->tailXid);
    1040              : 
    1041              :         /*
    1042              :          * It is possible for the tailXid to be ahead of the headXid.  This
    1043              :          * occurs if we checkpoint while there are in-progress serializable
    1044              :          * transaction(s) advancing the tail but we are yet to summarize the
    1045              :          * transactions.  In this case, we cutoff up to the headPage and the
    1046              :          * next summary will advance the headXid.
    1047              :          */
    1048            0 :         if (SerialPagePrecedesLogically(tailPage, serialControl->headPage))
    1049              :         {
    1050              :             /* We can truncate the SLRU up to the page containing tailXid */
    1051            0 :             truncateCutoffPage = tailPage;
    1052              :         }
    1053              :         else
    1054            0 :             truncateCutoffPage = serialControl->headPage;
    1055              :     }
    1056              :     else
    1057              :     {
    1058              :         /*----------
    1059              :          * The SLRU is no longer needed. Truncate to head before we set head
    1060              :          * invalid.
    1061              :          *
    1062              :          * XXX: It's possible that the SLRU is not needed again until XID
    1063              :          * wrap-around has happened, so that the segment containing headPage
    1064              :          * that we leave behind will appear to be new again. In that case it
    1065              :          * won't be removed until XID horizon advances enough to make it
    1066              :          * current again.
    1067              :          *
    1068              :          * XXX: This should happen in vac_truncate_clog(), not in checkpoints.
    1069              :          * Consider this scenario, starting from a system with no in-progress
    1070              :          * transactions and VACUUM FREEZE having maximized oldestXact:
    1071              :          * - Start a SERIALIZABLE transaction.
    1072              :          * - Start, finish, and summarize a SERIALIZABLE transaction, creating
    1073              :          *   one SLRU page.
    1074              :          * - Consume XIDs to reach xidStopLimit.
    1075              :          * - Finish all transactions.  Due to the long-running SERIALIZABLE
    1076              :          *   transaction, earlier checkpoints did not touch headPage.  The
    1077              :          *   next checkpoint will change it, but that checkpoint happens after
    1078              :          *   the end of the scenario.
    1079              :          * - VACUUM to advance XID limits.
    1080              :          * - Consume ~2M XIDs, crossing the former xidWrapLimit.
    1081              :          * - Start, finish, and summarize a SERIALIZABLE transaction.
    1082              :          *   SerialAdd() declines to create the targetPage, because headPage
    1083              :          *   is not regarded as in the past relative to that targetPage.  The
    1084              :          *   transaction instigating the summarize fails in
    1085              :          *   SimpleLruReadPage().
    1086              :          */
    1087            0 :         truncateCutoffPage = serialControl->headPage;
    1088            0 :         serialControl->headPage = -1;
    1089              :     }
    1090              : 
    1091            0 :     LWLockRelease(SerialControlLock);
    1092              : 
    1093              :     /*
    1094              :      * Truncate away pages that are no longer required.  Note that no
    1095              :      * additional locking is required, because this is only called as part of
    1096              :      * a checkpoint, and the validity limits have already been determined.
    1097              :      */
    1098            0 :     SimpleLruTruncate(SerialSlruCtl, truncateCutoffPage);
    1099              : 
    1100              :     /*
    1101              :      * Write dirty SLRU pages to disk
    1102              :      *
    1103              :      * This is not actually necessary from a correctness point of view. We do
    1104              :      * it merely as a debugging aid.
    1105              :      *
    1106              :      * We're doing this after the truncation to avoid writing pages right
    1107              :      * before deleting the file in which they sit, which would be completely
    1108              :      * pointless.
    1109              :      */
    1110            0 :     SimpleLruWriteAll(SerialSlruCtl, true);
    1111              : }
    1112              : 
    1113              : /*------------------------------------------------------------------------*/
    1114              : 
    1115              : /*
    1116              :  * PredicateLockShmemRequest -- Register the predicate locking data structures.
    1117              :  */
    1118              : static void
    1119         1238 : PredicateLockShmemRequest(void *arg)
    1120              : {
    1121              :     int64       max_predicate_lock_targets;
    1122              :     int64       max_predicate_locks;
    1123              :     int64       max_rw_conflicts;
    1124              : 
    1125              :     /*
    1126              :      * Register hash table for PREDICATELOCKTARGET structs.  This stores
    1127              :      * per-predicate-lock-target information.
    1128              :      */
    1129         1238 :     max_predicate_lock_targets = NPREDICATELOCKTARGETENTS();
    1130              : 
    1131         1238 :     ShmemRequestHash(.name = "PREDICATELOCKTARGET hash",
    1132              :                      .nelems = max_predicate_lock_targets,
    1133              :                      .ptr = &PredicateLockTargetHash,
    1134              :                      .hash_info.keysize = sizeof(PREDICATELOCKTARGETTAG),
    1135              :                      .hash_info.entrysize = sizeof(PREDICATELOCKTARGET),
    1136              :                      .hash_info.num_partitions = NUM_PREDICATELOCK_PARTITIONS,
    1137              :                      .hash_flags = HASH_ELEM | HASH_BLOBS | HASH_PARTITION | HASH_FIXED_SIZE,
    1138              :         );
    1139              : 
    1140              :     /*
    1141              :      * Allocate hash table for PREDICATELOCK structs.  This stores per
    1142              :      * xact-lock-of-a-target information.
    1143              :      *
    1144              :      * Assume an average of 2 xacts per target.
    1145              :      */
    1146         1238 :     max_predicate_locks = max_predicate_lock_targets * 2;
    1147              : 
    1148         1238 :     ShmemRequestHash(.name = "PREDICATELOCK hash",
    1149              :                      .nelems = max_predicate_locks,
    1150              :                      .ptr = &PredicateLockHash,
    1151              :                      .hash_info.keysize = sizeof(PREDICATELOCKTAG),
    1152              :                      .hash_info.entrysize = sizeof(PREDICATELOCK),
    1153              :                      .hash_info.hash = predicatelock_hash,
    1154              :                      .hash_info.num_partitions = NUM_PREDICATELOCK_PARTITIONS,
    1155              :                      .hash_flags = HASH_ELEM | HASH_FUNCTION | HASH_PARTITION | HASH_FIXED_SIZE,
    1156              :         );
    1157              : 
    1158              :     /*
    1159              :      * Compute size for serializable transaction hashtable.
    1160              :      *
    1161              :      * Assume an average of 10 predicate locking transactions per backend.
    1162              :      * This allows aggressive cleanup while detail is present before data must
    1163              :      * be summarized for storage in SLRU and the "dummy" transaction.
    1164              :      */
    1165         1238 :     max_serializable_xacts = (MaxBackends + max_prepared_xacts) * 10;
    1166              : 
    1167              :     /*
    1168              :      * Register a list to hold information on transactions participating in
    1169              :      * predicate locking.
    1170              :      */
    1171         1238 :     ShmemRequestStruct(.name = "PredXactList",
    1172              :                        .size = add_size(PredXactListDataSize,
    1173              :                                         (mul_size((Size) max_serializable_xacts,
    1174              :                                                   sizeof(SERIALIZABLEXACT)))),
    1175              :                        .ptr = (void **) &PredXact,
    1176              :         );
    1177              : 
    1178              :     /*
    1179              :      * Register hash table for SERIALIZABLEXID structs.  This stores per-xid
    1180              :      * information for serializable transactions which have accessed data.
    1181              :      */
    1182         1238 :     ShmemRequestHash(.name = "SERIALIZABLEXID hash",
    1183              :                      .nelems = max_serializable_xacts,
    1184              :                      .ptr = &SerializableXidHash,
    1185              :                      .hash_info.keysize = sizeof(SERIALIZABLEXIDTAG),
    1186              :                      .hash_info.entrysize = sizeof(SERIALIZABLEXID),
    1187              :                      .hash_flags = HASH_ELEM | HASH_BLOBS | HASH_FIXED_SIZE,
    1188              :         );
    1189              : 
    1190              :     /*
    1191              :      * Allocate space for tracking rw-conflicts in lists attached to the
    1192              :      * transactions.
    1193              :      *
    1194              :      * Assume an average of 5 conflicts per transaction.  Calculations suggest
    1195              :      * that this will prevent resource exhaustion in even the most pessimal
    1196              :      * loads up to max_connections = 200 with all 200 connections pounding the
    1197              :      * database with serializable transactions.  Beyond that, there may be
    1198              :      * occasional transactions canceled when trying to flag conflicts. That's
    1199              :      * probably OK.
    1200              :      */
    1201         1238 :     max_rw_conflicts = max_serializable_xacts * 5;
    1202              : 
    1203         1238 :     ShmemRequestStruct(.name = "RWConflictPool",
    1204              :                        .size = RWConflictPoolHeaderDataSize + mul_size((Size) max_rw_conflicts,
    1205              :                                                                        RWConflictDataSize),
    1206              :                        .ptr = (void **) &RWConflictPool,
    1207              :         );
    1208              : 
    1209         1238 :     ShmemRequestStruct(.name = "FinishedSerializableTransactions",
    1210              :                        .size = sizeof(dlist_head),
    1211              :                        .ptr = (void **) &FinishedSerializableTransactions,
    1212              :         );
    1213              : 
    1214              :     /*
    1215              :      * Initialize the SLRU storage for old committed serializable
    1216              :      * transactions.
    1217              :      */
    1218         1238 :     SimpleLruRequest(.desc = &SerialSlruDesc,
    1219              :                      .name = "serializable",
    1220              :                      .Dir = "pg_serial",
    1221              :                      .long_segment_names = false,
    1222              : 
    1223              :                      .nslots = serializable_buffers,
    1224              : 
    1225              :                      .sync_handler = SYNC_HANDLER_NONE,
    1226              :                      .PagePrecedes = SerialPagePrecedesLogically,
    1227              :                      .errdetail_for_io_error = serial_errdetail_for_io_error,
    1228              : 
    1229              :                      .buffer_tranche_id = LWTRANCHE_SERIAL_BUFFER,
    1230              :                      .bank_tranche_id = LWTRANCHE_SERIAL_SLRU,
    1231              :         );
    1232              : #ifdef USE_ASSERT_CHECKING
    1233              :     SerialPagePrecedesLogicallyUnitTests();
    1234              : #endif
    1235              : 
    1236         1238 :     ShmemRequestStruct(.name = "SerialControlData",
    1237              :                        .size = sizeof(SerialControlData),
    1238              :                        .ptr = (void **) &serialControl,
    1239              :         );
    1240         1238 : }
    1241              : 
    1242              : static void
    1243         1235 : PredicateLockShmemInit(void *arg)
    1244              : {
    1245              :     int         max_rw_conflicts;
    1246              :     bool        found;
    1247              : 
    1248              :     /*
    1249              :      * Reserve a dummy entry in the hash table; we use it to make sure there's
    1250              :      * always one entry available when we need to split or combine a page,
    1251              :      * because running out of space there could mean aborting a
    1252              :      * non-serializable transaction.
    1253              :      */
    1254         1235 :     (void) hash_search(PredicateLockTargetHash, &ScratchTargetTag,
    1255              :                        HASH_ENTER, &found);
    1256              :     Assert(!found);
    1257              : 
    1258         1235 :     dlist_init(&PredXact->availableList);
    1259         1235 :     dlist_init(&PredXact->activeList);
    1260         1235 :     PredXact->SxactGlobalXmin = InvalidTransactionId;
    1261         1235 :     PredXact->SxactGlobalXminCount = 0;
    1262         1235 :     PredXact->WritableSxactCount = 0;
    1263         1235 :     PredXact->LastSxactCommitSeqNo = FirstNormalSerCommitSeqNo - 1;
    1264         1235 :     PredXact->CanPartialClearThrough = 0;
    1265         1235 :     PredXact->HavePartialClearedThrough = 0;
    1266         1235 :     PredXact->element
    1267         1235 :         = (SERIALIZABLEXACT *) ((char *) PredXact + PredXactListDataSize);
    1268              :     /* Add all elements to available list, clean. */
    1269      1168055 :     for (int i = 0; i < max_serializable_xacts; i++)
    1270              :     {
    1271      1166820 :         LWLockInitialize(&PredXact->element[i].perXactPredicateListLock,
    1272              :                          LWTRANCHE_PER_XACT_PREDICATE_LIST);
    1273      1166820 :         dlist_push_tail(&PredXact->availableList, &PredXact->element[i].xactLink);
    1274              :     }
    1275         1235 :     PredXact->OldCommittedSxact = CreatePredXact();
    1276         1235 :     SetInvalidVirtualTransactionId(PredXact->OldCommittedSxact->vxid);
    1277         1235 :     PredXact->OldCommittedSxact->prepareSeqNo = 0;
    1278         1235 :     PredXact->OldCommittedSxact->commitSeqNo = 0;
    1279         1235 :     PredXact->OldCommittedSxact->SeqNo.lastCommitBeforeSnapshot = 0;
    1280         1235 :     dlist_init(&PredXact->OldCommittedSxact->outConflicts);
    1281         1235 :     dlist_init(&PredXact->OldCommittedSxact->inConflicts);
    1282         1235 :     dlist_init(&PredXact->OldCommittedSxact->predicateLocks);
    1283         1235 :     dlist_node_init(&PredXact->OldCommittedSxact->finishedLink);
    1284         1235 :     dlist_init(&PredXact->OldCommittedSxact->possibleUnsafeConflicts);
    1285         1235 :     PredXact->OldCommittedSxact->topXid = InvalidTransactionId;
    1286         1235 :     PredXact->OldCommittedSxact->finishedBefore = InvalidTransactionId;
    1287         1235 :     PredXact->OldCommittedSxact->xmin = InvalidTransactionId;
    1288         1235 :     PredXact->OldCommittedSxact->flags = SXACT_FLAG_COMMITTED;
    1289         1235 :     PredXact->OldCommittedSxact->pid = 0;
    1290         1235 :     PredXact->OldCommittedSxact->pgprocno = INVALID_PROC_NUMBER;
    1291              : 
    1292              :     /* Initialize the rw-conflict pool */
    1293         1235 :     dlist_init(&RWConflictPool->availableList);
    1294         1235 :     RWConflictPool->element = (RWConflict) ((char *) RWConflictPool +
    1295              :                                             RWConflictPoolHeaderDataSize);
    1296              : 
    1297         1235 :     max_rw_conflicts = max_serializable_xacts * 5;
    1298              : 
    1299              :     /* Add all elements to available list, clean. */
    1300      5835335 :     for (int i = 0; i < max_rw_conflicts; i++)
    1301              :     {
    1302      5834100 :         dlist_push_tail(&RWConflictPool->availableList,
    1303      5834100 :                         &RWConflictPool->element[i].outLink);
    1304              :     }
    1305              : 
    1306              :     /* Initialize the list of finished serializable transactions */
    1307         1235 :     dlist_init(FinishedSerializableTransactions);
    1308              : 
    1309              :     /* Initialize SerialControl to reflect empty SLRU. */
    1310         1235 :     LWLockAcquire(SerialControlLock, LW_EXCLUSIVE);
    1311         1235 :     serialControl->headPage = -1;
    1312         1235 :     serialControl->headXid = InvalidTransactionId;
    1313         1235 :     serialControl->tailXid = InvalidTransactionId;
    1314         1235 :     LWLockRelease(SerialControlLock);
    1315              : 
    1316              :     SlruPagePrecedesUnitTests(SerialSlruCtl, SERIAL_ENTRIESPERPAGE);
    1317              : 
    1318              :     /* This never changes, so let's keep a local copy. */
    1319         1235 :     OldCommittedSxact = PredXact->OldCommittedSxact;
    1320              : 
    1321              :     /* Pre-calculate the hash and partition lock of the scratch entry */
    1322         1235 :     ScratchTargetTagHash = PredicateLockTargetTagHashCode(&ScratchTargetTag);
    1323         1235 :     ScratchPartitionLock = PredicateLockHashPartitionLock(ScratchTargetTagHash);
    1324         1235 : }
    1325              : 
    1326              : static void
    1327            0 : PredicateLockShmemAttach(void *arg)
    1328              : {
    1329              :     /* This never changes, so let's keep a local copy. */
    1330            0 :     OldCommittedSxact = PredXact->OldCommittedSxact;
    1331              : 
    1332              :     /* Pre-calculate the hash and partition lock of the scratch entry */
    1333            0 :     ScratchTargetTagHash = PredicateLockTargetTagHashCode(&ScratchTargetTag);
    1334            0 :     ScratchPartitionLock = PredicateLockHashPartitionLock(ScratchTargetTagHash);
    1335            0 : }
    1336              : 
    1337              : /*
    1338              :  * Compute the hash code associated with a PREDICATELOCKTAG.
    1339              :  *
    1340              :  * Because we want to use just one set of partition locks for both the
    1341              :  * PREDICATELOCKTARGET and PREDICATELOCK hash tables, we have to make sure
    1342              :  * that PREDICATELOCKs fall into the same partition number as their
    1343              :  * associated PREDICATELOCKTARGETs.  dynahash.c expects the partition number
    1344              :  * to be the low-order bits of the hash code, and therefore a
    1345              :  * PREDICATELOCKTAG's hash code must have the same low-order bits as the
    1346              :  * associated PREDICATELOCKTARGETTAG's hash code.  We achieve this with this
    1347              :  * specialized hash function.
    1348              :  */
    1349              : static uint32
    1350            0 : predicatelock_hash(const void *key, Size keysize)
    1351              : {
    1352            0 :     const PREDICATELOCKTAG *predicatelocktag = (const PREDICATELOCKTAG *) key;
    1353              :     uint32      targethash;
    1354              : 
    1355              :     Assert(keysize == sizeof(PREDICATELOCKTAG));
    1356              : 
    1357              :     /* Look into the associated target object, and compute its hash code */
    1358            0 :     targethash = PredicateLockTargetTagHashCode(&predicatelocktag->myTarget->tag);
    1359              : 
    1360            0 :     return PredicateLockHashCodeFromTargetHashCode(predicatelocktag, targethash);
    1361              : }
    1362              : 
    1363              : 
    1364              : /*
    1365              :  * GetPredicateLockStatusData
    1366              :  *      Return a table containing the internal state of the predicate
    1367              :  *      lock manager for use in pg_lock_status.
    1368              :  *
    1369              :  * Like GetLockStatusData, this function tries to hold the partition LWLocks
    1370              :  * for as short a time as possible by returning two arrays that simply
    1371              :  * contain the PREDICATELOCKTARGETTAG and SERIALIZABLEXACT for each lock
    1372              :  * table entry. Multiple copies of the same PREDICATELOCKTARGETTAG and
    1373              :  * SERIALIZABLEXACT will likely appear.
    1374              :  */
    1375              : PredicateLockData *
    1376          414 : GetPredicateLockStatusData(void)
    1377              : {
    1378              :     PredicateLockData *data;
    1379              :     int         i;
    1380              :     int         els,
    1381              :                 el;
    1382              :     HASH_SEQ_STATUS seqstat;
    1383              :     PREDICATELOCK *predlock;
    1384              : 
    1385          414 :     data = palloc_object(PredicateLockData);
    1386              : 
    1387              :     /*
    1388              :      * To ensure consistency, take simultaneous locks on all partition locks
    1389              :      * in ascending order, then SerializableXactHashLock.
    1390              :      */
    1391         7038 :     for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
    1392         6624 :         LWLockAcquire(PredicateLockHashPartitionLockByIndex(i), LW_SHARED);
    1393          414 :     LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    1394              : 
    1395              :     /* Get number of locks and allocate appropriately-sized arrays. */
    1396          414 :     els = hash_get_num_entries(PredicateLockHash);
    1397          414 :     data->nelements = els;
    1398          414 :     data->locktags = palloc_array(PREDICATELOCKTARGETTAG, els);
    1399          414 :     data->xacts = palloc_array(SERIALIZABLEXACT, els);
    1400              : 
    1401              : 
    1402              :     /* Scan through PredicateLockHash and copy contents */
    1403          414 :     hash_seq_init(&seqstat, PredicateLockHash);
    1404              : 
    1405          414 :     el = 0;
    1406              : 
    1407          418 :     while ((predlock = (PREDICATELOCK *) hash_seq_search(&seqstat)))
    1408              :     {
    1409            4 :         data->locktags[el] = predlock->tag.myTarget->tag;
    1410            4 :         data->xacts[el] = *predlock->tag.myXact;
    1411            4 :         el++;
    1412              :     }
    1413              : 
    1414              :     Assert(el == els);
    1415              : 
    1416              :     /* Release locks in reverse order */
    1417          414 :     LWLockRelease(SerializableXactHashLock);
    1418         7038 :     for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
    1419         6624 :         LWLockRelease(PredicateLockHashPartitionLockByIndex(i));
    1420              : 
    1421          414 :     return data;
    1422              : }
    1423              : 
    1424              : /*
    1425              :  * Free up shared memory structures by pushing the oldest sxact (the one at
    1426              :  * the front of the SummarizeOldestCommittedSxact queue) into summary form.
    1427              :  * Each call will free exactly one SERIALIZABLEXACT structure and may also
    1428              :  * free one or more of these structures: SERIALIZABLEXID, PREDICATELOCK,
    1429              :  * PREDICATELOCKTARGET, RWConflictData.
    1430              :  */
    1431              : static void
    1432            0 : SummarizeOldestCommittedSxact(void)
    1433              : {
    1434              :     SERIALIZABLEXACT *sxact;
    1435              : 
    1436            0 :     LWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
    1437              : 
    1438              :     /*
    1439              :      * This function is only called if there are no sxact slots available.
    1440              :      * Some of them must belong to old, already-finished transactions, so
    1441              :      * there should be something in FinishedSerializableTransactions list that
    1442              :      * we can summarize. However, there's a race condition: while we were not
    1443              :      * holding any locks, a transaction might have ended and cleaned up all
    1444              :      * the finished sxact entries already, freeing up their sxact slots. In
    1445              :      * that case, we have nothing to do here. The caller will find one of the
    1446              :      * slots released by the other backend when it retries.
    1447              :      */
    1448            0 :     if (dlist_is_empty(FinishedSerializableTransactions))
    1449              :     {
    1450            0 :         LWLockRelease(SerializableFinishedListLock);
    1451            0 :         return;
    1452              :     }
    1453              : 
    1454              :     /*
    1455              :      * Grab the first sxact off the finished list -- this will be the earliest
    1456              :      * commit.  Remove it from the list.
    1457              :      */
    1458            0 :     sxact = dlist_head_element(SERIALIZABLEXACT, finishedLink,
    1459              :                                FinishedSerializableTransactions);
    1460            0 :     dlist_delete_thoroughly(&sxact->finishedLink);
    1461              : 
    1462              :     /* Add to SLRU summary information. */
    1463            0 :     if (TransactionIdIsValid(sxact->topXid) && !SxactIsReadOnly(sxact))
    1464            0 :         SerialAdd(sxact->topXid, SxactHasConflictOut(sxact)
    1465              :                   ? sxact->SeqNo.earliestOutConflictCommit : InvalidSerCommitSeqNo);
    1466              : 
    1467              :     /* Summarize and release the detail. */
    1468            0 :     ReleaseOneSerializableXact(sxact, false, true);
    1469              : 
    1470            0 :     LWLockRelease(SerializableFinishedListLock);
    1471              : }
    1472              : 
    1473              : /*
    1474              :  * GetSafeSnapshot
    1475              :  *      Obtain and register a snapshot for a READ ONLY DEFERRABLE
    1476              :  *      transaction. Ensures that the snapshot is "safe", i.e. a
    1477              :  *      read-only transaction running on it can execute serializably
    1478              :  *      without further checks. This requires waiting for concurrent
    1479              :  *      transactions to complete, and retrying with a new snapshot if
    1480              :  *      one of them could possibly create a conflict.
    1481              :  *
    1482              :  *      As with GetSerializableTransactionSnapshot (which this is a subroutine
    1483              :  *      for), the passed-in Snapshot pointer should reference a static data
    1484              :  *      area that can safely be passed to GetSnapshotData.
    1485              :  */
    1486              : static Snapshot
    1487            7 : GetSafeSnapshot(Snapshot origSnapshot)
    1488              : {
    1489              :     Snapshot    snapshot;
    1490              : 
    1491              :     Assert(XactReadOnly && XactDeferrable);
    1492              : 
    1493              :     while (true)
    1494              :     {
    1495              :         /*
    1496              :          * GetSerializableTransactionSnapshotInt is going to call
    1497              :          * GetSnapshotData, so we need to provide it the static snapshot area
    1498              :          * our caller passed to us.  The pointer returned is actually the same
    1499              :          * one passed to it, but we avoid assuming that here.
    1500              :          */
    1501            8 :         snapshot = GetSerializableTransactionSnapshotInt(origSnapshot,
    1502              :                                                          NULL, InvalidPid);
    1503              : 
    1504            8 :         if (MySerializableXact == InvalidSerializableXact)
    1505            5 :             return snapshot;    /* no concurrent r/w xacts; it's safe */
    1506              : 
    1507            3 :         LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    1508              : 
    1509              :         /*
    1510              :          * Wait for concurrent transactions to finish. Stop early if one of
    1511              :          * them marked us as conflicted.
    1512              :          */
    1513            3 :         MySerializableXact->flags |= SXACT_FLAG_DEFERRABLE_WAITING;
    1514            6 :         while (!(dlist_is_empty(&MySerializableXact->possibleUnsafeConflicts) ||
    1515            3 :                  SxactIsROUnsafe(MySerializableXact)))
    1516              :         {
    1517            3 :             LWLockRelease(SerializableXactHashLock);
    1518            3 :             ProcWaitForSignal(WAIT_EVENT_SAFE_SNAPSHOT);
    1519            3 :             LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    1520              :         }
    1521            3 :         MySerializableXact->flags &= ~SXACT_FLAG_DEFERRABLE_WAITING;
    1522              : 
    1523            3 :         if (!SxactIsROUnsafe(MySerializableXact))
    1524              :         {
    1525            2 :             LWLockRelease(SerializableXactHashLock);
    1526            2 :             break;              /* success */
    1527              :         }
    1528              : 
    1529            1 :         LWLockRelease(SerializableXactHashLock);
    1530              : 
    1531              :         /* else, need to retry... */
    1532            1 :         ereport(DEBUG2,
    1533              :                 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    1534              :                  errmsg_internal("deferrable snapshot was unsafe; trying a new one")));
    1535            1 :         ReleasePredicateLocks(false, false);
    1536              :     }
    1537              : 
    1538              :     /*
    1539              :      * Now we have a safe snapshot, so we don't need to do any further checks.
    1540              :      */
    1541              :     Assert(SxactIsROSafe(MySerializableXact));
    1542            2 :     ReleasePredicateLocks(false, true);
    1543              : 
    1544            2 :     return snapshot;
    1545              : }
    1546              : 
    1547              : /*
    1548              :  * GetSafeSnapshotBlockingPids
    1549              :  *      If the specified process is currently blocked in GetSafeSnapshot,
    1550              :  *      write the process IDs of all processes that it is blocked by
    1551              :  *      into the caller-supplied buffer output[].  The list is truncated at
    1552              :  *      output_size, and the number of PIDs written into the buffer is
    1553              :  *      returned.  Returns zero if the given PID is not currently blocked
    1554              :  *      in GetSafeSnapshot.
    1555              :  */
    1556              : int
    1557          409 : GetSafeSnapshotBlockingPids(int blocked_pid, int *output, int output_size)
    1558              : {
    1559          409 :     int         num_written = 0;
    1560              :     dlist_iter  iter;
    1561          409 :     SERIALIZABLEXACT *blocking_sxact = NULL;
    1562              : 
    1563          409 :     LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    1564              : 
    1565              :     /* Find blocked_pid's SERIALIZABLEXACT by linear search. */
    1566          958 :     dlist_foreach(iter, &PredXact->activeList)
    1567              :     {
    1568          628 :         SERIALIZABLEXACT *sxact =
    1569          628 :             dlist_container(SERIALIZABLEXACT, xactLink, iter.cur);
    1570              : 
    1571          628 :         if (sxact->pid == blocked_pid)
    1572              :         {
    1573           79 :             blocking_sxact = sxact;
    1574           79 :             break;
    1575              :         }
    1576              :     }
    1577              : 
    1578              :     /* Did we find it, and is it currently waiting in GetSafeSnapshot? */
    1579          409 :     if (blocking_sxact != NULL && SxactIsDeferrableWaiting(blocking_sxact))
    1580              :     {
    1581              :         /* Traverse the list of possible unsafe conflicts collecting PIDs. */
    1582            2 :         dlist_foreach(iter, &blocking_sxact->possibleUnsafeConflicts)
    1583              :         {
    1584            2 :             RWConflict  possibleUnsafeConflict =
    1585            2 :                 dlist_container(RWConflictData, inLink, iter.cur);
    1586              : 
    1587            2 :             output[num_written++] = possibleUnsafeConflict->sxactOut->pid;
    1588              : 
    1589            2 :             if (num_written >= output_size)
    1590            2 :                 break;
    1591              :         }
    1592              :     }
    1593              : 
    1594          409 :     LWLockRelease(SerializableXactHashLock);
    1595              : 
    1596          409 :     return num_written;
    1597              : }
    1598              : 
    1599              : /*
    1600              :  * Acquire a snapshot that can be used for the current transaction.
    1601              :  *
    1602              :  * Make sure we have a SERIALIZABLEXACT reference in MySerializableXact.
    1603              :  * It should be current for this process and be contained in PredXact.
    1604              :  *
    1605              :  * The passed-in Snapshot pointer should reference a static data area that
    1606              :  * can safely be passed to GetSnapshotData.  The return value is actually
    1607              :  * always this same pointer; no new snapshot data structure is allocated
    1608              :  * within this function.
    1609              :  */
    1610              : Snapshot
    1611         1691 : GetSerializableTransactionSnapshot(Snapshot snapshot)
    1612              : {
    1613              :     Assert(IsolationIsSerializable());
    1614              : 
    1615              :     /*
    1616              :      * Can't use serializable mode while recovery is still active, as it is,
    1617              :      * for example, on a hot standby.  We could get here despite the check in
    1618              :      * check_transaction_isolation() if default_transaction_isolation is set
    1619              :      * to serializable, so phrase the hint accordingly.
    1620              :      */
    1621         1691 :     if (RecoveryInProgress())
    1622            0 :         ereport(ERROR,
    1623              :                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    1624              :                  errmsg("cannot use serializable mode in a hot standby"),
    1625              :                  errdetail("\"default_transaction_isolation\" is set to \"serializable\"."),
    1626              :                  errhint("You can use \"SET default_transaction_isolation = 'repeatable read'\" to change the default.")));
    1627              : 
    1628              :     /*
    1629              :      * A special optimization is available for SERIALIZABLE READ ONLY
    1630              :      * DEFERRABLE transactions -- we can wait for a suitable snapshot and
    1631              :      * thereby avoid all SSI overhead once it's running.
    1632              :      */
    1633         1691 :     if (XactReadOnly && XactDeferrable)
    1634            7 :         return GetSafeSnapshot(snapshot);
    1635              : 
    1636         1684 :     return GetSerializableTransactionSnapshotInt(snapshot,
    1637              :                                                  NULL, InvalidPid);
    1638              : }
    1639              : 
    1640              : /*
    1641              :  * Import a snapshot to be used for the current transaction.
    1642              :  *
    1643              :  * This is nearly the same as GetSerializableTransactionSnapshot, except that
    1644              :  * we don't take a new snapshot, but rather use the data we're handed.
    1645              :  *
    1646              :  * The caller must have verified that the snapshot came from a serializable
    1647              :  * transaction; and if we're read-write, the source transaction must not be
    1648              :  * read-only.
    1649              :  */
    1650              : void
    1651           13 : SetSerializableTransactionSnapshot(Snapshot snapshot,
    1652              :                                    VirtualTransactionId *sourcevxid,
    1653              :                                    int sourcepid)
    1654              : {
    1655              :     Assert(IsolationIsSerializable());
    1656              : 
    1657              :     /*
    1658              :      * If this is called by parallel.c in a parallel worker, we don't want to
    1659              :      * create a SERIALIZABLEXACT just yet because the leader's
    1660              :      * SERIALIZABLEXACT will be installed with AttachSerializableXact().  We
    1661              :      * also don't want to reject SERIALIZABLE READ ONLY DEFERRABLE in this
    1662              :      * case, because the leader has already determined that the snapshot it
    1663              :      * has passed us is safe.  So there is nothing for us to do.
    1664              :      */
    1665           13 :     if (IsParallelWorker())
    1666           13 :         return;
    1667              : 
    1668              :     /*
    1669              :      * We do not allow SERIALIZABLE READ ONLY DEFERRABLE transactions to
    1670              :      * import snapshots, since there's no way to wait for a safe snapshot when
    1671              :      * we're using the snap we're told to.  (XXX instead of throwing an error,
    1672              :      * we could just ignore the XactDeferrable flag?)
    1673              :      */
    1674            0 :     if (XactReadOnly && XactDeferrable)
    1675            0 :         ereport(ERROR,
    1676              :                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    1677              :                  errmsg("a snapshot-importing transaction must not be READ ONLY DEFERRABLE")));
    1678              : 
    1679            0 :     (void) GetSerializableTransactionSnapshotInt(snapshot, sourcevxid,
    1680              :                                                  sourcepid);
    1681              : }
    1682              : 
    1683              : /*
    1684              :  * Guts of GetSerializableTransactionSnapshot
    1685              :  *
    1686              :  * If sourcevxid is valid, this is actually an import operation and we should
    1687              :  * skip calling GetSnapshotData, because the snapshot contents are already
    1688              :  * loaded up.  HOWEVER: to avoid race conditions, we must check that the
    1689              :  * source xact is still running after we acquire SerializableXactHashLock.
    1690              :  * We do that by calling ProcArrayInstallImportedXmin.
    1691              :  */
    1692              : static Snapshot
    1693         1692 : GetSerializableTransactionSnapshotInt(Snapshot snapshot,
    1694              :                                       VirtualTransactionId *sourcevxid,
    1695              :                                       int sourcepid)
    1696              : {
    1697              :     PGPROC     *proc;
    1698              :     VirtualTransactionId vxid;
    1699              :     SERIALIZABLEXACT *sxact,
    1700              :                *othersxact;
    1701              : 
    1702              :     /* We only do this for serializable transactions.  Once. */
    1703              :     Assert(MySerializableXact == InvalidSerializableXact);
    1704              : 
    1705              :     Assert(!RecoveryInProgress());
    1706              : 
    1707              :     /*
    1708              :      * Since all parts of a serializable transaction must use the same
    1709              :      * snapshot, it is too late to establish one after a parallel operation
    1710              :      * has begun.
    1711              :      */
    1712         1692 :     if (IsInParallelMode())
    1713            0 :         elog(ERROR, "cannot establish serializable snapshot during a parallel operation");
    1714              : 
    1715         1692 :     proc = MyProc;
    1716              :     Assert(proc != NULL);
    1717         1692 :     GET_VXID_FROM_PGPROC(vxid, *proc);
    1718              : 
    1719              :     /*
    1720              :      * First we get the sxact structure, which may involve looping and access
    1721              :      * to the "finished" list to free a structure for use.
    1722              :      *
    1723              :      * We must hold SerializableXactHashLock when taking/checking the snapshot
    1724              :      * to avoid race conditions, for much the same reasons that
    1725              :      * GetSnapshotData takes the ProcArrayLock.  Since we might have to
    1726              :      * release SerializableXactHashLock to call SummarizeOldestCommittedSxact,
    1727              :      * this means we have to create the sxact first, which is a bit annoying
    1728              :      * (in particular, an elog(ERROR) in procarray.c would cause us to leak
    1729              :      * the sxact).  Consider refactoring to avoid this.
    1730              :      */
    1731              : #ifdef TEST_SUMMARIZE_SERIAL
    1732              :     SummarizeOldestCommittedSxact();
    1733              : #endif
    1734         1692 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    1735              :     do
    1736              :     {
    1737         1692 :         sxact = CreatePredXact();
    1738              :         /* If null, push out committed sxact to SLRU summary & retry. */
    1739         1692 :         if (!sxact)
    1740              :         {
    1741            0 :             LWLockRelease(SerializableXactHashLock);
    1742            0 :             SummarizeOldestCommittedSxact();
    1743            0 :             LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    1744              :         }
    1745         1692 :     } while (!sxact);
    1746              : 
    1747              :     /* Get the snapshot, or check that it's safe to use */
    1748         1692 :     if (!sourcevxid)
    1749         1692 :         snapshot = GetSnapshotData(snapshot);
    1750            0 :     else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcevxid))
    1751              :     {
    1752            0 :         ReleasePredXact(sxact);
    1753            0 :         LWLockRelease(SerializableXactHashLock);
    1754            0 :         ereport(ERROR,
    1755              :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    1756              :                  errmsg("could not import the requested snapshot"),
    1757              :                  errdetail("The source process with PID %d is not running anymore.",
    1758              :                            sourcepid)));
    1759              :     }
    1760              : 
    1761              :     /*
    1762              :      * If there are no serializable transactions which are not read-only, we
    1763              :      * can "opt out" of predicate locking and conflict checking for a
    1764              :      * read-only transaction.
    1765              :      *
    1766              :      * The reason this is safe is that a read-only transaction can only become
    1767              :      * part of a dangerous structure if it overlaps a writable transaction
    1768              :      * which in turn overlaps a writable transaction which committed before
    1769              :      * the read-only transaction started.  A new writable transaction can
    1770              :      * overlap this one, but it can't meet the other condition of overlapping
    1771              :      * a transaction which committed before this one started.
    1772              :      */
    1773         1692 :     if (XactReadOnly && PredXact->WritableSxactCount == 0)
    1774              :     {
    1775          115 :         ReleasePredXact(sxact);
    1776          115 :         LWLockRelease(SerializableXactHashLock);
    1777          115 :         return snapshot;
    1778              :     }
    1779              : 
    1780              :     /* Initialize the structure. */
    1781         1577 :     sxact->vxid = vxid;
    1782         1577 :     sxact->SeqNo.lastCommitBeforeSnapshot = PredXact->LastSxactCommitSeqNo;
    1783         1577 :     sxact->prepareSeqNo = InvalidSerCommitSeqNo;
    1784         1577 :     sxact->commitSeqNo = InvalidSerCommitSeqNo;
    1785         1577 :     dlist_init(&(sxact->outConflicts));
    1786         1577 :     dlist_init(&(sxact->inConflicts));
    1787         1577 :     dlist_init(&(sxact->possibleUnsafeConflicts));
    1788         1577 :     sxact->topXid = GetTopTransactionIdIfAny();
    1789         1577 :     sxact->finishedBefore = InvalidTransactionId;
    1790         1577 :     sxact->xmin = snapshot->xmin;
    1791         1577 :     sxact->pid = MyProcPid;
    1792         1577 :     sxact->pgprocno = MyProcNumber;
    1793         1577 :     dlist_init(&sxact->predicateLocks);
    1794         1577 :     dlist_node_init(&sxact->finishedLink);
    1795         1577 :     sxact->flags = 0;
    1796         1577 :     if (XactReadOnly)
    1797              :     {
    1798              :         dlist_iter  iter;
    1799              : 
    1800          108 :         sxact->flags |= SXACT_FLAG_READ_ONLY;
    1801              : 
    1802              :         /*
    1803              :          * Register all concurrent r/w transactions as possible conflicts; if
    1804              :          * all of them commit without any outgoing conflicts to earlier
    1805              :          * transactions then this snapshot can be deemed safe (and we can run
    1806              :          * without tracking predicate locks).
    1807              :          */
    1808          472 :         dlist_foreach(iter, &PredXact->activeList)
    1809              :         {
    1810          364 :             othersxact = dlist_container(SERIALIZABLEXACT, xactLink, iter.cur);
    1811              : 
    1812          364 :             if (!SxactIsCommitted(othersxact)
    1813          243 :                 && !SxactIsDoomed(othersxact)
    1814          243 :                 && !SxactIsReadOnly(othersxact))
    1815              :             {
    1816          134 :                 SetPossibleUnsafeConflict(sxact, othersxact);
    1817              :             }
    1818              :         }
    1819              : 
    1820              :         /*
    1821              :          * If we didn't find any possibly unsafe conflicts because every
    1822              :          * uncommitted writable transaction turned out to be doomed, then we
    1823              :          * can "opt out" immediately.  See comments above the earlier check
    1824              :          * for PredXact->WritableSxactCount == 0.
    1825              :          */
    1826          108 :         if (dlist_is_empty(&sxact->possibleUnsafeConflicts))
    1827              :         {
    1828            0 :             ReleasePredXact(sxact);
    1829            0 :             LWLockRelease(SerializableXactHashLock);
    1830            0 :             return snapshot;
    1831              :         }
    1832              :     }
    1833              :     else
    1834              :     {
    1835         1469 :         ++(PredXact->WritableSxactCount);
    1836              :         Assert(PredXact->WritableSxactCount <=
    1837              :                (MaxBackends + max_prepared_xacts));
    1838              :     }
    1839              : 
    1840              :     /* Maintain serializable global xmin info. */
    1841         1577 :     if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
    1842              :     {
    1843              :         Assert(PredXact->SxactGlobalXminCount == 0);
    1844          874 :         PredXact->SxactGlobalXmin = snapshot->xmin;
    1845          874 :         PredXact->SxactGlobalXminCount = 1;
    1846          874 :         SerialSetActiveSerXmin(snapshot->xmin);
    1847              :     }
    1848          703 :     else if (TransactionIdEquals(snapshot->xmin, PredXact->SxactGlobalXmin))
    1849              :     {
    1850              :         Assert(PredXact->SxactGlobalXminCount > 0);
    1851          666 :         PredXact->SxactGlobalXminCount++;
    1852              :     }
    1853              :     else
    1854              :     {
    1855              :         Assert(TransactionIdFollows(snapshot->xmin, PredXact->SxactGlobalXmin));
    1856              :     }
    1857              : 
    1858         1577 :     MySerializableXact = sxact;
    1859         1577 :     MyXactDidWrite = false;     /* haven't written anything yet */
    1860              : 
    1861         1577 :     LWLockRelease(SerializableXactHashLock);
    1862              : 
    1863         1577 :     CreateLocalPredicateLockHash();
    1864              : 
    1865         1577 :     return snapshot;
    1866              : }
    1867              : 
    1868              : static void
    1869         1590 : CreateLocalPredicateLockHash(void)
    1870              : {
    1871              :     HASHCTL     hash_ctl;
    1872              : 
    1873              :     /* Initialize the backend-local hash table of parent locks */
    1874              :     Assert(LocalPredicateLockHash == NULL);
    1875         1590 :     hash_ctl.keysize = sizeof(PREDICATELOCKTARGETTAG);
    1876         1590 :     hash_ctl.entrysize = sizeof(LOCALPREDICATELOCK);
    1877         1590 :     LocalPredicateLockHash = hash_create("Local predicate lock",
    1878              :                                          max_predicate_locks_per_xact,
    1879              :                                          &hash_ctl,
    1880              :                                          HASH_ELEM | HASH_BLOBS);
    1881         1590 : }
    1882              : 
    1883              : /*
    1884              :  * Register the top level XID in SerializableXidHash.
    1885              :  * Also store it for easy reference in MySerializableXact.
    1886              :  */
    1887              : void
    1888       167581 : RegisterPredicateLockingXid(TransactionId xid)
    1889              : {
    1890              :     SERIALIZABLEXIDTAG sxidtag;
    1891              :     SERIALIZABLEXID *sxid;
    1892              :     bool        found;
    1893              : 
    1894              :     /*
    1895              :      * If we're not tracking predicate lock data for this transaction, we
    1896              :      * should ignore the request and return quickly.
    1897              :      */
    1898       167581 :     if (MySerializableXact == InvalidSerializableXact)
    1899       166279 :         return;
    1900              : 
    1901              :     /* We should have a valid XID and be at the top level. */
    1902              :     Assert(TransactionIdIsValid(xid));
    1903              : 
    1904         1302 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    1905              : 
    1906              :     /* This should only be done once per transaction. */
    1907              :     Assert(MySerializableXact->topXid == InvalidTransactionId);
    1908              : 
    1909         1302 :     MySerializableXact->topXid = xid;
    1910              : 
    1911         1302 :     sxidtag.xid = xid;
    1912         1302 :     sxid = (SERIALIZABLEXID *) hash_search(SerializableXidHash,
    1913              :                                            &sxidtag,
    1914              :                                            HASH_ENTER, &found);
    1915              :     Assert(!found);
    1916              : 
    1917              :     /* Initialize the structure. */
    1918         1302 :     sxid->myXact = MySerializableXact;
    1919         1302 :     LWLockRelease(SerializableXactHashLock);
    1920              : }
    1921              : 
    1922              : 
    1923              : /*
    1924              :  * Check whether there are any predicate locks held by any transaction
    1925              :  * for the page at the given block number.
    1926              :  *
    1927              :  * Note that the transaction may be completed but not yet subject to
    1928              :  * cleanup due to overlapping serializable transactions.  This must
    1929              :  * return valid information regardless of transaction isolation level.
    1930              :  *
    1931              :  * Also note that this doesn't check for a conflicting relation lock,
    1932              :  * just a lock specifically on the given page.
    1933              :  *
    1934              :  * One use is to support proper behavior during GiST index vacuum.
    1935              :  */
    1936              : bool
    1937            0 : PageIsPredicateLocked(Relation relation, BlockNumber blkno)
    1938              : {
    1939              :     PREDICATELOCKTARGETTAG targettag;
    1940              :     uint32      targettaghash;
    1941              :     LWLock     *partitionLock;
    1942              :     PREDICATELOCKTARGET *target;
    1943              : 
    1944            0 :     SET_PREDICATELOCKTARGETTAG_PAGE(targettag,
    1945              :                                     relation->rd_locator.dbOid,
    1946              :                                     relation->rd_id,
    1947              :                                     blkno);
    1948              : 
    1949            0 :     targettaghash = PredicateLockTargetTagHashCode(&targettag);
    1950            0 :     partitionLock = PredicateLockHashPartitionLock(targettaghash);
    1951            0 :     LWLockAcquire(partitionLock, LW_SHARED);
    1952              :     target = (PREDICATELOCKTARGET *)
    1953            0 :         hash_search_with_hash_value(PredicateLockTargetHash,
    1954              :                                     &targettag, targettaghash,
    1955              :                                     HASH_FIND, NULL);
    1956            0 :     LWLockRelease(partitionLock);
    1957              : 
    1958            0 :     return (target != NULL);
    1959              : }
    1960              : 
    1961              : 
    1962              : /*
    1963              :  * Check whether a particular lock is held by this transaction.
    1964              :  *
    1965              :  * Important note: this function may return false even if the lock is
    1966              :  * being held, because it uses the local lock table which is not
    1967              :  * updated if another transaction modifies our lock list (e.g. to
    1968              :  * split an index page). It can also return true when a coarser
    1969              :  * granularity lock that covers this target is being held. Be careful
    1970              :  * to only use this function in circumstances where such errors are
    1971              :  * acceptable!
    1972              :  */
    1973              : static bool
    1974        41896 : PredicateLockExists(const PREDICATELOCKTARGETTAG *targettag)
    1975              : {
    1976              :     LOCALPREDICATELOCK *lock;
    1977              : 
    1978              :     /* check local hash table */
    1979        41896 :     lock = (LOCALPREDICATELOCK *) hash_search(LocalPredicateLockHash,
    1980              :                                               targettag,
    1981              :                                               HASH_FIND, NULL);
    1982              : 
    1983        41896 :     if (!lock)
    1984        12752 :         return false;
    1985              : 
    1986              :     /*
    1987              :      * Found entry in the table, but still need to check whether it's actually
    1988              :      * held -- it could just be a parent of some held lock.
    1989              :      */
    1990        29144 :     return lock->held;
    1991              : }
    1992              : 
    1993              : /*
    1994              :  * Return the parent lock tag in the lock hierarchy: the next coarser
    1995              :  * lock that covers the provided tag.
    1996              :  *
    1997              :  * Returns true and sets *parent to the parent tag if one exists,
    1998              :  * returns false if none exists.
    1999              :  */
    2000              : static bool
    2001        25390 : GetParentPredicateLockTag(const PREDICATELOCKTARGETTAG *tag,
    2002              :                           PREDICATELOCKTARGETTAG *parent)
    2003              : {
    2004        25390 :     switch (GET_PREDICATELOCKTARGETTAG_TYPE(*tag))
    2005              :     {
    2006         8967 :         case PREDLOCKTAG_RELATION:
    2007              :             /* relation locks have no parent lock */
    2008         8967 :             return false;
    2009              : 
    2010         7553 :         case PREDLOCKTAG_PAGE:
    2011              :             /* parent lock is relation lock */
    2012         7553 :             SET_PREDICATELOCKTARGETTAG_RELATION(*parent,
    2013              :                                                 GET_PREDICATELOCKTARGETTAG_DB(*tag),
    2014              :                                                 GET_PREDICATELOCKTARGETTAG_RELATION(*tag));
    2015              : 
    2016         7553 :             return true;
    2017              : 
    2018         8870 :         case PREDLOCKTAG_TUPLE:
    2019              :             /* parent lock is page lock */
    2020         8870 :             SET_PREDICATELOCKTARGETTAG_PAGE(*parent,
    2021              :                                             GET_PREDICATELOCKTARGETTAG_DB(*tag),
    2022              :                                             GET_PREDICATELOCKTARGETTAG_RELATION(*tag),
    2023              :                                             GET_PREDICATELOCKTARGETTAG_PAGE(*tag));
    2024         8870 :             return true;
    2025              :     }
    2026              : 
    2027              :     /* not reachable */
    2028              :     Assert(false);
    2029            0 :     return false;
    2030              : }
    2031              : 
    2032              : /*
    2033              :  * Check whether the lock we are considering is already covered by a
    2034              :  * coarser lock for our transaction.
    2035              :  *
    2036              :  * Like PredicateLockExists, this function might return a false
    2037              :  * negative, but it will never return a false positive.
    2038              :  */
    2039              : static bool
    2040         8608 : CoarserLockCovers(const PREDICATELOCKTARGETTAG *newtargettag)
    2041              : {
    2042              :     PREDICATELOCKTARGETTAG targettag,
    2043              :                 parenttag;
    2044              : 
    2045         8608 :     targettag = *newtargettag;
    2046              : 
    2047              :     /* check parents iteratively until no more */
    2048        13416 :     while (GetParentPredicateLockTag(&targettag, &parenttag))
    2049              :     {
    2050         9469 :         targettag = parenttag;
    2051         9469 :         if (PredicateLockExists(&targettag))
    2052         4661 :             return true;
    2053              :     }
    2054              : 
    2055              :     /* no more parents to check; lock is not covered */
    2056         3947 :     return false;
    2057              : }
    2058              : 
    2059              : /*
    2060              :  * Remove the dummy entry from the predicate lock target hash, to free up some
    2061              :  * scratch space. The caller must be holding SerializablePredicateListLock,
    2062              :  * and must restore the entry with RestoreScratchTarget() before releasing the
    2063              :  * lock.
    2064              :  *
    2065              :  * If lockheld is true, the caller is already holding the partition lock
    2066              :  * of the partition containing the scratch entry.
    2067              :  */
    2068              : static void
    2069           45 : RemoveScratchTarget(bool lockheld)
    2070              : {
    2071              :     bool        found;
    2072              : 
    2073              :     Assert(LWLockHeldByMe(SerializablePredicateListLock));
    2074              : 
    2075           45 :     if (!lockheld)
    2076            0 :         LWLockAcquire(ScratchPartitionLock, LW_EXCLUSIVE);
    2077           45 :     hash_search_with_hash_value(PredicateLockTargetHash,
    2078              :                                 &ScratchTargetTag,
    2079              :                                 ScratchTargetTagHash,
    2080              :                                 HASH_REMOVE, &found);
    2081              :     Assert(found);
    2082           45 :     if (!lockheld)
    2083            0 :         LWLockRelease(ScratchPartitionLock);
    2084           45 : }
    2085              : 
    2086              : /*
    2087              :  * Re-insert the dummy entry in predicate lock target hash.
    2088              :  */
    2089              : static void
    2090           45 : RestoreScratchTarget(bool lockheld)
    2091              : {
    2092              :     bool        found;
    2093              : 
    2094              :     Assert(LWLockHeldByMe(SerializablePredicateListLock));
    2095              : 
    2096           45 :     if (!lockheld)
    2097            0 :         LWLockAcquire(ScratchPartitionLock, LW_EXCLUSIVE);
    2098           45 :     hash_search_with_hash_value(PredicateLockTargetHash,
    2099              :                                 &ScratchTargetTag,
    2100              :                                 ScratchTargetTagHash,
    2101              :                                 HASH_ENTER, &found);
    2102              :     Assert(!found);
    2103           45 :     if (!lockheld)
    2104            0 :         LWLockRelease(ScratchPartitionLock);
    2105           45 : }
    2106              : 
    2107              : /*
    2108              :  * Check whether the list of related predicate locks is empty for a
    2109              :  * predicate lock target, and remove the target if it is.
    2110              :  */
    2111              : static void
    2112         3941 : RemoveTargetIfNoLongerUsed(PREDICATELOCKTARGET *target, uint32 targettaghash)
    2113              : {
    2114              :     PREDICATELOCKTARGET *rmtarget PG_USED_FOR_ASSERTS_ONLY;
    2115              : 
    2116              :     Assert(LWLockHeldByMe(SerializablePredicateListLock));
    2117              : 
    2118              :     /* Can't remove it until no locks at this target. */
    2119         3941 :     if (!dlist_is_empty(&target->predicateLocks))
    2120          973 :         return;
    2121              : 
    2122              :     /* Actually remove the target. */
    2123         2968 :     rmtarget = hash_search_with_hash_value(PredicateLockTargetHash,
    2124         2968 :                                            &target->tag,
    2125              :                                            targettaghash,
    2126              :                                            HASH_REMOVE, NULL);
    2127              :     Assert(rmtarget == target);
    2128              : }
    2129              : 
    2130              : /*
    2131              :  * Delete child target locks owned by this process.
    2132              :  * This implementation is assuming that the usage of each target tag field
    2133              :  * is uniform.  No need to make this hard if we don't have to.
    2134              :  *
    2135              :  * We acquire an LWLock in the case of parallel mode, because worker
    2136              :  * backends have access to the leader's SERIALIZABLEXACT.  Otherwise,
    2137              :  * we aren't acquiring LWLocks for the predicate lock or lock
    2138              :  * target structures associated with this transaction unless we're going
    2139              :  * to modify them, because no other process is permitted to modify our
    2140              :  * locks.
    2141              :  */
    2142              : static void
    2143         2379 : DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
    2144              : {
    2145              :     SERIALIZABLEXACT *sxact;
    2146              :     PREDICATELOCK *predlock;
    2147              :     dlist_mutable_iter iter;
    2148              : 
    2149         2379 :     LWLockAcquire(SerializablePredicateListLock, LW_SHARED);
    2150         2379 :     sxact = MySerializableXact;
    2151         2379 :     if (IsInParallelMode())
    2152           11 :         LWLockAcquire(&sxact->perXactPredicateListLock, LW_EXCLUSIVE);
    2153              : 
    2154         7527 :     dlist_foreach_modify(iter, &sxact->predicateLocks)
    2155              :     {
    2156              :         PREDICATELOCKTAG oldlocktag;
    2157              :         PREDICATELOCKTARGET *oldtarget;
    2158              :         PREDICATELOCKTARGETTAG oldtargettag;
    2159              : 
    2160         5148 :         predlock = dlist_container(PREDICATELOCK, xactLink, iter.cur);
    2161              : 
    2162         5148 :         oldlocktag = predlock->tag;
    2163              :         Assert(oldlocktag.myXact == sxact);
    2164         5148 :         oldtarget = oldlocktag.myTarget;
    2165         5148 :         oldtargettag = oldtarget->tag;
    2166              : 
    2167         5148 :         if (TargetTagIsCoveredBy(oldtargettag, *newtargettag))
    2168              :         {
    2169              :             uint32      oldtargettaghash;
    2170              :             LWLock     *partitionLock;
    2171              :             PREDICATELOCK *rmpredlock PG_USED_FOR_ASSERTS_ONLY;
    2172              : 
    2173          679 :             oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
    2174          679 :             partitionLock = PredicateLockHashPartitionLock(oldtargettaghash);
    2175              : 
    2176          679 :             LWLockAcquire(partitionLock, LW_EXCLUSIVE);
    2177              : 
    2178          679 :             dlist_delete(&predlock->xactLink);
    2179          679 :             dlist_delete(&predlock->targetLink);
    2180          679 :             rmpredlock = hash_search_with_hash_value
    2181              :                 (PredicateLockHash,
    2182              :                  &oldlocktag,
    2183          679 :                  PredicateLockHashCodeFromTargetHashCode(&oldlocktag,
    2184              :                                                          oldtargettaghash),
    2185              :                  HASH_REMOVE, NULL);
    2186              :             Assert(rmpredlock == predlock);
    2187              : 
    2188          679 :             RemoveTargetIfNoLongerUsed(oldtarget, oldtargettaghash);
    2189              : 
    2190          679 :             LWLockRelease(partitionLock);
    2191              : 
    2192          679 :             DecrementParentLocks(&oldtargettag);
    2193              :         }
    2194              :     }
    2195         2379 :     if (IsInParallelMode())
    2196           11 :         LWLockRelease(&sxact->perXactPredicateListLock);
    2197         2379 :     LWLockRelease(SerializablePredicateListLock);
    2198         2379 : }
    2199              : 
    2200              : /*
    2201              :  * Returns the promotion limit for a given predicate lock target.  This is the
    2202              :  * max number of descendant locks allowed before promoting to the specified
    2203              :  * tag. Note that the limit includes non-direct descendants (e.g., both tuples
    2204              :  * and pages for a relation lock).
    2205              :  *
    2206              :  * Currently the default limit is 2 for a page lock, and half of the value of
    2207              :  * max_pred_locks_per_transaction - 1 for a relation lock, to match behavior
    2208              :  * of earlier releases when upgrading.
    2209              :  *
    2210              :  * TODO SSI: We should probably add additional GUCs to allow a maximum ratio
    2211              :  * of page and tuple locks based on the pages in a relation, and the maximum
    2212              :  * ratio of tuple locks to tuples in a page.  This would provide more
    2213              :  * generally "balanced" allocation of locks to where they are most useful,
    2214              :  * while still allowing the absolute numbers to prevent one relation from
    2215              :  * tying up all predicate lock resources.
    2216              :  */
    2217              : static int
    2218         4808 : MaxPredicateChildLocks(const PREDICATELOCKTARGETTAG *tag)
    2219              : {
    2220         4808 :     switch (GET_PREDICATELOCKTARGETTAG_TYPE(*tag))
    2221              :     {
    2222         3240 :         case PREDLOCKTAG_RELATION:
    2223         3240 :             return max_predicate_locks_per_relation < 0
    2224              :                 ? (max_predicate_locks_per_xact
    2225         3240 :                    / (-max_predicate_locks_per_relation)) - 1
    2226         3240 :                 : max_predicate_locks_per_relation;
    2227              : 
    2228         1568 :         case PREDLOCKTAG_PAGE:
    2229         1568 :             return max_predicate_locks_per_page;
    2230              : 
    2231            0 :         case PREDLOCKTAG_TUPLE:
    2232              : 
    2233              :             /*
    2234              :              * not reachable: nothing is finer-granularity than a tuple, so we
    2235              :              * should never try to promote to it.
    2236              :              */
    2237              :             Assert(false);
    2238            0 :             return 0;
    2239              :     }
    2240              : 
    2241              :     /* not reachable */
    2242              :     Assert(false);
    2243            0 :     return 0;
    2244              : }
    2245              : 
    2246              : /*
    2247              :  * For all ancestors of a newly-acquired predicate lock, increment
    2248              :  * their child count in the parent hash table. If any of them have
    2249              :  * more descendants than their promotion threshold, acquire the
    2250              :  * coarsest such lock.
    2251              :  *
    2252              :  * Returns true if a parent lock was acquired and false otherwise.
    2253              :  */
    2254              : static bool
    2255         3947 : CheckAndPromotePredicateLockRequest(const PREDICATELOCKTARGETTAG *reqtag)
    2256              : {
    2257              :     PREDICATELOCKTARGETTAG targettag,
    2258              :                 nexttag,
    2259              :                 promotiontag;
    2260              :     LOCALPREDICATELOCK *parentlock;
    2261              :     bool        found,
    2262              :                 promote;
    2263              : 
    2264         3947 :     promote = false;
    2265              : 
    2266         3947 :     targettag = *reqtag;
    2267              : 
    2268              :     /* check parents iteratively */
    2269        12702 :     while (GetParentPredicateLockTag(&targettag, &nexttag))
    2270              :     {
    2271         4808 :         targettag = nexttag;
    2272         4808 :         parentlock = (LOCALPREDICATELOCK *) hash_search(LocalPredicateLockHash,
    2273              :                                                         &targettag,
    2274              :                                                         HASH_ENTER,
    2275              :                                                         &found);
    2276         4808 :         if (!found)
    2277              :         {
    2278         3387 :             parentlock->held = false;
    2279         3387 :             parentlock->childLocks = 1;
    2280              :         }
    2281              :         else
    2282         1421 :             parentlock->childLocks++;
    2283              : 
    2284         4808 :         if (parentlock->childLocks >
    2285         4808 :             MaxPredicateChildLocks(&targettag))
    2286              :         {
    2287              :             /*
    2288              :              * We should promote to this parent lock. Continue to check its
    2289              :              * ancestors, however, both to get their child counts right and to
    2290              :              * check whether we should just go ahead and promote to one of
    2291              :              * them.
    2292              :              */
    2293          173 :             promotiontag = targettag;
    2294          173 :             promote = true;
    2295              :         }
    2296              :     }
    2297              : 
    2298         3947 :     if (promote)
    2299              :     {
    2300              :         /* acquire coarsest ancestor eligible for promotion */
    2301          173 :         PredicateLockAcquire(&promotiontag);
    2302          173 :         return true;
    2303              :     }
    2304              :     else
    2305         3774 :         return false;
    2306              : }
    2307              : 
    2308              : /*
    2309              :  * When releasing a lock, decrement the child count on all ancestor
    2310              :  * locks.
    2311              :  *
    2312              :  * This is called only when releasing a lock via
    2313              :  * DeleteChildTargetLocks (i.e. when a lock becomes redundant because
    2314              :  * we've acquired its parent, possibly due to promotion) or when a new
    2315              :  * MVCC write lock makes the predicate lock unnecessary. There's no
    2316              :  * point in calling it when locks are released at transaction end, as
    2317              :  * this information is no longer needed.
    2318              :  */
    2319              : static void
    2320         1073 : DecrementParentLocks(const PREDICATELOCKTARGETTAG *targettag)
    2321              : {
    2322              :     PREDICATELOCKTARGETTAG parenttag,
    2323              :                 nexttag;
    2324              : 
    2325         1073 :     parenttag = *targettag;
    2326              : 
    2327         3219 :     while (GetParentPredicateLockTag(&parenttag, &nexttag))
    2328              :     {
    2329              :         uint32      targettaghash;
    2330              :         LOCALPREDICATELOCK *parentlock,
    2331              :                    *rmlock PG_USED_FOR_ASSERTS_ONLY;
    2332              : 
    2333         2146 :         parenttag = nexttag;
    2334         2146 :         targettaghash = PredicateLockTargetTagHashCode(&parenttag);
    2335              :         parentlock = (LOCALPREDICATELOCK *)
    2336         2146 :             hash_search_with_hash_value(LocalPredicateLockHash,
    2337              :                                         &parenttag, targettaghash,
    2338              :                                         HASH_FIND, NULL);
    2339              : 
    2340              :         /*
    2341              :          * There's a small chance the parent lock doesn't exist in the lock
    2342              :          * table. This can happen if we prematurely removed it because an
    2343              :          * index split caused the child refcount to be off.
    2344              :          */
    2345         2146 :         if (parentlock == NULL)
    2346            0 :             continue;
    2347              : 
    2348         2146 :         parentlock->childLocks--;
    2349              : 
    2350              :         /*
    2351              :          * Under similar circumstances the parent lock's refcount might be
    2352              :          * zero. This only happens if we're holding that lock (otherwise we
    2353              :          * would have removed the entry).
    2354              :          */
    2355         2146 :         if (parentlock->childLocks < 0)
    2356              :         {
    2357              :             Assert(parentlock->held);
    2358            0 :             parentlock->childLocks = 0;
    2359              :         }
    2360              : 
    2361         2146 :         if ((parentlock->childLocks == 0) && (!parentlock->held))
    2362              :         {
    2363              :             rmlock = (LOCALPREDICATELOCK *)
    2364          776 :                 hash_search_with_hash_value(LocalPredicateLockHash,
    2365              :                                             &parenttag, targettaghash,
    2366              :                                             HASH_REMOVE, NULL);
    2367              :             Assert(rmlock == parentlock);
    2368              :         }
    2369              :     }
    2370         1073 : }
    2371              : 
    2372              : /*
    2373              :  * Indicate that a predicate lock on the given target is held by the
    2374              :  * specified transaction. Has no effect if the lock is already held.
    2375              :  *
    2376              :  * This updates the lock table and the sxact's lock list, and creates
    2377              :  * the lock target if necessary, but does *not* do anything related to
    2378              :  * granularity promotion or the local lock table. See
    2379              :  * PredicateLockAcquire for that.
    2380              :  */
    2381              : static void
    2382         3947 : CreatePredicateLock(const PREDICATELOCKTARGETTAG *targettag,
    2383              :                     uint32 targettaghash,
    2384              :                     SERIALIZABLEXACT *sxact)
    2385              : {
    2386              :     PREDICATELOCKTARGET *target;
    2387              :     PREDICATELOCKTAG locktag;
    2388              :     PREDICATELOCK *lock;
    2389              :     LWLock     *partitionLock;
    2390              :     bool        found;
    2391              : 
    2392         3947 :     partitionLock = PredicateLockHashPartitionLock(targettaghash);
    2393              : 
    2394         3947 :     LWLockAcquire(SerializablePredicateListLock, LW_SHARED);
    2395         3947 :     if (IsInParallelMode())
    2396           16 :         LWLockAcquire(&sxact->perXactPredicateListLock, LW_EXCLUSIVE);
    2397         3947 :     LWLockAcquire(partitionLock, LW_EXCLUSIVE);
    2398              : 
    2399              :     /* Make sure that the target is represented. */
    2400              :     target = (PREDICATELOCKTARGET *)
    2401         3947 :         hash_search_with_hash_value(PredicateLockTargetHash,
    2402              :                                     targettag, targettaghash,
    2403              :                                     HASH_ENTER_NULL, &found);
    2404         3947 :     if (!target)
    2405            0 :         ereport(ERROR,
    2406              :                 (errcode(ERRCODE_OUT_OF_MEMORY),
    2407              :                  errmsg("out of shared memory"),
    2408              :                  errhint("You might need to increase \"%s\".", "max_pred_locks_per_transaction")));
    2409         3947 :     if (!found)
    2410         2968 :         dlist_init(&target->predicateLocks);
    2411              : 
    2412              :     /* We've got the sxact and target, make sure they're joined. */
    2413         3947 :     locktag.myTarget = target;
    2414         3947 :     locktag.myXact = sxact;
    2415              :     lock = (PREDICATELOCK *)
    2416         3947 :         hash_search_with_hash_value(PredicateLockHash, &locktag,
    2417         3947 :                                     PredicateLockHashCodeFromTargetHashCode(&locktag, targettaghash),
    2418              :                                     HASH_ENTER_NULL, &found);
    2419         3947 :     if (!lock)
    2420            0 :         ereport(ERROR,
    2421              :                 (errcode(ERRCODE_OUT_OF_MEMORY),
    2422              :                  errmsg("out of shared memory"),
    2423              :                  errhint("You might need to increase \"%s\".", "max_pred_locks_per_transaction")));
    2424              : 
    2425         3947 :     if (!found)
    2426              :     {
    2427         3941 :         dlist_push_tail(&target->predicateLocks, &lock->targetLink);
    2428         3941 :         dlist_push_tail(&sxact->predicateLocks, &lock->xactLink);
    2429         3941 :         lock->commitSeqNo = InvalidSerCommitSeqNo;
    2430              :     }
    2431              : 
    2432         3947 :     LWLockRelease(partitionLock);
    2433         3947 :     if (IsInParallelMode())
    2434           16 :         LWLockRelease(&sxact->perXactPredicateListLock);
    2435         3947 :     LWLockRelease(SerializablePredicateListLock);
    2436         3947 : }
    2437              : 
    2438              : /*
    2439              :  * Acquire a predicate lock on the specified target for the current
    2440              :  * connection if not already held. This updates the local lock table
    2441              :  * and uses it to implement granularity promotion. It will consolidate
    2442              :  * multiple locks into a coarser lock if warranted, and will release
    2443              :  * any finer-grained locks covered by the new one.
    2444              :  */
    2445              : static void
    2446        26131 : PredicateLockAcquire(const PREDICATELOCKTARGETTAG *targettag)
    2447              : {
    2448              :     uint32      targettaghash;
    2449              :     bool        found;
    2450              :     LOCALPREDICATELOCK *locallock;
    2451              : 
    2452              :     /* Do we have the lock already, or a covering lock? */
    2453        26131 :     if (PredicateLockExists(targettag))
    2454        22184 :         return;
    2455              : 
    2456         8608 :     if (CoarserLockCovers(targettag))
    2457         4661 :         return;
    2458              : 
    2459              :     /* the same hash and LW lock apply to the lock target and the local lock. */
    2460         3947 :     targettaghash = PredicateLockTargetTagHashCode(targettag);
    2461              : 
    2462              :     /* Acquire lock in local table */
    2463              :     locallock = (LOCALPREDICATELOCK *)
    2464         3947 :         hash_search_with_hash_value(LocalPredicateLockHash,
    2465              :                                     targettag, targettaghash,
    2466              :                                     HASH_ENTER, &found);
    2467         3947 :     locallock->held = true;
    2468         3947 :     if (!found)
    2469         3614 :         locallock->childLocks = 0;
    2470              : 
    2471              :     /* Actually create the lock */
    2472         3947 :     CreatePredicateLock(targettag, targettaghash, MySerializableXact);
    2473              : 
    2474              :     /*
    2475              :      * Lock has been acquired. Check whether it should be promoted to a
    2476              :      * coarser granularity, or whether there are finer-granularity locks to
    2477              :      * clean up.
    2478              :      */
    2479         3947 :     if (CheckAndPromotePredicateLockRequest(targettag))
    2480              :     {
    2481              :         /*
    2482              :          * Lock request was promoted to a coarser-granularity lock, and that
    2483              :          * lock was acquired. It will delete this lock and any of its
    2484              :          * children, so we're done.
    2485              :          */
    2486              :     }
    2487              :     else
    2488              :     {
    2489              :         /* Clean up any finer-granularity locks */
    2490         3774 :         if (GET_PREDICATELOCKTARGETTAG_TYPE(*targettag) != PREDLOCKTAG_TUPLE)
    2491         2379 :             DeleteChildTargetLocks(targettag);
    2492              :     }
    2493              : }
    2494              : 
    2495              : 
    2496              : /*
    2497              :  *      PredicateLockRelation
    2498              :  *
    2499              :  * Gets a predicate lock at the relation level.
    2500              :  * Skip if not in full serializable transaction isolation level.
    2501              :  * Skip if this is a temporary table.
    2502              :  * Clear any finer-grained predicate locks this session has on the relation.
    2503              :  */
    2504              : void
    2505       472979 : PredicateLockRelation(Relation relation, Snapshot snapshot)
    2506              : {
    2507              :     PREDICATELOCKTARGETTAG tag;
    2508              : 
    2509       472979 :     if (!SerializationNeededForRead(relation, snapshot))
    2510       472253 :         return;
    2511              : 
    2512          726 :     SET_PREDICATELOCKTARGETTAG_RELATION(tag,
    2513              :                                         relation->rd_locator.dbOid,
    2514              :                                         relation->rd_id);
    2515          726 :     PredicateLockAcquire(&tag);
    2516              : }
    2517              : 
    2518              : /*
    2519              :  *      PredicateLockPage
    2520              :  *
    2521              :  * Gets a predicate lock at the page level.
    2522              :  * Skip if not in full serializable transaction isolation level.
    2523              :  * Skip if this is a temporary table.
    2524              :  * Skip if a coarser predicate lock already covers this page.
    2525              :  * Clear any finer-grained predicate locks this session has on the relation.
    2526              :  */
    2527              : void
    2528     13314540 : PredicateLockPage(Relation relation, BlockNumber blkno, Snapshot snapshot)
    2529              : {
    2530              :     PREDICATELOCKTARGETTAG tag;
    2531              : 
    2532     13314540 :     if (!SerializationNeededForRead(relation, snapshot))
    2533     13295604 :         return;
    2534              : 
    2535        18936 :     SET_PREDICATELOCKTARGETTAG_PAGE(tag,
    2536              :                                     relation->rd_locator.dbOid,
    2537              :                                     relation->rd_id,
    2538              :                                     blkno);
    2539        18936 :     PredicateLockAcquire(&tag);
    2540              : }
    2541              : 
    2542              : /*
    2543              :  *      PredicateLockTID
    2544              :  *
    2545              :  * Gets a predicate lock at the tuple level.
    2546              :  * Skip if not in full serializable transaction isolation level.
    2547              :  * Skip if this is a temporary table.
    2548              :  */
    2549              : void
    2550     23569303 : PredicateLockTID(Relation relation, const ItemPointerData *tid, Snapshot snapshot,
    2551              :                  TransactionId tuple_xid)
    2552              : {
    2553              :     PREDICATELOCKTARGETTAG tag;
    2554              : 
    2555     23569303 :     if (!SerializationNeededForRead(relation, snapshot))
    2556     23563007 :         return;
    2557              : 
    2558              :     /*
    2559              :      * Return if this xact wrote it.
    2560              :      */
    2561         6298 :     if (relation->rd_index == NULL)
    2562              :     {
    2563              :         /* If we wrote it; we already have a write lock. */
    2564         6298 :         if (TransactionIdIsCurrentTransactionId(tuple_xid))
    2565            2 :             return;
    2566              :     }
    2567              : 
    2568              :     /*
    2569              :      * Do quick-but-not-definitive test for a relation lock first.  This will
    2570              :      * never cause a return when the relation is *not* locked, but will
    2571              :      * occasionally let the check continue when there really *is* a relation
    2572              :      * level lock.
    2573              :      */
    2574         6296 :     SET_PREDICATELOCKTARGETTAG_RELATION(tag,
    2575              :                                         relation->rd_locator.dbOid,
    2576              :                                         relation->rd_id);
    2577         6296 :     if (PredicateLockExists(&tag))
    2578            0 :         return;
    2579              : 
    2580         6296 :     SET_PREDICATELOCKTARGETTAG_TUPLE(tag,
    2581              :                                      relation->rd_locator.dbOid,
    2582              :                                      relation->rd_id,
    2583              :                                      ItemPointerGetBlockNumber(tid),
    2584              :                                      ItemPointerGetOffsetNumber(tid));
    2585         6296 :     PredicateLockAcquire(&tag);
    2586              : }
    2587              : 
    2588              : 
    2589              : /*
    2590              :  *      DeleteLockTarget
    2591              :  *
    2592              :  * Remove a predicate lock target along with any locks held for it.
    2593              :  *
    2594              :  * Caller must hold SerializablePredicateListLock and the
    2595              :  * appropriate hash partition lock for the target.
    2596              :  */
    2597              : static void
    2598            0 : DeleteLockTarget(PREDICATELOCKTARGET *target, uint32 targettaghash)
    2599              : {
    2600              :     dlist_mutable_iter iter;
    2601              : 
    2602              :     Assert(LWLockHeldByMeInMode(SerializablePredicateListLock,
    2603              :                                 LW_EXCLUSIVE));
    2604              :     Assert(LWLockHeldByMe(PredicateLockHashPartitionLock(targettaghash)));
    2605              : 
    2606            0 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    2607              : 
    2608            0 :     dlist_foreach_modify(iter, &target->predicateLocks)
    2609              :     {
    2610            0 :         PREDICATELOCK *predlock =
    2611            0 :             dlist_container(PREDICATELOCK, targetLink, iter.cur);
    2612              :         bool        found;
    2613              : 
    2614            0 :         dlist_delete(&(predlock->xactLink));
    2615            0 :         dlist_delete(&(predlock->targetLink));
    2616              : 
    2617            0 :         hash_search_with_hash_value
    2618              :             (PredicateLockHash,
    2619            0 :              &predlock->tag,
    2620            0 :              PredicateLockHashCodeFromTargetHashCode(&predlock->tag,
    2621              :                                                      targettaghash),
    2622              :              HASH_REMOVE, &found);
    2623              :         Assert(found);
    2624              :     }
    2625            0 :     LWLockRelease(SerializableXactHashLock);
    2626              : 
    2627              :     /* Remove the target itself, if possible. */
    2628            0 :     RemoveTargetIfNoLongerUsed(target, targettaghash);
    2629            0 : }
    2630              : 
    2631              : 
    2632              : /*
    2633              :  *      TransferPredicateLocksToNewTarget
    2634              :  *
    2635              :  * Move or copy all the predicate locks for a lock target, for use by
    2636              :  * index page splits/combines and other things that create or replace
    2637              :  * lock targets. If 'removeOld' is true, the old locks and the target
    2638              :  * will be removed.
    2639              :  *
    2640              :  * Returns true on success, or false if we ran out of shared memory to
    2641              :  * allocate the new target or locks. Guaranteed to always succeed if
    2642              :  * removeOld is set (by using the scratch entry in PredicateLockTargetHash
    2643              :  * for scratch space).
    2644              :  *
    2645              :  * Warning: the "removeOld" option should be used only with care,
    2646              :  * because this function does not (indeed, can not) update other
    2647              :  * backends' LocalPredicateLockHash. If we are only adding new
    2648              :  * entries, this is not a problem: the local lock table is used only
    2649              :  * as a hint, so missing entries for locks that are held are
    2650              :  * OK. Having entries for locks that are no longer held, as can happen
    2651              :  * when using "removeOld", is not in general OK. We can only use it
    2652              :  * safely when replacing a lock with a coarser-granularity lock that
    2653              :  * covers it, or if we are absolutely certain that no one will need to
    2654              :  * refer to that lock in the future.
    2655              :  *
    2656              :  * Caller must hold SerializablePredicateListLock exclusively.
    2657              :  */
    2658              : static bool
    2659            3 : TransferPredicateLocksToNewTarget(PREDICATELOCKTARGETTAG oldtargettag,
    2660              :                                   PREDICATELOCKTARGETTAG newtargettag,
    2661              :                                   bool removeOld)
    2662              : {
    2663              :     uint32      oldtargettaghash;
    2664              :     LWLock     *oldpartitionLock;
    2665              :     PREDICATELOCKTARGET *oldtarget;
    2666              :     uint32      newtargettaghash;
    2667              :     LWLock     *newpartitionLock;
    2668              :     bool        found;
    2669            3 :     bool        outOfShmem = false;
    2670              : 
    2671              :     Assert(LWLockHeldByMeInMode(SerializablePredicateListLock,
    2672              :                                 LW_EXCLUSIVE));
    2673              : 
    2674            3 :     oldtargettaghash = PredicateLockTargetTagHashCode(&oldtargettag);
    2675            3 :     newtargettaghash = PredicateLockTargetTagHashCode(&newtargettag);
    2676            3 :     oldpartitionLock = PredicateLockHashPartitionLock(oldtargettaghash);
    2677            3 :     newpartitionLock = PredicateLockHashPartitionLock(newtargettaghash);
    2678              : 
    2679            3 :     if (removeOld)
    2680              :     {
    2681              :         /*
    2682              :          * Remove the dummy entry to give us scratch space, so we know we'll
    2683              :          * be able to create the new lock target.
    2684              :          */
    2685            0 :         RemoveScratchTarget(false);
    2686              :     }
    2687              : 
    2688              :     /*
    2689              :      * We must get the partition locks in ascending sequence to avoid
    2690              :      * deadlocks. If old and new partitions are the same, we must request the
    2691              :      * lock only once.
    2692              :      */
    2693            3 :     if (oldpartitionLock < newpartitionLock)
    2694              :     {
    2695            3 :         LWLockAcquire(oldpartitionLock,
    2696            3 :                       (removeOld ? LW_EXCLUSIVE : LW_SHARED));
    2697            3 :         LWLockAcquire(newpartitionLock, LW_EXCLUSIVE);
    2698              :     }
    2699            0 :     else if (oldpartitionLock > newpartitionLock)
    2700              :     {
    2701            0 :         LWLockAcquire(newpartitionLock, LW_EXCLUSIVE);
    2702            0 :         LWLockAcquire(oldpartitionLock,
    2703            0 :                       (removeOld ? LW_EXCLUSIVE : LW_SHARED));
    2704              :     }
    2705              :     else
    2706            0 :         LWLockAcquire(newpartitionLock, LW_EXCLUSIVE);
    2707              : 
    2708              :     /*
    2709              :      * Look for the old target.  If not found, that's OK; no predicate locks
    2710              :      * are affected, so we can just clean up and return. If it does exist,
    2711              :      * walk its list of predicate locks and move or copy them to the new
    2712              :      * target.
    2713              :      */
    2714            3 :     oldtarget = hash_search_with_hash_value(PredicateLockTargetHash,
    2715              :                                             &oldtargettag,
    2716              :                                             oldtargettaghash,
    2717              :                                             HASH_FIND, NULL);
    2718              : 
    2719            3 :     if (oldtarget)
    2720              :     {
    2721              :         PREDICATELOCKTARGET *newtarget;
    2722              :         PREDICATELOCKTAG newpredlocktag;
    2723              :         dlist_mutable_iter iter;
    2724              : 
    2725            0 :         newtarget = hash_search_with_hash_value(PredicateLockTargetHash,
    2726              :                                                 &newtargettag,
    2727              :                                                 newtargettaghash,
    2728              :                                                 HASH_ENTER_NULL, &found);
    2729              : 
    2730            0 :         if (!newtarget)
    2731              :         {
    2732              :             /* Failed to allocate due to insufficient shmem */
    2733            0 :             outOfShmem = true;
    2734            0 :             goto exit;
    2735              :         }
    2736              : 
    2737              :         /* If we created a new entry, initialize it */
    2738            0 :         if (!found)
    2739            0 :             dlist_init(&newtarget->predicateLocks);
    2740              : 
    2741            0 :         newpredlocktag.myTarget = newtarget;
    2742              : 
    2743              :         /*
    2744              :          * Loop through all the locks on the old target, replacing them with
    2745              :          * locks on the new target.
    2746              :          */
    2747            0 :         LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    2748              : 
    2749            0 :         dlist_foreach_modify(iter, &oldtarget->predicateLocks)
    2750              :         {
    2751            0 :             PREDICATELOCK *oldpredlock =
    2752            0 :                 dlist_container(PREDICATELOCK, targetLink, iter.cur);
    2753              :             PREDICATELOCK *newpredlock;
    2754            0 :             SerCommitSeqNo oldCommitSeqNo = oldpredlock->commitSeqNo;
    2755              : 
    2756            0 :             newpredlocktag.myXact = oldpredlock->tag.myXact;
    2757              : 
    2758            0 :             if (removeOld)
    2759              :             {
    2760            0 :                 dlist_delete(&(oldpredlock->xactLink));
    2761            0 :                 dlist_delete(&(oldpredlock->targetLink));
    2762              : 
    2763            0 :                 hash_search_with_hash_value
    2764              :                     (PredicateLockHash,
    2765            0 :                      &oldpredlock->tag,
    2766            0 :                      PredicateLockHashCodeFromTargetHashCode(&oldpredlock->tag,
    2767              :                                                              oldtargettaghash),
    2768              :                      HASH_REMOVE, &found);
    2769              :                 Assert(found);
    2770              :             }
    2771              : 
    2772              :             newpredlock = (PREDICATELOCK *)
    2773            0 :                 hash_search_with_hash_value(PredicateLockHash,
    2774              :                                             &newpredlocktag,
    2775            0 :                                             PredicateLockHashCodeFromTargetHashCode(&newpredlocktag,
    2776              :                                                                                     newtargettaghash),
    2777              :                                             HASH_ENTER_NULL,
    2778              :                                             &found);
    2779            0 :             if (!newpredlock)
    2780              :             {
    2781              :                 /* Out of shared memory. Undo what we've done so far. */
    2782            0 :                 LWLockRelease(SerializableXactHashLock);
    2783            0 :                 DeleteLockTarget(newtarget, newtargettaghash);
    2784            0 :                 outOfShmem = true;
    2785            0 :                 goto exit;
    2786              :             }
    2787            0 :             if (!found)
    2788              :             {
    2789            0 :                 dlist_push_tail(&(newtarget->predicateLocks),
    2790              :                                 &(newpredlock->targetLink));
    2791            0 :                 dlist_push_tail(&(newpredlocktag.myXact->predicateLocks),
    2792              :                                 &(newpredlock->xactLink));
    2793            0 :                 newpredlock->commitSeqNo = oldCommitSeqNo;
    2794              :             }
    2795              :             else
    2796              :             {
    2797            0 :                 if (newpredlock->commitSeqNo < oldCommitSeqNo)
    2798            0 :                     newpredlock->commitSeqNo = oldCommitSeqNo;
    2799              :             }
    2800              : 
    2801              :             Assert(newpredlock->commitSeqNo != 0);
    2802              :             Assert((newpredlock->commitSeqNo == InvalidSerCommitSeqNo)
    2803              :                    || (newpredlock->tag.myXact == OldCommittedSxact));
    2804              :         }
    2805            0 :         LWLockRelease(SerializableXactHashLock);
    2806              : 
    2807            0 :         if (removeOld)
    2808              :         {
    2809              :             Assert(dlist_is_empty(&oldtarget->predicateLocks));
    2810            0 :             RemoveTargetIfNoLongerUsed(oldtarget, oldtargettaghash);
    2811              :         }
    2812              :     }
    2813              : 
    2814              : 
    2815            3 : exit:
    2816              :     /* Release partition locks in reverse order of acquisition. */
    2817            3 :     if (oldpartitionLock < newpartitionLock)
    2818              :     {
    2819            3 :         LWLockRelease(newpartitionLock);
    2820            3 :         LWLockRelease(oldpartitionLock);
    2821              :     }
    2822            0 :     else if (oldpartitionLock > newpartitionLock)
    2823              :     {
    2824            0 :         LWLockRelease(oldpartitionLock);
    2825            0 :         LWLockRelease(newpartitionLock);
    2826              :     }
    2827              :     else
    2828            0 :         LWLockRelease(newpartitionLock);
    2829              : 
    2830            3 :     if (removeOld)
    2831              :     {
    2832              :         /* We shouldn't run out of memory if we're moving locks */
    2833              :         Assert(!outOfShmem);
    2834              : 
    2835              :         /* Put the scratch entry back */
    2836            0 :         RestoreScratchTarget(false);
    2837              :     }
    2838              : 
    2839            3 :     return !outOfShmem;
    2840              : }
    2841              : 
    2842              : /*
    2843              :  * Drop all predicate locks of any granularity from the specified relation,
    2844              :  * which can be a heap relation or an index relation.  If 'transfer' is true,
    2845              :  * acquire a relation lock on the heap for any transactions with any lock(s)
    2846              :  * on the specified relation.
    2847              :  *
    2848              :  * This requires grabbing a lot of LW locks and scanning the entire lock
    2849              :  * target table for matches.  That makes this more expensive than most
    2850              :  * predicate lock management functions, but it will only be called for DDL
    2851              :  * type commands that are expensive anyway, and there are fast returns when
    2852              :  * no serializable transactions are active or the relation is temporary.
    2853              :  *
    2854              :  * We don't use the TransferPredicateLocksToNewTarget function because it
    2855              :  * acquires its own locks on the partitions of the two targets involved,
    2856              :  * and we'll already be holding all partition locks.
    2857              :  *
    2858              :  * We can't throw an error from here, because the call could be from a
    2859              :  * transaction which is not serializable.
    2860              :  *
    2861              :  * NOTE: This is currently only called with transfer set to true, but that may
    2862              :  * change.  If we decide to clean up the locks from a table on commit of a
    2863              :  * transaction which executed DROP TABLE, the false condition will be useful.
    2864              :  */
    2865              : static void
    2866        22664 : DropAllPredicateLocksFromTable(Relation relation, bool transfer)
    2867              : {
    2868              :     HASH_SEQ_STATUS seqstat;
    2869              :     PREDICATELOCKTARGET *oldtarget;
    2870              :     PREDICATELOCKTARGET *heaptarget;
    2871              :     Oid         dbId;
    2872              :     Oid         relId;
    2873              :     Oid         heapId;
    2874              :     int         i;
    2875              :     bool        isIndex;
    2876              :     bool        found;
    2877              :     uint32      heaptargettaghash;
    2878              : 
    2879              :     /*
    2880              :      * Bail out quickly if there are no serializable transactions running.
    2881              :      * It's safe to check this without taking locks because the caller is
    2882              :      * holding an ACCESS EXCLUSIVE lock on the relation.  No new locks which
    2883              :      * would matter here can be acquired while that is held.
    2884              :      */
    2885        22664 :     if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
    2886        22619 :         return;
    2887              : 
    2888           65 :     if (!PredicateLockingNeededForRelation(relation))
    2889           20 :         return;
    2890              : 
    2891           45 :     dbId = relation->rd_locator.dbOid;
    2892           45 :     relId = relation->rd_id;
    2893           45 :     if (relation->rd_index == NULL)
    2894              :     {
    2895            5 :         isIndex = false;
    2896            5 :         heapId = relId;
    2897              :     }
    2898              :     else
    2899              :     {
    2900           40 :         isIndex = true;
    2901           40 :         heapId = relation->rd_index->indrelid;
    2902              :     }
    2903              :     Assert(heapId != InvalidOid);
    2904              :     Assert(transfer || !isIndex);   /* index OID only makes sense with
    2905              :                                      * transfer */
    2906              : 
    2907              :     /* Retrieve first time needed, then keep. */
    2908           45 :     heaptargettaghash = 0;
    2909           45 :     heaptarget = NULL;
    2910              : 
    2911              :     /* Acquire locks on all lock partitions */
    2912           45 :     LWLockAcquire(SerializablePredicateListLock, LW_EXCLUSIVE);
    2913          765 :     for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
    2914          720 :         LWLockAcquire(PredicateLockHashPartitionLockByIndex(i), LW_EXCLUSIVE);
    2915           45 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    2916              : 
    2917              :     /*
    2918              :      * Remove the dummy entry to give us scratch space, so we know we'll be
    2919              :      * able to create the new lock target.
    2920              :      */
    2921           45 :     if (transfer)
    2922           45 :         RemoveScratchTarget(true);
    2923              : 
    2924              :     /* Scan through target map */
    2925           45 :     hash_seq_init(&seqstat, PredicateLockTargetHash);
    2926              : 
    2927           90 :     while ((oldtarget = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
    2928              :     {
    2929              :         dlist_mutable_iter iter;
    2930              : 
    2931              :         /*
    2932              :          * Check whether this is a target which needs attention.
    2933              :          */
    2934           45 :         if (GET_PREDICATELOCKTARGETTAG_RELATION(oldtarget->tag) != relId)
    2935           45 :             continue;           /* wrong relation id */
    2936            0 :         if (GET_PREDICATELOCKTARGETTAG_DB(oldtarget->tag) != dbId)
    2937            0 :             continue;           /* wrong database id */
    2938            0 :         if (transfer && !isIndex
    2939            0 :             && GET_PREDICATELOCKTARGETTAG_TYPE(oldtarget->tag) == PREDLOCKTAG_RELATION)
    2940            0 :             continue;           /* already the right lock */
    2941              : 
    2942              :         /*
    2943              :          * If we made it here, we have work to do.  We make sure the heap
    2944              :          * relation lock exists, then we walk the list of predicate locks for
    2945              :          * the old target we found, moving all locks to the heap relation lock
    2946              :          * -- unless they already hold that.
    2947              :          */
    2948              : 
    2949              :         /*
    2950              :          * First make sure we have the heap relation target.  We only need to
    2951              :          * do this once.
    2952              :          */
    2953            0 :         if (transfer && heaptarget == NULL)
    2954              :         {
    2955              :             PREDICATELOCKTARGETTAG heaptargettag;
    2956              : 
    2957            0 :             SET_PREDICATELOCKTARGETTAG_RELATION(heaptargettag, dbId, heapId);
    2958            0 :             heaptargettaghash = PredicateLockTargetTagHashCode(&heaptargettag);
    2959            0 :             heaptarget = hash_search_with_hash_value(PredicateLockTargetHash,
    2960              :                                                      &heaptargettag,
    2961              :                                                      heaptargettaghash,
    2962              :                                                      HASH_ENTER, &found);
    2963            0 :             if (!found)
    2964            0 :                 dlist_init(&heaptarget->predicateLocks);
    2965              :         }
    2966              : 
    2967              :         /*
    2968              :          * Loop through all the locks on the old target, replacing them with
    2969              :          * locks on the new target.
    2970              :          */
    2971            0 :         dlist_foreach_modify(iter, &oldtarget->predicateLocks)
    2972              :         {
    2973            0 :             PREDICATELOCK *oldpredlock =
    2974            0 :                 dlist_container(PREDICATELOCK, targetLink, iter.cur);
    2975              :             PREDICATELOCK *newpredlock;
    2976              :             SerCommitSeqNo oldCommitSeqNo;
    2977              :             SERIALIZABLEXACT *oldXact;
    2978              : 
    2979              :             /*
    2980              :              * Remove the old lock first. This avoids the chance of running
    2981              :              * out of lock structure entries for the hash table.
    2982              :              */
    2983            0 :             oldCommitSeqNo = oldpredlock->commitSeqNo;
    2984            0 :             oldXact = oldpredlock->tag.myXact;
    2985              : 
    2986            0 :             dlist_delete(&(oldpredlock->xactLink));
    2987              : 
    2988              :             /*
    2989              :              * No need for retail delete from oldtarget list, we're removing
    2990              :              * the whole target anyway.
    2991              :              */
    2992            0 :             hash_search(PredicateLockHash,
    2993            0 :                         &oldpredlock->tag,
    2994              :                         HASH_REMOVE, &found);
    2995              :             Assert(found);
    2996              : 
    2997            0 :             if (transfer)
    2998              :             {
    2999              :                 PREDICATELOCKTAG newpredlocktag;
    3000              : 
    3001            0 :                 newpredlocktag.myTarget = heaptarget;
    3002            0 :                 newpredlocktag.myXact = oldXact;
    3003              :                 newpredlock = (PREDICATELOCK *)
    3004            0 :                     hash_search_with_hash_value(PredicateLockHash,
    3005              :                                                 &newpredlocktag,
    3006            0 :                                                 PredicateLockHashCodeFromTargetHashCode(&newpredlocktag,
    3007              :                                                                                         heaptargettaghash),
    3008              :                                                 HASH_ENTER,
    3009              :                                                 &found);
    3010            0 :                 if (!found)
    3011              :                 {
    3012            0 :                     dlist_push_tail(&(heaptarget->predicateLocks),
    3013              :                                     &(newpredlock->targetLink));
    3014            0 :                     dlist_push_tail(&(newpredlocktag.myXact->predicateLocks),
    3015              :                                     &(newpredlock->xactLink));
    3016            0 :                     newpredlock->commitSeqNo = oldCommitSeqNo;
    3017              :                 }
    3018              :                 else
    3019              :                 {
    3020            0 :                     if (newpredlock->commitSeqNo < oldCommitSeqNo)
    3021            0 :                         newpredlock->commitSeqNo = oldCommitSeqNo;
    3022              :                 }
    3023              : 
    3024              :                 Assert(newpredlock->commitSeqNo != 0);
    3025              :                 Assert((newpredlock->commitSeqNo == InvalidSerCommitSeqNo)
    3026              :                        || (newpredlock->tag.myXact == OldCommittedSxact));
    3027              :             }
    3028              :         }
    3029              : 
    3030            0 :         hash_search(PredicateLockTargetHash, &oldtarget->tag, HASH_REMOVE,
    3031              :                     &found);
    3032              :         Assert(found);
    3033              :     }
    3034              : 
    3035              :     /* Put the scratch entry back */
    3036           45 :     if (transfer)
    3037           45 :         RestoreScratchTarget(true);
    3038              : 
    3039              :     /* Release locks in reverse order */
    3040           45 :     LWLockRelease(SerializableXactHashLock);
    3041          765 :     for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
    3042          720 :         LWLockRelease(PredicateLockHashPartitionLockByIndex(i));
    3043           45 :     LWLockRelease(SerializablePredicateListLock);
    3044              : }
    3045              : 
    3046              : /*
    3047              :  * TransferPredicateLocksToHeapRelation
    3048              :  *      For all transactions, transfer all predicate locks for the given
    3049              :  *      relation to a single relation lock on the heap.
    3050              :  */
    3051              : void
    3052        22664 : TransferPredicateLocksToHeapRelation(Relation relation)
    3053              : {
    3054        22664 :     DropAllPredicateLocksFromTable(relation, true);
    3055        22664 : }
    3056              : 
    3057              : 
    3058              : /*
    3059              :  *      PredicateLockPageSplit
    3060              :  *
    3061              :  * Copies any predicate locks for the old page to the new page.
    3062              :  * Skip if this is a temporary table or toast table.
    3063              :  *
    3064              :  * NOTE: A page split (or overflow) affects all serializable transactions,
    3065              :  * even if it occurs in the context of another transaction isolation level.
    3066              :  *
    3067              :  * NOTE: This currently leaves the local copy of the locks without
    3068              :  * information on the new lock which is in shared memory.  This could cause
    3069              :  * problems if enough page splits occur on locked pages without the processes
    3070              :  * which hold the locks getting in and noticing.
    3071              :  */
    3072              : void
    3073        37976 : PredicateLockPageSplit(Relation relation, BlockNumber oldblkno,
    3074              :                        BlockNumber newblkno)
    3075              : {
    3076              :     PREDICATELOCKTARGETTAG oldtargettag;
    3077              :     PREDICATELOCKTARGETTAG newtargettag;
    3078              :     bool        success;
    3079              : 
    3080              :     /*
    3081              :      * Bail out quickly if there are no serializable transactions running.
    3082              :      *
    3083              :      * It's safe to do this check without taking any additional locks. Even if
    3084              :      * a serializable transaction starts concurrently, we know it can't take
    3085              :      * any SIREAD locks on the page being split because the caller is holding
    3086              :      * the associated buffer page lock. Memory reordering isn't an issue; the
    3087              :      * memory barrier in the LWLock acquisition guarantees that this read
    3088              :      * occurs while the buffer page lock is held.
    3089              :      */
    3090        37976 :     if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
    3091        37973 :         return;
    3092              : 
    3093           16 :     if (!PredicateLockingNeededForRelation(relation))
    3094           13 :         return;
    3095              : 
    3096              :     Assert(oldblkno != newblkno);
    3097              :     Assert(BlockNumberIsValid(oldblkno));
    3098              :     Assert(BlockNumberIsValid(newblkno));
    3099              : 
    3100            3 :     SET_PREDICATELOCKTARGETTAG_PAGE(oldtargettag,
    3101              :                                     relation->rd_locator.dbOid,
    3102              :                                     relation->rd_id,
    3103              :                                     oldblkno);
    3104            3 :     SET_PREDICATELOCKTARGETTAG_PAGE(newtargettag,
    3105              :                                     relation->rd_locator.dbOid,
    3106              :                                     relation->rd_id,
    3107              :                                     newblkno);
    3108              : 
    3109            3 :     LWLockAcquire(SerializablePredicateListLock, LW_EXCLUSIVE);
    3110              : 
    3111              :     /*
    3112              :      * Try copying the locks over to the new page's tag, creating it if
    3113              :      * necessary.
    3114              :      */
    3115            3 :     success = TransferPredicateLocksToNewTarget(oldtargettag,
    3116              :                                                 newtargettag,
    3117              :                                                 false);
    3118              : 
    3119            3 :     if (!success)
    3120              :     {
    3121              :         /*
    3122              :          * No more predicate lock entries are available. Failure isn't an
    3123              :          * option here, so promote the page lock to a relation lock.
    3124              :          */
    3125              : 
    3126              :         /* Get the parent relation lock's lock tag */
    3127            0 :         success = GetParentPredicateLockTag(&oldtargettag,
    3128              :                                             &newtargettag);
    3129              :         Assert(success);
    3130              : 
    3131              :         /*
    3132              :          * Move the locks to the parent. This shouldn't fail.
    3133              :          *
    3134              :          * Note that here we are removing locks held by other backends,
    3135              :          * leading to a possible inconsistency in their local lock hash table.
    3136              :          * This is OK because we're replacing it with a lock that covers the
    3137              :          * old one.
    3138              :          */
    3139            0 :         success = TransferPredicateLocksToNewTarget(oldtargettag,
    3140              :                                                     newtargettag,
    3141              :                                                     true);
    3142              :         Assert(success);
    3143              :     }
    3144              : 
    3145            3 :     LWLockRelease(SerializablePredicateListLock);
    3146              : }
    3147              : 
    3148              : /*
    3149              :  *      PredicateLockPageCombine
    3150              :  *
    3151              :  * Combines predicate locks for two existing pages.
    3152              :  * Skip if this is a temporary table or toast table.
    3153              :  *
    3154              :  * NOTE: A page combine affects all serializable transactions, even if it
    3155              :  * occurs in the context of another transaction isolation level.
    3156              :  */
    3157              : void
    3158         3461 : PredicateLockPageCombine(Relation relation, BlockNumber oldblkno,
    3159              :                          BlockNumber newblkno)
    3160              : {
    3161              :     /*
    3162              :      * Page combines differ from page splits in that we ought to be able to
    3163              :      * remove the locks on the old page after transferring them to the new
    3164              :      * page, instead of duplicating them. However, because we can't edit other
    3165              :      * backends' local lock tables, removing the old lock would leave them
    3166              :      * with an entry in their LocalPredicateLockHash for a lock they're not
    3167              :      * holding, which isn't acceptable. So we wind up having to do the same
    3168              :      * work as a page split, acquiring a lock on the new page and keeping the
    3169              :      * old page locked too. That can lead to some false positives, but should
    3170              :      * be rare in practice.
    3171              :      */
    3172         3461 :     PredicateLockPageSplit(relation, oldblkno, newblkno);
    3173         3461 : }
    3174              : 
    3175              : /*
    3176              :  * Walk the list of in-progress serializable transactions and find the new
    3177              :  * xmin.
    3178              :  */
    3179              : static void
    3180          893 : SetNewSxactGlobalXmin(void)
    3181              : {
    3182              :     dlist_iter  iter;
    3183              : 
    3184              :     Assert(LWLockHeldByMe(SerializableXactHashLock));
    3185              : 
    3186          893 :     PredXact->SxactGlobalXmin = InvalidTransactionId;
    3187          893 :     PredXact->SxactGlobalXminCount = 0;
    3188              : 
    3189         3378 :     dlist_foreach(iter, &PredXact->activeList)
    3190              :     {
    3191         2485 :         SERIALIZABLEXACT *sxact =
    3192         2485 :             dlist_container(SERIALIZABLEXACT, xactLink, iter.cur);
    3193              : 
    3194         2485 :         if (!SxactIsRolledBack(sxact)
    3195         2179 :             && !SxactIsCommitted(sxact)
    3196           19 :             && sxact != OldCommittedSxact)
    3197              :         {
    3198              :             Assert(sxact->xmin != InvalidTransactionId);
    3199           19 :             if (!TransactionIdIsValid(PredXact->SxactGlobalXmin)
    3200            0 :                 || TransactionIdPrecedes(sxact->xmin,
    3201            0 :                                          PredXact->SxactGlobalXmin))
    3202              :             {
    3203           19 :                 PredXact->SxactGlobalXmin = sxact->xmin;
    3204           19 :                 PredXact->SxactGlobalXminCount = 1;
    3205              :             }
    3206            0 :             else if (TransactionIdEquals(sxact->xmin,
    3207              :                                          PredXact->SxactGlobalXmin))
    3208            0 :                 PredXact->SxactGlobalXminCount++;
    3209              :         }
    3210              :     }
    3211              : 
    3212          893 :     SerialSetActiveSerXmin(PredXact->SxactGlobalXmin);
    3213          893 : }
    3214              : 
    3215              : /*
    3216              :  *      ReleasePredicateLocks
    3217              :  *
    3218              :  * Releases predicate locks based on completion of the current transaction,
    3219              :  * whether committed or rolled back.  It can also be called for a read only
    3220              :  * transaction when it becomes impossible for the transaction to become
    3221              :  * part of a dangerous structure.
    3222              :  *
    3223              :  * We do nothing unless this is a serializable transaction.
    3224              :  *
    3225              :  * This method must ensure that shared memory hash tables are cleaned
    3226              :  * up in some relatively timely fashion.
    3227              :  *
    3228              :  * If this transaction is committing and is holding any predicate locks,
    3229              :  * it must be added to a list of completed serializable transactions still
    3230              :  * holding locks.
    3231              :  *
    3232              :  * If isReadOnlySafe is true, then predicate locks are being released before
    3233              :  * the end of the transaction because MySerializableXact has been determined
    3234              :  * to be RO_SAFE.  In non-parallel mode we can release it completely, but it
    3235              :  * in parallel mode we partially release the SERIALIZABLEXACT and keep it
    3236              :  * around until the end of the transaction, allowing each backend to clear its
    3237              :  * MySerializableXact variable and benefit from the optimization in its own
    3238              :  * time.
    3239              :  */
    3240              : void
    3241       629209 : ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
    3242              : {
    3243       629209 :     bool        partiallyReleasing = false;
    3244              :     bool        needToClear;
    3245              :     SERIALIZABLEXACT *roXact;
    3246              :     dlist_mutable_iter iter;
    3247              : 
    3248              :     /*
    3249              :      * We can't trust XactReadOnly here, because a transaction which started
    3250              :      * as READ WRITE can show as READ ONLY later, e.g., within
    3251              :      * subtransactions.  We want to flag a transaction as READ ONLY if it
    3252              :      * commits without writing so that de facto READ ONLY transactions get the
    3253              :      * benefit of some RO optimizations, so we will use this local variable to
    3254              :      * get some cleanup logic right which is based on whether the transaction
    3255              :      * was declared READ ONLY at the top level.
    3256              :      */
    3257              :     bool        topLevelIsDeclaredReadOnly;
    3258              : 
    3259              :     /* We can't be both committing and releasing early due to RO_SAFE. */
    3260              :     Assert(!(isCommit && isReadOnlySafe));
    3261              : 
    3262              :     /* Are we at the end of a transaction, that is, a commit or abort? */
    3263       629209 :     if (!isReadOnlySafe)
    3264              :     {
    3265              :         /*
    3266              :          * Parallel workers mustn't release predicate locks at the end of
    3267              :          * their transaction.  The leader will do that at the end of its
    3268              :          * transaction.
    3269              :          */
    3270       629174 :         if (IsParallelWorker())
    3271              :         {
    3272         6030 :             ReleasePredicateLocksLocal();
    3273       627630 :             return;
    3274              :         }
    3275              : 
    3276              :         /*
    3277              :          * By the time the leader in a parallel query reaches end of
    3278              :          * transaction, it has waited for all workers to exit.
    3279              :          */
    3280              :         Assert(!ParallelContextActive());
    3281              : 
    3282              :         /*
    3283              :          * If the leader in a parallel query earlier stashed a partially
    3284              :          * released SERIALIZABLEXACT for final clean-up at end of transaction
    3285              :          * (because workers might still have been accessing it), then it's
    3286              :          * time to restore it.
    3287              :          */
    3288       623144 :         if (SavedSerializableXact != InvalidSerializableXact)
    3289              :         {
    3290              :             Assert(MySerializableXact == InvalidSerializableXact);
    3291            1 :             MySerializableXact = SavedSerializableXact;
    3292            1 :             SavedSerializableXact = InvalidSerializableXact;
    3293              :             Assert(SxactIsPartiallyReleased(MySerializableXact));
    3294              :         }
    3295              :     }
    3296              : 
    3297       623179 :     if (MySerializableXact == InvalidSerializableXact)
    3298              :     {
    3299              :         Assert(LocalPredicateLockHash == NULL);
    3300       621597 :         return;
    3301              :     }
    3302              : 
    3303         1582 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    3304              : 
    3305              :     /*
    3306              :      * If the transaction is committing, but it has been partially released
    3307              :      * already, then treat this as a roll back.  It was marked as rolled back.
    3308              :      */
    3309         1582 :     if (isCommit && SxactIsPartiallyReleased(MySerializableXact))
    3310            2 :         isCommit = false;
    3311              : 
    3312              :     /*
    3313              :      * If we're called in the middle of a transaction because we discovered
    3314              :      * that the SXACT_FLAG_RO_SAFE flag was set, then we'll partially release
    3315              :      * it (that is, release the predicate locks and conflicts, but not the
    3316              :      * SERIALIZABLEXACT itself) if we're the first backend to have noticed.
    3317              :      */
    3318         1582 :     if (isReadOnlySafe && IsInParallelMode())
    3319              :     {
    3320              :         /*
    3321              :          * The leader needs to stash a pointer to it, so that it can
    3322              :          * completely release it at end-of-transaction.
    3323              :          */
    3324            5 :         if (!IsParallelWorker())
    3325            1 :             SavedSerializableXact = MySerializableXact;
    3326              : 
    3327              :         /*
    3328              :          * The first backend to reach this condition will partially release
    3329              :          * the SERIALIZABLEXACT.  All others will just clear their
    3330              :          * backend-local state so that they stop doing SSI checks for the rest
    3331              :          * of the transaction.
    3332              :          */
    3333            5 :         if (SxactIsPartiallyReleased(MySerializableXact))
    3334              :         {
    3335            3 :             LWLockRelease(SerializableXactHashLock);
    3336            3 :             ReleasePredicateLocksLocal();
    3337            3 :             return;
    3338              :         }
    3339              :         else
    3340              :         {
    3341            2 :             MySerializableXact->flags |= SXACT_FLAG_PARTIALLY_RELEASED;
    3342            2 :             partiallyReleasing = true;
    3343              :             /* ... and proceed to perform the partial release below. */
    3344              :         }
    3345              :     }
    3346              :     Assert(!isCommit || SxactIsPrepared(MySerializableXact));
    3347              :     Assert(!isCommit || !SxactIsDoomed(MySerializableXact));
    3348              :     Assert(!SxactIsCommitted(MySerializableXact));
    3349              :     Assert(SxactIsPartiallyReleased(MySerializableXact)
    3350              :            || !SxactIsRolledBack(MySerializableXact));
    3351              : 
    3352              :     /* may not be serializable during COMMIT/ROLLBACK PREPARED */
    3353              :     Assert(MySerializableXact->pid == 0 || IsolationIsSerializable());
    3354              : 
    3355              :     /* We'd better not already be on the cleanup list. */
    3356              :     Assert(!SxactIsOnFinishedList(MySerializableXact));
    3357              : 
    3358         1579 :     topLevelIsDeclaredReadOnly = SxactIsReadOnly(MySerializableXact);
    3359              : 
    3360              :     /*
    3361              :      * We don't hold XidGenLock lock here, assuming that TransactionId is
    3362              :      * atomic!
    3363              :      *
    3364              :      * If this value is changing, we don't care that much whether we get the
    3365              :      * old or new value -- it is just used to determine how far
    3366              :      * SxactGlobalXmin must advance before this transaction can be fully
    3367              :      * cleaned up.  The worst that could happen is we wait for one more
    3368              :      * transaction to complete before freeing some RAM; correctness of visible
    3369              :      * behavior is not affected.
    3370              :      */
    3371         1579 :     MySerializableXact->finishedBefore = XidFromFullTransactionId(TransamVariables->nextXid);
    3372              : 
    3373              :     /*
    3374              :      * If it's not a commit it's either a rollback or a read-only transaction
    3375              :      * flagged SXACT_FLAG_RO_SAFE, and we can clear our locks immediately.
    3376              :      */
    3377         1579 :     if (isCommit)
    3378              :     {
    3379         1248 :         MySerializableXact->flags |= SXACT_FLAG_COMMITTED;
    3380         1248 :         MySerializableXact->commitSeqNo = ++(PredXact->LastSxactCommitSeqNo);
    3381              :         /* Recognize implicit read-only transaction (commit without write). */
    3382         1248 :         if (!MyXactDidWrite)
    3383          239 :             MySerializableXact->flags |= SXACT_FLAG_READ_ONLY;
    3384              :     }
    3385              :     else
    3386              :     {
    3387              :         /*
    3388              :          * The DOOMED flag indicates that we intend to roll back this
    3389              :          * transaction and so it should not cause serialization failures for
    3390              :          * other transactions that conflict with it. Note that this flag might
    3391              :          * already be set, if another backend marked this transaction for
    3392              :          * abort.
    3393              :          *
    3394              :          * The ROLLED_BACK flag further indicates that ReleasePredicateLocks
    3395              :          * has been called, and so the SerializableXact is eligible for
    3396              :          * cleanup. This means it should not be considered when calculating
    3397              :          * SxactGlobalXmin.
    3398              :          */
    3399          331 :         MySerializableXact->flags |= SXACT_FLAG_DOOMED;
    3400          331 :         MySerializableXact->flags |= SXACT_FLAG_ROLLED_BACK;
    3401              : 
    3402              :         /*
    3403              :          * If the transaction was previously prepared, but is now failing due
    3404              :          * to a ROLLBACK PREPARED or (hopefully very rare) error after the
    3405              :          * prepare, clear the prepared flag.  This simplifies conflict
    3406              :          * checking.
    3407              :          */
    3408          331 :         MySerializableXact->flags &= ~SXACT_FLAG_PREPARED;
    3409              :     }
    3410              : 
    3411         1579 :     if (!topLevelIsDeclaredReadOnly)
    3412              :     {
    3413              :         Assert(PredXact->WritableSxactCount > 0);
    3414         1469 :         if (--(PredXact->WritableSxactCount) == 0)
    3415              :         {
    3416              :             /*
    3417              :              * Release predicate locks and rw-conflicts in for all committed
    3418              :              * transactions.  There are no longer any transactions which might
    3419              :              * conflict with the locks and no chance for new transactions to
    3420              :              * overlap.  Similarly, existing conflicts in can't cause pivots,
    3421              :              * and any conflicts in which could have completed a dangerous
    3422              :              * structure would already have caused a rollback, so any
    3423              :              * remaining ones must be benign.
    3424              :              */
    3425          884 :             PredXact->CanPartialClearThrough = PredXact->LastSxactCommitSeqNo;
    3426              :         }
    3427              :     }
    3428              :     else
    3429              :     {
    3430              :         /*
    3431              :          * Read-only transactions: clear the list of transactions that might
    3432              :          * make us unsafe. Note that we use 'inLink' for the iteration as
    3433              :          * opposed to 'outLink' for the r/w xacts.
    3434              :          */
    3435          152 :         dlist_foreach_modify(iter, &MySerializableXact->possibleUnsafeConflicts)
    3436              :         {
    3437           42 :             RWConflict  possibleUnsafeConflict =
    3438           42 :                 dlist_container(RWConflictData, inLink, iter.cur);
    3439              : 
    3440              :             Assert(!SxactIsReadOnly(possibleUnsafeConflict->sxactOut));
    3441              :             Assert(MySerializableXact == possibleUnsafeConflict->sxactIn);
    3442              : 
    3443           42 :             ReleaseRWConflict(possibleUnsafeConflict);
    3444              :         }
    3445              :     }
    3446              : 
    3447              :     /* Check for conflict out to old committed transactions. */
    3448         1579 :     if (isCommit
    3449         1248 :         && !SxactIsReadOnly(MySerializableXact)
    3450         1009 :         && SxactHasSummaryConflictOut(MySerializableXact))
    3451              :     {
    3452              :         /*
    3453              :          * we don't know which old committed transaction we conflicted with,
    3454              :          * so be conservative and use FirstNormalSerCommitSeqNo here
    3455              :          */
    3456            0 :         MySerializableXact->SeqNo.earliestOutConflictCommit =
    3457              :             FirstNormalSerCommitSeqNo;
    3458            0 :         MySerializableXact->flags |= SXACT_FLAG_CONFLICT_OUT;
    3459              :     }
    3460              : 
    3461              :     /*
    3462              :      * Release all outConflicts to committed transactions.  If we're rolling
    3463              :      * back clear them all.  Set SXACT_FLAG_CONFLICT_OUT if any point to
    3464              :      * previously committed transactions.
    3465              :      */
    3466         2266 :     dlist_foreach_modify(iter, &MySerializableXact->outConflicts)
    3467              :     {
    3468          687 :         RWConflict  conflict =
    3469              :             dlist_container(RWConflictData, outLink, iter.cur);
    3470              : 
    3471          687 :         if (isCommit
    3472          455 :             && !SxactIsReadOnly(MySerializableXact)
    3473          347 :             && SxactIsCommitted(conflict->sxactIn))
    3474              :         {
    3475           96 :             if ((MySerializableXact->flags & SXACT_FLAG_CONFLICT_OUT) == 0
    3476            0 :                 || conflict->sxactIn->prepareSeqNo < MySerializableXact->SeqNo.earliestOutConflictCommit)
    3477           96 :                 MySerializableXact->SeqNo.earliestOutConflictCommit = conflict->sxactIn->prepareSeqNo;
    3478           96 :             MySerializableXact->flags |= SXACT_FLAG_CONFLICT_OUT;
    3479              :         }
    3480              : 
    3481          687 :         if (!isCommit
    3482          455 :             || SxactIsCommitted(conflict->sxactIn)
    3483          337 :             || (conflict->sxactIn->SeqNo.lastCommitBeforeSnapshot >= PredXact->LastSxactCommitSeqNo))
    3484          350 :             ReleaseRWConflict(conflict);
    3485              :     }
    3486              : 
    3487              :     /*
    3488              :      * Release all inConflicts from committed and read-only transactions. If
    3489              :      * we're rolling back, clear them all.
    3490              :      */
    3491         2360 :     dlist_foreach_modify(iter, &MySerializableXact->inConflicts)
    3492              :     {
    3493          781 :         RWConflict  conflict =
    3494          781 :             dlist_container(RWConflictData, inLink, iter.cur);
    3495              : 
    3496          781 :         if (!isCommit
    3497          604 :             || SxactIsCommitted(conflict->sxactOut)
    3498          419 :             || SxactIsReadOnly(conflict->sxactOut))
    3499          442 :             ReleaseRWConflict(conflict);
    3500              :     }
    3501              : 
    3502         1579 :     if (!topLevelIsDeclaredReadOnly)
    3503              :     {
    3504              :         /*
    3505              :          * Remove ourselves from the list of possible conflicts for concurrent
    3506              :          * READ ONLY transactions, flagging them as unsafe if we have a
    3507              :          * conflict out. If any are waiting DEFERRABLE transactions, wake them
    3508              :          * up if they are known safe or known unsafe.
    3509              :          */
    3510         1561 :         dlist_foreach_modify(iter, &MySerializableXact->possibleUnsafeConflicts)
    3511              :         {
    3512           92 :             RWConflict  possibleUnsafeConflict =
    3513              :                 dlist_container(RWConflictData, outLink, iter.cur);
    3514              : 
    3515           92 :             roXact = possibleUnsafeConflict->sxactIn;
    3516              :             Assert(MySerializableXact == possibleUnsafeConflict->sxactOut);
    3517              :             Assert(SxactIsReadOnly(roXact));
    3518              : 
    3519              :             /* Mark conflicted if necessary. */
    3520           92 :             if (isCommit
    3521           89 :                 && MyXactDidWrite
    3522           84 :                 && SxactHasConflictOut(MySerializableXact)
    3523           13 :                 && (MySerializableXact->SeqNo.earliestOutConflictCommit
    3524           13 :                     <= roXact->SeqNo.lastCommitBeforeSnapshot))
    3525              :             {
    3526              :                 /*
    3527              :                  * This releases possibleUnsafeConflict (as well as all other
    3528              :                  * possible conflicts for roXact)
    3529              :                  */
    3530            3 :                 FlagSxactUnsafe(roXact);
    3531              :             }
    3532              :             else
    3533              :             {
    3534           89 :                 ReleaseRWConflict(possibleUnsafeConflict);
    3535              : 
    3536              :                 /*
    3537              :                  * If we were the last possible conflict, flag it safe. The
    3538              :                  * transaction can now safely release its predicate locks (but
    3539              :                  * that transaction's backend has to do that itself).
    3540              :                  */
    3541           89 :                 if (dlist_is_empty(&roXact->possibleUnsafeConflicts))
    3542           67 :                     roXact->flags |= SXACT_FLAG_RO_SAFE;
    3543              :             }
    3544              : 
    3545              :             /*
    3546              :              * Wake up the process for a waiting DEFERRABLE transaction if we
    3547              :              * now know it's either safe or conflicted.
    3548              :              */
    3549           92 :             if (SxactIsDeferrableWaiting(roXact) &&
    3550            3 :                 (SxactIsROUnsafe(roXact) || SxactIsROSafe(roXact)))
    3551            3 :                 ProcSendSignal(roXact->pgprocno);
    3552              :         }
    3553              :     }
    3554              : 
    3555              :     /*
    3556              :      * Check whether it's time to clean up old transactions. This can only be
    3557              :      * done when the last serializable transaction with the oldest xmin among
    3558              :      * serializable transactions completes.  We then find the "new oldest"
    3559              :      * xmin and purge any transactions which finished before this transaction
    3560              :      * was launched.
    3561              :      *
    3562              :      * For parallel queries in read-only transactions, it might run twice. We
    3563              :      * only release the reference on the first call.
    3564              :      */
    3565         1579 :     needToClear = false;
    3566         1579 :     if ((partiallyReleasing ||
    3567         1577 :          !SxactIsPartiallyReleased(MySerializableXact)) &&
    3568         1577 :         TransactionIdEquals(MySerializableXact->xmin,
    3569              :                             PredXact->SxactGlobalXmin))
    3570              :     {
    3571              :         Assert(PredXact->SxactGlobalXminCount > 0);
    3572         1559 :         if (--(PredXact->SxactGlobalXminCount) == 0)
    3573              :         {
    3574          893 :             SetNewSxactGlobalXmin();
    3575          893 :             needToClear = true;
    3576              :         }
    3577              :     }
    3578              : 
    3579         1579 :     LWLockRelease(SerializableXactHashLock);
    3580              : 
    3581         1579 :     LWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
    3582              : 
    3583              :     /* Add this to the list of transactions to check for later cleanup. */
    3584         1579 :     if (isCommit)
    3585         1248 :         dlist_push_tail(FinishedSerializableTransactions,
    3586         1248 :                         &MySerializableXact->finishedLink);
    3587              : 
    3588              :     /*
    3589              :      * If we're releasing a RO_SAFE transaction in parallel mode, we'll only
    3590              :      * partially release it.  That's necessary because other backends may have
    3591              :      * a reference to it.  The leader will release the SERIALIZABLEXACT itself
    3592              :      * at the end of the transaction after workers have stopped running.
    3593              :      */
    3594         1579 :     if (!isCommit)
    3595          331 :         ReleaseOneSerializableXact(MySerializableXact,
    3596          331 :                                    isReadOnlySafe && IsInParallelMode(),
    3597          331 :                                    false);
    3598              : 
    3599         1579 :     LWLockRelease(SerializableFinishedListLock);
    3600              : 
    3601         1579 :     if (needToClear)
    3602          893 :         ClearOldPredicateLocks();
    3603              : 
    3604         1579 :     ReleasePredicateLocksLocal();
    3605              : }
    3606              : 
    3607              : static void
    3608         7612 : ReleasePredicateLocksLocal(void)
    3609              : {
    3610         7612 :     MySerializableXact = InvalidSerializableXact;
    3611         7612 :     MyXactDidWrite = false;
    3612              : 
    3613              :     /* Delete per-transaction lock table */
    3614         7612 :     if (LocalPredicateLockHash != NULL)
    3615              :     {
    3616         1578 :         hash_destroy(LocalPredicateLockHash);
    3617         1578 :         LocalPredicateLockHash = NULL;
    3618              :     }
    3619         7612 : }
    3620              : 
    3621              : /*
    3622              :  * Clear old predicate locks, belonging to committed transactions that are no
    3623              :  * longer interesting to any in-progress transaction.
    3624              :  */
    3625              : static void
    3626          893 : ClearOldPredicateLocks(void)
    3627              : {
    3628              :     dlist_mutable_iter iter;
    3629              : 
    3630              :     /*
    3631              :      * Loop through finished transactions. They are in commit order, so we can
    3632              :      * stop as soon as we find one that's still interesting.
    3633              :      */
    3634          893 :     LWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
    3635          893 :     LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    3636         2150 :     dlist_foreach_modify(iter, FinishedSerializableTransactions)
    3637              :     {
    3638         1266 :         SERIALIZABLEXACT *finishedSxact =
    3639         1266 :             dlist_container(SERIALIZABLEXACT, finishedLink, iter.cur);
    3640              : 
    3641         1266 :         if (!TransactionIdIsValid(PredXact->SxactGlobalXmin)
    3642           29 :             || TransactionIdPrecedesOrEquals(finishedSxact->finishedBefore,
    3643           29 :                                              PredXact->SxactGlobalXmin))
    3644              :         {
    3645              :             /*
    3646              :              * This transaction committed before any in-progress transaction
    3647              :              * took its snapshot. It's no longer interesting.
    3648              :              */
    3649         1248 :             LWLockRelease(SerializableXactHashLock);
    3650         1248 :             dlist_delete_thoroughly(&finishedSxact->finishedLink);
    3651         1248 :             ReleaseOneSerializableXact(finishedSxact, false, false);
    3652         1248 :             LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    3653              :         }
    3654           18 :         else if (finishedSxact->commitSeqNo > PredXact->HavePartialClearedThrough
    3655           18 :                  && finishedSxact->commitSeqNo <= PredXact->CanPartialClearThrough)
    3656              :         {
    3657              :             /*
    3658              :              * Any active transactions that took their snapshot before this
    3659              :              * transaction committed are read-only, so we can clear part of
    3660              :              * its state.
    3661              :              */
    3662            9 :             LWLockRelease(SerializableXactHashLock);
    3663              : 
    3664            9 :             if (SxactIsReadOnly(finishedSxact))
    3665              :             {
    3666              :                 /* A read-only transaction can be removed entirely */
    3667            0 :                 dlist_delete_thoroughly(&(finishedSxact->finishedLink));
    3668            0 :                 ReleaseOneSerializableXact(finishedSxact, false, false);
    3669              :             }
    3670              :             else
    3671              :             {
    3672              :                 /*
    3673              :                  * A read-write transaction can only be partially cleared. We
    3674              :                  * need to keep the SERIALIZABLEXACT but can release the
    3675              :                  * SIREAD locks and conflicts in.
    3676              :                  */
    3677            9 :                 ReleaseOneSerializableXact(finishedSxact, true, false);
    3678              :             }
    3679              : 
    3680            9 :             PredXact->HavePartialClearedThrough = finishedSxact->commitSeqNo;
    3681            9 :             LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    3682              :         }
    3683              :         else
    3684              :         {
    3685              :             /* Still interesting. */
    3686              :             break;
    3687              :         }
    3688              :     }
    3689          893 :     LWLockRelease(SerializableXactHashLock);
    3690              : 
    3691              :     /*
    3692              :      * Loop through predicate locks on dummy transaction for summarized data.
    3693              :      */
    3694          893 :     LWLockAcquire(SerializablePredicateListLock, LW_SHARED);
    3695          893 :     dlist_foreach_modify(iter, &OldCommittedSxact->predicateLocks)
    3696              :     {
    3697            0 :         PREDICATELOCK *predlock =
    3698            0 :             dlist_container(PREDICATELOCK, xactLink, iter.cur);
    3699              :         bool        canDoPartialCleanup;
    3700              : 
    3701            0 :         LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    3702              :         Assert(predlock->commitSeqNo != 0);
    3703              :         Assert(predlock->commitSeqNo != InvalidSerCommitSeqNo);
    3704            0 :         canDoPartialCleanup = (predlock->commitSeqNo <= PredXact->CanPartialClearThrough);
    3705            0 :         LWLockRelease(SerializableXactHashLock);
    3706              : 
    3707              :         /*
    3708              :          * If this lock originally belonged to an old enough transaction, we
    3709              :          * can release it.
    3710              :          */
    3711            0 :         if (canDoPartialCleanup)
    3712              :         {
    3713              :             PREDICATELOCKTAG tag;
    3714              :             PREDICATELOCKTARGET *target;
    3715              :             PREDICATELOCKTARGETTAG targettag;
    3716              :             uint32      targettaghash;
    3717              :             LWLock     *partitionLock;
    3718              : 
    3719            0 :             tag = predlock->tag;
    3720            0 :             target = tag.myTarget;
    3721            0 :             targettag = target->tag;
    3722            0 :             targettaghash = PredicateLockTargetTagHashCode(&targettag);
    3723            0 :             partitionLock = PredicateLockHashPartitionLock(targettaghash);
    3724              : 
    3725            0 :             LWLockAcquire(partitionLock, LW_EXCLUSIVE);
    3726              : 
    3727            0 :             dlist_delete(&(predlock->targetLink));
    3728            0 :             dlist_delete(&(predlock->xactLink));
    3729              : 
    3730            0 :             hash_search_with_hash_value(PredicateLockHash, &tag,
    3731            0 :                                         PredicateLockHashCodeFromTargetHashCode(&tag,
    3732              :                                                                                 targettaghash),
    3733              :                                         HASH_REMOVE, NULL);
    3734            0 :             RemoveTargetIfNoLongerUsed(target, targettaghash);
    3735              : 
    3736            0 :             LWLockRelease(partitionLock);
    3737              :         }
    3738              :     }
    3739              : 
    3740          893 :     LWLockRelease(SerializablePredicateListLock);
    3741          893 :     LWLockRelease(SerializableFinishedListLock);
    3742          893 : }
    3743              : 
    3744              : /*
    3745              :  * This is the normal way to delete anything from any of the predicate
    3746              :  * locking hash tables.  Given a transaction which we know can be deleted:
    3747              :  * delete all predicate locks held by that transaction and any predicate
    3748              :  * lock targets which are now unreferenced by a lock; delete all conflicts
    3749              :  * for the transaction; delete all xid values for the transaction; then
    3750              :  * delete the transaction.
    3751              :  *
    3752              :  * When the partial flag is set, we can release all predicate locks and
    3753              :  * in-conflict information -- we've established that there are no longer
    3754              :  * any overlapping read write transactions for which this transaction could
    3755              :  * matter -- but keep the transaction entry itself and any outConflicts.
    3756              :  *
    3757              :  * When the summarize flag is set, we've run short of room for sxact data
    3758              :  * and must summarize to the SLRU.  Predicate locks are transferred to a
    3759              :  * dummy "old" transaction, with duplicate locks on a single target
    3760              :  * collapsing to a single lock with the "latest" commitSeqNo from among
    3761              :  * the conflicting locks..
    3762              :  */
    3763              : static void
    3764         1588 : ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
    3765              :                            bool summarize)
    3766              : {
    3767              :     SERIALIZABLEXIDTAG sxidtag;
    3768              :     dlist_mutable_iter iter;
    3769              : 
    3770              :     Assert(sxact != NULL);
    3771              :     Assert(SxactIsRolledBack(sxact) || SxactIsCommitted(sxact));
    3772              :     Assert(partial || !SxactIsOnFinishedList(sxact));
    3773              :     Assert(LWLockHeldByMe(SerializableFinishedListLock));
    3774              : 
    3775              :     /*
    3776              :      * First release all the predicate locks held by this xact (or transfer
    3777              :      * them to OldCommittedSxact if summarize is true)
    3778              :      */
    3779         1588 :     LWLockAcquire(SerializablePredicateListLock, LW_SHARED);
    3780         1588 :     if (IsInParallelMode())
    3781            3 :         LWLockAcquire(&sxact->perXactPredicateListLock, LW_EXCLUSIVE);
    3782         4456 :     dlist_foreach_modify(iter, &sxact->predicateLocks)
    3783              :     {
    3784         2868 :         PREDICATELOCK *predlock =
    3785         2868 :             dlist_container(PREDICATELOCK, xactLink, iter.cur);
    3786              :         PREDICATELOCKTAG tag;
    3787              :         PREDICATELOCKTARGET *target;
    3788              :         PREDICATELOCKTARGETTAG targettag;
    3789              :         uint32      targettaghash;
    3790              :         LWLock     *partitionLock;
    3791              : 
    3792         2868 :         tag = predlock->tag;
    3793         2868 :         target = tag.myTarget;
    3794         2868 :         targettag = target->tag;
    3795         2868 :         targettaghash = PredicateLockTargetTagHashCode(&targettag);
    3796         2868 :         partitionLock = PredicateLockHashPartitionLock(targettaghash);
    3797              : 
    3798         2868 :         LWLockAcquire(partitionLock, LW_EXCLUSIVE);
    3799              : 
    3800         2868 :         dlist_delete(&predlock->targetLink);
    3801              : 
    3802         2868 :         hash_search_with_hash_value(PredicateLockHash, &tag,
    3803         2868 :                                     PredicateLockHashCodeFromTargetHashCode(&tag,
    3804              :                                                                             targettaghash),
    3805              :                                     HASH_REMOVE, NULL);
    3806         2868 :         if (summarize)
    3807              :         {
    3808              :             bool        found;
    3809              : 
    3810              :             /* Fold into dummy transaction list. */
    3811            0 :             tag.myXact = OldCommittedSxact;
    3812            0 :             predlock = hash_search_with_hash_value(PredicateLockHash, &tag,
    3813            0 :                                                    PredicateLockHashCodeFromTargetHashCode(&tag,
    3814              :                                                                                            targettaghash),
    3815              :                                                    HASH_ENTER_NULL, &found);
    3816            0 :             if (!predlock)
    3817            0 :                 ereport(ERROR,
    3818              :                         (errcode(ERRCODE_OUT_OF_MEMORY),
    3819              :                          errmsg("out of shared memory"),
    3820              :                          errhint("You might need to increase \"%s\".", "max_pred_locks_per_transaction")));
    3821            0 :             if (found)
    3822              :             {
    3823              :                 Assert(predlock->commitSeqNo != 0);
    3824              :                 Assert(predlock->commitSeqNo != InvalidSerCommitSeqNo);
    3825            0 :                 if (predlock->commitSeqNo < sxact->commitSeqNo)
    3826            0 :                     predlock->commitSeqNo = sxact->commitSeqNo;
    3827              :             }
    3828              :             else
    3829              :             {
    3830            0 :                 dlist_push_tail(&target->predicateLocks,
    3831              :                                 &predlock->targetLink);
    3832            0 :                 dlist_push_tail(&OldCommittedSxact->predicateLocks,
    3833              :                                 &predlock->xactLink);
    3834            0 :                 predlock->commitSeqNo = sxact->commitSeqNo;
    3835              :             }
    3836              :         }
    3837              :         else
    3838         2868 :             RemoveTargetIfNoLongerUsed(target, targettaghash);
    3839              : 
    3840         2868 :         LWLockRelease(partitionLock);
    3841              :     }
    3842              : 
    3843              :     /*
    3844              :      * Rather than retail removal, just re-init the head after we've run
    3845              :      * through the list.
    3846              :      */
    3847         1588 :     dlist_init(&sxact->predicateLocks);
    3848              : 
    3849         1588 :     if (IsInParallelMode())
    3850            3 :         LWLockRelease(&sxact->perXactPredicateListLock);
    3851         1588 :     LWLockRelease(SerializablePredicateListLock);
    3852              : 
    3853         1588 :     sxidtag.xid = sxact->topXid;
    3854         1588 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    3855              : 
    3856              :     /* Release all outConflicts (unless 'partial' is true) */
    3857         1588 :     if (!partial)
    3858              :     {
    3859         1577 :         dlist_foreach_modify(iter, &sxact->outConflicts)
    3860              :         {
    3861            0 :             RWConflict  conflict =
    3862              :                 dlist_container(RWConflictData, outLink, iter.cur);
    3863              : 
    3864            0 :             if (summarize)
    3865            0 :                 conflict->sxactIn->flags |= SXACT_FLAG_SUMMARY_CONFLICT_IN;
    3866            0 :             ReleaseRWConflict(conflict);
    3867              :         }
    3868              :     }
    3869              : 
    3870              :     /* Release all inConflicts. */
    3871         1588 :     dlist_foreach_modify(iter, &sxact->inConflicts)
    3872              :     {
    3873            0 :         RWConflict  conflict =
    3874            0 :             dlist_container(RWConflictData, inLink, iter.cur);
    3875              : 
    3876            0 :         if (summarize)
    3877            0 :             conflict->sxactOut->flags |= SXACT_FLAG_SUMMARY_CONFLICT_OUT;
    3878            0 :         ReleaseRWConflict(conflict);
    3879              :     }
    3880              : 
    3881              :     /* Finally, get rid of the xid and the record of the transaction itself. */
    3882         1588 :     if (!partial)
    3883              :     {
    3884         1577 :         if (sxidtag.xid != InvalidTransactionId)
    3885         1302 :             hash_search(SerializableXidHash, &sxidtag, HASH_REMOVE, NULL);
    3886         1577 :         ReleasePredXact(sxact);
    3887              :     }
    3888              : 
    3889         1588 :     LWLockRelease(SerializableXactHashLock);
    3890         1588 : }
    3891              : 
    3892              : /*
    3893              :  * Tests whether the given top level transaction is concurrent with
    3894              :  * (overlaps) our current transaction.
    3895              :  *
    3896              :  * We need to identify the top level transaction for SSI, anyway, so pass
    3897              :  * that to this function to save the overhead of checking the snapshot's
    3898              :  * subxip array.
    3899              :  */
    3900              : static bool
    3901          536 : XidIsConcurrent(TransactionId xid)
    3902              : {
    3903              :     Snapshot    snap;
    3904              : 
    3905              :     Assert(TransactionIdIsValid(xid));
    3906              :     Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
    3907              : 
    3908          536 :     snap = GetTransactionSnapshot();
    3909              : 
    3910          536 :     if (TransactionIdPrecedes(xid, snap->xmin))
    3911            0 :         return false;
    3912              : 
    3913          536 :     if (TransactionIdFollowsOrEquals(xid, snap->xmax))
    3914          524 :         return true;
    3915              : 
    3916           12 :     return pg_lfind32(xid, snap->xip, snap->xcnt);
    3917              : }
    3918              : 
    3919              : bool
    3920     45881407 : CheckForSerializableConflictOutNeeded(Relation relation, Snapshot snapshot)
    3921              : {
    3922     45881407 :     if (!SerializationNeededForRead(relation, snapshot))
    3923     45872831 :         return false;
    3924              : 
    3925              :     /* Check if someone else has already decided that we need to die */
    3926         8576 :     if (SxactIsDoomed(MySerializableXact))
    3927              :     {
    3928            0 :         ereport(ERROR,
    3929              :                 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    3930              :                  errmsg("could not serialize access due to read/write dependencies among transactions"),
    3931              :                  errdetail_internal("Reason code: Canceled on identification as a pivot, during conflict out checking."),
    3932              :                  errhint("The transaction might succeed if retried.")));
    3933              :     }
    3934              : 
    3935         8576 :     return true;
    3936              : }
    3937              : 
    3938              : /*
    3939              :  * CheckForSerializableConflictOut
    3940              :  *      A table AM is reading a tuple that has been modified.  If it determines
    3941              :  *      that the tuple version it is reading is not visible to us, it should
    3942              :  *      pass in the top level xid of the transaction that created it.
    3943              :  *      Otherwise, if it determines that it is visible to us but it has been
    3944              :  *      deleted or there is a newer version available due to an update, it
    3945              :  *      should pass in the top level xid of the modifying transaction.
    3946              :  *
    3947              :  * This function will check for overlap with our own transaction.  If the given
    3948              :  * xid is also serializable and the transactions overlap (i.e., they cannot see
    3949              :  * each other's writes), then we have a conflict out.
    3950              :  */
    3951              : void
    3952          567 : CheckForSerializableConflictOut(Relation relation, TransactionId xid, Snapshot snapshot)
    3953              : {
    3954              :     SERIALIZABLEXIDTAG sxidtag;
    3955              :     SERIALIZABLEXID *sxid;
    3956              :     SERIALIZABLEXACT *sxact;
    3957              : 
    3958          567 :     if (!SerializationNeededForRead(relation, snapshot))
    3959          200 :         return;
    3960              : 
    3961              :     /* Check if someone else has already decided that we need to die */
    3962          567 :     if (SxactIsDoomed(MySerializableXact))
    3963              :     {
    3964            0 :         ereport(ERROR,
    3965              :                 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    3966              :                  errmsg("could not serialize access due to read/write dependencies among transactions"),
    3967              :                  errdetail_internal("Reason code: Canceled on identification as a pivot, during conflict out checking."),
    3968              :                  errhint("The transaction might succeed if retried.")));
    3969              :     }
    3970              :     Assert(TransactionIdIsValid(xid));
    3971              : 
    3972          567 :     if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
    3973            0 :         return;
    3974              : 
    3975              :     /*
    3976              :      * Find sxact or summarized info for the top level xid.
    3977              :      */
    3978          567 :     sxidtag.xid = xid;
    3979          567 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    3980              :     sxid = (SERIALIZABLEXID *)
    3981          567 :         hash_search(SerializableXidHash, &sxidtag, HASH_FIND, NULL);
    3982          567 :     if (!sxid)
    3983              :     {
    3984              :         /*
    3985              :          * Transaction not found in "normal" SSI structures.  Check whether it
    3986              :          * got pushed out to SLRU storage for "old committed" transactions.
    3987              :          */
    3988              :         SerCommitSeqNo conflictCommitSeqNo;
    3989              : 
    3990           21 :         conflictCommitSeqNo = SerialGetMinConflictCommitSeqNo(xid);
    3991           21 :         if (conflictCommitSeqNo != 0)
    3992              :         {
    3993            0 :             if (conflictCommitSeqNo != InvalidSerCommitSeqNo
    3994            0 :                 && (!SxactIsReadOnly(MySerializableXact)
    3995            0 :                     || conflictCommitSeqNo
    3996            0 :                     <= MySerializableXact->SeqNo.lastCommitBeforeSnapshot))
    3997            0 :                 ereport(ERROR,
    3998              :                         (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    3999              :                          errmsg("could not serialize access due to read/write dependencies among transactions"),
    4000              :                          errdetail_internal("Reason code: Canceled on conflict out to old pivot %u.", xid),
    4001              :                          errhint("The transaction might succeed if retried.")));
    4002              : 
    4003            0 :             if (SxactHasSummaryConflictIn(MySerializableXact)
    4004            0 :                 || !dlist_is_empty(&MySerializableXact->inConflicts))
    4005            0 :                 ereport(ERROR,
    4006              :                         (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    4007              :                          errmsg("could not serialize access due to read/write dependencies among transactions"),
    4008              :                          errdetail_internal("Reason code: Canceled on identification as a pivot, with conflict out to old committed transaction %u.", xid),
    4009              :                          errhint("The transaction might succeed if retried.")));
    4010              : 
    4011            0 :             MySerializableXact->flags |= SXACT_FLAG_SUMMARY_CONFLICT_OUT;
    4012              :         }
    4013              : 
    4014              :         /* It's not serializable or otherwise not important. */
    4015           21 :         LWLockRelease(SerializableXactHashLock);
    4016           21 :         return;
    4017              :     }
    4018          546 :     sxact = sxid->myXact;
    4019              :     Assert(TransactionIdEquals(sxact->topXid, xid));
    4020          546 :     if (sxact == MySerializableXact || SxactIsDoomed(sxact))
    4021              :     {
    4022              :         /* Can't conflict with ourself or a transaction that will roll back. */
    4023            4 :         LWLockRelease(SerializableXactHashLock);
    4024            4 :         return;
    4025              :     }
    4026              : 
    4027              :     /*
    4028              :      * We have a conflict out to a transaction which has a conflict out to a
    4029              :      * summarized transaction.  That summarized transaction must have
    4030              :      * committed first, and we can't tell when it committed in relation to our
    4031              :      * snapshot acquisition, so something needs to be canceled.
    4032              :      */
    4033          542 :     if (SxactHasSummaryConflictOut(sxact))
    4034              :     {
    4035            0 :         if (!SxactIsPrepared(sxact))
    4036              :         {
    4037            0 :             sxact->flags |= SXACT_FLAG_DOOMED;
    4038            0 :             LWLockRelease(SerializableXactHashLock);
    4039            0 :             return;
    4040              :         }
    4041              :         else
    4042              :         {
    4043            0 :             LWLockRelease(SerializableXactHashLock);
    4044            0 :             ereport(ERROR,
    4045              :                     (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    4046              :                      errmsg("could not serialize access due to read/write dependencies among transactions"),
    4047              :                      errdetail_internal("Reason code: Canceled on conflict out to old pivot."),
    4048              :                      errhint("The transaction might succeed if retried.")));
    4049              :         }
    4050              :     }
    4051              : 
    4052              :     /*
    4053              :      * If this is a read-only transaction and the writing transaction has
    4054              :      * committed, and it doesn't have a rw-conflict to a transaction which
    4055              :      * committed before it, no conflict.
    4056              :      */
    4057          542 :     if (SxactIsReadOnly(MySerializableXact)
    4058          119 :         && SxactIsCommitted(sxact)
    4059            8 :         && !SxactHasSummaryConflictOut(sxact)
    4060            8 :         && (!SxactHasConflictOut(sxact)
    4061            2 :             || MySerializableXact->SeqNo.lastCommitBeforeSnapshot < sxact->SeqNo.earliestOutConflictCommit))
    4062              :     {
    4063              :         /* Read-only transaction will appear to run first.  No conflict. */
    4064            6 :         LWLockRelease(SerializableXactHashLock);
    4065            6 :         return;
    4066              :     }
    4067              : 
    4068          536 :     if (!XidIsConcurrent(xid))
    4069              :     {
    4070              :         /* This write was already in our snapshot; no conflict. */
    4071            0 :         LWLockRelease(SerializableXactHashLock);
    4072            0 :         return;
    4073              :     }
    4074              : 
    4075          536 :     if (RWConflictExists(MySerializableXact, sxact))
    4076              :     {
    4077              :         /* We don't want duplicate conflict records in the list. */
    4078          169 :         LWLockRelease(SerializableXactHashLock);
    4079          169 :         return;
    4080              :     }
    4081              : 
    4082              :     /*
    4083              :      * Flag the conflict.  But first, if this conflict creates a dangerous
    4084              :      * structure, ereport an error.
    4085              :      */
    4086          367 :     FlagRWConflict(MySerializableXact, sxact);
    4087          354 :     LWLockRelease(SerializableXactHashLock);
    4088              : }
    4089              : 
    4090              : /*
    4091              :  * Check a particular target for rw-dependency conflict in. A subroutine of
    4092              :  * CheckForSerializableConflictIn().
    4093              :  */
    4094              : static void
    4095         7616 : CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
    4096              : {
    4097              :     uint32      targettaghash;
    4098              :     LWLock     *partitionLock;
    4099              :     PREDICATELOCKTARGET *target;
    4100         7616 :     PREDICATELOCK *mypredlock = NULL;
    4101              :     PREDICATELOCKTAG mypredlocktag;
    4102              :     dlist_mutable_iter iter;
    4103              : 
    4104              :     Assert(MySerializableXact != InvalidSerializableXact);
    4105              : 
    4106              :     /*
    4107              :      * The same hash and LW lock apply to the lock target and the lock itself.
    4108              :      */
    4109         7616 :     targettaghash = PredicateLockTargetTagHashCode(targettag);
    4110         7616 :     partitionLock = PredicateLockHashPartitionLock(targettaghash);
    4111         7616 :     LWLockAcquire(partitionLock, LW_SHARED);
    4112              :     target = (PREDICATELOCKTARGET *)
    4113         7616 :         hash_search_with_hash_value(PredicateLockTargetHash,
    4114              :                                     targettag, targettaghash,
    4115              :                                     HASH_FIND, NULL);
    4116         7616 :     if (!target)
    4117              :     {
    4118              :         /* Nothing has this target locked; we're done here. */
    4119         5711 :         LWLockRelease(partitionLock);
    4120         5711 :         return;
    4121              :     }
    4122              : 
    4123              :     /*
    4124              :      * Each lock for an overlapping transaction represents a conflict: a
    4125              :      * rw-dependency in to this transaction.
    4126              :      */
    4127         1905 :     LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    4128              : 
    4129         4290 :     dlist_foreach_modify(iter, &target->predicateLocks)
    4130              :     {
    4131         2452 :         PREDICATELOCK *predlock =
    4132         2452 :             dlist_container(PREDICATELOCK, targetLink, iter.cur);
    4133         2452 :         SERIALIZABLEXACT *sxact = predlock->tag.myXact;
    4134              : 
    4135         2452 :         if (sxact == MySerializableXact)
    4136              :         {
    4137              :             /*
    4138              :              * If we're getting a write lock on a tuple, we don't need a
    4139              :              * predicate (SIREAD) lock on the same tuple. We can safely remove
    4140              :              * our SIREAD lock, but we'll defer doing so until after the loop
    4141              :              * because that requires upgrading to an exclusive partition lock.
    4142              :              *
    4143              :              * We can't use this optimization within a subtransaction because
    4144              :              * the subtransaction could roll back, and we would be left
    4145              :              * without any lock at the top level.
    4146              :              */
    4147         1594 :             if (!IsSubTransaction()
    4148         1594 :                 && GET_PREDICATELOCKTARGETTAG_OFFSET(*targettag))
    4149              :             {
    4150          401 :                 mypredlock = predlock;
    4151          401 :                 mypredlocktag = predlock->tag;
    4152              :             }
    4153              :         }
    4154          858 :         else if (!SxactIsDoomed(sxact)
    4155          858 :                  && (!SxactIsCommitted(sxact)
    4156           88 :                      || TransactionIdPrecedes(GetTransactionSnapshot()->xmin,
    4157              :                                               sxact->finishedBefore))
    4158          849 :                  && !RWConflictExists(sxact, MySerializableXact))
    4159              :         {
    4160          505 :             LWLockRelease(SerializableXactHashLock);
    4161          505 :             LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    4162              : 
    4163              :             /*
    4164              :              * Re-check after getting exclusive lock because the other
    4165              :              * transaction may have flagged a conflict.
    4166              :              */
    4167          505 :             if (!SxactIsDoomed(sxact)
    4168          505 :                 && (!SxactIsCommitted(sxact)
    4169           77 :                     || TransactionIdPrecedes(GetTransactionSnapshot()->xmin,
    4170              :                                              sxact->finishedBefore))
    4171          505 :                 && !RWConflictExists(sxact, MySerializableXact))
    4172              :             {
    4173          505 :                 FlagRWConflict(sxact, MySerializableXact);
    4174              :             }
    4175              : 
    4176          438 :             LWLockRelease(SerializableXactHashLock);
    4177          438 :             LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    4178              :         }
    4179              :     }
    4180         1838 :     LWLockRelease(SerializableXactHashLock);
    4181         1838 :     LWLockRelease(partitionLock);
    4182              : 
    4183              :     /*
    4184              :      * If we found one of our own SIREAD locks to remove, remove it now.
    4185              :      *
    4186              :      * At this point our transaction already has a RowExclusiveLock on the
    4187              :      * relation, so we are OK to drop the predicate lock on the tuple, if
    4188              :      * found, without fearing that another write against the tuple will occur
    4189              :      * before the MVCC information makes it to the buffer.
    4190              :      */
    4191         1838 :     if (mypredlock != NULL)
    4192              :     {
    4193              :         uint32      predlockhashcode;
    4194              :         PREDICATELOCK *rmpredlock;
    4195              : 
    4196          394 :         LWLockAcquire(SerializablePredicateListLock, LW_SHARED);
    4197          394 :         if (IsInParallelMode())
    4198            0 :             LWLockAcquire(&MySerializableXact->perXactPredicateListLock, LW_EXCLUSIVE);
    4199          394 :         LWLockAcquire(partitionLock, LW_EXCLUSIVE);
    4200          394 :         LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    4201              : 
    4202              :         /*
    4203              :          * Remove the predicate lock from shared memory, if it wasn't removed
    4204              :          * while the locks were released.  One way that could happen is from
    4205              :          * autovacuum cleaning up an index.
    4206              :          */
    4207          394 :         predlockhashcode = PredicateLockHashCodeFromTargetHashCode
    4208              :             (&mypredlocktag, targettaghash);
    4209              :         rmpredlock = (PREDICATELOCK *)
    4210          394 :             hash_search_with_hash_value(PredicateLockHash,
    4211              :                                         &mypredlocktag,
    4212              :                                         predlockhashcode,
    4213              :                                         HASH_FIND, NULL);
    4214          394 :         if (rmpredlock != NULL)
    4215              :         {
    4216              :             Assert(rmpredlock == mypredlock);
    4217              : 
    4218          394 :             dlist_delete(&(mypredlock->targetLink));
    4219          394 :             dlist_delete(&(mypredlock->xactLink));
    4220              : 
    4221              :             rmpredlock = (PREDICATELOCK *)
    4222          394 :                 hash_search_with_hash_value(PredicateLockHash,
    4223              :                                             &mypredlocktag,
    4224              :                                             predlockhashcode,
    4225              :                                             HASH_REMOVE, NULL);
    4226              :             Assert(rmpredlock == mypredlock);
    4227              : 
    4228          394 :             RemoveTargetIfNoLongerUsed(target, targettaghash);
    4229              :         }
    4230              : 
    4231          394 :         LWLockRelease(SerializableXactHashLock);
    4232          394 :         LWLockRelease(partitionLock);
    4233          394 :         if (IsInParallelMode())
    4234            0 :             LWLockRelease(&MySerializableXact->perXactPredicateListLock);
    4235          394 :         LWLockRelease(SerializablePredicateListLock);
    4236              : 
    4237          394 :         if (rmpredlock != NULL)
    4238              :         {
    4239              :             /*
    4240              :              * Remove entry in local lock table if it exists. It's OK if it
    4241              :              * doesn't exist; that means the lock was transferred to a new
    4242              :              * target by a different backend.
    4243              :              */
    4244          394 :             hash_search_with_hash_value(LocalPredicateLockHash,
    4245              :                                         targettag, targettaghash,
    4246              :                                         HASH_REMOVE, NULL);
    4247              : 
    4248          394 :             DecrementParentLocks(targettag);
    4249              :         }
    4250              :     }
    4251              : }
    4252              : 
    4253              : /*
    4254              :  * CheckForSerializableConflictIn
    4255              :  *      We are writing the given tuple.  If that indicates a rw-conflict
    4256              :  *      in from another serializable transaction, take appropriate action.
    4257              :  *
    4258              :  * Skip checking for any granularity for which a parameter is missing.
    4259              :  *
    4260              :  * A tuple update or delete is in conflict if we have a predicate lock
    4261              :  * against the relation or page in which the tuple exists, or against the
    4262              :  * tuple itself.
    4263              :  */
    4264              : void
    4265     24569114 : CheckForSerializableConflictIn(Relation relation, const ItemPointerData *tid, BlockNumber blkno)
    4266              : {
    4267              :     PREDICATELOCKTARGETTAG targettag;
    4268              : 
    4269     24569114 :     if (!SerializationNeededForWrite(relation))
    4270     24564608 :         return;
    4271              : 
    4272              :     /* Check if someone else has already decided that we need to die */
    4273         4506 :     if (SxactIsDoomed(MySerializableXact))
    4274            1 :         ereport(ERROR,
    4275              :                 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    4276              :                  errmsg("could not serialize access due to read/write dependencies among transactions"),
    4277              :                  errdetail_internal("Reason code: Canceled on identification as a pivot, during conflict in checking."),
    4278              :                  errhint("The transaction might succeed if retried.")));
    4279              : 
    4280              :     /*
    4281              :      * We're doing a write which might cause rw-conflicts now or later.
    4282              :      * Memorize that fact.
    4283              :      */
    4284         4505 :     MyXactDidWrite = true;
    4285              : 
    4286              :     /*
    4287              :      * It is important that we check for locks from the finest granularity to
    4288              :      * the coarsest granularity, so that granularity promotion doesn't cause
    4289              :      * us to miss a lock.  The new (coarser) lock will be acquired before the
    4290              :      * old (finer) locks are released.
    4291              :      *
    4292              :      * It is not possible to take and hold a lock across the checks for all
    4293              :      * granularities because each target could be in a separate partition.
    4294              :      */
    4295         4505 :     if (tid != NULL)
    4296              :     {
    4297          657 :         SET_PREDICATELOCKTARGETTAG_TUPLE(targettag,
    4298              :                                          relation->rd_locator.dbOid,
    4299              :                                          relation->rd_id,
    4300              :                                          ItemPointerGetBlockNumber(tid),
    4301              :                                          ItemPointerGetOffsetNumber(tid));
    4302          657 :         CheckTargetForConflictsIn(&targettag);
    4303              :     }
    4304              : 
    4305         4482 :     if (blkno != InvalidBlockNumber)
    4306              :     {
    4307         2507 :         SET_PREDICATELOCKTARGETTAG_PAGE(targettag,
    4308              :                                         relation->rd_locator.dbOid,
    4309              :                                         relation->rd_id,
    4310              :                                         blkno);
    4311         2507 :         CheckTargetForConflictsIn(&targettag);
    4312              :     }
    4313              : 
    4314         4452 :     SET_PREDICATELOCKTARGETTAG_RELATION(targettag,
    4315              :                                         relation->rd_locator.dbOid,
    4316              :                                         relation->rd_id);
    4317         4452 :     CheckTargetForConflictsIn(&targettag);
    4318              : }
    4319              : 
    4320              : /*
    4321              :  * CheckTableForSerializableConflictIn
    4322              :  *      The entire table is going through a DDL-style logical mass delete
    4323              :  *      like TRUNCATE or DROP TABLE.  If that causes a rw-conflict in from
    4324              :  *      another serializable transaction, take appropriate action.
    4325              :  *
    4326              :  * While these operations do not operate entirely within the bounds of
    4327              :  * snapshot isolation, they can occur inside a serializable transaction, and
    4328              :  * will logically occur after any reads which saw rows which were destroyed
    4329              :  * by these operations, so we do what we can to serialize properly under
    4330              :  * SSI.
    4331              :  *
    4332              :  * The relation passed in must be a heap relation. Any predicate lock of any
    4333              :  * granularity on the heap will cause a rw-conflict in to this transaction.
    4334              :  * Predicate locks on indexes do not matter because they only exist to guard
    4335              :  * against conflicting inserts into the index, and this is a mass *delete*.
    4336              :  * When a table is truncated or dropped, the index will also be truncated
    4337              :  * or dropped, and we'll deal with locks on the index when that happens.
    4338              :  *
    4339              :  * Dropping or truncating a table also needs to drop any existing predicate
    4340              :  * locks on heap tuples or pages, because they're about to go away. This
    4341              :  * should be done before altering the predicate locks because the transaction
    4342              :  * could be rolled back because of a conflict, in which case the lock changes
    4343              :  * are not needed. (At the moment, we don't actually bother to drop the
    4344              :  * existing locks on a dropped or truncated table at the moment. That might
    4345              :  * lead to some false positives, but it doesn't seem worth the trouble.)
    4346              :  */
    4347              : void
    4348        35333 : CheckTableForSerializableConflictIn(Relation relation)
    4349              : {
    4350              :     HASH_SEQ_STATUS seqstat;
    4351              :     PREDICATELOCKTARGET *target;
    4352              :     Oid         dbId;
    4353              :     Oid         heapId;
    4354              :     int         i;
    4355              : 
    4356              :     /*
    4357              :      * Bail out quickly if there are no serializable transactions running.
    4358              :      * It's safe to check this without taking locks because the caller is
    4359              :      * holding an ACCESS EXCLUSIVE lock on the relation.  No new locks which
    4360              :      * would matter here can be acquired while that is held.
    4361              :      */
    4362        35333 :     if (!TransactionIdIsValid(PredXact->SxactGlobalXmin))
    4363        35313 :         return;
    4364              : 
    4365          133 :     if (!SerializationNeededForWrite(relation))
    4366          113 :         return;
    4367              : 
    4368              :     /*
    4369              :      * We're doing a write which might cause rw-conflicts now or later.
    4370              :      * Memorize that fact.
    4371              :      */
    4372           20 :     MyXactDidWrite = true;
    4373              : 
    4374              :     Assert(relation->rd_index == NULL); /* not an index relation */
    4375              : 
    4376           20 :     dbId = relation->rd_locator.dbOid;
    4377           20 :     heapId = relation->rd_id;
    4378              : 
    4379           20 :     LWLockAcquire(SerializablePredicateListLock, LW_EXCLUSIVE);
    4380          340 :     for (i = 0; i < NUM_PREDICATELOCK_PARTITIONS; i++)
    4381          320 :         LWLockAcquire(PredicateLockHashPartitionLockByIndex(i), LW_SHARED);
    4382           20 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    4383              : 
    4384              :     /* Scan through target list */
    4385           20 :     hash_seq_init(&seqstat, PredicateLockTargetHash);
    4386              : 
    4387           70 :     while ((target = (PREDICATELOCKTARGET *) hash_seq_search(&seqstat)))
    4388              :     {
    4389              :         dlist_mutable_iter iter;
    4390              : 
    4391              :         /*
    4392              :          * Check whether this is a target which needs attention.
    4393              :          */
    4394           50 :         if (GET_PREDICATELOCKTARGETTAG_RELATION(target->tag) != heapId)
    4395           41 :             continue;           /* wrong relation id */
    4396            9 :         if (GET_PREDICATELOCKTARGETTAG_DB(target->tag) != dbId)
    4397            0 :             continue;           /* wrong database id */
    4398              : 
    4399              :         /*
    4400              :          * Loop through locks for this target and flag conflicts.
    4401              :          */
    4402           18 :         dlist_foreach_modify(iter, &target->predicateLocks)
    4403              :         {
    4404            9 :             PREDICATELOCK *predlock =
    4405            9 :                 dlist_container(PREDICATELOCK, targetLink, iter.cur);
    4406              : 
    4407            9 :             if (predlock->tag.myXact != MySerializableXact
    4408            0 :                 && !RWConflictExists(predlock->tag.myXact, MySerializableXact))
    4409              :             {
    4410            0 :                 FlagRWConflict(predlock->tag.myXact, MySerializableXact);
    4411              :             }
    4412              :         }
    4413              :     }
    4414              : 
    4415              :     /* Release locks in reverse order */
    4416           20 :     LWLockRelease(SerializableXactHashLock);
    4417          340 :     for (i = NUM_PREDICATELOCK_PARTITIONS - 1; i >= 0; i--)
    4418          320 :         LWLockRelease(PredicateLockHashPartitionLockByIndex(i));
    4419           20 :     LWLockRelease(SerializablePredicateListLock);
    4420              : }
    4421              : 
    4422              : 
    4423              : /*
    4424              :  * Flag a rw-dependency between two serializable transactions.
    4425              :  *
    4426              :  * The caller is responsible for ensuring that we have a LW lock on
    4427              :  * the transaction hash table.
    4428              :  */
    4429              : static void
    4430          872 : FlagRWConflict(SERIALIZABLEXACT *reader, SERIALIZABLEXACT *writer)
    4431              : {
    4432              :     Assert(reader != writer);
    4433              : 
    4434              :     /* First, see if this conflict causes failure. */
    4435          872 :     OnConflict_CheckForSerializationFailure(reader, writer);
    4436              : 
    4437              :     /* Actually do the conflict flagging. */
    4438          792 :     if (reader == OldCommittedSxact)
    4439            0 :         writer->flags |= SXACT_FLAG_SUMMARY_CONFLICT_IN;
    4440          792 :     else if (writer == OldCommittedSxact)
    4441            0 :         reader->flags |= SXACT_FLAG_SUMMARY_CONFLICT_OUT;
    4442              :     else
    4443          792 :         SetRWConflict(reader, writer);
    4444          792 : }
    4445              : 
    4446              : /*----------------------------------------------------------------------------
    4447              :  * We are about to add a RW-edge to the dependency graph - check that we don't
    4448              :  * introduce a dangerous structure by doing so, and abort one of the
    4449              :  * transactions if so.
    4450              :  *
    4451              :  * A serialization failure can only occur if there is a dangerous structure
    4452              :  * in the dependency graph:
    4453              :  *
    4454              :  *      Tin ------> Tpivot ------> Tout
    4455              :  *            rw             rw
    4456              :  *
    4457              :  * Furthermore, Tout must commit first.
    4458              :  *
    4459              :  * One more optimization is that if Tin is declared READ ONLY (or commits
    4460              :  * without writing), we can only have a problem if Tout committed before Tin
    4461              :  * acquired its snapshot.
    4462              :  *----------------------------------------------------------------------------
    4463              :  */
    4464              : static void
    4465          872 : OnConflict_CheckForSerializationFailure(const SERIALIZABLEXACT *reader,
    4466              :                                         SERIALIZABLEXACT *writer)
    4467              : {
    4468              :     bool        failure;
    4469              : 
    4470              :     Assert(LWLockHeldByMe(SerializableXactHashLock));
    4471              : 
    4472          872 :     failure = false;
    4473              : 
    4474              :     /*------------------------------------------------------------------------
    4475              :      * Check for already-committed writer with rw-conflict out flagged
    4476              :      * (conflict-flag on W means that T2 committed before W):
    4477              :      *
    4478              :      *      R ------> W ------> T2
    4479              :      *          rw        rw
    4480              :      *
    4481              :      * That is a dangerous structure, so we must abort. (Since the writer
    4482              :      * has already committed, we must be the reader)
    4483              :      *------------------------------------------------------------------------
    4484              :      */
    4485          872 :     if (SxactIsCommitted(writer)
    4486           18 :         && (SxactHasConflictOut(writer) || SxactHasSummaryConflictOut(writer)))
    4487            2 :         failure = true;
    4488              : 
    4489              :     /*------------------------------------------------------------------------
    4490              :      * Check whether the writer has become a pivot with an out-conflict
    4491              :      * committed transaction (T2), and T2 committed first:
    4492              :      *
    4493              :      *      R ------> W ------> T2
    4494              :      *          rw        rw
    4495              :      *
    4496              :      * Because T2 must've committed first, there is no anomaly if:
    4497              :      * - the reader committed before T2
    4498              :      * - the writer committed before T2
    4499              :      * - the reader is a READ ONLY transaction and the reader was concurrent
    4500              :      *   with T2 (= reader acquired its snapshot before T2 committed)
    4501              :      *
    4502              :      * We also handle the case that T2 is prepared but not yet committed
    4503              :      * here. In that case T2 has already checked for conflicts, so if it
    4504              :      * commits first, making the above conflict real, it's too late for it
    4505              :      * to abort.
    4506              :      *------------------------------------------------------------------------
    4507              :      */
    4508          872 :     if (!failure && SxactHasSummaryConflictOut(writer))
    4509            0 :         failure = true;
    4510          872 :     else if (!failure)
    4511              :     {
    4512              :         dlist_iter  iter;
    4513              : 
    4514         1087 :         dlist_foreach(iter, &writer->outConflicts)
    4515              :         {
    4516          292 :             RWConflict  conflict =
    4517              :                 dlist_container(RWConflictData, outLink, iter.cur);
    4518          292 :             SERIALIZABLEXACT *t2 = conflict->sxactIn;
    4519              : 
    4520          292 :             if (SxactIsPrepared(t2)
    4521           81 :                 && (!SxactIsCommitted(reader)
    4522           65 :                     || t2->prepareSeqNo <= reader->commitSeqNo)
    4523           81 :                 && (!SxactIsCommitted(writer)
    4524            0 :                     || t2->prepareSeqNo <= writer->commitSeqNo)
    4525           81 :                 && (!SxactIsReadOnly(reader)
    4526           12 :                     || t2->prepareSeqNo <= reader->SeqNo.lastCommitBeforeSnapshot))
    4527              :             {
    4528           75 :                 failure = true;
    4529           75 :                 break;
    4530              :             }
    4531              :         }
    4532              :     }
    4533              : 
    4534              :     /*------------------------------------------------------------------------
    4535              :      * Check whether the reader has become a pivot with a writer
    4536              :      * that's committed (or prepared):
    4537              :      *
    4538              :      *      T0 ------> R ------> W
    4539              :      *           rw        rw
    4540              :      *
    4541              :      * Because W must've committed first for an anomaly to occur, there is no
    4542              :      * anomaly if:
    4543              :      * - T0 committed before the writer
    4544              :      * - T0 is READ ONLY, and overlaps the writer
    4545              :      *------------------------------------------------------------------------
    4546              :      */
    4547          872 :     if (!failure && SxactIsPrepared(writer) && !SxactIsReadOnly(reader))
    4548              :     {
    4549           18 :         if (SxactHasSummaryConflictIn(reader))
    4550              :         {
    4551            0 :             failure = true;
    4552              :         }
    4553              :         else
    4554              :         {
    4555              :             dlist_iter  iter;
    4556              : 
    4557              :             /*
    4558              :              * The unconstify is needed as we have no const version of
    4559              :              * dlist_foreach().
    4560              :              */
    4561           18 :             dlist_foreach(iter, &unconstify(SERIALIZABLEXACT *, reader)->inConflicts)
    4562              :             {
    4563           11 :                 const RWConflict conflict =
    4564           11 :                     dlist_container(RWConflictData, inLink, iter.cur);
    4565           11 :                 const SERIALIZABLEXACT *t0 = conflict->sxactOut;
    4566              : 
    4567           11 :                 if (!SxactIsDoomed(t0)
    4568           11 :                     && (!SxactIsCommitted(t0)
    4569           11 :                         || t0->commitSeqNo >= writer->prepareSeqNo)
    4570           11 :                     && (!SxactIsReadOnly(t0)
    4571            0 :                         || t0->SeqNo.lastCommitBeforeSnapshot >= writer->prepareSeqNo))
    4572              :                 {
    4573           11 :                     failure = true;
    4574           11 :                     break;
    4575              :                 }
    4576              :             }
    4577              :         }
    4578              :     }
    4579              : 
    4580          872 :     if (failure)
    4581              :     {
    4582              :         /*
    4583              :          * We have to kill a transaction to avoid a possible anomaly from
    4584              :          * occurring. If the writer is us, we can just ereport() to cause a
    4585              :          * transaction abort. Otherwise we flag the writer for termination,
    4586              :          * causing it to abort when it tries to commit. However, if the writer
    4587              :          * is a prepared transaction, already prepared, we can't abort it
    4588              :          * anymore, so we have to kill the reader instead.
    4589              :          */
    4590           88 :         if (MySerializableXact == writer)
    4591              :         {
    4592           67 :             LWLockRelease(SerializableXactHashLock);
    4593           67 :             ereport(ERROR,
    4594              :                     (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    4595              :                      errmsg("could not serialize access due to read/write dependencies among transactions"),
    4596              :                      errdetail_internal("Reason code: Canceled on identification as a pivot, during write."),
    4597              :                      errhint("The transaction might succeed if retried.")));
    4598              :         }
    4599           21 :         else if (SxactIsPrepared(writer))
    4600              :         {
    4601           13 :             LWLockRelease(SerializableXactHashLock);
    4602              : 
    4603              :             /* if we're not the writer, we have to be the reader */
    4604              :             Assert(MySerializableXact == reader);
    4605           13 :             ereport(ERROR,
    4606              :                     (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    4607              :                      errmsg("could not serialize access due to read/write dependencies among transactions"),
    4608              :                      errdetail_internal("Reason code: Canceled on conflict out to pivot %u, during read.", writer->topXid),
    4609              :                      errhint("The transaction might succeed if retried.")));
    4610              :         }
    4611            8 :         writer->flags |= SXACT_FLAG_DOOMED;
    4612              :     }
    4613          792 : }
    4614              : 
    4615              : /*
    4616              :  * PreCommit_CheckForSerializationFailure
    4617              :  *      Check for dangerous structures in a serializable transaction
    4618              :  *      at commit.
    4619              :  *
    4620              :  * We're checking for a dangerous structure as each conflict is recorded.
    4621              :  * The only way we could have a problem at commit is if this is the "out"
    4622              :  * side of a pivot, and neither the "in" side nor the pivot has yet
    4623              :  * committed.
    4624              :  *
    4625              :  * If a dangerous structure is found, the pivot (the near conflict) is
    4626              :  * marked for death, because rolling back another transaction might mean
    4627              :  * that we fail without ever making progress.  This transaction is
    4628              :  * committing writes, so letting it commit ensures progress.  If we
    4629              :  * canceled the far conflict, it might immediately fail again on retry.
    4630              :  */
    4631              : void
    4632       592094 : PreCommit_CheckForSerializationFailure(void)
    4633              : {
    4634              :     dlist_iter  near_iter;
    4635              : 
    4636       592094 :     if (MySerializableXact == InvalidSerializableXact)
    4637       590682 :         return;
    4638              : 
    4639              :     Assert(IsolationIsSerializable());
    4640              : 
    4641         1412 :     LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    4642              : 
    4643              :     /*
    4644              :      * Check if someone else has already decided that we need to die.  Since
    4645              :      * we set our own DOOMED flag when partially releasing, ignore in that
    4646              :      * case.
    4647              :      */
    4648         1412 :     if (SxactIsDoomed(MySerializableXact) &&
    4649          156 :         !SxactIsPartiallyReleased(MySerializableXact))
    4650              :     {
    4651          155 :         LWLockRelease(SerializableXactHashLock);
    4652          155 :         ereport(ERROR,
    4653              :                 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    4654              :                  errmsg("could not serialize access due to read/write dependencies among transactions"),
    4655              :                  errdetail_internal("Reason code: Canceled on identification as a pivot, during commit attempt."),
    4656              :                  errhint("The transaction might succeed if retried.")));
    4657              :     }
    4658              : 
    4659         1863 :     dlist_foreach(near_iter, &MySerializableXact->inConflicts)
    4660              :     {
    4661          606 :         RWConflict  nearConflict =
    4662          606 :             dlist_container(RWConflictData, inLink, near_iter.cur);
    4663              : 
    4664          606 :         if (!SxactIsCommitted(nearConflict->sxactOut)
    4665          421 :             && !SxactIsDoomed(nearConflict->sxactOut))
    4666              :         {
    4667              :             dlist_iter  far_iter;
    4668              : 
    4669          451 :             dlist_foreach(far_iter, &nearConflict->sxactOut->inConflicts)
    4670              :             {
    4671          182 :                 RWConflict  farConflict =
    4672          182 :                     dlist_container(RWConflictData, inLink, far_iter.cur);
    4673              : 
    4674          182 :                 if (farConflict->sxactOut == MySerializableXact
    4675           42 :                     || (!SxactIsCommitted(farConflict->sxactOut)
    4676           24 :                         && !SxactIsReadOnly(farConflict->sxactOut)
    4677           12 :                         && !SxactIsDoomed(farConflict->sxactOut)))
    4678              :                 {
    4679              :                     /*
    4680              :                      * Normally, we kill the pivot transaction to make sure we
    4681              :                      * make progress if the failing transaction is retried.
    4682              :                      * However, we can't kill it if it's already prepared, so
    4683              :                      * in that case we commit suicide instead.
    4684              :                      */
    4685          152 :                     if (SxactIsPrepared(nearConflict->sxactOut))
    4686              :                     {
    4687            0 :                         LWLockRelease(SerializableXactHashLock);
    4688            0 :                         ereport(ERROR,
    4689              :                                 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
    4690              :                                  errmsg("could not serialize access due to read/write dependencies among transactions"),
    4691              :                                  errdetail_internal("Reason code: Canceled on commit attempt with conflict in from prepared pivot."),
    4692              :                                  errhint("The transaction might succeed if retried.")));
    4693              :                     }
    4694          152 :                     nearConflict->sxactOut->flags |= SXACT_FLAG_DOOMED;
    4695          152 :                     break;
    4696              :                 }
    4697              :             }
    4698              :         }
    4699              :     }
    4700              : 
    4701         1257 :     MySerializableXact->prepareSeqNo = ++(PredXact->LastSxactCommitSeqNo);
    4702         1257 :     MySerializableXact->flags |= SXACT_FLAG_PREPARED;
    4703              : 
    4704         1257 :     LWLockRelease(SerializableXactHashLock);
    4705              : }
    4706              : 
    4707              : /*------------------------------------------------------------------------*/
    4708              : 
    4709              : /*
    4710              :  * Two-phase commit support
    4711              :  */
    4712              : 
    4713              : /*
    4714              :  * AtPrepare_Locks
    4715              :  *      Do the preparatory work for a PREPARE: make 2PC state file
    4716              :  *      records for all predicate locks currently held.
    4717              :  */
    4718              : void
    4719          320 : AtPrepare_PredicateLocks(void)
    4720              : {
    4721              :     SERIALIZABLEXACT *sxact;
    4722              :     TwoPhasePredicateRecord record;
    4723              :     TwoPhasePredicateXactRecord *xactRecord;
    4724              :     TwoPhasePredicateLockRecord *lockRecord;
    4725              :     dlist_iter  iter;
    4726              : 
    4727          320 :     sxact = MySerializableXact;
    4728          320 :     xactRecord = &(record.data.xactRecord);
    4729          320 :     lockRecord = &(record.data.lockRecord);
    4730              : 
    4731          320 :     if (MySerializableXact == InvalidSerializableXact)
    4732          308 :         return;
    4733              : 
    4734              :     /* Generate an xact record for our SERIALIZABLEXACT */
    4735           12 :     record.type = TWOPHASEPREDICATERECORD_XACT;
    4736           12 :     xactRecord->xmin = MySerializableXact->xmin;
    4737           12 :     xactRecord->flags = MySerializableXact->flags;
    4738              : 
    4739              :     /*
    4740              :      * Note that we don't include the list of conflicts in our out in the
    4741              :      * statefile, because new conflicts can be added even after the
    4742              :      * transaction prepares. We'll just make a conservative assumption during
    4743              :      * recovery instead.
    4744              :      */
    4745              : 
    4746           12 :     RegisterTwoPhaseRecord(TWOPHASE_RM_PREDICATELOCK_ID, 0,
    4747              :                            &record, sizeof(record));
    4748              : 
    4749              :     /*
    4750              :      * Generate a lock record for each lock.
    4751              :      *
    4752              :      * To do this, we need to walk the predicate lock list in our sxact rather
    4753              :      * than using the local predicate lock table because the latter is not
    4754              :      * guaranteed to be accurate.
    4755              :      */
    4756           12 :     LWLockAcquire(SerializablePredicateListLock, LW_SHARED);
    4757              : 
    4758              :     /*
    4759              :      * No need to take sxact->perXactPredicateListLock in parallel mode
    4760              :      * because there cannot be any parallel workers running while we are
    4761              :      * preparing a transaction.
    4762              :      */
    4763              :     Assert(!IsParallelWorker() && !ParallelContextActive());
    4764              : 
    4765           22 :     dlist_foreach(iter, &sxact->predicateLocks)
    4766              :     {
    4767           10 :         PREDICATELOCK *predlock =
    4768           10 :             dlist_container(PREDICATELOCK, xactLink, iter.cur);
    4769              : 
    4770           10 :         record.type = TWOPHASEPREDICATERECORD_LOCK;
    4771           10 :         lockRecord->target = predlock->tag.myTarget->tag;
    4772              : 
    4773           10 :         RegisterTwoPhaseRecord(TWOPHASE_RM_PREDICATELOCK_ID, 0,
    4774              :                                &record, sizeof(record));
    4775              :     }
    4776              : 
    4777           12 :     LWLockRelease(SerializablePredicateListLock);
    4778              : }
    4779              : 
    4780              : /*
    4781              :  * PostPrepare_Locks
    4782              :  *      Clean up after successful PREPARE. Unlike the non-predicate
    4783              :  *      lock manager, we do not need to transfer locks to a dummy
    4784              :  *      PGPROC because our SERIALIZABLEXACT will stay around
    4785              :  *      anyway. We only need to clean up our local state.
    4786              :  */
    4787              : void
    4788          320 : PostPrepare_PredicateLocks(FullTransactionId fxid)
    4789              : {
    4790          320 :     if (MySerializableXact == InvalidSerializableXact)
    4791          308 :         return;
    4792              : 
    4793              :     Assert(SxactIsPrepared(MySerializableXact));
    4794              : 
    4795           12 :     MySerializableXact->pid = 0;
    4796           12 :     MySerializableXact->pgprocno = INVALID_PROC_NUMBER;
    4797              : 
    4798           12 :     hash_destroy(LocalPredicateLockHash);
    4799           12 :     LocalPredicateLockHash = NULL;
    4800              : 
    4801           12 :     MySerializableXact = InvalidSerializableXact;
    4802           12 :     MyXactDidWrite = false;
    4803              : }
    4804              : 
    4805              : /*
    4806              :  * PredicateLockTwoPhaseFinish
    4807              :  *      Release a prepared transaction's predicate locks once it
    4808              :  *      commits or aborts.
    4809              :  */
    4810              : void
    4811          327 : PredicateLockTwoPhaseFinish(FullTransactionId fxid, bool isCommit)
    4812              : {
    4813              :     SERIALIZABLEXID *sxid;
    4814              :     SERIALIZABLEXIDTAG sxidtag;
    4815              : 
    4816          327 :     sxidtag.xid = XidFromFullTransactionId(fxid);
    4817              : 
    4818          327 :     LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    4819              :     sxid = (SERIALIZABLEXID *)
    4820          327 :         hash_search(SerializableXidHash, &sxidtag, HASH_FIND, NULL);
    4821          327 :     LWLockRelease(SerializableXactHashLock);
    4822              : 
    4823              :     /* xid will not be found if it wasn't a serializable transaction */
    4824          327 :     if (sxid == NULL)
    4825          315 :         return;
    4826              : 
    4827              :     /* Release its locks */
    4828           12 :     MySerializableXact = sxid->myXact;
    4829           12 :     MyXactDidWrite = true;      /* conservatively assume that we wrote
    4830              :                                  * something */
    4831           12 :     ReleasePredicateLocks(isCommit, false);
    4832              : }
    4833              : 
    4834              : /*
    4835              :  * Re-acquire a predicate lock belonging to a transaction that was prepared.
    4836              :  */
    4837              : void
    4838            0 : predicatelock_twophase_recover(FullTransactionId fxid, uint16 info,
    4839              :                                void *recdata, uint32 len)
    4840              : {
    4841              :     TwoPhasePredicateRecord *record;
    4842            0 :     TransactionId xid = XidFromFullTransactionId(fxid);
    4843              : 
    4844              :     Assert(len == sizeof(TwoPhasePredicateRecord));
    4845              : 
    4846            0 :     record = (TwoPhasePredicateRecord *) recdata;
    4847              : 
    4848              :     Assert((record->type == TWOPHASEPREDICATERECORD_XACT) ||
    4849              :            (record->type == TWOPHASEPREDICATERECORD_LOCK));
    4850              : 
    4851            0 :     if (record->type == TWOPHASEPREDICATERECORD_XACT)
    4852              :     {
    4853              :         /* Per-transaction record. Set up a SERIALIZABLEXACT. */
    4854              :         TwoPhasePredicateXactRecord *xactRecord;
    4855              :         SERIALIZABLEXACT *sxact;
    4856              :         SERIALIZABLEXID *sxid;
    4857              :         SERIALIZABLEXIDTAG sxidtag;
    4858              :         bool        found;
    4859              : 
    4860            0 :         xactRecord = (TwoPhasePredicateXactRecord *) &record->data.xactRecord;
    4861              : 
    4862            0 :         LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
    4863            0 :         sxact = CreatePredXact();
    4864            0 :         if (!sxact)
    4865            0 :             ereport(ERROR,
    4866              :                     (errcode(ERRCODE_OUT_OF_MEMORY),
    4867              :                      errmsg("out of shared memory")));
    4868              : 
    4869              :         /* vxid for a prepared xact is INVALID_PROC_NUMBER/xid; no pid */
    4870            0 :         sxact->vxid.procNumber = INVALID_PROC_NUMBER;
    4871            0 :         sxact->vxid.localTransactionId = (LocalTransactionId) xid;
    4872            0 :         sxact->pid = 0;
    4873            0 :         sxact->pgprocno = INVALID_PROC_NUMBER;
    4874              : 
    4875              :         /* a prepared xact hasn't committed yet */
    4876            0 :         sxact->prepareSeqNo = RecoverySerCommitSeqNo;
    4877            0 :         sxact->commitSeqNo = InvalidSerCommitSeqNo;
    4878            0 :         sxact->finishedBefore = InvalidTransactionId;
    4879              : 
    4880            0 :         sxact->SeqNo.lastCommitBeforeSnapshot = RecoverySerCommitSeqNo;
    4881              : 
    4882              :         /*
    4883              :          * Don't need to track this; no transactions running at the time the
    4884              :          * recovered xact started are still active, except possibly other
    4885              :          * prepared xacts and we don't care whether those are RO_SAFE or not.
    4886              :          */
    4887            0 :         dlist_init(&(sxact->possibleUnsafeConflicts));
    4888              : 
    4889            0 :         dlist_init(&(sxact->predicateLocks));
    4890            0 :         dlist_node_init(&sxact->finishedLink);
    4891              : 
    4892            0 :         sxact->topXid = xid;
    4893            0 :         sxact->xmin = xactRecord->xmin;
    4894            0 :         sxact->flags = xactRecord->flags;
    4895              :         Assert(SxactIsPrepared(sxact));
    4896            0 :         if (!SxactIsReadOnly(sxact))
    4897              :         {
    4898            0 :             ++(PredXact->WritableSxactCount);
    4899              :             Assert(PredXact->WritableSxactCount <=
    4900              :                    (MaxBackends + max_prepared_xacts));
    4901              :         }
    4902              : 
    4903              :         /*
    4904              :          * We don't know whether the transaction had any conflicts or not, so
    4905              :          * we'll conservatively assume that it had both a conflict in and a
    4906              :          * conflict out, and represent that with the summary conflict flags.
    4907              :          */
    4908            0 :         dlist_init(&(sxact->outConflicts));
    4909            0 :         dlist_init(&(sxact->inConflicts));
    4910            0 :         sxact->flags |= SXACT_FLAG_SUMMARY_CONFLICT_IN;
    4911            0 :         sxact->flags |= SXACT_FLAG_SUMMARY_CONFLICT_OUT;
    4912              : 
    4913              :         /* Register the transaction's xid */
    4914            0 :         sxidtag.xid = xid;
    4915            0 :         sxid = (SERIALIZABLEXID *) hash_search(SerializableXidHash,
    4916              :                                                &sxidtag,
    4917              :                                                HASH_ENTER, &found);
    4918              :         Assert(sxid != NULL);
    4919              :         Assert(!found);
    4920            0 :         sxid->myXact = sxact;
    4921              : 
    4922              :         /*
    4923              :          * Update global xmin. Note that this is a special case compared to
    4924              :          * registering a normal transaction, because the global xmin might go
    4925              :          * backwards. That's OK, because until recovery is over we're not
    4926              :          * going to complete any transactions or create any non-prepared
    4927              :          * transactions, so there's no danger of throwing away.
    4928              :          */
    4929            0 :         if ((!TransactionIdIsValid(PredXact->SxactGlobalXmin)) ||
    4930            0 :             (TransactionIdFollows(PredXact->SxactGlobalXmin, sxact->xmin)))
    4931              :         {
    4932            0 :             PredXact->SxactGlobalXmin = sxact->xmin;
    4933            0 :             PredXact->SxactGlobalXminCount = 1;
    4934            0 :             SerialSetActiveSerXmin(sxact->xmin);
    4935              :         }
    4936            0 :         else if (TransactionIdEquals(sxact->xmin, PredXact->SxactGlobalXmin))
    4937              :         {
    4938              :             Assert(PredXact->SxactGlobalXminCount > 0);
    4939            0 :             PredXact->SxactGlobalXminCount++;
    4940              :         }
    4941              : 
    4942            0 :         LWLockRelease(SerializableXactHashLock);
    4943              :     }
    4944            0 :     else if (record->type == TWOPHASEPREDICATERECORD_LOCK)
    4945              :     {
    4946              :         /* Lock record. Recreate the PREDICATELOCK */
    4947              :         TwoPhasePredicateLockRecord *lockRecord;
    4948              :         SERIALIZABLEXID *sxid;
    4949              :         SERIALIZABLEXACT *sxact;
    4950              :         SERIALIZABLEXIDTAG sxidtag;
    4951              :         uint32      targettaghash;
    4952              : 
    4953            0 :         lockRecord = (TwoPhasePredicateLockRecord *) &record->data.lockRecord;
    4954            0 :         targettaghash = PredicateLockTargetTagHashCode(&lockRecord->target);
    4955              : 
    4956            0 :         LWLockAcquire(SerializableXactHashLock, LW_SHARED);
    4957            0 :         sxidtag.xid = xid;
    4958              :         sxid = (SERIALIZABLEXID *)
    4959            0 :             hash_search(SerializableXidHash, &sxidtag, HASH_FIND, NULL);
    4960            0 :         LWLockRelease(SerializableXactHashLock);
    4961              : 
    4962              :         Assert(sxid != NULL);
    4963            0 :         sxact = sxid->myXact;
    4964              :         Assert(sxact != InvalidSerializableXact);
    4965              : 
    4966            0 :         CreatePredicateLock(&lockRecord->target, targettaghash, sxact);
    4967              :     }
    4968            0 : }
    4969              : 
    4970              : /*
    4971              :  * Prepare to share the current SERIALIZABLEXACT with parallel workers.
    4972              :  * Return a handle object that can be used by AttachSerializableXact() in a
    4973              :  * parallel worker.
    4974              :  */
    4975              : SerializableXactHandle
    4976          680 : ShareSerializableXact(void)
    4977              : {
    4978          680 :     return MySerializableXact;
    4979              : }
    4980              : 
    4981              : /*
    4982              :  * Allow parallel workers to import the leader's SERIALIZABLEXACT.
    4983              :  */
    4984              : void
    4985         2010 : AttachSerializableXact(SerializableXactHandle handle)
    4986              : {
    4987              : 
    4988              :     Assert(MySerializableXact == InvalidSerializableXact);
    4989              : 
    4990         2010 :     MySerializableXact = (SERIALIZABLEXACT *) handle;
    4991         2010 :     if (MySerializableXact != InvalidSerializableXact)
    4992           13 :         CreateLocalPredicateLockHash();
    4993         2010 : }
        

Generated by: LCOV version 2.0-1