LCOV - code coverage report
Current view: top level - src/backend/utils/time - snapmgr.c (source / functions) Coverage Total Hit
Test: PostgreSQL 19devel Lines: 89.3 % 533 476
Test Date: 2026-03-12 11:14:52 Functions: 100.0 % 49 49
Legend: Lines:     hit not hit

            Line data    Source code
       1              : /*-------------------------------------------------------------------------
       2              :  *
       3              :  * snapmgr.c
       4              :  *      PostgreSQL snapshot manager
       5              :  *
       6              :  * The following functions return an MVCC snapshot that can be used in tuple
       7              :  * visibility checks:
       8              :  *
       9              :  * - GetTransactionSnapshot
      10              :  * - GetLatestSnapshot
      11              :  * - GetCatalogSnapshot
      12              :  * - GetNonHistoricCatalogSnapshot
      13              :  *
      14              :  * Each of these functions returns a reference to a statically allocated
      15              :  * snapshot.  The statically allocated snapshot is subject to change on any
      16              :  * snapshot-related function call, and should not be used directly.  Instead,
      17              :  * call PushActiveSnapshot() or RegisterSnapshot() to create a longer-lived
      18              :  * copy and use that.
      19              :  *
      20              :  * We keep track of snapshots in two ways: those "registered" by resowner.c,
      21              :  * and the "active snapshot" stack.  All snapshots in either of them live in
      22              :  * persistent memory.  When a snapshot is no longer in any of these lists
      23              :  * (tracked by separate refcounts on each snapshot), its memory can be freed.
      24              :  *
      25              :  * In addition to the above-mentioned MVCC snapshots, there are some special
      26              :  * snapshots like SnapshotSelf, SnapshotAny, and "dirty" snapshots.  They can
      27              :  * only be used in limited contexts and cannot be registered or pushed to the
      28              :  * active stack.
      29              :  *
      30              :  * ActiveSnapshot stack
      31              :  * --------------------
      32              :  *
      33              :  * Most visibility checks use the current "active snapshot" returned by
      34              :  * GetActiveSnapshot().  When running normal queries, the active snapshot is
      35              :  * set when query execution begins based on the transaction isolation level.
      36              :  *
      37              :  * The active snapshot is tracked in a stack so that the currently active one
      38              :  * is at the top of the stack.  It mirrors the process call stack: whenever we
      39              :  * recurse or switch context to fetch rows from a different portal for
      40              :  * example, the appropriate snapshot is pushed to become the active snapshot,
      41              :  * and popped on return.  Once upon a time, ActiveSnapshot was just a global
      42              :  * variable that was saved and restored similar to CurrentMemoryContext, but
      43              :  * nowadays it's managed as a separate data structure so that we can keep
      44              :  * track of which snapshots are in use and reset MyProc->xmin when there is no
      45              :  * active snapshot.
      46              :  *
      47              :  * However, there are a couple of exceptions where the active snapshot stack
      48              :  * does not strictly mirror the call stack:
      49              :  *
      50              :  * - VACUUM and a few other utility commands manage their own transactions,
      51              :  *   which take their own snapshots.  They are called with an active snapshot
      52              :  *   set, like most utility commands, but they pop the active snapshot that
      53              :  *   was pushed by the caller.  PortalRunUtility knows about the possibility
      54              :  *   that the snapshot it pushed is no longer active on return.
      55              :  *
      56              :  * - When COMMIT or ROLLBACK is executed within a procedure or DO-block, the
      57              :  *   active snapshot stack is destroyed, and re-established later when
      58              :  *   subsequent statements in the procedure are executed.  There are many
      59              :  *   limitations on when in-procedure COMMIT/ROLLBACK is allowed; one such
      60              :  *   limitation is that all the snapshots on the active snapshot stack are
      61              :  *   known to portals that are being executed, which makes it safe to reset
      62              :  *   the stack.  See EnsurePortalSnapshotExists().
      63              :  *
      64              :  * Registered snapshots
      65              :  * --------------------
      66              :  *
      67              :  * In addition to snapshots pushed to the active snapshot stack, a snapshot
      68              :  * can be registered with a resource owner.
      69              :  *
      70              :  * The FirstXactSnapshot, if any, is treated a bit specially: we increment its
      71              :  * regd_count and list it in RegisteredSnapshots, but this reference is not
      72              :  * tracked by a resource owner. We used to use the TopTransactionResourceOwner
      73              :  * to track this snapshot reference, but that introduces logical circularity
      74              :  * and thus makes it impossible to clean up in a sane fashion.  It's better to
      75              :  * handle this reference as an internally-tracked registration, so that this
      76              :  * module is entirely lower-level than ResourceOwners.
      77              :  *
      78              :  * Likewise, any snapshots that have been exported by pg_export_snapshot
      79              :  * have regd_count = 1 and are listed in RegisteredSnapshots, but are not
      80              :  * tracked by any resource owner.
      81              :  *
      82              :  * Likewise, the CatalogSnapshot is listed in RegisteredSnapshots when it
      83              :  * is valid, but is not tracked by any resource owner.
      84              :  *
      85              :  * The same is true for historic snapshots used during logical decoding,
      86              :  * their lifetime is managed separately (as they live longer than one xact.c
      87              :  * transaction).
      88              :  *
      89              :  * These arrangements let us reset MyProc->xmin when there are no snapshots
      90              :  * referenced by this transaction, and advance it when the one with oldest
      91              :  * Xmin is no longer referenced.  For simplicity however, only registered
      92              :  * snapshots not active snapshots participate in tracking which one is oldest;
      93              :  * we don't try to change MyProc->xmin except when the active-snapshot
      94              :  * stack is empty.
      95              :  *
      96              :  *
      97              :  * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
      98              :  * Portions Copyright (c) 1994, Regents of the University of California
      99              :  *
     100              :  * IDENTIFICATION
     101              :  *    src/backend/utils/time/snapmgr.c
     102              :  *
     103              :  *-------------------------------------------------------------------------
     104              :  */
     105              : #include "postgres.h"
     106              : 
     107              : #include <sys/stat.h>
     108              : #include <unistd.h>
     109              : 
     110              : #include "access/subtrans.h"
     111              : #include "access/transam.h"
     112              : #include "access/xact.h"
     113              : #include "datatype/timestamp.h"
     114              : #include "lib/pairingheap.h"
     115              : #include "miscadmin.h"
     116              : #include "port/pg_lfind.h"
     117              : #include "storage/fd.h"
     118              : #include "storage/predicate.h"
     119              : #include "storage/proc.h"
     120              : #include "storage/procarray.h"
     121              : #include "utils/builtins.h"
     122              : #include "utils/injection_point.h"
     123              : #include "utils/memutils.h"
     124              : #include "utils/resowner.h"
     125              : #include "utils/snapmgr.h"
     126              : #include "utils/syscache.h"
     127              : 
     128              : 
     129              : /*
     130              :  * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
     131              :  * mode, and to the latest one taken in a read-committed transaction.
     132              :  * SecondarySnapshot is a snapshot that's always up-to-date as of the current
     133              :  * instant, even in transaction-snapshot mode.  It should only be used for
     134              :  * special-purpose code (say, RI checking.)  CatalogSnapshot points to an
     135              :  * MVCC snapshot intended to be used for catalog scans; we must invalidate it
     136              :  * whenever a system catalog change occurs.
     137              :  *
     138              :  * These SnapshotData structs are static to simplify memory allocation
     139              :  * (see the hack in GetSnapshotData to avoid repeated malloc/free).
     140              :  */
     141              : static SnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
     142              : static SnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
     143              : static SnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
     144              : SnapshotData SnapshotSelfData = {SNAPSHOT_SELF};
     145              : SnapshotData SnapshotAnyData = {SNAPSHOT_ANY};
     146              : SnapshotData SnapshotToastData = {SNAPSHOT_TOAST};
     147              : 
     148              : /* Pointers to valid snapshots */
     149              : static Snapshot CurrentSnapshot = NULL;
     150              : static Snapshot SecondarySnapshot = NULL;
     151              : static Snapshot CatalogSnapshot = NULL;
     152              : static Snapshot HistoricSnapshot = NULL;
     153              : 
     154              : /*
     155              :  * These are updated by GetSnapshotData.  We initialize them this way
     156              :  * for the convenience of TransactionIdIsInProgress: even in bootstrap
     157              :  * mode, we don't want it to say that BootstrapTransactionId is in progress.
     158              :  */
     159              : TransactionId TransactionXmin = FirstNormalTransactionId;
     160              : TransactionId RecentXmin = FirstNormalTransactionId;
     161              : 
     162              : /* (table, ctid) => (cmin, cmax) mapping during timetravel */
     163              : static HTAB *tuplecid_data = NULL;
     164              : 
     165              : /*
     166              :  * Elements of the active snapshot stack.
     167              :  *
     168              :  * Each element here accounts for exactly one active_count on SnapshotData.
     169              :  *
     170              :  * NB: the code assumes that elements in this list are in non-increasing
     171              :  * order of as_level; also, the list must be NULL-terminated.
     172              :  */
     173              : typedef struct ActiveSnapshotElt
     174              : {
     175              :     Snapshot    as_snap;
     176              :     int         as_level;
     177              :     struct ActiveSnapshotElt *as_next;
     178              : } ActiveSnapshotElt;
     179              : 
     180              : /* Top of the stack of active snapshots */
     181              : static ActiveSnapshotElt *ActiveSnapshot = NULL;
     182              : 
     183              : /*
     184              :  * Currently registered Snapshots.  Ordered in a heap by xmin, so that we can
     185              :  * quickly find the one with lowest xmin, to advance our MyProc->xmin.
     186              :  */
     187              : static int  xmin_cmp(const pairingheap_node *a, const pairingheap_node *b,
     188              :                      void *arg);
     189              : 
     190              : static pairingheap RegisteredSnapshots = {&xmin_cmp, NULL, NULL};
     191              : 
     192              : /* first GetTransactionSnapshot call in a transaction? */
     193              : bool        FirstSnapshotSet = false;
     194              : 
     195              : /*
     196              :  * Remember the serializable transaction snapshot, if any.  We cannot trust
     197              :  * FirstSnapshotSet in combination with IsolationUsesXactSnapshot(), because
     198              :  * GUC may be reset before us, changing the value of IsolationUsesXactSnapshot.
     199              :  */
     200              : static Snapshot FirstXactSnapshot = NULL;
     201              : 
     202              : /* Define pathname of exported-snapshot files */
     203              : #define SNAPSHOT_EXPORT_DIR "pg_snapshots"
     204              : 
     205              : /* Structure holding info about exported snapshot. */
     206              : typedef struct ExportedSnapshot
     207              : {
     208              :     char       *snapfile;
     209              :     Snapshot    snapshot;
     210              : } ExportedSnapshot;
     211              : 
     212              : /* Current xact's exported snapshots (a list of ExportedSnapshot structs) */
     213              : static List *exportedSnapshots = NIL;
     214              : 
     215              : /* Prototypes for local functions */
     216              : static Snapshot CopySnapshot(Snapshot snapshot);
     217              : static void UnregisterSnapshotNoOwner(Snapshot snapshot);
     218              : static void FreeSnapshot(Snapshot snapshot);
     219              : static void SnapshotResetXmin(void);
     220              : 
     221              : /* ResourceOwner callbacks to track snapshot references */
     222              : static void ResOwnerReleaseSnapshot(Datum res);
     223              : 
     224              : static const ResourceOwnerDesc snapshot_resowner_desc =
     225              : {
     226              :     .name = "snapshot reference",
     227              :     .release_phase = RESOURCE_RELEASE_AFTER_LOCKS,
     228              :     .release_priority = RELEASE_PRIO_SNAPSHOT_REFS,
     229              :     .ReleaseResource = ResOwnerReleaseSnapshot,
     230              :     .DebugPrint = NULL          /* the default message is fine */
     231              : };
     232              : 
     233              : /* Convenience wrappers over ResourceOwnerRemember/Forget */
     234              : static inline void
     235      8555679 : ResourceOwnerRememberSnapshot(ResourceOwner owner, Snapshot snap)
     236              : {
     237      8555679 :     ResourceOwnerRemember(owner, PointerGetDatum(snap), &snapshot_resowner_desc);
     238      8555679 : }
     239              : static inline void
     240      8525619 : ResourceOwnerForgetSnapshot(ResourceOwner owner, Snapshot snap)
     241              : {
     242      8525619 :     ResourceOwnerForget(owner, PointerGetDatum(snap), &snapshot_resowner_desc);
     243      8525619 : }
     244              : 
     245              : /*
     246              :  * Snapshot fields to be serialized.
     247              :  *
     248              :  * Only these fields need to be sent to the cooperating backend; the
     249              :  * remaining ones can (and must) be set by the receiver upon restore.
     250              :  */
     251              : typedef struct SerializedSnapshotData
     252              : {
     253              :     TransactionId xmin;
     254              :     TransactionId xmax;
     255              :     uint32      xcnt;
     256              :     int32       subxcnt;
     257              :     bool        suboverflowed;
     258              :     bool        takenDuringRecovery;
     259              :     CommandId   curcid;
     260              : } SerializedSnapshotData;
     261              : 
     262              : /*
     263              :  * GetTransactionSnapshot
     264              :  *      Get the appropriate snapshot for a new query in a transaction.
     265              :  *
     266              :  * Note that the return value points at static storage that will be modified
     267              :  * by future calls and by CommandCounterIncrement().  Callers must call
     268              :  * RegisterSnapshot or PushActiveSnapshot on the returned snap before doing
     269              :  * any other non-trivial work that could invalidate it.
     270              :  */
     271              : Snapshot
     272       991438 : GetTransactionSnapshot(void)
     273              : {
     274              :     /*
     275              :      * Return historic snapshot if doing logical decoding.
     276              :      *
     277              :      * Historic snapshots are only usable for catalog access, not for
     278              :      * general-purpose queries.  The caller is responsible for ensuring that
     279              :      * the snapshot is used correctly! (PostgreSQL code never calls this
     280              :      * during logical decoding, but extensions can do it.)
     281              :      */
     282       991438 :     if (HistoricSnapshotActive())
     283              :     {
     284              :         /*
     285              :          * We'll never need a non-historic transaction snapshot in this
     286              :          * (sub-)transaction, so there's no need to be careful to set one up
     287              :          * for later calls to GetTransactionSnapshot().
     288              :          */
     289              :         Assert(!FirstSnapshotSet);
     290            0 :         return HistoricSnapshot;
     291              :     }
     292              : 
     293              :     /* First call in transaction? */
     294       991438 :     if (!FirstSnapshotSet)
     295              :     {
     296              :         /*
     297              :          * Don't allow catalog snapshot to be older than xact snapshot.  Must
     298              :          * do this first to allow the empty-heap Assert to succeed.
     299              :          */
     300       398029 :         InvalidateCatalogSnapshot();
     301              : 
     302              :         Assert(pairingheap_is_empty(&RegisteredSnapshots));
     303              :         Assert(FirstXactSnapshot == NULL);
     304              : 
     305       398029 :         if (IsInParallelMode())
     306            0 :             elog(ERROR,
     307              :                  "cannot take query snapshot during a parallel operation");
     308              : 
     309              :         /*
     310              :          * In transaction-snapshot mode, the first snapshot must live until
     311              :          * end of xact regardless of what the caller does with it, so we must
     312              :          * make a copy of it rather than returning CurrentSnapshotData
     313              :          * directly.  Furthermore, if we're running in serializable mode,
     314              :          * predicate.c needs to wrap the snapshot fetch in its own processing.
     315              :          */
     316       398029 :         if (IsolationUsesXactSnapshot())
     317              :         {
     318              :             /* First, create the snapshot in CurrentSnapshotData */
     319         2868 :             if (IsolationIsSerializable())
     320         1666 :                 CurrentSnapshot = GetSerializableTransactionSnapshot(&CurrentSnapshotData);
     321              :             else
     322         1202 :                 CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
     323              :             /* Make a saved copy */
     324         2868 :             CurrentSnapshot = CopySnapshot(CurrentSnapshot);
     325         2868 :             FirstXactSnapshot = CurrentSnapshot;
     326              :             /* Mark it as "registered" in FirstXactSnapshot */
     327         2868 :             FirstXactSnapshot->regd_count++;
     328         2868 :             pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
     329              :         }
     330              :         else
     331       395161 :             CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
     332              : 
     333       398029 :         FirstSnapshotSet = true;
     334       398029 :         return CurrentSnapshot;
     335              :     }
     336              : 
     337       593409 :     if (IsolationUsesXactSnapshot())
     338        79101 :         return CurrentSnapshot;
     339              : 
     340              :     /* Don't allow catalog snapshot to be older than xact snapshot. */
     341       514308 :     InvalidateCatalogSnapshot();
     342              : 
     343       514308 :     CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
     344              : 
     345       514308 :     return CurrentSnapshot;
     346              : }
     347              : 
     348              : /*
     349              :  * GetLatestSnapshot
     350              :  *      Get a snapshot that is up-to-date as of the current instant,
     351              :  *      even if we are executing in transaction-snapshot mode.
     352              :  */
     353              : Snapshot
     354        76874 : GetLatestSnapshot(void)
     355              : {
     356              :     /*
     357              :      * We might be able to relax this, but nothing that could otherwise work
     358              :      * needs it.
     359              :      */
     360        76874 :     if (IsInParallelMode())
     361            0 :         elog(ERROR,
     362              :              "cannot update SecondarySnapshot during a parallel operation");
     363              : 
     364              :     /*
     365              :      * So far there are no cases requiring support for GetLatestSnapshot()
     366              :      * during logical decoding, but it wouldn't be hard to add if required.
     367              :      */
     368              :     Assert(!HistoricSnapshotActive());
     369              : 
     370              :     /* If first call in transaction, go ahead and set the xact snapshot */
     371        76874 :     if (!FirstSnapshotSet)
     372           46 :         return GetTransactionSnapshot();
     373              : 
     374        76828 :     SecondarySnapshot = GetSnapshotData(&SecondarySnapshotData);
     375              : 
     376        76828 :     return SecondarySnapshot;
     377              : }
     378              : 
     379              : /*
     380              :  * GetCatalogSnapshot
     381              :  *      Get a snapshot that is sufficiently up-to-date for scan of the
     382              :  *      system catalog with the specified OID.
     383              :  */
     384              : Snapshot
     385      7886206 : GetCatalogSnapshot(Oid relid)
     386              : {
     387              :     /*
     388              :      * Return historic snapshot while we're doing logical decoding, so we can
     389              :      * see the appropriate state of the catalog.
     390              :      *
     391              :      * This is the primary reason for needing to reset the system caches after
     392              :      * finishing decoding.
     393              :      */
     394      7886206 :     if (HistoricSnapshotActive())
     395        17249 :         return HistoricSnapshot;
     396              : 
     397      7868957 :     return GetNonHistoricCatalogSnapshot(relid);
     398              : }
     399              : 
     400              : /*
     401              :  * GetNonHistoricCatalogSnapshot
     402              :  *      Get a snapshot that is sufficiently up-to-date for scan of the system
     403              :  *      catalog with the specified OID, even while historic snapshots are set
     404              :  *      up.
     405              :  */
     406              : Snapshot
     407      7870725 : GetNonHistoricCatalogSnapshot(Oid relid)
     408              : {
     409              :     /*
     410              :      * If the caller is trying to scan a relation that has no syscache, no
     411              :      * catcache invalidations will be sent when it is updated.  For a few key
     412              :      * relations, snapshot invalidations are sent instead.  If we're trying to
     413              :      * scan a relation for which neither catcache nor snapshot invalidations
     414              :      * are sent, we must refresh the snapshot every time.
     415              :      */
     416      7870725 :     if (CatalogSnapshot &&
     417      6885426 :         !RelationInvalidatesSnapshotsOnly(relid) &&
     418      6043993 :         !RelationHasSysCache(relid))
     419       264993 :         InvalidateCatalogSnapshot();
     420              : 
     421      7870725 :     if (CatalogSnapshot == NULL)
     422              :     {
     423              :         /* Get new snapshot. */
     424      1250292 :         CatalogSnapshot = GetSnapshotData(&CatalogSnapshotData);
     425              : 
     426              :         /*
     427              :          * Make sure the catalog snapshot will be accounted for in decisions
     428              :          * about advancing PGPROC->xmin.  We could apply RegisterSnapshot, but
     429              :          * that would result in making a physical copy, which is overkill; and
     430              :          * it would also create a dependency on some resource owner, which we
     431              :          * do not want for reasons explained at the head of this file. Instead
     432              :          * just shove the CatalogSnapshot into the pairing heap manually. This
     433              :          * has to be reversed in InvalidateCatalogSnapshot, of course.
     434              :          *
     435              :          * NB: it had better be impossible for this to throw error, since the
     436              :          * CatalogSnapshot pointer is already valid.
     437              :          */
     438      1250292 :         pairingheap_add(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
     439              :     }
     440              : 
     441      7870725 :     return CatalogSnapshot;
     442              : }
     443              : 
     444              : /*
     445              :  * InvalidateCatalogSnapshot
     446              :  *      Mark the current catalog snapshot, if any, as invalid
     447              :  *
     448              :  * We could change this API to allow the caller to provide more fine-grained
     449              :  * invalidation details, so that a change to relation A wouldn't prevent us
     450              :  * from using our cached snapshot to scan relation B, but so far there's no
     451              :  * evidence that the CPU cycles we spent tracking such fine details would be
     452              :  * well-spent.
     453              :  */
     454              : void
     455     15615854 : InvalidateCatalogSnapshot(void)
     456              : {
     457     15615854 :     if (CatalogSnapshot)
     458              :     {
     459      1250292 :         pairingheap_remove(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
     460      1250292 :         CatalogSnapshot = NULL;
     461      1250292 :         SnapshotResetXmin();
     462      1250292 :         INJECTION_POINT("invalidate-catalog-snapshot-end", NULL);
     463              :     }
     464     15615854 : }
     465              : 
     466              : /*
     467              :  * InvalidateCatalogSnapshotConditionally
     468              :  *      Drop catalog snapshot if it's the only one we have
     469              :  *
     470              :  * This is called when we are about to wait for client input, so we don't
     471              :  * want to continue holding the catalog snapshot if it might mean that the
     472              :  * global xmin horizon can't advance.  However, if there are other snapshots
     473              :  * still active or registered, the catalog snapshot isn't likely to be the
     474              :  * oldest one, so we might as well keep it.
     475              :  */
     476              : void
     477       423860 : InvalidateCatalogSnapshotConditionally(void)
     478              : {
     479       423860 :     if (CatalogSnapshot &&
     480        58896 :         ActiveSnapshot == NULL &&
     481        58045 :         pairingheap_is_singular(&RegisteredSnapshots))
     482         9587 :         InvalidateCatalogSnapshot();
     483       423860 : }
     484              : 
     485              : /*
     486              :  * SnapshotSetCommandId
     487              :  *      Propagate CommandCounterIncrement into the static snapshots, if set
     488              :  */
     489              : void
     490       597382 : SnapshotSetCommandId(CommandId curcid)
     491              : {
     492       597382 :     if (!FirstSnapshotSet)
     493        10295 :         return;
     494              : 
     495       587087 :     if (CurrentSnapshot)
     496       587087 :         CurrentSnapshot->curcid = curcid;
     497       587087 :     if (SecondarySnapshot)
     498        81451 :         SecondarySnapshot->curcid = curcid;
     499              :     /* Should we do the same with CatalogSnapshot? */
     500              : }
     501              : 
     502              : /*
     503              :  * SetTransactionSnapshot
     504              :  *      Set the transaction's snapshot from an imported MVCC snapshot.
     505              :  *
     506              :  * Note that this is very closely tied to GetTransactionSnapshot --- it
     507              :  * must take care of all the same considerations as the first-snapshot case
     508              :  * in GetTransactionSnapshot.
     509              :  */
     510              : static void
     511         1713 : SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
     512              :                        int sourcepid, PGPROC *sourceproc)
     513              : {
     514              :     /* Caller should have checked this already */
     515              :     Assert(!FirstSnapshotSet);
     516              : 
     517              :     /* Better do this to ensure following Assert succeeds. */
     518         1713 :     InvalidateCatalogSnapshot();
     519              : 
     520              :     Assert(pairingheap_is_empty(&RegisteredSnapshots));
     521              :     Assert(FirstXactSnapshot == NULL);
     522              :     Assert(!HistoricSnapshotActive());
     523              : 
     524              :     /*
     525              :      * Even though we are not going to use the snapshot it computes, we must
     526              :      * call GetSnapshotData, for two reasons: (1) to be sure that
     527              :      * CurrentSnapshotData's XID arrays have been allocated, and (2) to update
     528              :      * the state for GlobalVis*.
     529              :      */
     530         1713 :     CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
     531              : 
     532              :     /*
     533              :      * Now copy appropriate fields from the source snapshot.
     534              :      */
     535         1713 :     CurrentSnapshot->xmin = sourcesnap->xmin;
     536         1713 :     CurrentSnapshot->xmax = sourcesnap->xmax;
     537         1713 :     CurrentSnapshot->xcnt = sourcesnap->xcnt;
     538              :     Assert(sourcesnap->xcnt <= GetMaxSnapshotXidCount());
     539         1713 :     if (sourcesnap->xcnt > 0)
     540          341 :         memcpy(CurrentSnapshot->xip, sourcesnap->xip,
     541          341 :                sourcesnap->xcnt * sizeof(TransactionId));
     542         1713 :     CurrentSnapshot->subxcnt = sourcesnap->subxcnt;
     543              :     Assert(sourcesnap->subxcnt <= GetMaxSnapshotSubxidCount());
     544         1713 :     if (sourcesnap->subxcnt > 0)
     545            4 :         memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
     546            4 :                sourcesnap->subxcnt * sizeof(TransactionId));
     547         1713 :     CurrentSnapshot->suboverflowed = sourcesnap->suboverflowed;
     548         1713 :     CurrentSnapshot->takenDuringRecovery = sourcesnap->takenDuringRecovery;
     549              :     /* NB: curcid should NOT be copied, it's a local matter */
     550              : 
     551         1713 :     CurrentSnapshot->snapXactCompletionCount = 0;
     552              : 
     553              :     /*
     554              :      * Now we have to fix what GetSnapshotData did with MyProc->xmin and
     555              :      * TransactionXmin.  There is a race condition: to make sure we are not
     556              :      * causing the global xmin to go backwards, we have to test that the
     557              :      * source transaction is still running, and that has to be done
     558              :      * atomically. So let procarray.c do it.
     559              :      *
     560              :      * Note: in serializable mode, predicate.c will do this a second time. It
     561              :      * doesn't seem worth contorting the logic here to avoid two calls,
     562              :      * especially since it's not clear that predicate.c *must* do this.
     563              :      */
     564         1713 :     if (sourceproc != NULL)
     565              :     {
     566         1697 :         if (!ProcArrayInstallRestoredXmin(CurrentSnapshot->xmin, sourceproc))
     567            0 :             ereport(ERROR,
     568              :                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
     569              :                      errmsg("could not import the requested snapshot"),
     570              :                      errdetail("The source transaction is not running anymore.")));
     571              :     }
     572           16 :     else if (!ProcArrayInstallImportedXmin(CurrentSnapshot->xmin, sourcevxid))
     573            0 :         ereport(ERROR,
     574              :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
     575              :                  errmsg("could not import the requested snapshot"),
     576              :                  errdetail("The source process with PID %d is not running anymore.",
     577              :                            sourcepid)));
     578              : 
     579              :     /*
     580              :      * In transaction-snapshot mode, the first snapshot must live until end of
     581              :      * xact, so we must make a copy of it.  Furthermore, if we're running in
     582              :      * serializable mode, predicate.c needs to do its own processing.
     583              :      */
     584         1713 :     if (IsolationUsesXactSnapshot())
     585              :     {
     586          244 :         if (IsolationIsSerializable())
     587           13 :             SetSerializableTransactionSnapshot(CurrentSnapshot, sourcevxid,
     588              :                                                sourcepid);
     589              :         /* Make a saved copy */
     590          244 :         CurrentSnapshot = CopySnapshot(CurrentSnapshot);
     591          244 :         FirstXactSnapshot = CurrentSnapshot;
     592              :         /* Mark it as "registered" in FirstXactSnapshot */
     593          244 :         FirstXactSnapshot->regd_count++;
     594          244 :         pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
     595              :     }
     596              : 
     597         1713 :     FirstSnapshotSet = true;
     598         1713 : }
     599              : 
     600              : /*
     601              :  * CopySnapshot
     602              :  *      Copy the given snapshot.
     603              :  *
     604              :  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
     605              :  * to 0.  The returned snapshot has the copied flag set.
     606              :  */
     607              : static Snapshot
     608      8959190 : CopySnapshot(Snapshot snapshot)
     609              : {
     610              :     Snapshot    newsnap;
     611              :     Size        subxipoff;
     612              :     Size        size;
     613              : 
     614              :     Assert(snapshot != InvalidSnapshot);
     615              : 
     616              :     /* We allocate any XID arrays needed in the same palloc block. */
     617      8959190 :     size = subxipoff = sizeof(SnapshotData) +
     618      8959190 :         snapshot->xcnt * sizeof(TransactionId);
     619      8959190 :     if (snapshot->subxcnt > 0)
     620        77819 :         size += snapshot->subxcnt * sizeof(TransactionId);
     621              : 
     622      8959190 :     newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
     623      8959190 :     memcpy(newsnap, snapshot, sizeof(SnapshotData));
     624              : 
     625      8959190 :     newsnap->regd_count = 0;
     626      8959190 :     newsnap->active_count = 0;
     627      8959190 :     newsnap->copied = true;
     628      8959190 :     newsnap->snapXactCompletionCount = 0;
     629              : 
     630              :     /* setup XID array */
     631      8959190 :     if (snapshot->xcnt > 0)
     632              :     {
     633      2372698 :         newsnap->xip = (TransactionId *) (newsnap + 1);
     634      2372698 :         memcpy(newsnap->xip, snapshot->xip,
     635      2372698 :                snapshot->xcnt * sizeof(TransactionId));
     636              :     }
     637              :     else
     638      6586492 :         newsnap->xip = NULL;
     639              : 
     640              :     /*
     641              :      * Setup subXID array. Don't bother to copy it if it had overflowed,
     642              :      * though, because it's not used anywhere in that case. Except if it's a
     643              :      * snapshot taken during recovery; all the top-level XIDs are in subxip as
     644              :      * well in that case, so we mustn't lose them.
     645              :      */
     646      8959190 :     if (snapshot->subxcnt > 0 &&
     647        77819 :         (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
     648              :     {
     649        77819 :         newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
     650        77819 :         memcpy(newsnap->subxip, snapshot->subxip,
     651        77819 :                snapshot->subxcnt * sizeof(TransactionId));
     652              :     }
     653              :     else
     654      8881371 :         newsnap->subxip = NULL;
     655              : 
     656      8959190 :     return newsnap;
     657              : }
     658              : 
     659              : /*
     660              :  * FreeSnapshot
     661              :  *      Free the memory associated with a snapshot.
     662              :  */
     663              : static void
     664      8934077 : FreeSnapshot(Snapshot snapshot)
     665              : {
     666              :     Assert(snapshot->regd_count == 0);
     667              :     Assert(snapshot->active_count == 0);
     668              :     Assert(snapshot->copied);
     669              : 
     670      8934077 :     pfree(snapshot);
     671      8934077 : }
     672              : 
     673              : /*
     674              :  * PushActiveSnapshot
     675              :  *      Set the given snapshot as the current active snapshot
     676              :  *
     677              :  * If the passed snapshot is a statically-allocated one, or it is possibly
     678              :  * subject to a future command counter update, create a new long-lived copy
     679              :  * with active refcount=1.  Otherwise, only increment the refcount.
     680              :  */
     681              : void
     682      1084206 : PushActiveSnapshot(Snapshot snapshot)
     683              : {
     684      1084206 :     PushActiveSnapshotWithLevel(snapshot, GetCurrentTransactionNestLevel());
     685      1084206 : }
     686              : 
     687              : /*
     688              :  * PushActiveSnapshotWithLevel
     689              :  *      Set the given snapshot as the current active snapshot
     690              :  *
     691              :  * Same as PushActiveSnapshot except that caller can specify the
     692              :  * transaction nesting level that "owns" the snapshot.  This level
     693              :  * must not be deeper than the current top of the snapshot stack.
     694              :  */
     695              : void
     696      1232607 : PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
     697              : {
     698              :     ActiveSnapshotElt *newactive;
     699              : 
     700              :     Assert(snapshot != InvalidSnapshot);
     701              :     Assert(ActiveSnapshot == NULL || snap_level >= ActiveSnapshot->as_level);
     702              : 
     703      1232607 :     newactive = MemoryContextAlloc(TopTransactionContext, sizeof(ActiveSnapshotElt));
     704              : 
     705              :     /*
     706              :      * Checking SecondarySnapshot is probably useless here, but it seems
     707              :      * better to be sure.
     708              :      */
     709      1232607 :     if (snapshot == CurrentSnapshot || snapshot == SecondarySnapshot ||
     710       249862 :         !snapshot->copied)
     711       982745 :         newactive->as_snap = CopySnapshot(snapshot);
     712              :     else
     713       249862 :         newactive->as_snap = snapshot;
     714              : 
     715      1232607 :     newactive->as_next = ActiveSnapshot;
     716      1232607 :     newactive->as_level = snap_level;
     717              : 
     718      1232607 :     newactive->as_snap->active_count++;
     719              : 
     720      1232607 :     ActiveSnapshot = newactive;
     721      1232607 : }
     722              : 
     723              : /*
     724              :  * PushCopiedSnapshot
     725              :  *      As above, except forcibly copy the presented snapshot.
     726              :  *
     727              :  * This should be used when the ActiveSnapshot has to be modifiable, for
     728              :  * example if the caller intends to call UpdateActiveSnapshotCommandId.
     729              :  * The new snapshot will be released when popped from the stack.
     730              :  */
     731              : void
     732        60822 : PushCopiedSnapshot(Snapshot snapshot)
     733              : {
     734        60822 :     PushActiveSnapshot(CopySnapshot(snapshot));
     735        60822 : }
     736              : 
     737              : /*
     738              :  * UpdateActiveSnapshotCommandId
     739              :  *
     740              :  * Update the current CID of the active snapshot.  This can only be applied
     741              :  * to a snapshot that is not referenced elsewhere.
     742              :  */
     743              : void
     744        62624 : UpdateActiveSnapshotCommandId(void)
     745              : {
     746              :     CommandId   save_curcid,
     747              :                 curcid;
     748              : 
     749              :     Assert(ActiveSnapshot != NULL);
     750              :     Assert(ActiveSnapshot->as_snap->active_count == 1);
     751              :     Assert(ActiveSnapshot->as_snap->regd_count == 0);
     752              : 
     753              :     /*
     754              :      * Don't allow modification of the active snapshot during parallel
     755              :      * operation.  We share the snapshot to worker backends at the beginning
     756              :      * of parallel operation, so any change to the snapshot can lead to
     757              :      * inconsistencies.  We have other defenses against
     758              :      * CommandCounterIncrement, but there are a few places that call this
     759              :      * directly, so we put an additional guard here.
     760              :      */
     761        62624 :     save_curcid = ActiveSnapshot->as_snap->curcid;
     762        62624 :     curcid = GetCurrentCommandId(false);
     763        62624 :     if (IsInParallelMode() && save_curcid != curcid)
     764            0 :         elog(ERROR, "cannot modify commandid in active snapshot during a parallel operation");
     765        62624 :     ActiveSnapshot->as_snap->curcid = curcid;
     766        62624 : }
     767              : 
     768              : /*
     769              :  * PopActiveSnapshot
     770              :  *
     771              :  * Remove the topmost snapshot from the active snapshot stack, decrementing the
     772              :  * reference count, and free it if this was the last reference.
     773              :  */
     774              : void
     775      1204148 : PopActiveSnapshot(void)
     776              : {
     777              :     ActiveSnapshotElt *newstack;
     778              : 
     779      1204148 :     newstack = ActiveSnapshot->as_next;
     780              : 
     781              :     Assert(ActiveSnapshot->as_snap->active_count > 0);
     782              : 
     783      1204148 :     ActiveSnapshot->as_snap->active_count--;
     784              : 
     785      1204148 :     if (ActiveSnapshot->as_snap->active_count == 0 &&
     786      1185503 :         ActiveSnapshot->as_snap->regd_count == 0)
     787       869914 :         FreeSnapshot(ActiveSnapshot->as_snap);
     788              : 
     789      1204148 :     pfree(ActiveSnapshot);
     790      1204148 :     ActiveSnapshot = newstack;
     791              : 
     792      1204148 :     SnapshotResetXmin();
     793      1204148 : }
     794              : 
     795              : /*
     796              :  * GetActiveSnapshot
     797              :  *      Return the topmost snapshot in the Active stack.
     798              :  */
     799              : Snapshot
     800       563099 : GetActiveSnapshot(void)
     801              : {
     802              :     Assert(ActiveSnapshot != NULL);
     803              : 
     804       563099 :     return ActiveSnapshot->as_snap;
     805              : }
     806              : 
     807              : /*
     808              :  * ActiveSnapshotSet
     809              :  *      Return whether there is at least one snapshot in the Active stack
     810              :  */
     811              : bool
     812       566515 : ActiveSnapshotSet(void)
     813              : {
     814       566515 :     return ActiveSnapshot != NULL;
     815              : }
     816              : 
     817              : /*
     818              :  * RegisterSnapshot
     819              :  *      Register a snapshot as being in use by the current resource owner
     820              :  *
     821              :  * If InvalidSnapshot is passed, it is not registered.
     822              :  */
     823              : Snapshot
     824      9181035 : RegisterSnapshot(Snapshot snapshot)
     825              : {
     826      9181035 :     if (snapshot == InvalidSnapshot)
     827       625470 :         return InvalidSnapshot;
     828              : 
     829      8555565 :     return RegisterSnapshotOnOwner(snapshot, CurrentResourceOwner);
     830              : }
     831              : 
     832              : /*
     833              :  * RegisterSnapshotOnOwner
     834              :  *      As above, but use the specified resource owner
     835              :  */
     836              : Snapshot
     837      8555679 : RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner)
     838              : {
     839              :     Snapshot    snap;
     840              : 
     841      8555679 :     if (snapshot == InvalidSnapshot)
     842            0 :         return InvalidSnapshot;
     843              : 
     844              :     /* Static snapshot?  Create a persistent copy */
     845      8555679 :     snap = snapshot->copied ? snapshot : CopySnapshot(snapshot);
     846              : 
     847              :     /* and tell resowner.c about it */
     848      8555679 :     ResourceOwnerEnlarge(owner);
     849      8555679 :     snap->regd_count++;
     850      8555679 :     ResourceOwnerRememberSnapshot(owner, snap);
     851              : 
     852      8555679 :     if (snap->regd_count == 1)
     853      8222344 :         pairingheap_add(&RegisteredSnapshots, &snap->ph_node);
     854              : 
     855      8555679 :     return snap;
     856              : }
     857              : 
     858              : /*
     859              :  * UnregisterSnapshot
     860              :  *
     861              :  * Decrement the reference count of a snapshot, remove the corresponding
     862              :  * reference from CurrentResourceOwner, and free the snapshot if no more
     863              :  * references remain.
     864              :  */
     865              : void
     866      9099658 : UnregisterSnapshot(Snapshot snapshot)
     867              : {
     868      9099658 :     if (snapshot == NULL)
     869       596355 :         return;
     870              : 
     871      8503303 :     UnregisterSnapshotFromOwner(snapshot, CurrentResourceOwner);
     872              : }
     873              : 
     874              : /*
     875              :  * UnregisterSnapshotFromOwner
     876              :  *      As above, but use the specified resource owner
     877              :  */
     878              : void
     879      8525619 : UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner)
     880              : {
     881      8525619 :     if (snapshot == NULL)
     882            0 :         return;
     883              : 
     884      8525619 :     ResourceOwnerForgetSnapshot(owner, snapshot);
     885      8525619 :     UnregisterSnapshotNoOwner(snapshot);
     886              : }
     887              : 
     888              : static void
     889      8555679 : UnregisterSnapshotNoOwner(Snapshot snapshot)
     890              : {
     891              :     Assert(snapshot->regd_count > 0);
     892              :     Assert(!pairingheap_is_empty(&RegisteredSnapshots));
     893              : 
     894      8555679 :     snapshot->regd_count--;
     895      8555679 :     if (snapshot->regd_count == 0)
     896      8222344 :         pairingheap_remove(&RegisteredSnapshots, &snapshot->ph_node);
     897              : 
     898      8555679 :     if (snapshot->regd_count == 0 && snapshot->active_count == 0)
     899              :     {
     900      8061297 :         FreeSnapshot(snapshot);
     901      8061297 :         SnapshotResetXmin();
     902              :     }
     903      8555679 : }
     904              : 
     905              : /*
     906              :  * Comparison function for RegisteredSnapshots heap.  Snapshots are ordered
     907              :  * by xmin, so that the snapshot with smallest xmin is at the top.
     908              :  */
     909              : static int
     910      8219336 : xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
     911              : {
     912      8219336 :     const SnapshotData *asnap = pairingheap_const_container(SnapshotData, ph_node, a);
     913      8219336 :     const SnapshotData *bsnap = pairingheap_const_container(SnapshotData, ph_node, b);
     914              : 
     915      8219336 :     if (TransactionIdPrecedes(asnap->xmin, bsnap->xmin))
     916        61009 :         return 1;
     917      8158327 :     else if (TransactionIdFollows(asnap->xmin, bsnap->xmin))
     918        11315 :         return -1;
     919              :     else
     920      8147012 :         return 0;
     921              : }
     922              : 
     923              : /*
     924              :  * SnapshotResetXmin
     925              :  *
     926              :  * If there are no more snapshots, we can reset our PGPROC->xmin to
     927              :  * InvalidTransactionId. Note we can do this without locking because we assume
     928              :  * that storing an Xid is atomic.
     929              :  *
     930              :  * Even if there are some remaining snapshots, we may be able to advance our
     931              :  * PGPROC->xmin to some degree.  This typically happens when a portal is
     932              :  * dropped.  For efficiency, we only consider recomputing PGPROC->xmin when
     933              :  * the active snapshot stack is empty; this allows us not to need to track
     934              :  * which active snapshot is oldest.
     935              :  */
     936              : static void
     937     10547472 : SnapshotResetXmin(void)
     938              : {
     939              :     Snapshot    minSnapshot;
     940              : 
     941     10547472 :     if (ActiveSnapshot != NULL)
     942      7586658 :         return;
     943              : 
     944      2960814 :     if (pairingheap_is_empty(&RegisteredSnapshots))
     945              :     {
     946       948746 :         MyProc->xmin = TransactionXmin = InvalidTransactionId;
     947       948746 :         return;
     948              :     }
     949              : 
     950      2012068 :     minSnapshot = pairingheap_container(SnapshotData, ph_node,
     951              :                                         pairingheap_first(&RegisteredSnapshots));
     952              : 
     953      2012068 :     if (TransactionIdPrecedes(MyProc->xmin, minSnapshot->xmin))
     954         4089 :         MyProc->xmin = TransactionXmin = minSnapshot->xmin;
     955              : }
     956              : 
     957              : /*
     958              :  * AtSubCommit_Snapshot
     959              :  */
     960              : void
     961         5381 : AtSubCommit_Snapshot(int level)
     962              : {
     963              :     ActiveSnapshotElt *active;
     964              : 
     965              :     /*
     966              :      * Relabel the active snapshots set in this subtransaction as though they
     967              :      * are owned by the parent subxact.
     968              :      */
     969         5381 :     for (active = ActiveSnapshot; active != NULL; active = active->as_next)
     970              :     {
     971         4565 :         if (active->as_level < level)
     972         4565 :             break;
     973            0 :         active->as_level = level - 1;
     974              :     }
     975         5381 : }
     976              : 
     977              : /*
     978              :  * AtSubAbort_Snapshot
     979              :  *      Clean up snapshots after a subtransaction abort
     980              :  */
     981              : void
     982         4722 : AtSubAbort_Snapshot(int level)
     983              : {
     984              :     /* Forget the active snapshots set by this subtransaction */
     985         7588 :     while (ActiveSnapshot && ActiveSnapshot->as_level >= level)
     986              :     {
     987              :         ActiveSnapshotElt *next;
     988              : 
     989         2866 :         next = ActiveSnapshot->as_next;
     990              : 
     991              :         /*
     992              :          * Decrement the snapshot's active count.  If it's still registered or
     993              :          * marked as active by an outer subtransaction, we can't free it yet.
     994              :          */
     995              :         Assert(ActiveSnapshot->as_snap->active_count >= 1);
     996         2866 :         ActiveSnapshot->as_snap->active_count -= 1;
     997              : 
     998         2866 :         if (ActiveSnapshot->as_snap->active_count == 0 &&
     999         2866 :             ActiveSnapshot->as_snap->regd_count == 0)
    1000         2866 :             FreeSnapshot(ActiveSnapshot->as_snap);
    1001              : 
    1002              :         /* and free the stack element */
    1003         2866 :         pfree(ActiveSnapshot);
    1004              : 
    1005         2866 :         ActiveSnapshot = next;
    1006              :     }
    1007              : 
    1008         4722 :     SnapshotResetXmin();
    1009         4722 : }
    1010              : 
    1011              : /*
    1012              :  * AtEOXact_Snapshot
    1013              :  *      Snapshot manager's cleanup function for end of transaction
    1014              :  */
    1015              : void
    1016       564875 : AtEOXact_Snapshot(bool isCommit, bool resetXmin)
    1017              : {
    1018              :     /*
    1019              :      * In transaction-snapshot mode we must release our privately-managed
    1020              :      * reference to the transaction snapshot.  We must remove it from
    1021              :      * RegisteredSnapshots to keep the check below happy.  But we don't bother
    1022              :      * to do FreeSnapshot, for two reasons: the memory will go away with
    1023              :      * TopTransactionContext anyway, and if someone has left the snapshot
    1024              :      * stacked as active, we don't want the code below to be chasing through a
    1025              :      * dangling pointer.
    1026              :      */
    1027       564875 :     if (FirstXactSnapshot != NULL)
    1028              :     {
    1029              :         Assert(FirstXactSnapshot->regd_count > 0);
    1030              :         Assert(!pairingheap_is_empty(&RegisteredSnapshots));
    1031         3112 :         pairingheap_remove(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
    1032              :     }
    1033       564875 :     FirstXactSnapshot = NULL;
    1034              : 
    1035              :     /*
    1036              :      * If we exported any snapshots, clean them up.
    1037              :      */
    1038       564875 :     if (exportedSnapshots != NIL)
    1039              :     {
    1040              :         ListCell   *lc;
    1041              : 
    1042              :         /*
    1043              :          * Get rid of the files.  Unlink failure is only a WARNING because (1)
    1044              :          * it's too late to abort the transaction, and (2) leaving a leaked
    1045              :          * file around has little real consequence anyway.
    1046              :          *
    1047              :          * We also need to remove the snapshots from RegisteredSnapshots to
    1048              :          * prevent a warning below.
    1049              :          *
    1050              :          * As with the FirstXactSnapshot, we don't need to free resources of
    1051              :          * the snapshot itself as it will go away with the memory context.
    1052              :          */
    1053           18 :         foreach(lc, exportedSnapshots)
    1054              :         {
    1055            9 :             ExportedSnapshot *esnap = (ExportedSnapshot *) lfirst(lc);
    1056              : 
    1057            9 :             if (unlink(esnap->snapfile))
    1058            0 :                 elog(WARNING, "could not unlink file \"%s\": %m",
    1059              :                      esnap->snapfile);
    1060              : 
    1061            9 :             pairingheap_remove(&RegisteredSnapshots,
    1062            9 :                                &esnap->snapshot->ph_node);
    1063              :         }
    1064              : 
    1065            9 :         exportedSnapshots = NIL;
    1066              :     }
    1067              : 
    1068              :     /* Drop catalog snapshot if any */
    1069       564875 :     InvalidateCatalogSnapshot();
    1070              : 
    1071              :     /* On commit, complain about leftover snapshots */
    1072       564875 :     if (isCommit)
    1073              :     {
    1074              :         ActiveSnapshotElt *active;
    1075              : 
    1076       538181 :         if (!pairingheap_is_empty(&RegisteredSnapshots))
    1077            0 :             elog(WARNING, "registered snapshots seem to remain after cleanup");
    1078              : 
    1079              :         /* complain about unpopped active snapshots */
    1080       538181 :         for (active = ActiveSnapshot; active != NULL; active = active->as_next)
    1081            0 :             elog(WARNING, "snapshot %p still active", active);
    1082              :     }
    1083              : 
    1084              :     /*
    1085              :      * And reset our state.  We don't need to free the memory explicitly --
    1086              :      * it'll go away with TopTransactionContext.
    1087              :      */
    1088       564875 :     ActiveSnapshot = NULL;
    1089       564875 :     pairingheap_reset(&RegisteredSnapshots);
    1090              : 
    1091       564875 :     CurrentSnapshot = NULL;
    1092       564875 :     SecondarySnapshot = NULL;
    1093              : 
    1094       564875 :     FirstSnapshotSet = false;
    1095              : 
    1096              :     /*
    1097              :      * During normal commit processing, we call ProcArrayEndTransaction() to
    1098              :      * reset the MyProc->xmin. That call happens prior to the call to
    1099              :      * AtEOXact_Snapshot(), so we need not touch xmin here at all.
    1100              :      */
    1101       564875 :     if (resetXmin)
    1102        27013 :         SnapshotResetXmin();
    1103              : 
    1104              :     Assert(resetXmin || MyProc->xmin == 0);
    1105       564875 : }
    1106              : 
    1107              : 
    1108              : /*
    1109              :  * ExportSnapshot
    1110              :  *      Export the snapshot to a file so that other backends can import it.
    1111              :  *      Returns the token (the file name) that can be used to import this
    1112              :  *      snapshot.
    1113              :  */
    1114              : char *
    1115            9 : ExportSnapshot(Snapshot snapshot)
    1116              : {
    1117              :     TransactionId topXid;
    1118              :     TransactionId *children;
    1119              :     ExportedSnapshot *esnap;
    1120              :     int         nchildren;
    1121              :     int         addTopXid;
    1122              :     StringInfoData buf;
    1123              :     FILE       *f;
    1124              :     int         i;
    1125              :     MemoryContext oldcxt;
    1126              :     char        path[MAXPGPATH];
    1127              :     char        pathtmp[MAXPGPATH];
    1128              : 
    1129              :     /*
    1130              :      * It's tempting to call RequireTransactionBlock here, since it's not very
    1131              :      * useful to export a snapshot that will disappear immediately afterwards.
    1132              :      * However, we haven't got enough information to do that, since we don't
    1133              :      * know if we're at top level or not.  For example, we could be inside a
    1134              :      * plpgsql function that is going to fire off other transactions via
    1135              :      * dblink.  Rather than disallow perfectly legitimate usages, don't make a
    1136              :      * check.
    1137              :      *
    1138              :      * Also note that we don't make any restriction on the transaction's
    1139              :      * isolation level; however, importers must check the level if they are
    1140              :      * serializable.
    1141              :      */
    1142              : 
    1143              :     /*
    1144              :      * Get our transaction ID if there is one, to include in the snapshot.
    1145              :      */
    1146            9 :     topXid = GetTopTransactionIdIfAny();
    1147              : 
    1148              :     /*
    1149              :      * We cannot export a snapshot from a subtransaction because there's no
    1150              :      * easy way for importers to verify that the same subtransaction is still
    1151              :      * running.
    1152              :      */
    1153            9 :     if (IsSubTransaction())
    1154            0 :         ereport(ERROR,
    1155              :                 (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
    1156              :                  errmsg("cannot export a snapshot from a subtransaction")));
    1157              : 
    1158              :     /*
    1159              :      * We do however allow previous committed subtransactions to exist.
    1160              :      * Importers of the snapshot must see them as still running, so get their
    1161              :      * XIDs to add them to the snapshot.
    1162              :      */
    1163            9 :     nchildren = xactGetCommittedChildren(&children);
    1164              : 
    1165              :     /*
    1166              :      * Generate file path for the snapshot.  We start numbering of snapshots
    1167              :      * inside the transaction from 1.
    1168              :      */
    1169            9 :     snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%08X-%d",
    1170            9 :              MyProc->vxid.procNumber, MyProc->vxid.lxid,
    1171            9 :              list_length(exportedSnapshots) + 1);
    1172              : 
    1173              :     /*
    1174              :      * Copy the snapshot into TopTransactionContext, add it to the
    1175              :      * exportedSnapshots list, and mark it pseudo-registered.  We do this to
    1176              :      * ensure that the snapshot's xmin is honored for the rest of the
    1177              :      * transaction.
    1178              :      */
    1179            9 :     snapshot = CopySnapshot(snapshot);
    1180              : 
    1181            9 :     oldcxt = MemoryContextSwitchTo(TopTransactionContext);
    1182            9 :     esnap = palloc_object(ExportedSnapshot);
    1183            9 :     esnap->snapfile = pstrdup(path);
    1184            9 :     esnap->snapshot = snapshot;
    1185            9 :     exportedSnapshots = lappend(exportedSnapshots, esnap);
    1186            9 :     MemoryContextSwitchTo(oldcxt);
    1187              : 
    1188            9 :     snapshot->regd_count++;
    1189            9 :     pairingheap_add(&RegisteredSnapshots, &snapshot->ph_node);
    1190              : 
    1191              :     /*
    1192              :      * Fill buf with a text serialization of the snapshot, plus identification
    1193              :      * data about this transaction.  The format expected by ImportSnapshot is
    1194              :      * pretty rigid: each line must be fieldname:value.
    1195              :      */
    1196            9 :     initStringInfo(&buf);
    1197              : 
    1198            9 :     appendStringInfo(&buf, "vxid:%d/%u\n", MyProc->vxid.procNumber, MyProc->vxid.lxid);
    1199            9 :     appendStringInfo(&buf, "pid:%d\n", MyProcPid);
    1200            9 :     appendStringInfo(&buf, "dbid:%u\n", MyDatabaseId);
    1201            9 :     appendStringInfo(&buf, "iso:%d\n", XactIsoLevel);
    1202            9 :     appendStringInfo(&buf, "ro:%d\n", XactReadOnly);
    1203              : 
    1204            9 :     appendStringInfo(&buf, "xmin:%u\n", snapshot->xmin);
    1205            9 :     appendStringInfo(&buf, "xmax:%u\n", snapshot->xmax);
    1206              : 
    1207              :     /*
    1208              :      * We must include our own top transaction ID in the top-xid data, since
    1209              :      * by definition we will still be running when the importing transaction
    1210              :      * adopts the snapshot, but GetSnapshotData never includes our own XID in
    1211              :      * the snapshot.  (There must, therefore, be enough room to add it.)
    1212              :      *
    1213              :      * However, it could be that our topXid is after the xmax, in which case
    1214              :      * we shouldn't include it because xip[] members are expected to be before
    1215              :      * xmax.  (We need not make the same check for subxip[] members, see
    1216              :      * snapshot.h.)
    1217              :      */
    1218            9 :     addTopXid = (TransactionIdIsValid(topXid) &&
    1219            9 :                  TransactionIdPrecedes(topXid, snapshot->xmax)) ? 1 : 0;
    1220            9 :     appendStringInfo(&buf, "xcnt:%d\n", snapshot->xcnt + addTopXid);
    1221            9 :     for (i = 0; i < snapshot->xcnt; i++)
    1222            0 :         appendStringInfo(&buf, "xip:%u\n", snapshot->xip[i]);
    1223            9 :     if (addTopXid)
    1224            0 :         appendStringInfo(&buf, "xip:%u\n", topXid);
    1225              : 
    1226              :     /*
    1227              :      * Similarly, we add our subcommitted child XIDs to the subxid data. Here,
    1228              :      * we have to cope with possible overflow.
    1229              :      */
    1230           18 :     if (snapshot->suboverflowed ||
    1231            9 :         snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
    1232            0 :         appendStringInfoString(&buf, "sof:1\n");
    1233              :     else
    1234              :     {
    1235            9 :         appendStringInfoString(&buf, "sof:0\n");
    1236            9 :         appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
    1237            9 :         for (i = 0; i < snapshot->subxcnt; i++)
    1238            0 :             appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
    1239            9 :         for (i = 0; i < nchildren; i++)
    1240            0 :             appendStringInfo(&buf, "sxp:%u\n", children[i]);
    1241              :     }
    1242            9 :     appendStringInfo(&buf, "rec:%u\n", snapshot->takenDuringRecovery);
    1243              : 
    1244              :     /*
    1245              :      * Now write the text representation into a file.  We first write to a
    1246              :      * ".tmp" filename, and rename to final filename if no error.  This
    1247              :      * ensures that no other backend can read an incomplete file
    1248              :      * (ImportSnapshot won't allow it because of its valid-characters check).
    1249              :      */
    1250            9 :     snprintf(pathtmp, sizeof(pathtmp), "%s.tmp", path);
    1251            9 :     if (!(f = AllocateFile(pathtmp, PG_BINARY_W)))
    1252            0 :         ereport(ERROR,
    1253              :                 (errcode_for_file_access(),
    1254              :                  errmsg("could not create file \"%s\": %m", pathtmp)));
    1255              : 
    1256            9 :     if (fwrite(buf.data, buf.len, 1, f) != 1)
    1257            0 :         ereport(ERROR,
    1258              :                 (errcode_for_file_access(),
    1259              :                  errmsg("could not write to file \"%s\": %m", pathtmp)));
    1260              : 
    1261              :     /* no fsync() since file need not survive a system crash */
    1262              : 
    1263            9 :     if (FreeFile(f))
    1264            0 :         ereport(ERROR,
    1265              :                 (errcode_for_file_access(),
    1266              :                  errmsg("could not write to file \"%s\": %m", pathtmp)));
    1267              : 
    1268              :     /*
    1269              :      * Now that we have written everything into a .tmp file, rename the file
    1270              :      * to remove the .tmp suffix.
    1271              :      */
    1272            9 :     if (rename(pathtmp, path) < 0)
    1273            0 :         ereport(ERROR,
    1274              :                 (errcode_for_file_access(),
    1275              :                  errmsg("could not rename file \"%s\" to \"%s\": %m",
    1276              :                         pathtmp, path)));
    1277              : 
    1278              :     /*
    1279              :      * The basename of the file is what we return from pg_export_snapshot().
    1280              :      * It's already in path in a textual format and we know that the path
    1281              :      * starts with SNAPSHOT_EXPORT_DIR.  Skip over the prefix and the slash
    1282              :      * and pstrdup it so as not to return the address of a local variable.
    1283              :      */
    1284            9 :     return pstrdup(path + strlen(SNAPSHOT_EXPORT_DIR) + 1);
    1285              : }
    1286              : 
    1287              : /*
    1288              :  * pg_export_snapshot
    1289              :  *      SQL-callable wrapper for ExportSnapshot.
    1290              :  */
    1291              : Datum
    1292            8 : pg_export_snapshot(PG_FUNCTION_ARGS)
    1293              : {
    1294              :     char       *snapshotName;
    1295              : 
    1296            8 :     snapshotName = ExportSnapshot(GetActiveSnapshot());
    1297            8 :     PG_RETURN_TEXT_P(cstring_to_text(snapshotName));
    1298              : }
    1299              : 
    1300              : 
    1301              : /*
    1302              :  * Parsing subroutines for ImportSnapshot: parse a line with the given
    1303              :  * prefix followed by a value, and advance *s to the next line.  The
    1304              :  * filename is provided for use in error messages.
    1305              :  */
    1306              : static int
    1307          112 : parseIntFromText(const char *prefix, char **s, const char *filename)
    1308              : {
    1309          112 :     char       *ptr = *s;
    1310          112 :     int         prefixlen = strlen(prefix);
    1311              :     int         val;
    1312              : 
    1313          112 :     if (strncmp(ptr, prefix, prefixlen) != 0)
    1314            0 :         ereport(ERROR,
    1315              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1316              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1317          112 :     ptr += prefixlen;
    1318          112 :     if (sscanf(ptr, "%d", &val) != 1)
    1319            0 :         ereport(ERROR,
    1320              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1321              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1322          112 :     ptr = strchr(ptr, '\n');
    1323          112 :     if (!ptr)
    1324            0 :         ereport(ERROR,
    1325              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1326              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1327          112 :     *s = ptr + 1;
    1328          112 :     return val;
    1329              : }
    1330              : 
    1331              : static TransactionId
    1332           48 : parseXidFromText(const char *prefix, char **s, const char *filename)
    1333              : {
    1334           48 :     char       *ptr = *s;
    1335           48 :     int         prefixlen = strlen(prefix);
    1336              :     TransactionId val;
    1337              : 
    1338           48 :     if (strncmp(ptr, prefix, prefixlen) != 0)
    1339            0 :         ereport(ERROR,
    1340              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1341              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1342           48 :     ptr += prefixlen;
    1343           48 :     if (sscanf(ptr, "%u", &val) != 1)
    1344            0 :         ereport(ERROR,
    1345              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1346              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1347           48 :     ptr = strchr(ptr, '\n');
    1348           48 :     if (!ptr)
    1349            0 :         ereport(ERROR,
    1350              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1351              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1352           48 :     *s = ptr + 1;
    1353           48 :     return val;
    1354              : }
    1355              : 
    1356              : static void
    1357           16 : parseVxidFromText(const char *prefix, char **s, const char *filename,
    1358              :                   VirtualTransactionId *vxid)
    1359              : {
    1360           16 :     char       *ptr = *s;
    1361           16 :     int         prefixlen = strlen(prefix);
    1362              : 
    1363           16 :     if (strncmp(ptr, prefix, prefixlen) != 0)
    1364            0 :         ereport(ERROR,
    1365              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1366              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1367           16 :     ptr += prefixlen;
    1368           16 :     if (sscanf(ptr, "%d/%u", &vxid->procNumber, &vxid->localTransactionId) != 2)
    1369            0 :         ereport(ERROR,
    1370              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1371              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1372           16 :     ptr = strchr(ptr, '\n');
    1373           16 :     if (!ptr)
    1374            0 :         ereport(ERROR,
    1375              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1376              :                  errmsg("invalid snapshot data in file \"%s\"", filename)));
    1377           16 :     *s = ptr + 1;
    1378           16 : }
    1379              : 
    1380              : /*
    1381              :  * ImportSnapshot
    1382              :  *      Import a previously exported snapshot.  The argument should be a
    1383              :  *      filename in SNAPSHOT_EXPORT_DIR.  Load the snapshot from that file.
    1384              :  *      This is called by "SET TRANSACTION SNAPSHOT 'foo'".
    1385              :  */
    1386              : void
    1387           22 : ImportSnapshot(const char *idstr)
    1388              : {
    1389              :     char        path[MAXPGPATH];
    1390              :     FILE       *f;
    1391              :     struct stat stat_buf;
    1392              :     char       *filebuf;
    1393              :     int         xcnt;
    1394              :     int         i;
    1395              :     VirtualTransactionId src_vxid;
    1396              :     int         src_pid;
    1397              :     Oid         src_dbid;
    1398              :     int         src_isolevel;
    1399              :     bool        src_readonly;
    1400              :     SnapshotData snapshot;
    1401              : 
    1402              :     /*
    1403              :      * Must be at top level of a fresh transaction.  Note in particular that
    1404              :      * we check we haven't acquired an XID --- if we have, it's conceivable
    1405              :      * that the snapshot would show it as not running, making for very screwy
    1406              :      * behavior.
    1407              :      */
    1408           44 :     if (FirstSnapshotSet ||
    1409           44 :         GetTopTransactionIdIfAny() != InvalidTransactionId ||
    1410           22 :         IsSubTransaction())
    1411            0 :         ereport(ERROR,
    1412              :                 (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
    1413              :                  errmsg("SET TRANSACTION SNAPSHOT must be called before any query")));
    1414              : 
    1415              :     /*
    1416              :      * If we are in read committed mode then the next query would execute with
    1417              :      * a new snapshot thus making this function call quite useless.
    1418              :      */
    1419           22 :     if (!IsolationUsesXactSnapshot())
    1420            0 :         ereport(ERROR,
    1421              :                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    1422              :                  errmsg("a snapshot-importing transaction must have isolation level SERIALIZABLE or REPEATABLE READ")));
    1423              : 
    1424              :     /*
    1425              :      * Verify the identifier: only 0-9, A-F and hyphens are allowed.  We do
    1426              :      * this mainly to prevent reading arbitrary files.
    1427              :      */
    1428           22 :     if (strspn(idstr, "0123456789ABCDEF-") != strlen(idstr))
    1429            3 :         ereport(ERROR,
    1430              :                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
    1431              :                  errmsg("invalid snapshot identifier: \"%s\"", idstr)));
    1432              : 
    1433              :     /* OK, read the file */
    1434           19 :     snprintf(path, MAXPGPATH, SNAPSHOT_EXPORT_DIR "/%s", idstr);
    1435              : 
    1436           19 :     f = AllocateFile(path, PG_BINARY_R);
    1437           19 :     if (!f)
    1438              :     {
    1439              :         /*
    1440              :          * If file is missing while identifier has a correct format, avoid
    1441              :          * system errors.
    1442              :          */
    1443            3 :         if (errno == ENOENT)
    1444            3 :             ereport(ERROR,
    1445              :                     (errcode(ERRCODE_UNDEFINED_OBJECT),
    1446              :                      errmsg("snapshot \"%s\" does not exist", idstr)));
    1447              :         else
    1448            0 :             ereport(ERROR,
    1449              :                     (errcode_for_file_access(),
    1450              :                      errmsg("could not open file \"%s\" for reading: %m",
    1451              :                             path)));
    1452              :     }
    1453              : 
    1454              :     /* get the size of the file so that we know how much memory we need */
    1455           16 :     if (fstat(fileno(f), &stat_buf))
    1456            0 :         elog(ERROR, "could not stat file \"%s\": %m", path);
    1457              : 
    1458              :     /* and read the file into a palloc'd string */
    1459           16 :     filebuf = (char *) palloc(stat_buf.st_size + 1);
    1460           16 :     if (fread(filebuf, stat_buf.st_size, 1, f) != 1)
    1461            0 :         elog(ERROR, "could not read file \"%s\": %m", path);
    1462              : 
    1463           16 :     filebuf[stat_buf.st_size] = '\0';
    1464              : 
    1465           16 :     FreeFile(f);
    1466              : 
    1467              :     /*
    1468              :      * Construct a snapshot struct by parsing the file content.
    1469              :      */
    1470           16 :     memset(&snapshot, 0, sizeof(snapshot));
    1471              : 
    1472           16 :     parseVxidFromText("vxid:", &filebuf, path, &src_vxid);
    1473           16 :     src_pid = parseIntFromText("pid:", &filebuf, path);
    1474              :     /* we abuse parseXidFromText a bit here ... */
    1475           16 :     src_dbid = parseXidFromText("dbid:", &filebuf, path);
    1476           16 :     src_isolevel = parseIntFromText("iso:", &filebuf, path);
    1477           16 :     src_readonly = parseIntFromText("ro:", &filebuf, path);
    1478              : 
    1479           16 :     snapshot.snapshot_type = SNAPSHOT_MVCC;
    1480              : 
    1481           16 :     snapshot.xmin = parseXidFromText("xmin:", &filebuf, path);
    1482           16 :     snapshot.xmax = parseXidFromText("xmax:", &filebuf, path);
    1483              : 
    1484           16 :     snapshot.xcnt = xcnt = parseIntFromText("xcnt:", &filebuf, path);
    1485              : 
    1486              :     /* sanity-check the xid count before palloc */
    1487           16 :     if (xcnt < 0 || xcnt > GetMaxSnapshotXidCount())
    1488            0 :         ereport(ERROR,
    1489              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1490              :                  errmsg("invalid snapshot data in file \"%s\"", path)));
    1491              : 
    1492           16 :     snapshot.xip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
    1493           16 :     for (i = 0; i < xcnt; i++)
    1494            0 :         snapshot.xip[i] = parseXidFromText("xip:", &filebuf, path);
    1495              : 
    1496           16 :     snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
    1497              : 
    1498           16 :     if (!snapshot.suboverflowed)
    1499              :     {
    1500           16 :         snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
    1501              : 
    1502              :         /* sanity-check the xid count before palloc */
    1503           16 :         if (xcnt < 0 || xcnt > GetMaxSnapshotSubxidCount())
    1504            0 :             ereport(ERROR,
    1505              :                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1506              :                      errmsg("invalid snapshot data in file \"%s\"", path)));
    1507              : 
    1508           16 :         snapshot.subxip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
    1509           16 :         for (i = 0; i < xcnt; i++)
    1510            0 :             snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
    1511              :     }
    1512              :     else
    1513              :     {
    1514            0 :         snapshot.subxcnt = 0;
    1515            0 :         snapshot.subxip = NULL;
    1516              :     }
    1517              : 
    1518           16 :     snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
    1519              : 
    1520              :     /*
    1521              :      * Do some additional sanity checking, just to protect ourselves.  We
    1522              :      * don't trouble to check the array elements, just the most critical
    1523              :      * fields.
    1524              :      */
    1525           16 :     if (!VirtualTransactionIdIsValid(src_vxid) ||
    1526           16 :         !OidIsValid(src_dbid) ||
    1527           16 :         !TransactionIdIsNormal(snapshot.xmin) ||
    1528           16 :         !TransactionIdIsNormal(snapshot.xmax))
    1529            0 :         ereport(ERROR,
    1530              :                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
    1531              :                  errmsg("invalid snapshot data in file \"%s\"", path)));
    1532              : 
    1533              :     /*
    1534              :      * If we're serializable, the source transaction must be too, otherwise
    1535              :      * predicate.c has problems (SxactGlobalXmin could go backwards).  Also, a
    1536              :      * non-read-only transaction can't adopt a snapshot from a read-only
    1537              :      * transaction, as predicate.c handles the cases very differently.
    1538              :      */
    1539           16 :     if (IsolationIsSerializable())
    1540              :     {
    1541            0 :         if (src_isolevel != XACT_SERIALIZABLE)
    1542            0 :             ereport(ERROR,
    1543              :                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    1544              :                      errmsg("a serializable transaction cannot import a snapshot from a non-serializable transaction")));
    1545            0 :         if (src_readonly && !XactReadOnly)
    1546            0 :             ereport(ERROR,
    1547              :                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    1548              :                      errmsg("a non-read-only serializable transaction cannot import a snapshot from a read-only transaction")));
    1549              :     }
    1550              : 
    1551              :     /*
    1552              :      * We cannot import a snapshot that was taken in a different database,
    1553              :      * because vacuum calculates OldestXmin on a per-database basis; so the
    1554              :      * source transaction's xmin doesn't protect us from data loss.  This
    1555              :      * restriction could be removed if the source transaction were to mark its
    1556              :      * xmin as being globally applicable.  But that would require some
    1557              :      * additional syntax, since that has to be known when the snapshot is
    1558              :      * initially taken.  (See pgsql-hackers discussion of 2011-10-21.)
    1559              :      */
    1560           16 :     if (src_dbid != MyDatabaseId)
    1561            0 :         ereport(ERROR,
    1562              :                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    1563              :                  errmsg("cannot import a snapshot from a different database")));
    1564              : 
    1565              :     /* OK, install the snapshot */
    1566           16 :     SetTransactionSnapshot(&snapshot, &src_vxid, src_pid, NULL);
    1567           16 : }
    1568              : 
    1569              : /*
    1570              :  * XactHasExportedSnapshots
    1571              :  *      Test whether current transaction has exported any snapshots.
    1572              :  */
    1573              : bool
    1574          333 : XactHasExportedSnapshots(void)
    1575              : {
    1576          333 :     return (exportedSnapshots != NIL);
    1577              : }
    1578              : 
    1579              : /*
    1580              :  * DeleteAllExportedSnapshotFiles
    1581              :  *      Clean up any files that have been left behind by a crashed backend
    1582              :  *      that had exported snapshots before it died.
    1583              :  *
    1584              :  * This should be called during database startup or crash recovery.
    1585              :  */
    1586              : void
    1587          224 : DeleteAllExportedSnapshotFiles(void)
    1588              : {
    1589              :     char        buf[MAXPGPATH + sizeof(SNAPSHOT_EXPORT_DIR)];
    1590              :     DIR        *s_dir;
    1591              :     struct dirent *s_de;
    1592              : 
    1593              :     /*
    1594              :      * Problems in reading the directory, or unlinking files, are reported at
    1595              :      * LOG level.  Since we're running in the startup process, ERROR level
    1596              :      * would prevent database start, and it's not important enough for that.
    1597              :      */
    1598          224 :     s_dir = AllocateDir(SNAPSHOT_EXPORT_DIR);
    1599              : 
    1600          672 :     while ((s_de = ReadDirExtended(s_dir, SNAPSHOT_EXPORT_DIR, LOG)) != NULL)
    1601              :     {
    1602          448 :         if (strcmp(s_de->d_name, ".") == 0 ||
    1603          224 :             strcmp(s_de->d_name, "..") == 0)
    1604          448 :             continue;
    1605              : 
    1606            0 :         snprintf(buf, sizeof(buf), SNAPSHOT_EXPORT_DIR "/%s", s_de->d_name);
    1607              : 
    1608            0 :         if (unlink(buf) != 0)
    1609            0 :             ereport(LOG,
    1610              :                     (errcode_for_file_access(),
    1611              :                      errmsg("could not remove file \"%s\": %m", buf)));
    1612              :     }
    1613              : 
    1614          224 :     FreeDir(s_dir);
    1615          224 : }
    1616              : 
    1617              : /*
    1618              :  * ThereAreNoPriorRegisteredSnapshots
    1619              :  *      Is the registered snapshot count less than or equal to one?
    1620              :  *
    1621              :  * Don't use this to settle important decisions.  While zero registrations and
    1622              :  * no ActiveSnapshot would confirm a certain idleness, the system makes no
    1623              :  * guarantees about the significance of one registered snapshot.
    1624              :  */
    1625              : bool
    1626           30 : ThereAreNoPriorRegisteredSnapshots(void)
    1627              : {
    1628           30 :     if (pairingheap_is_empty(&RegisteredSnapshots) ||
    1629            0 :         pairingheap_is_singular(&RegisteredSnapshots))
    1630           30 :         return true;
    1631              : 
    1632            0 :     return false;
    1633              : }
    1634              : 
    1635              : /*
    1636              :  * HaveRegisteredOrActiveSnapshot
    1637              :  *      Is there any registered or active snapshot?
    1638              :  *
    1639              :  * NB: Unless pushed or active, the cached catalog snapshot will not cause
    1640              :  * this function to return true. That allows this function to be used in
    1641              :  * checks enforcing a longer-lived snapshot.
    1642              :  */
    1643              : bool
    1644        28818 : HaveRegisteredOrActiveSnapshot(void)
    1645              : {
    1646        28818 :     if (ActiveSnapshot != NULL)
    1647        28546 :         return true;
    1648              : 
    1649              :     /*
    1650              :      * The catalog snapshot is in RegisteredSnapshots when valid, but can be
    1651              :      * removed at any time due to invalidation processing. If explicitly
    1652              :      * registered more than one snapshot has to be in RegisteredSnapshots.
    1653              :      */
    1654          272 :     if (CatalogSnapshot != NULL &&
    1655           16 :         pairingheap_is_singular(&RegisteredSnapshots))
    1656            0 :         return false;
    1657              : 
    1658          272 :     return !pairingheap_is_empty(&RegisteredSnapshots);
    1659              : }
    1660              : 
    1661              : 
    1662              : /*
    1663              :  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
    1664              :  * access to behave just like it did at a certain point in the past.
    1665              :  *
    1666              :  * Needed for logical decoding.
    1667              :  */
    1668              : void
    1669         5463 : SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids)
    1670              : {
    1671              :     Assert(historic_snapshot != NULL);
    1672              : 
    1673              :     /* setup the timetravel snapshot */
    1674         5463 :     HistoricSnapshot = historic_snapshot;
    1675              : 
    1676              :     /* setup (cmin, cmax) lookup hash */
    1677         5463 :     tuplecid_data = tuplecids;
    1678         5463 : }
    1679              : 
    1680              : 
    1681              : /*
    1682              :  * Make catalog snapshots behave normally again.
    1683              :  */
    1684              : void
    1685         5457 : TeardownHistoricSnapshot(bool is_error)
    1686              : {
    1687         5457 :     HistoricSnapshot = NULL;
    1688         5457 :     tuplecid_data = NULL;
    1689         5457 : }
    1690              : 
    1691              : bool
    1692     11292982 : HistoricSnapshotActive(void)
    1693              : {
    1694     11292982 :     return HistoricSnapshot != NULL;
    1695              : }
    1696              : 
    1697              : HTAB *
    1698          724 : HistoricSnapshotGetTupleCids(void)
    1699              : {
    1700              :     Assert(HistoricSnapshotActive());
    1701          724 :     return tuplecid_data;
    1702              : }
    1703              : 
    1704              : /*
    1705              :  * EstimateSnapshotSpace
    1706              :  *      Returns the size needed to store the given snapshot.
    1707              :  *
    1708              :  * We are exporting only required fields from the Snapshot, stored in
    1709              :  * SerializedSnapshotData.
    1710              :  */
    1711              : Size
    1712         1326 : EstimateSnapshotSpace(Snapshot snapshot)
    1713              : {
    1714              :     Size        size;
    1715              : 
    1716              :     Assert(snapshot != InvalidSnapshot);
    1717              :     Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
    1718              : 
    1719              :     /* We allocate any XID arrays needed in the same palloc block. */
    1720         1326 :     size = add_size(sizeof(SerializedSnapshotData),
    1721         1326 :                     mul_size(snapshot->xcnt, sizeof(TransactionId)));
    1722         1326 :     if (snapshot->subxcnt > 0 &&
    1723            2 :         (!snapshot->suboverflowed || snapshot->takenDuringRecovery))
    1724            2 :         size = add_size(size,
    1725            2 :                         mul_size(snapshot->subxcnt, sizeof(TransactionId)));
    1726              : 
    1727         1326 :     return size;
    1728              : }
    1729              : 
    1730              : /*
    1731              :  * SerializeSnapshot
    1732              :  *      Dumps the serialized snapshot (extracted from given snapshot) onto the
    1733              :  *      memory location at start_address.
    1734              :  */
    1735              : void
    1736         1159 : SerializeSnapshot(Snapshot snapshot, char *start_address)
    1737              : {
    1738              :     SerializedSnapshotData serialized_snapshot;
    1739              : 
    1740              :     Assert(snapshot->subxcnt >= 0);
    1741              : 
    1742              :     /* Copy all required fields */
    1743         1159 :     serialized_snapshot.xmin = snapshot->xmin;
    1744         1159 :     serialized_snapshot.xmax = snapshot->xmax;
    1745         1159 :     serialized_snapshot.xcnt = snapshot->xcnt;
    1746         1159 :     serialized_snapshot.subxcnt = snapshot->subxcnt;
    1747         1159 :     serialized_snapshot.suboverflowed = snapshot->suboverflowed;
    1748         1159 :     serialized_snapshot.takenDuringRecovery = snapshot->takenDuringRecovery;
    1749         1159 :     serialized_snapshot.curcid = snapshot->curcid;
    1750              : 
    1751              :     /*
    1752              :      * Ignore the SubXID array if it has overflowed, unless the snapshot was
    1753              :      * taken during recovery - in that case, top-level XIDs are in subxip as
    1754              :      * well, and we mustn't lose them.
    1755              :      */
    1756         1159 :     if (serialized_snapshot.suboverflowed && !snapshot->takenDuringRecovery)
    1757            0 :         serialized_snapshot.subxcnt = 0;
    1758              : 
    1759              :     /* Copy struct to possibly-unaligned buffer */
    1760         1159 :     memcpy(start_address,
    1761              :            &serialized_snapshot, sizeof(SerializedSnapshotData));
    1762              : 
    1763              :     /* Copy XID array */
    1764         1159 :     if (snapshot->xcnt > 0)
    1765          428 :         memcpy((TransactionId *) (start_address +
    1766              :                                   sizeof(SerializedSnapshotData)),
    1767          428 :                snapshot->xip, snapshot->xcnt * sizeof(TransactionId));
    1768              : 
    1769              :     /*
    1770              :      * Copy SubXID array. Don't bother to copy it if it had overflowed,
    1771              :      * though, because it's not used anywhere in that case. Except if it's a
    1772              :      * snapshot taken during recovery; all the top-level XIDs are in subxip as
    1773              :      * well in that case, so we mustn't lose them.
    1774              :      */
    1775         1159 :     if (serialized_snapshot.subxcnt > 0)
    1776              :     {
    1777            2 :         Size        subxipoff = sizeof(SerializedSnapshotData) +
    1778            2 :             snapshot->xcnt * sizeof(TransactionId);
    1779              : 
    1780            2 :         memcpy((TransactionId *) (start_address + subxipoff),
    1781            2 :                snapshot->subxip, snapshot->subxcnt * sizeof(TransactionId));
    1782              :     }
    1783         1159 : }
    1784              : 
    1785              : /*
    1786              :  * RestoreSnapshot
    1787              :  *      Restore a serialized snapshot from the specified address.
    1788              :  *
    1789              :  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
    1790              :  * to 0.  The returned snapshot has the copied flag set.
    1791              :  */
    1792              : Snapshot
    1793         3620 : RestoreSnapshot(char *start_address)
    1794              : {
    1795              :     SerializedSnapshotData serialized_snapshot;
    1796              :     Size        size;
    1797              :     Snapshot    snapshot;
    1798              :     TransactionId *serialized_xids;
    1799              : 
    1800         3620 :     memcpy(&serialized_snapshot, start_address,
    1801              :            sizeof(SerializedSnapshotData));
    1802         3620 :     serialized_xids = (TransactionId *)
    1803              :         (start_address + sizeof(SerializedSnapshotData));
    1804              : 
    1805              :     /* We allocate any XID arrays needed in the same palloc block. */
    1806         3620 :     size = sizeof(SnapshotData)
    1807         3620 :         + serialized_snapshot.xcnt * sizeof(TransactionId)
    1808         3620 :         + serialized_snapshot.subxcnt * sizeof(TransactionId);
    1809              : 
    1810              :     /* Copy all required fields */
    1811         3620 :     snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
    1812         3620 :     snapshot->snapshot_type = SNAPSHOT_MVCC;
    1813         3620 :     snapshot->xmin = serialized_snapshot.xmin;
    1814         3620 :     snapshot->xmax = serialized_snapshot.xmax;
    1815         3620 :     snapshot->xip = NULL;
    1816         3620 :     snapshot->xcnt = serialized_snapshot.xcnt;
    1817         3620 :     snapshot->subxip = NULL;
    1818         3620 :     snapshot->subxcnt = serialized_snapshot.subxcnt;
    1819         3620 :     snapshot->suboverflowed = serialized_snapshot.suboverflowed;
    1820         3620 :     snapshot->takenDuringRecovery = serialized_snapshot.takenDuringRecovery;
    1821         3620 :     snapshot->curcid = serialized_snapshot.curcid;
    1822         3620 :     snapshot->snapXactCompletionCount = 0;
    1823              : 
    1824              :     /* Copy XIDs, if present. */
    1825         3620 :     if (serialized_snapshot.xcnt > 0)
    1826              :     {
    1827         1023 :         snapshot->xip = (TransactionId *) (snapshot + 1);
    1828         1023 :         memcpy(snapshot->xip, serialized_xids,
    1829         1023 :                serialized_snapshot.xcnt * sizeof(TransactionId));
    1830              :     }
    1831              : 
    1832              :     /* Copy SubXIDs, if present. */
    1833         3620 :     if (serialized_snapshot.subxcnt > 0)
    1834              :     {
    1835            9 :         snapshot->subxip = ((TransactionId *) (snapshot + 1)) +
    1836            9 :             serialized_snapshot.xcnt;
    1837            9 :         memcpy(snapshot->subxip, serialized_xids + serialized_snapshot.xcnt,
    1838            9 :                serialized_snapshot.subxcnt * sizeof(TransactionId));
    1839              :     }
    1840              : 
    1841              :     /* Set the copied flag so that the caller will set refcounts correctly. */
    1842         3620 :     snapshot->regd_count = 0;
    1843         3620 :     snapshot->active_count = 0;
    1844         3620 :     snapshot->copied = true;
    1845              : 
    1846         3620 :     return snapshot;
    1847              : }
    1848              : 
    1849              : /*
    1850              :  * Install a restored snapshot as the transaction snapshot.
    1851              :  */
    1852              : void
    1853         1697 : RestoreTransactionSnapshot(Snapshot snapshot, PGPROC *source_pgproc)
    1854              : {
    1855         1697 :     SetTransactionSnapshot(snapshot, NULL, InvalidPid, source_pgproc);
    1856         1697 : }
    1857              : 
    1858              : /*
    1859              :  * XidInMVCCSnapshot
    1860              :  *      Is the given XID still-in-progress according to the snapshot?
    1861              :  *
    1862              :  * Note: GetSnapshotData never stores either top xid or subxids of our own
    1863              :  * backend into a snapshot, so these xids will not be reported as "running"
    1864              :  * by this function.  This is OK for current uses, because we always check
    1865              :  * TransactionIdIsCurrentTransactionId first, except when it's known the
    1866              :  * XID could not be ours anyway.
    1867              :  */
    1868              : bool
    1869     77411888 : XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
    1870              : {
    1871              :     /*
    1872              :      * Make a quick range check to eliminate most XIDs without looking at the
    1873              :      * xip arrays.  Note that this is OK even if we convert a subxact XID to
    1874              :      * its parent below, because a subxact with XID < xmin has surely also got
    1875              :      * a parent with XID < xmin, while one with XID >= xmax must belong to a
    1876              :      * parent that was not yet committed at the time of this snapshot.
    1877              :      */
    1878              : 
    1879              :     /* Any xid < xmin is not in-progress */
    1880     77411888 :     if (TransactionIdPrecedes(xid, snapshot->xmin))
    1881     73433803 :         return false;
    1882              :     /* Any xid >= xmax is in-progress */
    1883      3978085 :     if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
    1884        19719 :         return true;
    1885              : 
    1886              :     /*
    1887              :      * Snapshot information is stored slightly differently in snapshots taken
    1888              :      * during recovery.
    1889              :      */
    1890      3958366 :     if (!snapshot->takenDuringRecovery)
    1891              :     {
    1892              :         /*
    1893              :          * If the snapshot contains full subxact data, the fastest way to
    1894              :          * check things is just to compare the given XID against both subxact
    1895              :          * XIDs and top-level XIDs.  If the snapshot overflowed, we have to
    1896              :          * use pg_subtrans to convert a subxact XID to its parent XID, but
    1897              :          * then we need only look at top-level XIDs not subxacts.
    1898              :          */
    1899      3958286 :         if (!snapshot->suboverflowed)
    1900              :         {
    1901              :             /* we have full data, so search subxip */
    1902      3957936 :             if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
    1903          232 :                 return true;
    1904              : 
    1905              :             /* not there, fall through to search xip[] */
    1906              :         }
    1907              :         else
    1908              :         {
    1909              :             /*
    1910              :              * Snapshot overflowed, so convert xid to top-level.  This is safe
    1911              :              * because we eliminated too-old XIDs above.
    1912              :              */
    1913          350 :             xid = SubTransGetTopmostTransaction(xid);
    1914              : 
    1915              :             /*
    1916              :              * If xid was indeed a subxact, we might now have an xid < xmin,
    1917              :              * so recheck to avoid an array scan.  No point in rechecking
    1918              :              * xmax.
    1919              :              */
    1920          350 :             if (TransactionIdPrecedes(xid, snapshot->xmin))
    1921            0 :                 return false;
    1922              :         }
    1923              : 
    1924      3958054 :         if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
    1925        19970 :             return true;
    1926              :     }
    1927              :     else
    1928              :     {
    1929              :         /*
    1930              :          * In recovery we store all xids in the subxip array because it is by
    1931              :          * far the bigger array, and we mostly don't know which xids are
    1932              :          * top-level and which are subxacts. The xip array is empty.
    1933              :          *
    1934              :          * We start by searching subtrans, if we overflowed.
    1935              :          */
    1936           80 :         if (snapshot->suboverflowed)
    1937              :         {
    1938              :             /*
    1939              :              * Snapshot overflowed, so convert xid to top-level.  This is safe
    1940              :              * because we eliminated too-old XIDs above.
    1941              :              */
    1942            4 :             xid = SubTransGetTopmostTransaction(xid);
    1943              : 
    1944              :             /*
    1945              :              * If xid was indeed a subxact, we might now have an xid < xmin,
    1946              :              * so recheck to avoid an array scan.  No point in rechecking
    1947              :              * xmax.
    1948              :              */
    1949            4 :             if (TransactionIdPrecedes(xid, snapshot->xmin))
    1950            0 :                 return false;
    1951              :         }
    1952              : 
    1953              :         /*
    1954              :          * We now have either a top-level xid higher than xmin or an
    1955              :          * indeterminate xid. We don't know whether it's top level or subxact
    1956              :          * but it doesn't matter. If it's present, the xid is visible.
    1957              :          */
    1958           80 :         if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
    1959            6 :             return true;
    1960              :     }
    1961              : 
    1962      3938158 :     return false;
    1963              : }
    1964              : 
    1965              : /* ResourceOwner callbacks */
    1966              : 
    1967              : static void
    1968        30060 : ResOwnerReleaseSnapshot(Datum res)
    1969              : {
    1970        30060 :     UnregisterSnapshotNoOwner((Snapshot) DatumGetPointer(res));
    1971        30060 : }
        

Generated by: LCOV version 2.0-1