LCOV - 603a3335f2b60b2c798da5c627b3bb288b92a7bd vs e395fbd32a07557de4ac98088928c1749d4845d8

LCOV - differential code coverage report

Current view:	top level - src/backend/access/transam - xlog.c (source / functions)		Coverage	Total	Hit	UNC	LBC	UBC	GIC	GNC	CBC	DCB
Current:	603a3335f2b60b2c798da5c627b3bb288b92a7bd vs e395fbd32a07557de4ac98088928c1749d4845d8	Lines:	89.5 %	2753	2464		1	288	1	2	2461	3
Current Date:	2026-07-25 17:13:00 -0400	Functions:	97.8 %	139	136			3		3	133
Baseline:	lcov-20260726-baseline	Branches:	64.6 %	1773	1146	1	4	622		1	1145
Baseline Date:	2026-07-25 19:16:42 +0200	Line coverage date bins:
Legend:	Lines: hit not hit Branches: + taken - not taken # not executed	(7,30] days:	100.0 %	2	2					2
		(30,360] days:	92.8 %	335	311			24			311
		(360..) days:	89.0 %	2416	2151		1	264	1		2150
		Function coverage date bins:
		(30,360] days:	95.7 %	23	22			1			22
		(360..) days:	98.3 %	116	114			2		3	111
		Branch coverage date bins:
		(7,30] days:	50.0 %	2	1	1				1
		(30,360] days:	73.9 %	180	133			47			133
		(360..) days:	63.6 %	1591	1012		4	575			1012

 Age         Owner                    Branch data    TLA  Line data    Source code

                                  1                 :                : /*-------------------------------------------------------------------------
                                  2                 :                :  *
                                  3                 :                :  * xlog.c
                                  4                 :                :  *      PostgreSQL write-ahead log manager
                                  5                 :                :  *
                                  6                 :                :  * The Write-Ahead Log (WAL) functionality is split into several source
                                  7                 :                :  * files, in addition to this one:
                                  8                 :                :  *
                                  9                 :                :  * xloginsert.c - Functions for constructing WAL records
                                 10                 :                :  * xlogrecovery.c - WAL recovery and standby code
                                 11                 :                :  * xlogreader.c - Facility for reading WAL files and parsing WAL records
                                 12                 :                :  * xlogutils.c - Helper functions for WAL redo routines
                                 13                 :                :  *
                                 14                 :                :  * This file contains functions for coordinating database startup and
                                 15                 :                :  * checkpointing, and managing the write-ahead log buffers when the
                                 16                 :                :  * system is running.
                                 17                 :                :  *
                                 18                 :                :  * StartupXLOG() is the main entry point of the startup process.  It
                                 19                 :                :  * coordinates database startup, performing WAL recovery, and the
                                 20                 :                :  * transition from WAL recovery into normal operations.
                                 21                 :                :  *
                                 22                 :                :  * XLogInsertRecord() inserts a WAL record into the WAL buffers.  Most
                                 23                 :                :  * callers should not call this directly, but use the functions in
                                 24                 :                :  * xloginsert.c to construct the WAL record.  XLogFlush() can be used
                                 25                 :                :  * to force the WAL to disk.
                                 26                 :                :  *
                                 27                 :                :  * In addition to those, there are many other functions for interrogating
                                 28                 :                :  * the current system state, and for starting/stopping backups.
                                 29                 :                :  *
                                 30                 :                :  *
                                 31                 :                :  * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
                                 32                 :                :  * Portions Copyright (c) 1994, Regents of the University of California
                                 33                 :                :  *
                                 34                 :                :  * src/backend/access/transam/xlog.c
                                 35                 :                :  *
                                 36                 :                :  *-------------------------------------------------------------------------
                                 37                 :                :  */
                                 38                 :                : 
                                 39                 :                : #include "postgres.h"
                                 40                 :                : 
                                 41                 :                : #include <ctype.h>
                                 42                 :                : #include <math.h>
                                 43                 :                : #include <time.h>
                                 44                 :                : #include <fcntl.h>
                                 45                 :                : #include <sys/stat.h>
                                 46                 :                : #include <sys/time.h>
                                 47                 :                : #include <unistd.h>
                                 48                 :                : 
                                 49                 :                : #include "access/clog.h"
                                 50                 :                : #include "access/commit_ts.h"
                                 51                 :                : #include "access/heaptoast.h"
                                 52                 :                : #include "access/multixact.h"
                                 53                 :                : #include "access/rewriteheap.h"
                                 54                 :                : #include "access/subtrans.h"
                                 55                 :                : #include "access/timeline.h"
                                 56                 :                : #include "access/transam.h"
                                 57                 :                : #include "access/twophase.h"
                                 58                 :                : #include "access/xact.h"
                                 59                 :                : #include "access/xlog_internal.h"
                                 60                 :                : #include "access/xlogarchive.h"
                                 61                 :                : #include "access/xloginsert.h"
                                 62                 :                : #include "access/xlogreader.h"
                                 63                 :                : #include "access/xlogrecovery.h"
                                 64                 :                : #include "access/xlogutils.h"
                                 65                 :                : #include "access/xlogwait.h"
                                 66                 :                : #include "backup/basebackup.h"
                                 67                 :                : #include "catalog/catversion.h"
                                 68                 :                : #include "catalog/pg_control.h"
                                 69                 :                : #include "catalog/pg_database.h"
                                 70                 :                : #include "common/controldata_utils.h"
                                 71                 :                : #include "common/file_utils.h"
                                 72                 :                : #include "executor/instrument.h"
                                 73                 :                : #include "miscadmin.h"
                                 74                 :                : #include "pg_trace.h"
                                 75                 :                : #include "pgstat.h"
                                 76                 :                : #include "port/atomics.h"
                                 77                 :                : #include "postmaster/bgwriter.h"
                                 78                 :                : #include "postmaster/datachecksum_state.h"
                                 79                 :                : #include "postmaster/startup.h"
                                 80                 :                : #include "postmaster/walsummarizer.h"
                                 81                 :                : #include "postmaster/walwriter.h"
                                 82                 :                : #include "replication/origin.h"
                                 83                 :                : #include "replication/slot.h"
                                 84                 :                : #include "replication/slotsync.h"
                                 85                 :                : #include "replication/snapbuild.h"
                                 86                 :                : #include "replication/walreceiver.h"
                                 87                 :                : #include "replication/walsender.h"
                                 88                 :                : #include "storage/bufmgr.h"
                                 89                 :                : #include "storage/fd.h"
                                 90                 :                : #include "storage/ipc.h"
                                 91                 :                : #include "storage/large_object.h"
                                 92                 :                : #include "storage/latch.h"
                                 93                 :                : #include "storage/predicate.h"
                                 94                 :                : #include "storage/proc.h"
                                 95                 :                : #include "storage/procarray.h"
                                 96                 :                : #include "storage/procsignal.h"
                                 97                 :                : #include "storage/reinit.h"
                                 98                 :                : #include "storage/spin.h"
                                 99                 :                : #include "storage/subsystems.h"
                                100                 :                : #include "storage/sync.h"
                                101                 :                : #include "utils/guc_hooks.h"
                                102                 :                : #include "utils/guc_tables.h"
                                103                 :                : #include "utils/injection_point.h"
                                104                 :                : #include "utils/pgstat_internal.h"
                                105                 :                : #include "utils/ps_status.h"
                                106                 :                : #include "utils/relmapper.h"
                                107                 :                : #include "utils/snapmgr.h"
                                108                 :                : #include "utils/timeout.h"
                                109                 :                : #include "utils/timestamp.h"
                                110                 :                : #include "utils/varlena.h"
                                111                 :                : #include "utils/wait_event.h"
                                112                 :                : 
                                113                 :                : #ifdef WAL_DEBUG
                                114                 :                : #include "utils/memutils.h"
                                115                 :                : #endif
                                116                 :                : 
                                117                 :                : /* timeline ID to be used when bootstrapping */
                                118                 :                : #define BootstrapTimeLineID     1
                                119                 :                : 
                                120                 :                : /* User-settable parameters */
                                121                 :                : int         max_wal_size_mb = 1024; /* 1 GB */
                                122                 :                : int         min_wal_size_mb = 80;   /* 80 MB */
                                123                 :                : int         wal_keep_size_mb = 0;
                                124                 :                : int         XLOGbuffers = -1;
                                125                 :                : int         XLogArchiveTimeout = 0;
                                126                 :                : int         XLogArchiveMode = ARCHIVE_MODE_OFF;
                                127                 :                : char       *XLogArchiveCommand = NULL;
                                128                 :                : bool        EnableHotStandby = false;
                                129                 :                : bool        fullPageWrites = true;
                                130                 :                : bool        wal_log_hints = false;
                                131                 :                : int         wal_compression = WAL_COMPRESSION_NONE;
                                132                 :                : char       *wal_consistency_checking_string = NULL;
                                133                 :                : bool       *wal_consistency_checking = NULL;
                                134                 :                : bool        wal_init_zero = true;
                                135                 :                : bool        wal_recycle = true;
                                136                 :                : bool        log_checkpoints = true;
                                137                 :                : int         wal_sync_method = DEFAULT_WAL_SYNC_METHOD;
                                138                 :                : int         wal_level = WAL_LEVEL_REPLICA;
                                139                 :                : int         CommitDelay = 0;    /* precommit delay in microseconds */
                                140                 :                : int         CommitSiblings = 5; /* # concurrent xacts needed to sleep */
                                141                 :                : int         wal_retrieve_retry_interval = 5000;
                                142                 :                : int         max_slot_wal_keep_size_mb = -1;
                                143                 :                : int         wal_decode_buffer_size = 512 * 1024;
                                144                 :                : bool        track_wal_io_timing = false;
                                145                 :                : 
                                146                 :                : #ifdef WAL_DEBUG
                                147                 :                : bool        XLOG_DEBUG = false;
                                148                 :                : #endif
                                149                 :                : 
                                150                 :                : int         wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
                                151                 :                : 
                                152                 :                : /*
                                153                 :                :  * Number of WAL insertion locks to use. A higher value allows more insertions
                                154                 :                :  * to happen concurrently, but adds some CPU overhead to flushing the WAL,
                                155                 :                :  * which needs to iterate all the locks.
                                156                 :                :  */
                                157                 :                : #define NUM_XLOGINSERT_LOCKS  8
                                158                 :                : 
                                159                 :                : /*
                                160                 :                :  * Max distance from last checkpoint, before triggering a new xlog-based
                                161                 :                :  * checkpoint.
                                162                 :                :  */
                                163                 :                : int         CheckPointSegments;
                                164                 :                : 
                                165                 :                : /* Estimated distance between checkpoints, in bytes */
                                166                 :                : static double CheckPointDistanceEstimate = 0;
                                167                 :                : static double PrevCheckPointDistance = 0;
                                168                 :                : 
                                169                 :                : /*
                                170                 :                :  * Track whether there were any deferred checks for custom resource managers
                                171                 :                :  * specified in wal_consistency_checking.
                                172                 :                :  */
                                173                 :                : static bool check_wal_consistency_checking_deferred = false;
                                174                 :                : 
                                175                 :                : /*
                                176                 :                :  * GUC support
                                177                 :                :  */
                                178                 :                : const struct config_enum_entry wal_sync_method_options[] = {
                                179                 :                :     {"fsync", WAL_SYNC_METHOD_FSYNC, false},
                                180                 :                : #ifdef HAVE_FSYNC_WRITETHROUGH
                                181                 :                :     {"fsync_writethrough", WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, false},
                                182                 :                : #endif
                                183                 :                :     {"fdatasync", WAL_SYNC_METHOD_FDATASYNC, false},
                                184                 :                : #ifdef O_SYNC
                                185                 :                :     {"open_sync", WAL_SYNC_METHOD_OPEN, false},
                                186                 :                : #endif
                                187                 :                : #ifdef O_DSYNC
                                188                 :                :     {"open_datasync", WAL_SYNC_METHOD_OPEN_DSYNC, false},
                                189                 :                : #endif
                                190                 :                :     {NULL, 0, false}
                                191                 :                : };
                                192                 :                : 
                                193                 :                : 
                                194                 :                : /*
                                195                 :                :  * Although only "on", "off", and "always" are documented,
                                196                 :                :  * we accept all the likely variants of "on" and "off".
                                197                 :                :  */
                                198                 :                : const struct config_enum_entry archive_mode_options[] = {
                                199                 :                :     {"always", ARCHIVE_MODE_ALWAYS, false},
                                200                 :                :     {"on", ARCHIVE_MODE_ON, false},
                                201                 :                :     {"off", ARCHIVE_MODE_OFF, false},
                                202                 :                :     {"true", ARCHIVE_MODE_ON, true},
                                203                 :                :     {"false", ARCHIVE_MODE_OFF, true},
                                204                 :                :     {"yes", ARCHIVE_MODE_ON, true},
                                205                 :                :     {"no", ARCHIVE_MODE_OFF, true},
                                206                 :                :     {"1", ARCHIVE_MODE_ON, true},
                                207                 :                :     {"0", ARCHIVE_MODE_OFF, true},
                                208                 :                :     {NULL, 0, false}
                                209                 :                : };
                                210                 :                : 
                                211                 :                : /*
                                212                 :                :  * Statistics for current checkpoint are collected in this global struct.
                                213                 :                :  * Because only the checkpointer or a stand-alone backend can perform
                                214                 :                :  * checkpoints, this will be unused in normal backends.
                                215                 :                :  */
                                216                 :                : CheckpointStatsData CheckpointStats;
                                217                 :                : 
                                218                 :                : /*
                                219                 :                :  * During recovery, lastFullPageWrites keeps track of full_page_writes that
                                220                 :                :  * the replayed WAL records indicate. It's initialized with full_page_writes
                                221                 :                :  * that the recovery starting checkpoint record indicates, and then updated
                                222                 :                :  * each time XLOG_FPW_CHANGE record is replayed.
                                223                 :                :  */
                                224                 :                : static bool lastFullPageWrites;
                                225                 :                : 
                                226                 :                : /*
                                227                 :                :  * Local copy of the state tracked by SharedRecoveryState in shared memory,
                                228                 :                :  * It is false if SharedRecoveryState is RECOVERY_STATE_DONE.  True actually
                                229                 :                :  * means "not known, need to check the shared state".
                                230                 :                :  */
                                231                 :                : static bool LocalRecoveryInProgress = true;
                                232                 :                : 
                                233                 :                : /*
                                234                 :                :  * Local state for XLogInsertAllowed():
                                235                 :                :  *      1: unconditionally allowed to insert XLOG
                                236                 :                :  *      0: unconditionally not allowed to insert XLOG
                                237                 :                :  *      -1: must check RecoveryInProgress(); disallow until it is false
                                238                 :                :  * Most processes start with -1 and transition to 1 after seeing that recovery
                                239                 :                :  * is not in progress.  But we can also force the value for special cases.
                                240                 :                :  * The coding in XLogInsertAllowed() depends on the first two of these states
                                241                 :                :  * being numerically the same as bool true and false.
                                242                 :                :  */
                                243                 :                : static int  LocalXLogInsertAllowed = -1;
                                244                 :                : 
                                245                 :                : /*
                                246                 :                :  * ProcLastRecPtr points to the start of the last XLOG record inserted by the
                                247                 :                :  * current backend.  It is updated for all inserts.  XactLastRecEnd points to
                                248                 :                :  * end+1 of the last record, and is reset when we end a top-level transaction,
                                249                 :                :  * or start a new one; so it can be used to tell if the current transaction has
                                250                 :                :  * created any XLOG records.
                                251                 :                :  *
                                252                 :                :  * While in parallel mode, this may not be fully up to date.  When committing,
                                253                 :                :  * a transaction can assume this covers all xlog records written either by the
                                254                 :                :  * user backend or by any parallel worker which was present at any point during
                                255                 :                :  * the transaction.  But when aborting, or when still in parallel mode, other
                                256                 :                :  * parallel backends may have written WAL records at later LSNs than the value
                                257                 :                :  * stored here.  The parallel leader advances its own copy, when necessary,
                                258                 :                :  * in WaitForParallelWorkersToFinish.
                                259                 :                :  */
                                260                 :                : XLogRecPtr  ProcLastRecPtr = InvalidXLogRecPtr;
                                261                 :                : XLogRecPtr  XactLastRecEnd = InvalidXLogRecPtr;
                                262                 :                : XLogRecPtr  XactLastCommitEnd = InvalidXLogRecPtr;
                                263                 :                : 
                                264                 :                : /*
                                265                 :                :  * RedoRecPtr is this backend's local copy of the REDO record pointer
                                266                 :                :  * (which is almost but not quite the same as a pointer to the most recent
                                267                 :                :  * CHECKPOINT record).  We update this from the shared-memory copy,
                                268                 :                :  * XLogCtl->Insert.RedoRecPtr, whenever we can safely do so (ie, when we
                                269                 :                :  * hold an insertion lock).  See XLogInsertRecord for details.  We are also
                                270                 :                :  * allowed to update from XLogCtl->RedoRecPtr if we hold the info_lck;
                                271                 :                :  * see GetRedoRecPtr.
                                272                 :                :  *
                                273                 :                :  * NB: Code that uses this variable must be prepared not only for the
                                274                 :                :  * possibility that it may be arbitrarily out of date, but also for the
                                275                 :                :  * possibility that it might be set to InvalidXLogRecPtr. We used to
                                276                 :                :  * initialize it as a side effect of the first call to RecoveryInProgress(),
                                277                 :                :  * which meant that most code that might use it could assume that it had a
                                278                 :                :  * real if perhaps stale value. That's no longer the case.
                                279                 :                :  */
                                280                 :                : static XLogRecPtr RedoRecPtr;
                                281                 :                : 
                                282                 :                : /*
                                283                 :                :  * doPageWrites is this backend's local copy of (fullPageWrites ||
                                284                 :                :  * runningBackups > 0).  It is used together with RedoRecPtr to decide whether
                                285                 :                :  * a full-page image of a page need to be taken.
                                286                 :                :  *
                                287                 :                :  * NB: Initially this is false, and there's no guarantee that it will be
                                288                 :                :  * initialized to any other value before it is first used. Any code that
                                289                 :                :  * makes use of it must recheck the value after obtaining a WALInsertLock,
                                290                 :                :  * and respond appropriately if it turns out that the previous value wasn't
                                291                 :                :  * accurate.
                                292                 :                :  */
                                293                 :                : static bool doPageWrites;
                                294                 :                : 
                                295                 :                : /*----------
                                296                 :                :  * Shared-memory data structures for XLOG control
                                297                 :                :  *
                                298                 :                :  * LogwrtRqst indicates a byte position that we need to write and/or fsync
                                299                 :                :  * the log up to (all records before that point must be written or fsynced).
                                300                 :                :  * The positions already written/fsynced are maintained in logWriteResult
                                301                 :                :  * and logFlushResult using atomic access.
                                302                 :                :  * In addition to the shared variable, each backend has a private copy of
                                303                 :                :  * both in LogwrtResult, which is updated when convenient.
                                304                 :                :  *
                                305                 :                :  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
                                306                 :                :  * (protected by info_lck), but we don't need to cache any copies of it.
                                307                 :                :  *
                                308                 :                :  * info_lck is only held long enough to read/update the protected variables,
                                309                 :                :  * so it's a plain spinlock.  The other locks are held longer (potentially
                                310                 :                :  * over I/O operations), so we use LWLocks for them.  These locks are:
                                311                 :                :  *
                                312                 :                :  * WALBufMappingLock: must be held to replace a page in the WAL buffer cache.
                                313                 :                :  * It is only held while initializing and changing the mapping.  If the
                                314                 :                :  * contents of the buffer being replaced haven't been written yet, the mapping
                                315                 :                :  * lock is released while the write is done, and reacquired afterwards.
                                316                 :                :  *
                                317                 :                :  * WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or
                                318                 :                :  * XLogFlush).
                                319                 :                :  *
                                320                 :                :  * ControlFileLock: must be held to read/update control file or create
                                321                 :                :  * new log file.
                                322                 :                :  *
                                323                 :                :  *----------
                                324                 :                :  */
                                325                 :                : 
                                326                 :                : typedef struct XLogwrtRqst
                                327                 :                : {
                                328                 :                :     XLogRecPtr  Write;          /* last byte + 1 to write out */
                                329                 :                :     XLogRecPtr  Flush;          /* last byte + 1 to flush */
                                330                 :                : } XLogwrtRqst;
                                331                 :                : 
                                332                 :                : typedef struct XLogwrtResult
                                333                 :                : {
                                334                 :                :     XLogRecPtr  Write;          /* last byte + 1 written out */
                                335                 :                :     XLogRecPtr  Flush;          /* last byte + 1 flushed */
                                336                 :                : } XLogwrtResult;
                                337                 :                : 
                                338                 :                : /*
                                339                 :                :  * Inserting to WAL is protected by a small fixed number of WAL insertion
                                340                 :                :  * locks. To insert to the WAL, you must hold one of the locks - it doesn't
                                341                 :                :  * matter which one. To lock out other concurrent insertions, you must hold
                                342                 :                :  * of them. Each WAL insertion lock consists of a lightweight lock, plus an
                                343                 :                :  * indicator of how far the insertion has progressed (insertingAt).
                                344                 :                :  *
                                345                 :                :  * The insertingAt values are read when a process wants to flush WAL from
                                346                 :                :  * the in-memory buffers to disk, to check that all the insertions to the
                                347                 :                :  * region the process is about to write out have finished. You could simply
                                348                 :                :  * wait for all currently in-progress insertions to finish, but the
                                349                 :                :  * insertingAt indicator allows you to ignore insertions to later in the WAL,
                                350                 :                :  * so that you only wait for the insertions that are modifying the buffers
                                351                 :                :  * you're about to write out.
                                352                 :                :  *
                                353                 :                :  * This isn't just an optimization. If all the WAL buffers are dirty, an
                                354                 :                :  * inserter that's holding a WAL insert lock might need to evict an old WAL
                                355                 :                :  * buffer, which requires flushing the WAL. If it's possible for an inserter
                                356                 :                :  * to block on another inserter unnecessarily, deadlock can arise when two
                                357                 :                :  * inserters holding a WAL insert lock wait for each other to finish their
                                358                 :                :  * insertion.
                                359                 :                :  *
                                360                 :                :  * Small WAL records that don't cross a page boundary never update the value,
                                361                 :                :  * the WAL record is just copied to the page and the lock is released. But
                                362                 :                :  * to avoid the deadlock-scenario explained above, the indicator is always
                                363                 :                :  * updated before sleeping while holding an insertion lock.
                                364                 :                :  *
                                365                 :                :  * lastImportantAt contains the LSN of the last important WAL record inserted
                                366                 :                :  * using a given lock. This value is used to detect if there has been
                                367                 :                :  * important WAL activity since the last time some action, like a checkpoint,
                                368                 :                :  * was performed - allowing to not repeat the action if not. The LSN is
                                369                 :                :  * updated for all insertions, unless the XLOG_MARK_UNIMPORTANT flag was
                                370                 :                :  * set. lastImportantAt is never cleared, only overwritten by the LSN of newer
                                371                 :                :  * records.  Tracking the WAL activity directly in WALInsertLock has the
                                372                 :                :  * advantage of not needing any additional locks to update the value.
                                373                 :                :  */
                                374                 :                : typedef struct
                                375                 :                : {
                                376                 :                :     LWLock      lock;
                                377                 :                :     pg_atomic_uint64 insertingAt;
                                378                 :                :     XLogRecPtr  lastImportantAt;
                                379                 :                : } WALInsertLock;
                                380                 :                : 
                                381                 :                : /*
                                382                 :                :  * All the WAL insertion locks are allocated as an array in shared memory. We
                                383                 :                :  * force the array stride to be a power of 2, which saves a few cycles in
                                384                 :                :  * indexing, but more importantly also ensures that individual slots don't
                                385                 :                :  * cross cache line boundaries. (Of course, we have to also ensure that the
                                386                 :                :  * array start address is suitably aligned.)
                                387                 :                :  */
                                388                 :                : typedef union WALInsertLockPadded
                                389                 :                : {
                                390                 :                :     WALInsertLock l;
                                391                 :                :     char        pad[PG_CACHE_LINE_SIZE];
                                392                 :                : } WALInsertLockPadded;
                                393                 :                : 
                                394                 :                : /*
                                395                 :                :  * Session status of running backup, used for sanity checks in SQL-callable
                                396                 :                :  * functions to start and stop backups.
                                397                 :                :  */
                                398                 :                : static SessionBackupState sessionBackupState = SESSION_BACKUP_NONE;
                                399                 :                : 
                                400                 :                : /*
                                401                 :                :  * Shared state data for WAL insertion.
                                402                 :                :  */
                                403                 :                : typedef struct XLogCtlInsert
                                404                 :                : {
                                405                 :                :     slock_t     insertpos_lck;  /* protects CurrBytePos and PrevBytePos */
                                406                 :                : 
                                407                 :                :     /*
                                408                 :                :      * CurrBytePos is the end of reserved WAL. The next record will be
                                409                 :                :      * inserted at that position. PrevBytePos is the start position of the
                                410                 :                :      * previously inserted (or rather, reserved) record - it is copied to the
                                411                 :                :      * prev-link of the next record. These are stored as "usable byte
                                412                 :                :      * positions" rather than XLogRecPtrs (see XLogBytePosToRecPtr()).
                                413                 :                :      */
                                414                 :                :     uint64      CurrBytePos;
                                415                 :                :     uint64      PrevBytePos;
                                416                 :                : 
                                417                 :                :     /*
                                418                 :                :      * Make sure the above heavily-contended spinlock and byte positions are
                                419                 :                :      * on their own cache line. In particular, the RedoRecPtr and full page
                                420                 :                :      * write variables below should be on a different cache line. They are
                                421                 :                :      * read on every WAL insertion, but updated rarely, and we don't want
                                422                 :                :      * those reads to steal the cache line containing Curr/PrevBytePos.
                                423                 :                :      */
                                424                 :                :     char        pad[PG_CACHE_LINE_SIZE];
                                425                 :                : 
                                426                 :                :     /*
                                427                 :                :      * fullPageWrites is the authoritative value used by all backends to
                                428                 :                :      * determine whether to write full-page image to WAL. This shared value,
                                429                 :                :      * instead of the process-local fullPageWrites, is required because, when
                                430                 :                :      * full_page_writes is changed by SIGHUP, we must WAL-log it before it
                                431                 :                :      * actually affects WAL-logging by backends.  Checkpointer sets at startup
                                432                 :                :      * or after SIGHUP.
                                433                 :                :      *
                                434                 :                :      * To read these fields, you must hold an insertion lock. To modify them,
                                435                 :                :      * you must hold ALL the locks.
                                436                 :                :      */
                                437                 :                :     XLogRecPtr  RedoRecPtr;     /* current redo point for insertions */
                                438                 :                :     bool        fullPageWrites;
                                439                 :                : 
                                440                 :                :     /*
                                441                 :                :      * runningBackups is a counter indicating the number of backups currently
                                442                 :                :      * in progress. lastBackupStart is the latest checkpoint redo location
                                443                 :                :      * used as a starting point for an online backup.
                                444                 :                :      */
                                445                 :                :     int         runningBackups;
                                446                 :                :     XLogRecPtr  lastBackupStart;
                                447                 :                : 
                                448                 :                :     /*
                                449                 :                :      * WAL insertion locks.
                                450                 :                :      */
                                451                 :                :     WALInsertLockPadded *WALInsertLocks;
                                452                 :                : } XLogCtlInsert;
                                453                 :                : 
                                454                 :                : /*
                                455                 :                :  * Total shared-memory state for XLOG.
                                456                 :                :  */
                                457                 :                : typedef struct XLogCtlData
                                458                 :                : {
                                459                 :                :     XLogCtlInsert Insert;
                                460                 :                : 
                                461                 :                :     /* Protected by info_lck: */
                                462                 :                :     XLogwrtRqst LogwrtRqst;
                                463                 :                :     XLogRecPtr  RedoRecPtr;     /* a recent copy of Insert->RedoRecPtr */
                                464                 :                :     XLogRecPtr  asyncXactLSN;   /* LSN of newest async commit/abort */
                                465                 :                :     XLogRecPtr  replicationSlotMinLSN;  /* oldest LSN needed by any slot */
                                466                 :                : 
                                467                 :                :     XLogSegNo   lastRemovedSegNo;   /* latest removed/recycled XLOG segment */
                                468                 :                : 
                                469                 :                :     /* Fake LSN counter, for unlogged relations. */
                                470                 :                :     pg_atomic_uint64 unloggedLSN;
                                471                 :                : 
                                472                 :                :     /* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
                                473                 :                :     pg_time_t   lastSegSwitchTime;
                                474                 :                :     XLogRecPtr  lastSegSwitchLSN;
                                475                 :                : 
                                476                 :                :     /* These are accessed using atomics -- info_lck not needed */
                                477                 :                :     pg_atomic_uint64 logInsertResult;   /* last byte + 1 inserted to buffers */
                                478                 :                :     pg_atomic_uint64 logWriteResult;    /* last byte + 1 written out */
                                479                 :                :     pg_atomic_uint64 logFlushResult;    /* last byte + 1 flushed */
                                480                 :                : 
                                481                 :                :     /*
                                482                 :                :      * Latest initialized page in the cache (last byte position + 1).
                                483                 :                :      *
                                484                 :                :      * To change the identity of a buffer (and InitializedUpTo), you need to
                                485                 :                :      * hold WALBufMappingLock.  To change the identity of a buffer that's
                                486                 :                :      * still dirty, the old page needs to be written out first, and for that
                                487                 :                :      * you need WALWriteLock, and you need to ensure that there are no
                                488                 :                :      * in-progress insertions to the page by calling
                                489                 :                :      * WaitXLogInsertionsToFinish().
                                490                 :                :      */
                                491                 :                :     XLogRecPtr  InitializedUpTo;
                                492                 :                : 
                                493                 :                :     /*
                                494                 :                :      * These values do not change after startup, although the pointed-to pages
                                495                 :                :      * and xlblocks values certainly do.  xlblocks values are protected by
                                496                 :                :      * WALBufMappingLock.
                                497                 :                :      */
                                498                 :                :     char       *pages;          /* buffers for unwritten XLOG pages */
                                499                 :                :     pg_atomic_uint64 *xlblocks; /* 1st byte ptr-s + XLOG_BLCKSZ */
                                500                 :                :     int         XLogCacheBlck;  /* highest allocated xlog buffer index */
                                501                 :                : 
                                502                 :                :     /*
                                503                 :                :      * InsertTimeLineID is the timeline into which new WAL is being inserted
                                504                 :                :      * and flushed. It is zero during recovery, and does not change once set.
                                505                 :                :      *
                                506                 :                :      * If we create a new timeline when the system was started up,
                                507                 :                :      * PrevTimeLineID is the old timeline's ID that we forked off from.
                                508                 :                :      * Otherwise it's equal to InsertTimeLineID.
                                509                 :                :      *
                                510                 :                :      * We set these fields while holding info_lck. Most that reads these
                                511                 :                :      * values knows that recovery is no longer in progress and so can safely
                                512                 :                :      * read the value without a lock, but code that could be run either during
                                513                 :                :      * or after recovery can take info_lck while reading these values.
                                514                 :                :      */
                                515                 :                :     TimeLineID  InsertTimeLineID;
                                516                 :                :     TimeLineID  PrevTimeLineID;
                                517                 :                : 
                                518                 :                :     /*
                                519                 :                :      * SharedRecoveryState indicates if we're still in crash or archive
                                520                 :                :      * recovery.  Protected by info_lck.
                                521                 :                :      */
                                522                 :                :     RecoveryState SharedRecoveryState;
                                523                 :                : 
                                524                 :                :     /*
                                525                 :                :      * InstallXLogFileSegmentActive indicates whether the checkpointer should
                                526                 :                :      * arrange for future segments by recycling and/or PreallocXlogFiles().
                                527                 :                :      * Protected by ControlFileLock.  Only the startup process changes it.  If
                                528                 :                :      * true, anyone can use InstallXLogFileSegment().  If false, the startup
                                529                 :                :      * process owns the exclusive right to install segments, by reading from
                                530                 :                :      * the archive and possibly replacing existing files.
                                531                 :                :      */
                                532                 :                :     bool        InstallXLogFileSegmentActive;
                                533                 :                : 
                                534                 :                :     /*
                                535                 :                :      * WalWriterSleeping indicates whether the WAL writer is currently in
                                536                 :                :      * low-power mode (and hence should be nudged if an async commit occurs).
                                537                 :                :      * Protected by info_lck.
                                538                 :                :      */
                                539                 :                :     bool        WalWriterSleeping;
                                540                 :                : 
                                541                 :                :     /*
                                542                 :                :      * During recovery, we keep a copy of the latest checkpoint record here.
                                543                 :                :      * lastCheckPointRecPtr points to start of checkpoint record and
                                544                 :                :      * lastCheckPointEndPtr points to end+1 of checkpoint record.  Used by the
                                545                 :                :      * checkpointer when it wants to create a restartpoint.
                                546                 :                :      *
                                547                 :                :      * Protected by info_lck.
                                548                 :                :      */
                                549                 :                :     XLogRecPtr  lastCheckPointRecPtr;
                                550                 :                :     XLogRecPtr  lastCheckPointEndPtr;
                                551                 :                :     CheckPoint  lastCheckPoint;
                                552                 :                : 
                                553                 :                :     /*
                                554                 :                :      * lastFpwDisableRecPtr points to the start of the last replayed
                                555                 :                :      * XLOG_FPW_CHANGE record that instructs full_page_writes is disabled.
                                556                 :                :      */
                                557                 :                :     XLogRecPtr  lastFpwDisableRecPtr;
                                558                 :                : 
                                559                 :                :     /* last data_checksum_version we've seen */
                                560                 :                :     uint32      data_checksum_version;
                                561                 :                : 
                                562                 :                :     slock_t     info_lck;       /* locks shared variables shown above */
                                563                 :                : } XLogCtlData;
                                564                 :                : 
                                565                 :                : /*
                                566                 :                :  * Classification of XLogInsertRecord operations.
                                567                 :                :  */
                                568                 :                : typedef enum
                                569                 :                : {
                                570                 :                :     WALINSERT_NORMAL,
                                571                 :                :     WALINSERT_SPECIAL_SWITCH,
                                572                 :                :     WALINSERT_SPECIAL_CHECKPOINT
                                573                 :                : } WalInsertClass;
                                574                 :                : 
                                575                 :                : static XLogCtlData *XLogCtl = NULL;
                                576                 :                : 
                                577                 :                : /* a private copy of XLogCtl->Insert.WALInsertLocks, for convenience */
                                578                 :                : static WALInsertLockPadded *WALInsertLocks = NULL;
                                579                 :                : 
                                580                 :                : /*
                                581                 :                :  * We maintain an image of pg_control in shared memory.
                                582                 :                :  */
                                583                 :                : static ControlFileData *LocalControlFile = NULL;
                                584                 :                : static ControlFileData *ControlFile = NULL;
                                585                 :                : 
                                586                 :                : static void XLOGShmemRequest(void *arg);
                                587                 :                : static void XLOGShmemInit(void *arg);
                                588                 :                : static void XLOGShmemAttach(void *arg);
                                589                 :                : 
                                590                 :                : const ShmemCallbacks XLOGShmemCallbacks = {
                                591                 :                :     .request_fn = XLOGShmemRequest,
                                592                 :                :     .init_fn = XLOGShmemInit,
                                593                 :                :     .attach_fn = XLOGShmemAttach,
                                594                 :                : };
                                595                 :                : 
                                596                 :                : /*
                                597                 :                :  * Calculate the amount of space left on the page after 'endptr'. Beware
                                598                 :                :  * multiple evaluation!
                                599                 :                :  */
                                600                 :                : #define INSERT_FREESPACE(endptr)    \
                                601                 :                :     (((endptr) % XLOG_BLCKSZ == 0) ? 0 : (XLOG_BLCKSZ - (endptr) % XLOG_BLCKSZ))
                                602                 :                : 
                                603                 :                : /* Macro to advance to next buffer index. */
                                604                 :                : #define NextBufIdx(idx)     \
                                605                 :                :         (((idx) == XLogCtl->XLogCacheBlck) ? 0 : ((idx) + 1))
                                606                 :                : 
                                607                 :                : /*
                                608                 :                :  * XLogRecPtrToBufIdx returns the index of the WAL buffer that holds, or
                                609                 :                :  * would hold if it was in cache, the page containing 'recptr'.
                                610                 :                :  */
                                611                 :                : #define XLogRecPtrToBufIdx(recptr)  \
                                612                 :                :     (((recptr) / XLOG_BLCKSZ) % (XLogCtl->XLogCacheBlck + 1))
                                613                 :                : 
                                614                 :                : /*
                                615                 :                :  * These are the number of bytes in a WAL page usable for WAL data.
                                616                 :                :  */
                                617                 :                : #define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD)
                                618                 :                : 
                                619                 :                : /*
                                620                 :                :  * Convert values of GUCs measured in megabytes to equiv. segment count.
                                621                 :                :  * Rounds down.
                                622                 :                :  */
                                623                 :                : #define ConvertToXSegs(x, segsize)  XLogMBVarToSegs((x), (segsize))
                                624                 :                : 
                                625                 :                : /* The number of bytes in a WAL segment usable for WAL data. */
                                626                 :                : static int  UsableBytesInSegment;
                                627                 :                : 
                                628                 :                : /*
                                629                 :                :  * Private, possibly out-of-date copy of shared LogwrtResult.
                                630                 :                :  * See discussion above.
                                631                 :                :  */
                                632                 :                : static XLogwrtResult LogwrtResult = {0, 0};
                                633                 :                : 
                                634                 :                : /*
                                635                 :                :  * Update local copy of shared XLogCtl->log{Write,Flush}Result
                                636                 :                :  *
                                637                 :                :  * It's critical that Flush always trails Write, so the order of the reads is
                                638                 :                :  * important, as is the barrier.  See also XLogWrite.
                                639                 :                :  */
                                640                 :                : #define RefreshXLogWriteResult(_target) \
                                641                 :                :     do { \
                                642                 :                :         _target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
                                643                 :                :         pg_read_barrier(); \
                                644                 :                :         _target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
                                645                 :                :     } while (0)
                                646                 :                : 
                                647                 :                : /*
                                648                 :                :  * openLogFile is -1 or a kernel FD for an open log file segment.
                                649                 :                :  * openLogSegNo identifies the segment, and openLogTLI the corresponding TLI.
                                650                 :                :  * These variables are only used to write the XLOG, and so will normally refer
                                651                 :                :  * to the active segment.
                                652                 :                :  *
                                653                 :                :  * Note: call Reserve/ReleaseExternalFD to track consumption of this FD.
                                654                 :                :  */
                                655                 :                : static int  openLogFile = -1;
                                656                 :                : static XLogSegNo openLogSegNo = 0;
                                657                 :                : static TimeLineID openLogTLI = 0;
                                658                 :                : 
                                659                 :                : /*
                                660                 :                :  * Local copies of equivalent fields in the control file.  When running
                                661                 :                :  * crash recovery, LocalMinRecoveryPoint is set to InvalidXLogRecPtr as we
                                662                 :                :  * expect to replay all the WAL available, and updateMinRecoveryPoint is
                                663                 :                :  * switched to false to prevent any updates while replaying records.
                                664                 :                :  * Those values are kept consistent as long as crash recovery runs.
                                665                 :                :  */
                                666                 :                : static XLogRecPtr LocalMinRecoveryPoint;
                                667                 :                : static TimeLineID LocalMinRecoveryPointTLI;
                                668                 :                : static bool updateMinRecoveryPoint = true;
                                669                 :                : 
                                670                 :                : /*
                                671                 :                :  * Local state for ControlFile data_checksum_version.  After initialization
                                672                 :                :  * this is only updated when absorbing a procsignal barrier during interrupt
                                673                 :                :  * processing.  The reason for keeping a copy in backend-private memory is to
                                674                 :                :  * avoid locking for interrogating the data checksum state.  Possible values
                                675                 :                :  * are the data checksum versions defined in storage/checksum.h.
                                676                 :                :  */
                                677                 :                : static ChecksumStateType LocalDataChecksumState = 0;
                                678                 :                : 
                                679                 :                : /*
                                680                 :                :  * Variable backing the GUC, keep it in sync with LocalDataChecksumState.
                                681                 :                :  * See SetLocalDataChecksumState().
                                682                 :                :  */
                                683                 :                : int         data_checksums = 0;
                                684                 :                : 
                                685                 :                : /* For WALInsertLockAcquire/Release functions */
                                686                 :                : static int  MyLockNo = 0;
                                687                 :                : static bool holdingAllLocks = false;
                                688                 :                : 
                                689                 :                : #ifdef WAL_DEBUG
                                690                 :                : static MemoryContext walDebugCxt = NULL;
                                691                 :                : #endif
                                692                 :                : 
                                693                 :                : static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
                                694                 :                :                                         XLogRecPtr EndOfLog,
                                695                 :                :                                         TimeLineID newTLI);
                                696                 :                : static void CheckRequiredParameterValues(void);
                                697                 :                : static void XLogReportParameters(void);
                                698                 :                : static int  LocalSetXLogInsertAllowed(void);
                                699                 :                : static void CreateEndOfRecoveryRecord(void);
                                700                 :                : static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
                                701                 :                :                                                   XLogRecPtr pagePtr,
                                702                 :                :                                                   TimeLineID newTLI);
                                703                 :                : static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
                                704                 :                : static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
                                705                 :                : 
                                706                 :                : static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
                                707                 :                :                                   bool opportunistic);
                                708                 :                : static void XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible);
                                709                 :                : static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                                710                 :                :                                    bool find_free, XLogSegNo max_segno,
                                711                 :                :                                    TimeLineID tli);
                                712                 :                : static void XLogFileClose(void);
                                713                 :                : static void PreallocXlogFiles(XLogRecPtr endptr, TimeLineID tli);
                                714                 :                : static void RemoveTempXlogFiles(void);
                                715                 :                : static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr,
                                716                 :                :                                XLogRecPtr endptr, TimeLineID insertTLI);
                                717                 :                : static void RemoveXlogFile(const struct dirent *segment_de,
                                718                 :                :                            XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo,
                                719                 :                :                            TimeLineID insertTLI);
                                720                 :                : static void UpdateLastRemovedPtr(char *filename);
                                721                 :                : static void ValidateXLOGDirectoryStructure(void);
                                722                 :                : static void CleanupBackupHistory(void);
                                723                 :                : static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
                                724                 :                : static bool PerformRecoveryXLogAction(void);
                                725                 :                : static void InitControlFile(uint64 sysidentifier, uint32 data_checksum_version);
                                726                 :                : static void WriteControlFile(void);
                                727                 :                : static void ReadControlFile(void);
                                728                 :                : static void UpdateControlFile(void);
                                729                 :                : static char *str_time(pg_time_t tnow, char *buf, size_t bufsize);
                                730                 :                : 
                                731                 :                : static int  get_sync_bit(int method);
                                732                 :                : 
                                733                 :                : static void CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
                                734                 :                :                                 XLogRecData *rdata,
                                735                 :                :                                 XLogRecPtr StartPos, XLogRecPtr EndPos,
                                736                 :                :                                 TimeLineID tli);
                                737                 :                : static void ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos,
                                738                 :                :                                       XLogRecPtr *EndPos, XLogRecPtr *PrevPtr);
                                739                 :                : static bool ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos,
                                740                 :                :                               XLogRecPtr *PrevPtr);
                                741                 :                : static XLogRecPtr WaitXLogInsertionsToFinish(XLogRecPtr upto);
                                742                 :                : static char *GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli);
                                743                 :                : static XLogRecPtr XLogBytePosToRecPtr(uint64 bytepos);
                                744                 :                : static XLogRecPtr XLogBytePosToEndRecPtr(uint64 bytepos);
                                745                 :                : static uint64 XLogRecPtrToBytePos(XLogRecPtr ptr);
                                746                 :                : 
                                747                 :                : static void WALInsertLockAcquire(void);
                                748                 :                : static void WALInsertLockAcquireExclusive(void);
                                749                 :                : static void WALInsertLockRelease(void);
                                750                 :                : static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
                                751                 :                : 
                                752                 :                : static void XLogChecksums(uint32 new_type);
                                753                 :                : 
                                754                 :                : /*
                                755                 :                :  * Insert an XLOG record represented by an already-constructed chain of data
                                756                 :                :  * chunks.  This is a low-level routine; to construct the WAL record header
                                757                 :                :  * and data, use the higher-level routines in xloginsert.c.
                                758                 :                :  *
                                759                 :                :  * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this
                                760                 :                :  * WAL record applies to, that were not included in the record as full page
                                761                 :                :  * images.  If fpw_lsn <= RedoRecPtr, the function does not perform the
                                762                 :                :  * insertion and returns InvalidXLogRecPtr.  The caller can then recalculate
                                763                 :                :  * which pages need a full-page image, and retry.  If fpw_lsn is invalid, the
                                764                 :                :  * record is always inserted.
                                765                 :                :  *
                                766                 :                :  * 'flags' gives more in-depth control on the record being inserted. See
                                767                 :                :  * XLogSetRecordFlags() for details.
                                768                 :                :  *
                                769                 :                :  * 'topxid_included' tells whether the top-transaction id is logged along with
                                770                 :                :  * current subtransaction. See XLogRecordAssemble().
                                771                 :                :  *
                                772                 :                :  * The first XLogRecData in the chain must be for the record header, and its
                                773                 :                :  * data must be MAXALIGNed.  XLogInsertRecord fills in the xl_prev and
                                774                 :                :  * xl_crc fields in the header, the rest of the header must already be filled
                                775                 :                :  * by the caller.
                                776                 :                :  *
                                777                 :                :  * Returns XLOG pointer to end of record (beginning of next record).
                                778                 :                :  * This can be used as LSN for data pages affected by the logged action.
                                779                 :                :  * (LSN is the XLOG point up to which the XLOG must be flushed to disk
                                780                 :                :  * before the data page can be written out.  This implements the basic
                                781                 :                :  * WAL rule "write the log before the data".)
                                782                 :                :  */
                                783                 :                : XLogRecPtr
 3503 andres@anarazel.de        784                 :CBC    24612579 : XLogInsertRecord(XLogRecData *rdata,
                                785                 :                :                  XLogRecPtr fpw_lsn,
                                786                 :                :                  uint8 flags,
                                787                 :                :                  int num_fpi,
                                788                 :                :                  uint64 fpi_bytes,
                                789                 :                :                  bool topxid_included)
                                790                 :                : {
 9257 bruce@momjian.us          791                 :       24612579 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                                792                 :                :     pg_crc32c   rdata_crc;
                                793                 :                :     bool        inserted;
 4280 heikki.linnakangas@i      794                 :       24612579 :     XLogRecord *rechdr = (XLogRecord *) rdata->data;
 3551 tgl@sss.pgh.pa.us         795                 :       24612579 :     uint8       info = rechdr->xl_info & ~XLR_INFO_MASK;
 1011 rhaas@postgresql.org      796                 :       24612579 :     WalInsertClass class = WALINSERT_NORMAL;
                                797                 :                :     XLogRecPtr  StartPos;
                                798                 :                :     XLogRecPtr  EndPos;
 2873 akapila@postgresql.o      799                 :       24612579 :     bool        prevDoPageWrites = doPageWrites;
                                800                 :                :     TimeLineID  insertTLI;
                                801                 :                : 
                                802                 :                :     /* Does this record type require special handling? */
 1011 rhaas@postgresql.org      803         [ +  + ]:       24612579 :     if (unlikely(rechdr->xl_rmid == RM_XLOG_ID))
                                804                 :                :     {
                                805         [ +  + ]:         330884 :         if (info == XLOG_SWITCH)
                                806                 :            837 :             class = WALINSERT_SPECIAL_SWITCH;
                                807         [ +  + ]:         330047 :         else if (info == XLOG_CHECKPOINT_REDO)
                                808                 :           1000 :             class = WALINSERT_SPECIAL_CHECKPOINT;
                                809                 :                :     }
                                810                 :                : 
                                811                 :                :     /* we assume that all of the record header is in the first chunk */
 4266 heikki.linnakangas@i      812         [ -  + ]:       24612579 :     Assert(rdata->len >= SizeOfXLogRecord);
                                813                 :                : 
                                814                 :                :     /* cross-check on whether we should be here or not */
 6239 tgl@sss.pgh.pa.us         815         [ -  + ]:       24612579 :     if (!XLogInsertAllowed())
 6239 tgl@sss.pgh.pa.us         816         [ #  # ]:UBC           0 :         elog(ERROR, "cannot make new WAL entries during recovery");
                                817                 :                : 
                                818                 :                :     /*
                                819                 :                :      * Given that we're not in recovery, InsertTimeLineID is set and can't
                                820                 :                :      * change, so we can read it without a lock.
                                821                 :                :      */
 1719 rhaas@postgresql.org      822                 :CBC    24612579 :     insertTLI = XLogCtl->InsertTimeLineID;
                                823                 :                : 
                                824                 :                :     /*----------
                                825                 :                :      *
                                826                 :                :      * We have now done all the preparatory work we can without holding a
                                827                 :                :      * lock or modifying shared state. From here on, inserting the new WAL
                                828                 :                :      * record to the shared WAL buffer cache is a two-step process:
                                829                 :                :      *
                                830                 :                :      * 1. Reserve the right amount of space from the WAL. The current head of
                                831                 :                :      *    reserved space is kept in Insert->CurrBytePos, and is protected by
                                832                 :                :      *    insertpos_lck.
                                833                 :                :      *
                                834                 :                :      * 2. Copy the record to the reserved WAL space. This involves finding the
                                835                 :                :      *    correct WAL buffer containing the reserved space, and copying the
                                836                 :                :      *    record in place. This can be done concurrently in multiple processes.
                                837                 :                :      *
                                838                 :                :      * To keep track of which insertions are still in-progress, each concurrent
                                839                 :                :      * inserter acquires an insertion lock. In addition to just indicating that
                                840                 :                :      * an insertion is in progress, the lock tells others how far the inserter
                                841                 :                :      * has progressed. There is a small fixed number of insertion locks,
                                842                 :                :      * determined by NUM_XLOGINSERT_LOCKS. When an inserter crosses a page
                                843                 :                :      * boundary, it updates the value stored in the lock to the how far it has
                                844                 :                :      * inserted, to allow the previous buffer to be flushed.
                                845                 :                :      *
                                846                 :                :      * Holding onto an insertion lock also protects RedoRecPtr and
                                847                 :                :      * fullPageWrites from changing until the insertion is finished.
                                848                 :                :      *
                                849                 :                :      * Step 2 can usually be done completely in parallel. If the required WAL
                                850                 :                :      * page is not initialized yet, you have to grab WALBufMappingLock to
                                851                 :                :      * initialize it, but the WAL writer tries to do that ahead of insertions
                                852                 :                :      * to avoid that from happening in the critical path.
                                853                 :                :      *
                                854                 :                :      *----------
                                855                 :                :      */
 5310 heikki.linnakangas@i      856                 :       24612579 :     START_CRIT_SECTION();
                                857                 :                : 
 1011 rhaas@postgresql.org      858         [ +  + ]:       24612579 :     if (likely(class == WALINSERT_NORMAL))
                                859                 :                :     {
 1020                           860                 :       24610742 :         WALInsertLockAcquire();
                                861                 :                : 
                                862                 :                :         /*
                                863                 :                :          * Check to see if my copy of RedoRecPtr is out of date. If so, may
                                864                 :                :          * have to go back and have the caller recompute everything. This can
                                865                 :                :          * only happen just after a checkpoint, so it's better to be slow in
                                866                 :                :          * this case and fast otherwise.
                                867                 :                :          *
                                868                 :                :          * Also check to see if fullPageWrites was just turned on or there's a
                                869                 :                :          * running backup (which forces full-page writes); if we weren't
                                870                 :                :          * already doing full-page writes then go back and recompute.
                                871                 :                :          *
                                872                 :                :          * If we aren't doing full-page writes then RedoRecPtr doesn't
                                873                 :                :          * actually affect the contents of the XLOG record, so we'll update
                                874                 :                :          * our local copy but not force a recomputation.  (If doPageWrites was
                                875                 :                :          * just turned off, we could recompute the record without full pages,
                                876                 :                :          * but we choose not to bother.)
                                877                 :                :          */
                                878         [ +  + ]:       24610742 :         if (RedoRecPtr != Insert->RedoRecPtr)
                                879                 :                :         {
                                880         [ -  + ]:           7717 :             Assert(RedoRecPtr < Insert->RedoRecPtr);
                                881                 :           7717 :             RedoRecPtr = Insert->RedoRecPtr;
                                882                 :                :         }
                                883   [ +  +  +  + ]:       24610742 :         doPageWrites = (Insert->fullPageWrites || Insert->runningBackups > 0);
                                884                 :                : 
                                885         [ +  + ]:       24610742 :         if (doPageWrites &&
                                886   [ +  +  +  + ]:       22304060 :             (!prevDoPageWrites ||
  262 alvherre@kurilemu.de      887         [ +  + ]:       21288034 :              (XLogRecPtrIsValid(fpw_lsn) && fpw_lsn <= RedoRecPtr)))
                                888                 :                :         {
                                889                 :                :             /*
                                890                 :                :              * Oops, some buffer now needs to be backed up that the caller
                                891                 :                :              * didn't back up.  Start over.
                                892                 :                :              */
 1020 rhaas@postgresql.org      893                 :           8420 :             WALInsertLockRelease();
                                894         [ -  + ]:           8420 :             END_CRIT_SECTION();
                                895                 :           8420 :             return InvalidXLogRecPtr;
                                896                 :                :         }
                                897                 :                : 
                                898                 :                :         /*
                                899                 :                :          * Reserve space for the record in the WAL. This also sets the xl_prev
                                900                 :                :          * pointer.
                                901                 :                :          */
 4280 heikki.linnakangas@i      902                 :       24602322 :         ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
                                903                 :                :                                   &rechdr->xl_prev);
                                904                 :                : 
                                905                 :                :         /* Normal records are always inserted. */
 4766                           906                 :       24602322 :         inserted = true;
                                907                 :                :     }
 1011 rhaas@postgresql.org      908         [ +  + ]:           1837 :     else if (class == WALINSERT_SPECIAL_SWITCH)
                                909                 :                :     {
                                910                 :                :         /*
                                911                 :                :          * In order to insert an XLOG_SWITCH record, we need to hold all of
                                912                 :                :          * the WAL insertion locks, not just one, so that no one else can
                                913                 :                :          * begin inserting a record until we've figured out how much space
                                914                 :                :          * remains in the current WAL segment and claimed all of it.
                                915                 :                :          *
                                916                 :                :          * Nonetheless, this case is simpler than the normal cases handled
                                917                 :                :          * below, which must check for changes in doPageWrites and RedoRecPtr.
                                918                 :                :          * Those checks are only needed for records that can contain buffer
                                919                 :                :          * references, and an XLOG_SWITCH record never does.
                                920                 :                :          */
  262 alvherre@kurilemu.de      921         [ -  + ]:            837 :         Assert(!XLogRecPtrIsValid(fpw_lsn));
 1020 rhaas@postgresql.org      922                 :            837 :         WALInsertLockAcquireExclusive();
                                923                 :            837 :         inserted = ReserveXLogSwitch(&StartPos, &EndPos, &rechdr->xl_prev);
                                924                 :                :     }
                                925                 :                :     else
                                926                 :                :     {
 1011                           927         [ -  + ]:           1000 :         Assert(class == WALINSERT_SPECIAL_CHECKPOINT);
                                928                 :                : 
                                929                 :                :         /*
                                930                 :                :          * We need to update both the local and shared copies of RedoRecPtr,
                                931                 :                :          * which means that we need to hold all the WAL insertion locks.
                                932                 :                :          * However, there can't be any buffer references, so as above, we need
                                933                 :                :          * not check RedoRecPtr before inserting the record; we just need to
                                934                 :                :          * update it afterwards.
                                935                 :                :          */
  262 alvherre@kurilemu.de      936         [ -  + ]:           1000 :         Assert(!XLogRecPtrIsValid(fpw_lsn));
 1011 rhaas@postgresql.org      937                 :           1000 :         WALInsertLockAcquireExclusive();
                                938                 :           1000 :         ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
                                939                 :                :                                   &rechdr->xl_prev);
                                940                 :           1000 :         RedoRecPtr = Insert->RedoRecPtr = StartPos;
                                941                 :           1000 :         inserted = true;
                                942                 :                :     }
                                943                 :                : 
 4766 heikki.linnakangas@i      944         [ +  + ]:       24604159 :     if (inserted)
                                945                 :                :     {
                                946                 :                :         /*
                                947                 :                :          * Now that xl_prev has been filled in, calculate CRC of the record
                                948                 :                :          * header.
                                949                 :                :          */
 4266                           950                 :       24604094 :         rdata_crc = rechdr->xl_crc;
                                951                 :       24604094 :         COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
 4282                           952                 :       24604094 :         FIN_CRC32C(rdata_crc);
 4766                           953                 :       24604094 :         rechdr->xl_crc = rdata_crc;
                                954                 :                : 
                                955                 :                :         /*
                                956                 :                :          * All the record data, including the header, is now ready to be
                                957                 :                :          * inserted. Copy the record in the space reserved.
                                958                 :                :          */
 1011 rhaas@postgresql.org      959                 :       24604094 :         CopyXLogRecordToWAL(rechdr->xl_tot_len,
                                960                 :                :                             class == WALINSERT_SPECIAL_SWITCH, rdata,
                                961                 :                :                             StartPos, EndPos, insertTLI);
                                962                 :                : 
                                963                 :                :         /*
                                964                 :                :          * Unless record is flagged as not important, update LSN of last
                                965                 :                :          * important record in the current slot. When holding all locks, just
                                966                 :                :          * update the first one.
                                967                 :                :          */
 3503 andres@anarazel.de        968         [ +  + ]:       24604094 :         if ((flags & XLOG_MARK_UNIMPORTANT) == 0)
                                969                 :                :         {
 3357 bruce@momjian.us          970         [ +  + ]:       24443075 :             int         lockno = holdingAllLocks ? 0 : MyLockNo;
                                971                 :                : 
 3503 andres@anarazel.de        972                 :       24443075 :             WALInsertLocks[lockno].l.lastImportantAt = StartPos;
                                973                 :                :         }
                                974                 :                :     }
                                975                 :                :     else
                                976                 :                :     {
                                977                 :                :         /*
                                978                 :                :          * This was an xlog-switch record, but the current insert location was
                                979                 :                :          * already exactly at the beginning of a segment, so there was no need
                                980                 :                :          * to do anything.
                                981                 :                :          */
                                982                 :                :     }
                                983                 :                : 
                                984                 :                :     /*
                                985                 :                :      * Done! Let others know that we're finished.
                                986                 :                :      */
 4510 heikki.linnakangas@i      987                 :       24604159 :     WALInsertLockRelease();
                                988                 :                : 
 4766                           989         [ -  + ]:       24604159 :     END_CRIT_SECTION();
                                990                 :                : 
 1727 akapila@postgresql.o      991                 :       24604159 :     MarkCurrentTransactionIdLoggedIfAny();
                                992                 :                : 
                                993                 :                :     /*
                                994                 :                :      * Mark top transaction id is logged (if needed) so that we should not try
                                995                 :                :      * to log it again with the next WAL record in the current subtransaction.
                                996                 :                :      */
                                997         [ +  + ]:       24604159 :     if (topxid_included)
                                998                 :            223 :         MarkSubxactTopXidLogged();
                                999                 :                : 
                               1000                 :                :     /*
                               1001                 :                :      * Update shared LogwrtRqst.Write, if we crossed page boundary.
                               1002                 :                :      */
 4766 heikki.linnakangas@i     1003         [ +  + ]:       24604159 :     if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
                               1004                 :                :     {
 4325 andres@anarazel.de       1005                 :        1909063 :         SpinLockAcquire(&XLogCtl->info_lck);
                               1006                 :                :         /* advance global request to include new block(s) */
                               1007         [ +  + ]:        1909063 :         if (XLogCtl->LogwrtRqst.Write < EndPos)
                               1008                 :        1835640 :             XLogCtl->LogwrtRqst.Write = EndPos;
                               1009                 :        1909063 :         SpinLockRelease(&XLogCtl->info_lck);
  842 alvherre@alvh.no-ip.     1010                 :        1909063 :         RefreshXLogWriteResult(LogwrtResult);
                               1011                 :                :     }
                               1012                 :                : 
                               1013                 :                :     /*
                               1014                 :                :      * If this was an XLOG_SWITCH record, flush the record and the empty
                               1015                 :                :      * padding space that fills the rest of the segment, and perform
                               1016                 :                :      * end-of-segment actions (eg, notifying archiver).
                               1017                 :                :      */
 1011 rhaas@postgresql.org     1018         [ +  + ]:       24604159 :     if (class == WALINSERT_SPECIAL_SWITCH)
                               1019                 :                :     {
                               1020                 :                :         TRACE_POSTGRESQL_WAL_SWITCH();
 4766 heikki.linnakangas@i     1021                 :            837 :         XLogFlush(EndPos);
                               1022                 :                : 
                               1023                 :                :         /*
                               1024                 :                :          * Even though we reserved the rest of the segment for us, which is
                               1025                 :                :          * reflected in EndPos, we return a pointer to just the end of the
                               1026                 :                :          * xlog-switch record.
                               1027                 :                :          */
                               1028         [ +  + ]:            837 :         if (inserted)
                               1029                 :                :         {
                               1030                 :            772 :             EndPos = StartPos + SizeOfXLogRecord;
                               1031         [ +  + ]:            772 :             if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
                               1032                 :                :             {
 3232 andres@anarazel.de       1033                 :              1 :                 uint64      offset = XLogSegmentOffset(EndPos, wal_segment_size);
                               1034                 :                : 
                               1035         [ -  + ]:              1 :                 if (offset == EndPos % XLOG_BLCKSZ)
 4766 heikki.linnakangas@i     1036                 :UBC           0 :                     EndPos += SizeOfXLogLongPHD;
                               1037                 :                :                 else
 4766 heikki.linnakangas@i     1038                 :CBC           1 :                     EndPos += SizeOfXLogShortPHD;
                               1039                 :                :             }
                               1040                 :                :         }
                               1041                 :                :     }
                               1042                 :                : 
                               1043                 :                : #ifdef WAL_DEBUG
                               1044                 :                :     if (XLOG_DEBUG)
                               1045                 :                :     {
                               1046                 :                :         static XLogReaderState *debug_reader = NULL;
                               1047                 :                :         XLogRecord *record;
                               1048                 :                :         DecodedXLogRecord *decoded;
                               1049                 :                :         StringInfoData buf;
                               1050                 :                :         StringInfoData recordBuf;
                               1051                 :                :         char       *errormsg = NULL;
                               1052                 :                :         MemoryContext oldCxt;
                               1053                 :                : 
                               1054                 :                :         oldCxt = MemoryContextSwitchTo(walDebugCxt);
                               1055                 :                : 
                               1056                 :                :         initStringInfo(&buf);
                               1057                 :                :         appendStringInfo(&buf, "INSERT @ %X/%08X: ", LSN_FORMAT_ARGS(EndPos));
                               1058                 :                : 
                               1059                 :                :         /*
                               1060                 :                :          * We have to piece together the WAL record data from the XLogRecData
                               1061                 :                :          * entries, so that we can pass it to the rm_desc function as one
                               1062                 :                :          * contiguous chunk.
                               1063                 :                :          */
                               1064                 :                :         initStringInfo(&recordBuf);
                               1065                 :                :         for (; rdata != NULL; rdata = rdata->next)
                               1066                 :                :             appendBinaryStringInfo(&recordBuf, rdata->data, rdata->len);
                               1067                 :                : 
                               1068                 :                :         /* We also need temporary space to decode the record. */
                               1069                 :                :         record = (XLogRecord *) recordBuf.data;
                               1070                 :                :         decoded = (DecodedXLogRecord *)
                               1071                 :                :             palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len));
                               1072                 :                : 
                               1073                 :                :         if (!debug_reader)
                               1074                 :                :             debug_reader = XLogReaderAllocate(wal_segment_size, NULL,
                               1075                 :                :                                               XL_ROUTINE(.page_read = NULL,
                               1076                 :                :                                                          .segment_open = NULL,
                               1077                 :                :                                                          .segment_close = NULL),
                               1078                 :                :                                               NULL);
                               1079                 :                :         if (!debug_reader)
                               1080                 :                :         {
                               1081                 :                :             appendStringInfoString(&buf, "error decoding record: out of memory while allocating a WAL reading processor");
                               1082                 :                :         }
                               1083                 :                :         else if (!DecodeXLogRecord(debug_reader,
                               1084                 :                :                                    decoded,
                               1085                 :                :                                    record,
                               1086                 :                :                                    EndPos,
                               1087                 :                :                                    &errormsg))
                               1088                 :                :         {
                               1089                 :                :             appendStringInfo(&buf, "error decoding record: %s",
                               1090                 :                :                              errormsg ? errormsg : "no error message");
                               1091                 :                :         }
                               1092                 :                :         else
                               1093                 :                :         {
                               1094                 :                :             appendStringInfoString(&buf, " - ");
                               1095                 :                : 
                               1096                 :                :             debug_reader->record = decoded;
                               1097                 :                :             xlog_outdesc(&buf, debug_reader);
                               1098                 :                :             debug_reader->record = NULL;
                               1099                 :                :         }
                               1100                 :                :         elog(LOG, "%s", buf.data);
                               1101                 :                : 
                               1102                 :                :         pfree(decoded);
                               1103                 :                :         pfree(buf.data);
                               1104                 :                :         pfree(recordBuf.data);
                               1105                 :                :         MemoryContextSwitchTo(oldCxt);
                               1106                 :                :     }
                               1107                 :                : #endif
                               1108                 :                : 
                               1109                 :                :     /*
                               1110                 :                :      * Update our global variables
                               1111                 :                :      */
                               1112                 :       24604159 :     ProcLastRecPtr = StartPos;
                               1113                 :       24604159 :     XactLastRecEnd = EndPos;
                               1114                 :                : 
                               1115                 :                :     /* Report WAL traffic to the instrumentation. */
 2304 akapila@postgresql.o     1116         [ +  + ]:       24604159 :     if (inserted)
                               1117                 :                :     {
                               1118                 :       24604094 :         pgWalUsage.wal_bytes += rechdr->xl_tot_len;
                               1119                 :       24604094 :         pgWalUsage.wal_records++;
 2273                          1120                 :       24604094 :         pgWalUsage.wal_fpi += num_fpi;
  270 michael@paquier.xyz      1121                 :       24604094 :         pgWalUsage.wal_fpi_bytes += fpi_bytes;
                               1122                 :                : 
                               1123                 :                :         /* Required for the flush of pending stats WAL data */
  363                          1124                 :       24604094 :         pgstat_report_fixed = true;
                               1125                 :                :     }
                               1126                 :                : 
 4766 heikki.linnakangas@i     1127                 :       24604159 :     return EndPos;
                               1128                 :                : }
                               1129                 :                : 
                               1130                 :                : /*
                               1131                 :                :  * Reserves the right amount of space for a record of given size from the WAL.
                               1132                 :                :  * *StartPos is set to the beginning of the reserved section, *EndPos to
                               1133                 :                :  * its end+1. *PrevPtr is set to the beginning of the previous record; it is
                               1134                 :                :  * used to set the xl_prev of this record.
                               1135                 :                :  *
                               1136                 :                :  * This is the performance critical part of XLogInsert that must be serialized
                               1137                 :                :  * across backends. The rest can happen mostly in parallel. Try to keep this
                               1138                 :                :  * section as short as possible, insertpos_lck can be heavily contended on a
                               1139                 :                :  * busy system.
                               1140                 :                :  *
                               1141                 :                :  * NB: The space calculation here must match the code in CopyXLogRecordToWAL,
                               1142                 :                :  * where we actually copy the record to the reserved space.
                               1143                 :                :  *
                               1144                 :                :  * NB: Testing shows that XLogInsertRecord runs faster if this code is inlined;
                               1145                 :                :  * however, because there are two call sites, the compiler is reluctant to
                               1146                 :                :  * inline. We use pg_always_inline here to try to convince it.
                               1147                 :                :  */
                               1148                 :                : static pg_always_inline void
                               1149                 :       24603322 : ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos, XLogRecPtr *EndPos,
                               1150                 :                :                           XLogRecPtr *PrevPtr)
                               1151                 :                : {
 4325 andres@anarazel.de       1152                 :       24603322 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1153                 :                :     uint64      startbytepos;
                               1154                 :                :     uint64      endbytepos;
                               1155                 :                :     uint64      prevbytepos;
                               1156                 :                : 
 4766 heikki.linnakangas@i     1157                 :       24603322 :     size = MAXALIGN(size);
                               1158                 :                : 
                               1159                 :                :     /* All (non xlog-switch) records should contain data. */
                               1160         [ -  + ]:       24603322 :     Assert(size > SizeOfXLogRecord);
                               1161                 :                : 
                               1162                 :                :     /*
                               1163                 :                :      * The duration the spinlock needs to be held is minimized by minimizing
                               1164                 :                :      * the calculations that have to be done while holding the lock. The
                               1165                 :                :      * current tip of reserved WAL is kept in CurrBytePos, as a byte position
                               1166                 :                :      * that only counts "usable" bytes in WAL, that is, it excludes all WAL
                               1167                 :                :      * page headers. The mapping between "usable" byte positions and physical
                               1168                 :                :      * positions (XLogRecPtrs) can be done outside the locked region, and
                               1169                 :                :      * because the usable byte position doesn't include any headers, reserving
                               1170                 :                :      * X bytes from WAL is almost as simple as "CurrBytePos += X".
                               1171                 :                :      */
                               1172                 :       24603322 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1173                 :                : 
                               1174                 :       24603322 :     startbytepos = Insert->CurrBytePos;
                               1175                 :       24603322 :     endbytepos = startbytepos + size;
                               1176                 :       24603322 :     prevbytepos = Insert->PrevBytePos;
                               1177                 :       24603322 :     Insert->CurrBytePos = endbytepos;
                               1178                 :       24603322 :     Insert->PrevBytePos = startbytepos;
                               1179                 :                : 
                               1180                 :       24603322 :     SpinLockRelease(&Insert->insertpos_lck);
                               1181                 :                : 
                               1182                 :       24603322 :     *StartPos = XLogBytePosToRecPtr(startbytepos);
                               1183                 :       24603322 :     *EndPos = XLogBytePosToEndRecPtr(endbytepos);
                               1184                 :       24603322 :     *PrevPtr = XLogBytePosToRecPtr(prevbytepos);
                               1185                 :                : 
                               1186                 :                :     /*
                               1187                 :                :      * Check that the conversions between "usable byte positions" and
                               1188                 :                :      * XLogRecPtrs work consistently in both directions.
                               1189                 :                :      */
                               1190         [ -  + ]:       24603322 :     Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
                               1191         [ -  + ]:       24603322 :     Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
                               1192         [ -  + ]:       24603322 :     Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
                               1193                 :       24603322 : }
                               1194                 :                : 
                               1195                 :                : /*
                               1196                 :                :  * Like ReserveXLogInsertLocation(), but for an xlog-switch record.
                               1197                 :                :  *
                               1198                 :                :  * A log-switch record is handled slightly differently. The rest of the
                               1199                 :                :  * segment will be reserved for this insertion, as indicated by the returned
                               1200                 :                :  * *EndPos value. However, if we are already at the beginning of the current
                               1201                 :                :  * segment, *StartPos and *EndPos are set to the current location without
                               1202                 :                :  * reserving any space, and the function returns false.
                               1203                 :                :  */
                               1204                 :                : static bool
                               1205                 :            837 : ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos, XLogRecPtr *PrevPtr)
                               1206                 :                : {
 4325 andres@anarazel.de       1207                 :            837 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1208                 :                :     uint64      startbytepos;
                               1209                 :                :     uint64      endbytepos;
                               1210                 :                :     uint64      prevbytepos;
 4266 heikki.linnakangas@i     1211                 :            837 :     uint32      size = MAXALIGN(SizeOfXLogRecord);
                               1212                 :                :     XLogRecPtr  ptr;
                               1213                 :                :     uint32      segleft;
                               1214                 :                : 
                               1215                 :                :     /*
                               1216                 :                :      * These calculations are a bit heavy-weight to be done while holding a
                               1217                 :                :      * spinlock, but since we're holding all the WAL insertion locks, there
                               1218                 :                :      * are no other inserters competing for it. GetXLogInsertRecPtr() does
                               1219                 :                :      * compete for it, but that's not called very frequently.
                               1220                 :                :      */
 4766                          1221                 :            837 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1222                 :                : 
                               1223                 :            837 :     startbytepos = Insert->CurrBytePos;
                               1224                 :                : 
                               1225                 :            837 :     ptr = XLogBytePosToEndRecPtr(startbytepos);
 3232 andres@anarazel.de       1226         [ +  + ]:            837 :     if (XLogSegmentOffset(ptr, wal_segment_size) == 0)
                               1227                 :                :     {
 4766 heikki.linnakangas@i     1228                 :             65 :         SpinLockRelease(&Insert->insertpos_lck);
                               1229                 :             65 :         *EndPos = *StartPos = ptr;
                               1230                 :             65 :         return false;
                               1231                 :                :     }
                               1232                 :                : 
                               1233                 :            772 :     endbytepos = startbytepos + size;
                               1234                 :            772 :     prevbytepos = Insert->PrevBytePos;
                               1235                 :                : 
                               1236                 :            772 :     *StartPos = XLogBytePosToRecPtr(startbytepos);
                               1237                 :            772 :     *EndPos = XLogBytePosToEndRecPtr(endbytepos);
                               1238                 :                : 
 3232 andres@anarazel.de       1239                 :            772 :     segleft = wal_segment_size - XLogSegmentOffset(*EndPos, wal_segment_size);
                               1240         [ +  - ]:            772 :     if (segleft != wal_segment_size)
                               1241                 :                :     {
                               1242                 :                :         /* consume the rest of the segment */
 4766 heikki.linnakangas@i     1243                 :            772 :         *EndPos += segleft;
                               1244                 :            772 :         endbytepos = XLogRecPtrToBytePos(*EndPos);
                               1245                 :                :     }
                               1246                 :            772 :     Insert->CurrBytePos = endbytepos;
                               1247                 :            772 :     Insert->PrevBytePos = startbytepos;
                               1248                 :                : 
                               1249                 :            772 :     SpinLockRelease(&Insert->insertpos_lck);
                               1250                 :                : 
                               1251                 :            772 :     *PrevPtr = XLogBytePosToRecPtr(prevbytepos);
                               1252                 :                : 
 3232 andres@anarazel.de       1253         [ -  + ]:            772 :     Assert(XLogSegmentOffset(*EndPos, wal_segment_size) == 0);
 4766 heikki.linnakangas@i     1254         [ -  + ]:            772 :     Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
                               1255         [ -  + ]:            772 :     Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
                               1256         [ -  + ]:            772 :     Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
                               1257                 :                : 
                               1258                 :            772 :     return true;
                               1259                 :                : }
                               1260                 :                : 
                               1261                 :                : /*
                               1262                 :                :  * Subroutine of XLogInsertRecord.  Copies a WAL record to an already-reserved
                               1263                 :                :  * area in the WAL.
                               1264                 :                :  */
                               1265                 :                : static void
                               1266                 :       24604094 : CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
                               1267                 :                :                     XLogRecPtr StartPos, XLogRecPtr EndPos, TimeLineID tli)
                               1268                 :                : {
                               1269                 :                :     char       *currpos;
                               1270                 :                :     int         freespace;
                               1271                 :                :     int         written;
                               1272                 :                :     XLogRecPtr  CurrPos;
                               1273                 :                :     XLogPageHeader pagehdr;
                               1274                 :                : 
                               1275                 :                :     /*
                               1276                 :                :      * Get a pointer to the right place in the right WAL buffer to start
                               1277                 :                :      * inserting to.
                               1278                 :                :      */
                               1279                 :       24604094 :     CurrPos = StartPos;
 1724 rhaas@postgresql.org     1280                 :       24604094 :     currpos = GetXLogBuffer(CurrPos, tli);
 4766 heikki.linnakangas@i     1281         [ +  - ]:       24604094 :     freespace = INSERT_FREESPACE(CurrPos);
                               1282                 :                : 
                               1283                 :                :     /*
                               1284                 :                :      * there should be enough space for at least the first field (xl_tot_len)
                               1285                 :                :      * on this page.
                               1286                 :                :      */
                               1287         [ -  + ]:       24604094 :     Assert(freespace >= sizeof(uint32));
                               1288                 :                : 
                               1289                 :                :     /* Copy record data */
                               1290                 :       24604094 :     written = 0;
                               1291         [ +  + ]:      113082521 :     while (rdata != NULL)
                               1292                 :                :     {
  691 peter@eisentraut.org     1293                 :       88478427 :         const char *rdata_data = rdata->data;
 4766 heikki.linnakangas@i     1294                 :       88478427 :         int         rdata_len = rdata->len;
                               1295                 :                : 
                               1296         [ +  + ]:       90502415 :         while (rdata_len > freespace)
                               1297                 :                :         {
                               1298                 :                :             /*
                               1299                 :                :              * Write what fits on this page, and continue on the next page.
                               1300                 :                :              */
                               1301   [ +  +  -  + ]:        2023988 :             Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || freespace == 0);
                               1302                 :        2023988 :             memcpy(currpos, rdata_data, freespace);
                               1303                 :        2023988 :             rdata_data += freespace;
                               1304                 :        2023988 :             rdata_len -= freespace;
                               1305                 :        2023988 :             written += freespace;
                               1306                 :        2023988 :             CurrPos += freespace;
                               1307                 :                : 
                               1308                 :                :             /*
                               1309                 :                :              * Get pointer to beginning of next page, and set the xlp_rem_len
                               1310                 :                :              * in the page header. Set XLP_FIRST_IS_CONTRECORD.
                               1311                 :                :              *
                               1312                 :                :              * It's safe to set the contrecord flag and xlp_rem_len without a
                               1313                 :                :              * lock on the page. All the other flags were already set when the
                               1314                 :                :              * page was initialized, in AdvanceXLInsertBuffer, and we're the
                               1315                 :                :              * only backend that needs to set the contrecord flag.
                               1316                 :                :              */
 1724 rhaas@postgresql.org     1317                 :        2023988 :             currpos = GetXLogBuffer(CurrPos, tli);
 4766 heikki.linnakangas@i     1318                 :        2023988 :             pagehdr = (XLogPageHeader) currpos;
                               1319                 :        2023988 :             pagehdr->xlp_rem_len = write_len - written;
                               1320                 :        2023988 :             pagehdr->xlp_info |= XLP_FIRST_IS_CONTRECORD;
                               1321                 :                : 
                               1322                 :                :             /* skip over the page header */
 3232 andres@anarazel.de       1323         [ +  + ]:        2023988 :             if (XLogSegmentOffset(CurrPos, wal_segment_size) == 0)
                               1324                 :                :             {
 4766 heikki.linnakangas@i     1325                 :           1292 :                 CurrPos += SizeOfXLogLongPHD;
                               1326                 :           1292 :                 currpos += SizeOfXLogLongPHD;
                               1327                 :                :             }
                               1328                 :                :             else
                               1329                 :                :             {
                               1330                 :        2022696 :                 CurrPos += SizeOfXLogShortPHD;
                               1331                 :        2022696 :                 currpos += SizeOfXLogShortPHD;
                               1332                 :                :             }
                               1333         [ +  - ]:        2023988 :             freespace = INSERT_FREESPACE(CurrPos);
                               1334                 :                :         }
                               1335                 :                : 
                               1336   [ -  +  -  - ]:       88478427 :         Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || rdata_len == 0);
                               1337                 :       88478427 :         memcpy(currpos, rdata_data, rdata_len);
                               1338                 :       88478427 :         currpos += rdata_len;
                               1339                 :       88478427 :         CurrPos += rdata_len;
                               1340                 :       88478427 :         freespace -= rdata_len;
                               1341                 :       88478427 :         written += rdata_len;
                               1342                 :                : 
                               1343                 :       88478427 :         rdata = rdata->next;
                               1344                 :                :     }
                               1345         [ -  + ]:       24604094 :     Assert(written == write_len);
                               1346                 :                : 
                               1347                 :                :     /*
                               1348                 :                :      * If this was an xlog-switch, it's not enough to write the switch record,
                               1349                 :                :      * we also have to consume all the remaining space in the WAL segment.  We
                               1350                 :                :      * have already reserved that space, but we need to actually fill it.
                               1351                 :                :      */
 3232 andres@anarazel.de       1352   [ +  +  +  - ]:       24604094 :     if (isLogSwitch && XLogSegmentOffset(CurrPos, wal_segment_size) != 0)
                               1353                 :                :     {
                               1354                 :                :         /* An xlog-switch record doesn't contain any data besides the header */
 4766 heikki.linnakangas@i     1355         [ -  + ]:            772 :         Assert(write_len == SizeOfXLogRecord);
                               1356                 :                : 
                               1357                 :                :         /* Assert that we did reserve the right amount of space */
 3232 andres@anarazel.de       1358         [ -  + ]:            772 :         Assert(XLogSegmentOffset(EndPos, wal_segment_size) == 0);
                               1359                 :                : 
                               1360                 :                :         /* Use up all the remaining space on the current page */
 4766 heikki.linnakangas@i     1361                 :            772 :         CurrPos += freespace;
                               1362                 :                : 
                               1363                 :                :         /*
                               1364                 :                :          * Cause all remaining pages in the segment to be flushed, leaving the
                               1365                 :                :          * XLog position where it should be, at the start of the next segment.
                               1366                 :                :          * We do this one page at a time, to make sure we don't deadlock
                               1367                 :                :          * against ourselves if wal_buffers < wal_segment_size.
                               1368                 :                :          */
                               1369         [ +  + ]:         782410 :         while (CurrPos < EndPos)
                               1370                 :                :         {
                               1371                 :                :             /*
                               1372                 :                :              * The minimal action to flush the page would be to call
                               1373                 :                :              * WALInsertLockUpdateInsertingAt(CurrPos) followed by
                               1374                 :                :              * AdvanceXLInsertBuffer(...).  The page would be left initialized
                               1375                 :                :              * mostly to zeros, except for the page header (always the short
                               1376                 :                :              * variant, as this is never a segment's first page).
                               1377                 :                :              *
                               1378                 :                :              * The large vistas of zeros are good for compressibility, but the
                               1379                 :                :              * headers interrupting them every XLOG_BLCKSZ (with values that
                               1380                 :                :              * differ from page to page) are not.  The effect varies with
                               1381                 :                :              * compression tool, but bzip2 for instance compresses about an
                               1382                 :                :              * order of magnitude worse if those headers are left in place.
                               1383                 :                :              *
                               1384                 :                :              * Rather than complicating AdvanceXLInsertBuffer itself (which is
                               1385                 :                :              * called in heavily-loaded circumstances as well as this lightly-
                               1386                 :                :              * loaded one) with variant behavior, we just use GetXLogBuffer
                               1387                 :                :              * (which itself calls the two methods we need) to get the pointer
                               1388                 :                :              * and zero most of the page.  Then we just zero the page header.
                               1389                 :                :              */
 1724 rhaas@postgresql.org     1390                 :         781638 :             currpos = GetXLogBuffer(CurrPos, tli);
 3040 tgl@sss.pgh.pa.us        1391   [ +  -  +  -  :        3126552 :             MemSet(currpos, 0, SizeOfXLogShortPHD);
                                     +  -  +  -  +  
                                                 + ]
                               1392                 :                : 
 4766 heikki.linnakangas@i     1393                 :         781638 :             CurrPos += XLOG_BLCKSZ;
                               1394                 :                :         }
                               1395                 :                :     }
                               1396                 :                :     else
                               1397                 :                :     {
                               1398                 :                :         /* Align the end position, so that the next record starts aligned */
 4266                          1399                 :       24603322 :         CurrPos = MAXALIGN64(CurrPos);
                               1400                 :                :     }
                               1401                 :                : 
 4766                          1402         [ -  + ]:       24604094 :     if (CurrPos != EndPos)
  844 dgustafsson@postgres     1403         [ #  # ]:UBC           0 :         ereport(PANIC,
                               1404                 :                :                 errcode(ERRCODE_DATA_CORRUPTED),
                               1405                 :                :                 errmsg_internal("space reserved for WAL record does not match what was written"));
 4766 heikki.linnakangas@i     1406                 :CBC    24604094 : }
                               1407                 :                : 
                               1408                 :                : /*
                               1409                 :                :  * Acquire a WAL insertion lock, for inserting to WAL.
                               1410                 :                :  */
                               1411                 :                : static void
 4510                          1412                 :       24611753 : WALInsertLockAcquire(void)
                               1413                 :                : {
                               1414                 :                :     bool        immed;
                               1415                 :                : 
                               1416                 :                :     /*
                               1417                 :                :      * It doesn't matter which of the WAL insertion locks we acquire, so try
                               1418                 :                :      * the one we used last time.  If the system isn't particularly busy, it's
                               1419                 :                :      * a good bet that it's still available, and it's good to have some
                               1420                 :                :      * affinity to a particular lock so that you don't unnecessarily bounce
                               1421                 :                :      * cache lines between processes when there's no contention.
                               1422                 :                :      *
                               1423                 :                :      * If this is the first time through in this backend, pick a lock
                               1424                 :                :      * (semi-)randomly.  This allows the locks to be used evenly if you have a
                               1425                 :                :      * lot of very short connections.
                               1426                 :                :      */
                               1427                 :                :     static int  lockToTry = -1;
                               1428                 :                : 
                               1429         [ +  + ]:       24611753 :     if (lockToTry == -1)
  885                          1430                 :           8855 :         lockToTry = MyProcNumber % NUM_XLOGINSERT_LOCKS;
 4510                          1431                 :       24611753 :     MyLockNo = lockToTry;
                               1432                 :                : 
                               1433                 :                :     /*
                               1434                 :                :      * The insertingAt value is initially set to 0, as we don't know our
                               1435                 :                :      * insert location yet.
                               1436                 :                :      */
 4013 andres@anarazel.de       1437                 :       24611753 :     immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
 4510 heikki.linnakangas@i     1438         [ +  + ]:       24611753 :     if (!immed)
                               1439                 :                :     {
                               1440                 :                :         /*
                               1441                 :                :          * If we couldn't get the lock immediately, try another lock next
                               1442                 :                :          * time.  On a system with more insertion locks than concurrent
                               1443                 :                :          * inserters, this causes all the inserters to eventually migrate to a
                               1444                 :                :          * lock that no-one else is using.  On a system with more inserters
                               1445                 :                :          * than locks, it still helps to distribute the inserters evenly
                               1446                 :                :          * across the locks.
                               1447                 :                :          */
 4316                          1448                 :          22529 :         lockToTry = (lockToTry + 1) % NUM_XLOGINSERT_LOCKS;
                               1449                 :                :     }
 4766                          1450                 :       24611753 : }
                               1451                 :                : 
                               1452                 :                : /*
                               1453                 :                :  * Acquire all WAL insertion locks, to prevent other backends from inserting
                               1454                 :                :  * to WAL.
                               1455                 :                :  */
                               1456                 :                : static void
 4510                          1457                 :           4833 : WALInsertLockAcquireExclusive(void)
                               1458                 :                : {
                               1459                 :                :     int         i;
                               1460                 :                : 
                               1461                 :                :     /*
                               1462                 :                :      * When holding all the locks, all but the last lock's insertingAt
                               1463                 :                :      * indicator is set to 0xFFFFFFFFFFFFFFFF, which is higher than any real
                               1464                 :                :      * XLogRecPtr value, to make sure that no-one blocks waiting on those.
                               1465                 :                :      */
 4316                          1466         [ +  + ]:          38664 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS - 1; i++)
                               1467                 :                :     {
 4013 andres@anarazel.de       1468                 :          33831 :         LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               1469                 :          33831 :         LWLockUpdateVar(&WALInsertLocks[i].l.lock,
                               1470                 :          33831 :                         &WALInsertLocks[i].l.insertingAt,
                               1471                 :                :                         PG_UINT64_MAX);
                               1472                 :                :     }
                               1473                 :                :     /* Variable value reset to 0 at release */
                               1474                 :           4833 :     LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               1475                 :                : 
 4510 heikki.linnakangas@i     1476                 :           4833 :     holdingAllLocks = true;
 4766                          1477                 :           4833 : }
                               1478                 :                : 
                               1479                 :                : /*
                               1480                 :                :  * Release our insertion lock (or locks, if we're holding them all).
                               1481                 :                :  *
                               1482                 :                :  * NB: Reset all variables to 0, so they cause LWLockWaitForVar to block the
                               1483                 :                :  * next time the lock is acquired.
                               1484                 :                :  */
                               1485                 :                : static void
 4510                          1486                 :       24616586 : WALInsertLockRelease(void)
                               1487                 :                : {
                               1488         [ +  + ]:       24616586 :     if (holdingAllLocks)
                               1489                 :                :     {
                               1490                 :                :         int         i;
                               1491                 :                : 
 4316                          1492         [ +  + ]:          43497 :         for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
 4013 andres@anarazel.de       1493                 :          38664 :             LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
                               1494                 :          38664 :                                   &WALInsertLocks[i].l.insertingAt,
                               1495                 :                :                                   0);
                               1496                 :                : 
 4510 heikki.linnakangas@i     1497                 :           4833 :         holdingAllLocks = false;
                               1498                 :                :     }
                               1499                 :                :     else
                               1500                 :                :     {
 4013 andres@anarazel.de       1501                 :       24611753 :         LWLockReleaseClearVar(&WALInsertLocks[MyLockNo].l.lock,
                               1502                 :       24611753 :                               &WALInsertLocks[MyLockNo].l.insertingAt,
                               1503                 :                :                               0);
                               1504                 :                :     }
 4766 heikki.linnakangas@i     1505                 :       24616586 : }
                               1506                 :                : 
                               1507                 :                : /*
                               1508                 :                :  * Update our insertingAt value, to let others know that we've finished
                               1509                 :                :  * inserting up to that point.
                               1510                 :                :  */
                               1511                 :                : static void
 4510                          1512                 :        2677437 : WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt)
                               1513                 :                : {
                               1514         [ +  + ]:        2677437 :     if (holdingAllLocks)
                               1515                 :                :     {
                               1516                 :                :         /*
                               1517                 :                :          * We use the last lock to mark our actual position, see comments in
                               1518                 :                :          * WALInsertLockAcquireExclusive.
                               1519                 :                :          */
 4316                          1520                 :         778999 :         LWLockUpdateVar(&WALInsertLocks[NUM_XLOGINSERT_LOCKS - 1].l.lock,
 3322 tgl@sss.pgh.pa.us        1521                 :         778999 :                         &WALInsertLocks[NUM_XLOGINSERT_LOCKS - 1].l.insertingAt,
                               1522                 :                :                         insertingAt);
                               1523                 :                :     }
                               1524                 :                :     else
 4510 heikki.linnakangas@i     1525                 :        1898438 :         LWLockUpdateVar(&WALInsertLocks[MyLockNo].l.lock,
                               1526                 :        1898438 :                         &WALInsertLocks[MyLockNo].l.insertingAt,
                               1527                 :                :                         insertingAt);
 4766                          1528                 :        2677437 : }
                               1529                 :                : 
                               1530                 :                : /*
                               1531                 :                :  * Wait for any WAL insertions < upto to finish.
                               1532                 :                :  *
                               1533                 :                :  * Returns the location of the oldest insertion that is still in-progress.
                               1534                 :                :  * Any WAL prior to that point has been fully copied into WAL buffers, and
                               1535                 :                :  * can be flushed out to disk. Because this waits for any insertions older
                               1536                 :                :  * than 'upto' to finish, the return value is always >= 'upto'.
                               1537                 :                :  *
                               1538                 :                :  * Note: When you are about to write out WAL, you must call this function
                               1539                 :                :  * *before* acquiring WALWriteLock, to avoid deadlocks. This function might
                               1540                 :                :  * need to wait for an insertion to finish (or at least advance to next
                               1541                 :                :  * uninitialized page), and the inserter might need to evict an old WAL buffer
                               1542                 :                :  * to make room for a new one, which in turn requires WALWriteLock.
                               1543                 :                :  */
                               1544                 :                : static XLogRecPtr
                               1545                 :        2545093 : WaitXLogInsertionsToFinish(XLogRecPtr upto)
                               1546                 :                : {
                               1547                 :                :     uint64      bytepos;
                               1548                 :                :     XLogRecPtr  inserted;
                               1549                 :                :     XLogRecPtr  reservedUpto;
                               1550                 :                :     XLogRecPtr  finishedUpto;
 4325 andres@anarazel.de       1551                 :        2545093 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1552                 :                :     int         i;
                               1553                 :                : 
 4766 heikki.linnakangas@i     1554         [ -  + ]:        2545093 :     if (MyProc == NULL)
 4766 heikki.linnakangas@i     1555         [ #  # ]:UBC           0 :         elog(PANIC, "cannot wait without a PGPROC structure");
                               1556                 :                : 
                               1557                 :                :     /*
                               1558                 :                :      * Check if there's any work to do.  Use a barrier to ensure we get the
                               1559                 :                :      * freshest value.
                               1560                 :                :      */
  840 alvherre@alvh.no-ip.     1561                 :CBC     2545093 :     inserted = pg_atomic_read_membarrier_u64(&XLogCtl->logInsertResult);
                               1562         [ +  + ]:        2545093 :     if (upto <= inserted)
                               1563                 :        1994226 :         return inserted;
                               1564                 :                : 
                               1565                 :                :     /* Read the current insert position */
 4766 heikki.linnakangas@i     1566                 :         550867 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1567                 :         550867 :     bytepos = Insert->CurrBytePos;
                               1568                 :         550867 :     SpinLockRelease(&Insert->insertpos_lck);
                               1569                 :         550867 :     reservedUpto = XLogBytePosToEndRecPtr(bytepos);
                               1570                 :                : 
                               1571                 :                :     /*
                               1572                 :                :      * No-one should request to flush a piece of WAL that hasn't even been
                               1573                 :                :      * reserved yet. However, it can happen if there is a block with a bogus
                               1574                 :                :      * LSN on disk, for example. XLogFlush checks for that situation and
                               1575                 :                :      * complains, but only after the flush. Here we just assume that to mean
                               1576                 :                :      * that all WAL that has been reserved needs to be finished. In this
                               1577                 :                :      * corner-case, the return value can be smaller than 'upto' argument.
                               1578                 :                :      */
                               1579         [ -  + ]:         550867 :     if (upto > reservedUpto)
                               1580                 :                :     {
 2060 peter@eisentraut.org     1581         [ #  # ]:UBC           0 :         ereport(LOG,
                               1582                 :                :                 errmsg("request to flush past end of generated WAL; request %X/%08X, current position %X/%08X",
                               1583                 :                :                        LSN_FORMAT_ARGS(upto), LSN_FORMAT_ARGS(reservedUpto)));
 4766 heikki.linnakangas@i     1584                 :              0 :         upto = reservedUpto;
                               1585                 :                :     }
                               1586                 :                : 
                               1587                 :                :     /*
                               1588                 :                :      * Loop through all the locks, sleeping on any in-progress insert older
                               1589                 :                :      * than 'upto'.
                               1590                 :                :      *
                               1591                 :                :      * finishedUpto is our return value, indicating the point upto which all
                               1592                 :                :      * the WAL insertions have been finished. Initialize it to the head of
                               1593                 :                :      * reserved WAL, and as we iterate through the insertion locks, back it
                               1594                 :                :      * out for any insertion that's still in progress.
                               1595                 :                :      */
 4766 heikki.linnakangas@i     1596                 :CBC      550867 :     finishedUpto = reservedUpto;
 4316                          1597         [ +  + ]:        4957803 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               1598                 :                :     {
 4464 bruce@momjian.us         1599                 :        4406936 :         XLogRecPtr  insertingat = InvalidXLogRecPtr;
                               1600                 :                : 
                               1601                 :                :         do
                               1602                 :                :         {
                               1603                 :                :             /*
                               1604                 :                :              * See if this insertion is in progress.  LWLockWaitForVar will
                               1605                 :                :              * wait for the lock to be released, or for the 'value' to be set
                               1606                 :                :              * by a LWLockUpdateVar call.  When a lock is initially acquired,
                               1607                 :                :              * its value is 0 (InvalidXLogRecPtr), which means that we don't
                               1608                 :                :              * know where it's inserting yet.  We will have to wait for it. If
                               1609                 :                :              * it's a small insertion, the record will most likely fit on the
                               1610                 :                :              * same page and the inserter will release the lock without ever
                               1611                 :                :              * calling LWLockUpdateVar.  But if it has to sleep, it will
                               1612                 :                :              * advertise the insertion point with LWLockUpdateVar before
                               1613                 :                :              * sleeping.
                               1614                 :                :              *
                               1615                 :                :              * In this loop we are only waiting for insertions that started
                               1616                 :                :              * before WaitXLogInsertionsToFinish was called.  The lack of
                               1617                 :                :              * memory barriers in the loop means that we might see locks as
                               1618                 :                :              * "unused" that have since become used.  This is fine because
                               1619                 :                :              * they only can be used for later insertions that we would not
                               1620                 :                :              * want to wait on anyway.  Not taking a lock to acquire the
                               1621                 :                :              * current insertingAt value means that we might see older
                               1622                 :                :              * insertingAt values.  This is also fine, because if we read a
                               1623                 :                :              * value too old, we will add ourselves to the wait queue, which
                               1624                 :                :              * contains atomic operations.
                               1625                 :                :              */
 4510 heikki.linnakangas@i     1626         [ +  + ]:        4551304 :             if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
                               1627                 :        4551304 :                                  &WALInsertLocks[i].l.insertingAt,
                               1628                 :                :                                  insertingat, &insertingat))
                               1629                 :                :             {
                               1630                 :                :                 /* the lock was free, so no insertion in progress */
                               1631                 :        3101075 :                 insertingat = InvalidXLogRecPtr;
                               1632                 :        3101075 :                 break;
                               1633                 :                :             }
                               1634                 :                : 
                               1635                 :                :             /*
                               1636                 :                :              * This insertion is still in progress. Have to wait, unless the
                               1637                 :                :              * inserter has proceeded past 'upto'.
                               1638                 :                :              */
                               1639         [ +  + ]:        1450229 :         } while (insertingat < upto);
                               1640                 :                : 
  262 alvherre@kurilemu.de     1641   [ +  +  +  + ]:        4406936 :         if (XLogRecPtrIsValid(insertingat) && insertingat < finishedUpto)
 4510 heikki.linnakangas@i     1642                 :         467598 :             finishedUpto = insertingat;
                               1643                 :                :     }
                               1644                 :                : 
                               1645                 :                :     /*
                               1646                 :                :      * Advance the limit we know to have been inserted and return the freshest
                               1647                 :                :      * value we know of, which might be beyond what we requested if somebody
                               1648                 :                :      * is concurrently doing this with an 'upto' pointer ahead of us.
                               1649                 :                :      */
  840 alvherre@alvh.no-ip.     1650                 :         550867 :     finishedUpto = pg_atomic_monotonic_advance_u64(&XLogCtl->logInsertResult,
                               1651                 :                :                                                    finishedUpto);
                               1652                 :                : 
 4766 heikki.linnakangas@i     1653                 :         550867 :     return finishedUpto;
                               1654                 :                : }
                               1655                 :                : 
                               1656                 :                : /*
                               1657                 :                :  * Get a pointer to the right location in the WAL buffer containing the
                               1658                 :                :  * given XLogRecPtr.
                               1659                 :                :  *
                               1660                 :                :  * If the page is not initialized yet, it is initialized. That might require
                               1661                 :                :  * evicting an old dirty buffer from the buffer cache, which means I/O.
                               1662                 :                :  *
                               1663                 :                :  * The caller must ensure that the page containing the requested location
                               1664                 :                :  * isn't evicted yet, and won't be evicted. The way to ensure that is to
                               1665                 :                :  * hold onto a WAL insertion lock with the insertingAt position set to
                               1666                 :                :  * something <= ptr. GetXLogBuffer() will update insertingAt if it needs
                               1667                 :                :  * to evict an old page from the buffer. (This means that once you call
                               1668                 :                :  * GetXLogBuffer() with a given 'ptr', you must not access anything before
                               1669                 :                :  * that point anymore, and must not call GetXLogBuffer() with an older 'ptr'
                               1670                 :                :  * later, because older buffers might be recycled already)
                               1671                 :                :  */
                               1672                 :                : static char *
 1724 rhaas@postgresql.org     1673                 :       27409731 : GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
                               1674                 :                : {
                               1675                 :                :     int         idx;
                               1676                 :                :     XLogRecPtr  endptr;
                               1677                 :                :     static uint64 cachedPage = 0;
                               1678                 :                :     static char *cachedPos = NULL;
                               1679                 :                :     XLogRecPtr  expectedEndPtr;
                               1680                 :                : 
                               1681                 :                :     /*
                               1682                 :                :      * Fast path for the common case that we need to access again the same
                               1683                 :                :      * page as last time.
                               1684                 :                :      */
 4766 heikki.linnakangas@i     1685         [ +  + ]:       27409731 :     if (ptr / XLOG_BLCKSZ == cachedPage)
                               1686                 :                :     {
                               1687         [ -  + ]:       24009941 :         Assert(((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC);
                               1688         [ -  + ]:       24009941 :         Assert(((XLogPageHeader) cachedPos)->xlp_pageaddr == ptr - (ptr % XLOG_BLCKSZ));
                               1689                 :       24009941 :         return cachedPos + ptr % XLOG_BLCKSZ;
                               1690                 :                :     }
                               1691                 :                : 
                               1692                 :                :     /*
                               1693                 :                :      * The XLog buffer cache is organized so that a page is always loaded to a
                               1694                 :                :      * particular buffer.  That way we can easily calculate the buffer a given
                               1695                 :                :      * page must be loaded into, from the XLogRecPtr alone.
                               1696                 :                :      */
                               1697                 :        3399790 :     idx = XLogRecPtrToBufIdx(ptr);
                               1698                 :                : 
                               1699                 :                :     /*
                               1700                 :                :      * See what page is loaded in the buffer at the moment. It could be the
                               1701                 :                :      * page we're looking for, or something older. It can't be anything newer
                               1702                 :                :      * - that would imply the page we're looking for has already been written
                               1703                 :                :      * out to disk and evicted, and the caller is responsible for making sure
                               1704                 :                :      * that doesn't happen.
                               1705                 :                :      *
                               1706                 :                :      * We don't hold a lock while we read the value. If someone is just about
                               1707                 :                :      * to initialize or has just initialized the page, it's possible that we
                               1708                 :                :      * get InvalidXLogRecPtr. That's ok, we'll grab the mapping lock (in
                               1709                 :                :      * AdvanceXLInsertBuffer) and retry if we see anything other than the page
                               1710                 :                :      * we're looking for.
                               1711                 :                :      */
                               1712                 :        3399790 :     expectedEndPtr = ptr;
                               1713                 :        3399790 :     expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;
                               1714                 :                : 
  950 jdavis@postgresql.or     1715                 :        3399790 :     endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
 4766 heikki.linnakangas@i     1716         [ +  + ]:        3399790 :     if (expectedEndPtr != endptr)
                               1717                 :                :     {
                               1718                 :                :         XLogRecPtr  initializedUpto;
                               1719                 :                : 
                               1720                 :                :         /*
                               1721                 :                :          * Before calling AdvanceXLInsertBuffer(), which can block, let others
                               1722                 :                :          * know how far we're finished with inserting the record.
                               1723                 :                :          *
                               1724                 :                :          * NB: If 'ptr' points to just after the page header, advertise a
                               1725                 :                :          * position at the beginning of the page rather than 'ptr' itself. If
                               1726                 :                :          * there are no other insertions running, someone might try to flush
                               1727                 :                :          * up to our advertised location. If we advertised a position after
                               1728                 :                :          * the page header, someone might try to flush the page header, even
                               1729                 :                :          * though page might actually not be initialized yet. As the first
                               1730                 :                :          * inserter on the page, we are effectively responsible for making
                               1731                 :                :          * sure that it's initialized, before we let insertingAt to move past
                               1732                 :                :          * the page header.
                               1733                 :                :          */
 4011                          1734         [ +  + ]:        2677437 :         if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
 3232 andres@anarazel.de       1735         [ +  - ]:          10196 :             XLogSegmentOffset(ptr, wal_segment_size) > XLOG_BLCKSZ)
 4011 heikki.linnakangas@i     1736                 :          10196 :             initializedUpto = ptr - SizeOfXLogShortPHD;
                               1737         [ +  + ]:        2667241 :         else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
 3232 andres@anarazel.de       1738         [ +  + ]:            920 :                  XLogSegmentOffset(ptr, wal_segment_size) < XLOG_BLCKSZ)
 4011 heikki.linnakangas@i     1739                 :            637 :             initializedUpto = ptr - SizeOfXLogLongPHD;
                               1740                 :                :         else
                               1741                 :        2666604 :             initializedUpto = ptr;
                               1742                 :                : 
                               1743                 :        2677437 :         WALInsertLockUpdateInsertingAt(initializedUpto);
                               1744                 :                : 
 1724 rhaas@postgresql.org     1745                 :        2677437 :         AdvanceXLInsertBuffer(ptr, tli, false);
  950 jdavis@postgresql.or     1746                 :        2677437 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1747                 :                : 
 4766 heikki.linnakangas@i     1748         [ -  + ]:        2677437 :         if (expectedEndPtr != endptr)
  384 alvherre@kurilemu.de     1749         [ #  # ]:UBC           0 :             elog(PANIC, "could not find WAL buffer for %X/%08X",
                               1750                 :                :                  LSN_FORMAT_ARGS(ptr));
                               1751                 :                :     }
                               1752                 :                :     else
                               1753                 :                :     {
                               1754                 :                :         /*
                               1755                 :                :          * Make sure the initialization of the page is visible to us, and
                               1756                 :                :          * won't arrive later to overwrite the WAL data we write on the page.
                               1757                 :                :          */
 4766 heikki.linnakangas@i     1758                 :CBC      722353 :         pg_memory_barrier();
                               1759                 :                :     }
                               1760                 :                : 
                               1761                 :                :     /*
                               1762                 :                :      * Found the buffer holding this page. Return a pointer to the right
                               1763                 :                :      * offset within the page.
                               1764                 :                :      */
                               1765                 :        3399790 :     cachedPage = ptr / XLOG_BLCKSZ;
                               1766                 :        3399790 :     cachedPos = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
                               1767                 :                : 
                               1768         [ -  + ]:        3399790 :     Assert(((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC);
                               1769         [ -  + ]:        3399790 :     Assert(((XLogPageHeader) cachedPos)->xlp_pageaddr == ptr - (ptr % XLOG_BLCKSZ));
                               1770                 :                : 
                               1771                 :        3399790 :     return cachedPos + ptr % XLOG_BLCKSZ;
                               1772                 :                : }
                               1773                 :                : 
                               1774                 :                : /*
                               1775                 :                :  * Read WAL data directly from WAL buffers, if available. Returns the number
                               1776                 :                :  * of bytes read successfully.
                               1777                 :                :  *
                               1778                 :                :  * Fewer than 'count' bytes may be read if some of the requested WAL data has
                               1779                 :                :  * already been evicted.
                               1780                 :                :  *
                               1781                 :                :  * No locks are taken.
                               1782                 :                :  *
                               1783                 :                :  * Caller should ensure that it reads no further than LogwrtResult.Write
                               1784                 :                :  * (which should have been updated by the caller when determining how far to
                               1785                 :                :  * read). The 'tli' argument is only used as a convenient safety check so that
                               1786                 :                :  * callers do not read from WAL buffers on a historical timeline.
                               1787                 :                :  */
                               1788                 :                : Size
  895 jdavis@postgresql.or     1789                 :         112124 : WALReadFromBuffers(char *dstbuf, XLogRecPtr startptr, Size count,
                               1790                 :                :                    TimeLineID tli)
                               1791                 :                : {
                               1792                 :         112124 :     char       *pdst = dstbuf;
                               1793                 :         112124 :     XLogRecPtr  recptr = startptr;
                               1794                 :                :     XLogRecPtr  inserted;
  891                          1795                 :         112124 :     Size        nbytes = count;
                               1796                 :                : 
  895                          1797   [ +  +  +  + ]:         112124 :     if (RecoveryInProgress() || tli != GetWALInsertionTimeLine())
                               1798                 :           1100 :         return 0;
                               1799                 :                : 
  262 alvherre@kurilemu.de     1800         [ -  + ]:         111024 :     Assert(XLogRecPtrIsValid(startptr));
                               1801                 :                : 
                               1802                 :                :     /*
                               1803                 :                :      * Caller should ensure that the requested data has been inserted into WAL
                               1804                 :                :      * buffers before we try to read it.
                               1805                 :                :      */
  840 alvherre@alvh.no-ip.     1806                 :         111024 :     inserted = pg_atomic_read_u64(&XLogCtl->logInsertResult);
                               1807         [ -  + ]:         111024 :     if (startptr + count > inserted)
  840 alvherre@alvh.no-ip.     1808         [ #  # ]:UBC           0 :         ereport(ERROR,
                               1809                 :                :                 errmsg("cannot read past end of generated WAL: requested %X/%08X, current position %X/%08X",
                               1810                 :                :                        LSN_FORMAT_ARGS(startptr + count),
                               1811                 :                :                        LSN_FORMAT_ARGS(inserted)));
                               1812                 :                : 
                               1813                 :                :     /*
                               1814                 :                :      * Loop through the buffers without a lock. For each buffer, atomically
                               1815                 :                :      * read and verify the end pointer, then copy the data out, and finally
                               1816                 :                :      * re-read and re-verify the end pointer.
                               1817                 :                :      *
                               1818                 :                :      * Once a page is evicted, it never returns to the WAL buffers, so if the
                               1819                 :                :      * end pointer matches the expected end pointer before and after we copy
                               1820                 :                :      * the data, then the right page must have been present during the data
                               1821                 :                :      * copy. Read barriers are necessary to ensure that the data copy actually
                               1822                 :                :      * happens between the two verification steps.
                               1823                 :                :      *
                               1824                 :                :      * If either verification fails, we simply terminate the loop and return
                               1825                 :                :      * with the data that had been already copied out successfully.
                               1826                 :                :      */
  895 jdavis@postgresql.or     1827         [ +  + ]:CBC      146087 :     while (nbytes > 0)
                               1828                 :                :     {
                               1829                 :         136202 :         uint32      offset = recptr % XLOG_BLCKSZ;
                               1830                 :         136202 :         int         idx = XLogRecPtrToBufIdx(recptr);
                               1831                 :                :         XLogRecPtr  expectedEndPtr;
                               1832                 :                :         XLogRecPtr  endptr;
                               1833                 :                :         const char *page;
                               1834                 :                :         const char *psrc;
                               1835                 :                :         Size        npagebytes;
                               1836                 :                : 
                               1837                 :                :         /*
                               1838                 :                :          * Calculate the end pointer we expect in the xlblocks array if the
                               1839                 :                :          * correct page is present.
                               1840                 :                :          */
                               1841                 :         136202 :         expectedEndPtr = recptr + (XLOG_BLCKSZ - offset);
                               1842                 :                : 
                               1843                 :                :         /*
                               1844                 :                :          * First verification step: check that the correct page is present in
                               1845                 :                :          * the WAL buffers.
                               1846                 :                :          */
                               1847                 :         136202 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1848         [ +  + ]:         136202 :         if (expectedEndPtr != endptr)
                               1849                 :         101139 :             break;
                               1850                 :                : 
                               1851                 :                :         /*
                               1852                 :                :          * The correct page is present (or was at the time the endptr was
                               1853                 :                :          * read; must re-verify later). Calculate pointer to source data and
                               1854                 :                :          * determine how much data to read from this page.
                               1855                 :                :          */
                               1856                 :          35063 :         page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
                               1857                 :          35063 :         psrc = page + offset;
                               1858                 :          35063 :         npagebytes = Min(nbytes, XLOG_BLCKSZ - offset);
                               1859                 :                : 
                               1860                 :                :         /*
                               1861                 :                :          * Ensure that the data copy and the first verification step are not
                               1862                 :                :          * reordered.
                               1863                 :                :          */
                               1864                 :          35063 :         pg_read_barrier();
                               1865                 :                : 
                               1866                 :                :         /* data copy */
                               1867                 :          35063 :         memcpy(pdst, psrc, npagebytes);
                               1868                 :                : 
                               1869                 :                :         /*
                               1870                 :                :          * Ensure that the data copy and the second verification step are not
                               1871                 :                :          * reordered.
                               1872                 :                :          */
                               1873                 :          35063 :         pg_read_barrier();
                               1874                 :                : 
                               1875                 :                :         /*
                               1876                 :                :          * Second verification step: check that the page we read from wasn't
                               1877                 :                :          * evicted while we were copying the data.
                               1878                 :                :          */
                               1879                 :          35063 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1880         [ -  + ]:          35063 :         if (expectedEndPtr != endptr)
  895 jdavis@postgresql.or     1881                 :LBC         (2) :             break;
                               1882                 :                : 
  895 jdavis@postgresql.or     1883                 :CBC       35063 :         pdst += npagebytes;
                               1884                 :          35063 :         recptr += npagebytes;
                               1885                 :          35063 :         nbytes -= npagebytes;
                               1886                 :                :     }
                               1887                 :                : 
                               1888         [ -  + ]:         111024 :     Assert(pdst - dstbuf <= count);
                               1889                 :                : 
                               1890                 :         111024 :     return pdst - dstbuf;
                               1891                 :                : }
                               1892                 :                : 
                               1893                 :                : /*
                               1894                 :                :  * Converts a "usable byte position" to XLogRecPtr. A usable byte position
                               1895                 :                :  * is the position starting from the beginning of WAL, excluding all WAL
                               1896                 :                :  * page headers.
                               1897                 :                :  */
                               1898                 :                : static XLogRecPtr
 4766 heikki.linnakangas@i     1899                 :       49211043 : XLogBytePosToRecPtr(uint64 bytepos)
                               1900                 :                : {
                               1901                 :                :     uint64      fullsegs;
                               1902                 :                :     uint64      fullpages;
                               1903                 :                :     uint64      bytesleft;
                               1904                 :                :     uint32      seg_offset;
                               1905                 :                :     XLogRecPtr  result;
                               1906                 :                : 
                               1907                 :       49211043 :     fullsegs = bytepos / UsableBytesInSegment;
                               1908                 :       49211043 :     bytesleft = bytepos % UsableBytesInSegment;
                               1909                 :                : 
                               1910         [ +  + ]:       49211043 :     if (bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD)
                               1911                 :                :     {
                               1912                 :                :         /* fits on first page of segment */
                               1913                 :          70764 :         seg_offset = bytesleft + SizeOfXLogLongPHD;
                               1914                 :                :     }
                               1915                 :                :     else
                               1916                 :                :     {
                               1917                 :                :         /* account for the first page on segment with long header */
                               1918                 :       49140279 :         seg_offset = XLOG_BLCKSZ;
                               1919                 :       49140279 :         bytesleft -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
                               1920                 :                : 
                               1921                 :       49140279 :         fullpages = bytesleft / UsableBytesInPage;
                               1922                 :       49140279 :         bytesleft = bytesleft % UsableBytesInPage;
                               1923                 :                : 
                               1924                 :       49140279 :         seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
                               1925                 :                :     }
                               1926                 :                : 
 2939 alvherre@alvh.no-ip.     1927                 :       49211043 :     XLogSegNoOffsetToRecPtr(fullsegs, seg_offset, wal_segment_size, result);
                               1928                 :                : 
 4766 heikki.linnakangas@i     1929                 :       49211043 :     return result;
                               1930                 :                : }
                               1931                 :                : 
                               1932                 :                : /*
                               1933                 :                :  * Like XLogBytePosToRecPtr, but if the position is at a page boundary,
                               1934                 :                :  * returns a pointer to the beginning of the page (ie. before page header),
                               1935                 :                :  * not to where the first xlog record on that page would go to. This is used
                               1936                 :                :  * when converting a pointer to the end of a record.
                               1937                 :                :  */
                               1938                 :                : static XLogRecPtr
                               1939                 :       25157361 : XLogBytePosToEndRecPtr(uint64 bytepos)
                               1940                 :                : {
                               1941                 :                :     uint64      fullsegs;
                               1942                 :                :     uint64      fullpages;
                               1943                 :                :     uint64      bytesleft;
                               1944                 :                :     uint32      seg_offset;
                               1945                 :                :     XLogRecPtr  result;
                               1946                 :                : 
                               1947                 :       25157361 :     fullsegs = bytepos / UsableBytesInSegment;
                               1948                 :       25157361 :     bytesleft = bytepos % UsableBytesInSegment;
                               1949                 :                : 
                               1950         [ +  + ]:       25157361 :     if (bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD)
                               1951                 :                :     {
                               1952                 :                :         /* fits on first page of segment */
                               1953         [ +  + ]:         112304 :         if (bytesleft == 0)
                               1954                 :          75454 :             seg_offset = 0;
                               1955                 :                :         else
                               1956                 :          36850 :             seg_offset = bytesleft + SizeOfXLogLongPHD;
                               1957                 :                :     }
                               1958                 :                :     else
                               1959                 :                :     {
                               1960                 :                :         /* account for the first page on segment with long header */
                               1961                 :       25045057 :         seg_offset = XLOG_BLCKSZ;
                               1962                 :       25045057 :         bytesleft -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
                               1963                 :                : 
                               1964                 :       25045057 :         fullpages = bytesleft / UsableBytesInPage;
                               1965                 :       25045057 :         bytesleft = bytesleft % UsableBytesInPage;
                               1966                 :                : 
                               1967         [ +  + ]:       25045057 :         if (bytesleft == 0)
                               1968                 :          24724 :             seg_offset += fullpages * XLOG_BLCKSZ + bytesleft;
                               1969                 :                :         else
                               1970                 :       25020333 :             seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
                               1971                 :                :     }
                               1972                 :                : 
 2939 alvherre@alvh.no-ip.     1973                 :       25157361 :     XLogSegNoOffsetToRecPtr(fullsegs, seg_offset, wal_segment_size, result);
                               1974                 :                : 
 4766 heikki.linnakangas@i     1975                 :       25157361 :     return result;
                               1976                 :                : }
                               1977                 :                : 
                               1978                 :                : /*
                               1979                 :                :  * Convert an XLogRecPtr to a "usable byte position".
                               1980                 :                :  */
                               1981                 :                : static uint64
                               1982                 :       73815036 : XLogRecPtrToBytePos(XLogRecPtr ptr)
                               1983                 :                : {
                               1984                 :                :     uint64      fullsegs;
                               1985                 :                :     uint32      fullpages;
                               1986                 :                :     uint32      offset;
                               1987                 :                :     uint64      result;
                               1988                 :                : 
 3232 andres@anarazel.de       1989                 :       73815036 :     XLByteToSeg(ptr, fullsegs, wal_segment_size);
                               1990                 :                : 
                               1991                 :       73815036 :     fullpages = (XLogSegmentOffset(ptr, wal_segment_size)) / XLOG_BLCKSZ;
 4766 heikki.linnakangas@i     1992                 :       73815036 :     offset = ptr % XLOG_BLCKSZ;
                               1993                 :                : 
                               1994         [ +  + ]:       73815036 :     if (fullpages == 0)
                               1995                 :                :     {
                               1996                 :         106993 :         result = fullsegs * UsableBytesInSegment;
                               1997         [ +  + ]:         106993 :         if (offset > 0)
                               1998                 :                :         {
                               1999         [ -  + ]:         105395 :             Assert(offset >= SizeOfXLogLongPHD);
                               2000                 :         105395 :             result += offset - SizeOfXLogLongPHD;
                               2001                 :                :         }
                               2002                 :                :     }
                               2003                 :                :     else
                               2004                 :                :     {
                               2005                 :       73708043 :         result = fullsegs * UsableBytesInSegment +
 4464 bruce@momjian.us         2006                 :       73708043 :             (XLOG_BLCKSZ - SizeOfXLogLongPHD) + /* account for first page */
 3322 tgl@sss.pgh.pa.us        2007                 :       73708043 :             (fullpages - 1) * UsableBytesInPage;    /* full pages */
 4766 heikki.linnakangas@i     2008         [ +  + ]:       73708043 :         if (offset > 0)
                               2009                 :                :         {
                               2010         [ -  + ]:       73683657 :             Assert(offset >= SizeOfXLogShortPHD);
                               2011                 :       73683657 :             result += offset - SizeOfXLogShortPHD;
                               2012                 :                :         }
                               2013                 :                :     }
                               2014                 :                : 
                               2015                 :       73815036 :     return result;
                               2016                 :                : }
                               2017                 :                : 
                               2018                 :                : /*
                               2019                 :                :  * Initialize XLOG buffers, writing out old buffers if they still contain
                               2020                 :                :  * unwritten data, upto the page containing 'upto'. Or if 'opportunistic' is
                               2021                 :                :  * true, initialize as many pages as we can without having to write out
                               2022                 :                :  * unwritten data. Any new pages are initialized to zeros, with pages headers
                               2023                 :                :  * initialized properly.
                               2024                 :                :  */
                               2025                 :                : static void
 1724 rhaas@postgresql.org     2026                 :        2684771 : AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
                               2027                 :                : {
                               2028                 :                :     int         nextidx;
                               2029                 :                :     XLogRecPtr  OldPageRqstPtr;
                               2030                 :                :     XLogwrtRqst WriteRqst;
 4766 heikki.linnakangas@i     2031                 :        2684771 :     XLogRecPtr  NewPageEndPtr = InvalidXLogRecPtr;
                               2032                 :                :     XLogRecPtr  NewPageBeginPtr;
                               2033                 :                :     XLogPageHeader NewPage;
 1405 tgl@sss.pgh.pa.us        2034                 :        2684771 :     int         npages pg_attribute_unused() = 0;
                               2035                 :                : 
  338 akorotkov@postgresql     2036                 :        2684771 :     LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE);
                               2037                 :                : 
                               2038                 :                :     /*
                               2039                 :                :      * Now that we have the lock, check if someone initialized the page
                               2040                 :                :      * already.
                               2041                 :                :      */
                               2042   [ +  +  +  + ]:        7910034 :     while (upto >= XLogCtl->InitializedUpTo || opportunistic)
                               2043                 :                :     {
                               2044                 :        5232597 :         nextidx = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo);
                               2045                 :                : 
                               2046                 :                :         /*
                               2047                 :                :          * Get ending-offset of the buffer page we need to replace (this may
                               2048                 :                :          * be zero if the buffer hasn't been used yet).  Fall through if it's
                               2049                 :                :          * already written out.
                               2050                 :                :          */
                               2051                 :        5232597 :         OldPageRqstPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]);
                               2052         [ +  + ]:        5232597 :         if (LogwrtResult.Write < OldPageRqstPtr)
                               2053                 :                :         {
                               2054                 :                :             /*
                               2055                 :                :              * Nope, got work to do. If we just want to pre-initialize as much
                               2056                 :                :              * as we can without flushing, give up now.
                               2057                 :                :              */
                               2058         [ +  + ]:        2400793 :             if (opportunistic)
                               2059                 :           7334 :                 break;
                               2060                 :                : 
                               2061                 :                :             /* Advance shared memory write request position */
 4325 andres@anarazel.de       2062                 :        2393459 :             SpinLockAcquire(&XLogCtl->info_lck);
                               2063         [ +  + ]:        2393459 :             if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
                               2064                 :         731567 :                 XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
                               2065                 :        2393459 :             SpinLockRelease(&XLogCtl->info_lck);
                               2066                 :                : 
                               2067                 :                :             /*
                               2068                 :                :              * Acquire an up-to-date LogwrtResult value and see if we still
                               2069                 :                :              * need to write it or if someone else already did.
                               2070                 :                :              */
  842 alvherre@alvh.no-ip.     2071                 :        2393459 :             RefreshXLogWriteResult(LogwrtResult);
 4766 heikki.linnakangas@i     2072         [ +  + ]:        2393459 :             if (LogwrtResult.Write < OldPageRqstPtr)
                               2073                 :                :             {
                               2074                 :                :                 /*
                               2075                 :                :                  * Must acquire write lock. Release WALBufMappingLock first,
                               2076                 :                :                  * to make sure that all insertions that we need to wait for
                               2077                 :                :                  * can finish (up to this same position). Otherwise we risk
                               2078                 :                :                  * deadlock.
                               2079                 :                :                  */
  338 akorotkov@postgresql     2080                 :        2367889 :                 LWLockRelease(WALBufMappingLock);
                               2081                 :                : 
 4766 heikki.linnakangas@i     2082                 :        2367889 :                 WaitXLogInsertionsToFinish(OldPageRqstPtr);
                               2083                 :                : 
                               2084                 :        2367889 :                 LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
                               2085                 :                : 
  844 alvherre@alvh.no-ip.     2086                 :        2367889 :                 RefreshXLogWriteResult(LogwrtResult);
 4766 heikki.linnakangas@i     2087         [ +  + ]:        2367889 :                 if (LogwrtResult.Write >= OldPageRqstPtr)
                               2088                 :                :                 {
                               2089                 :                :                     /* OK, someone wrote it already */
                               2090                 :         151618 :                     LWLockRelease(WALWriteLock);
                               2091                 :                :                 }
                               2092                 :                :                 else
                               2093                 :                :                 {
                               2094                 :                :                     /* Have to write it ourselves */
                               2095                 :                :                     TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START();
                               2096                 :        2216271 :                     WriteRqst.Write = OldPageRqstPtr;
  178 alvherre@kurilemu.de     2097                 :        2216271 :                     WriteRqst.Flush = InvalidXLogRecPtr;
 1724 rhaas@postgresql.org     2098                 :        2216271 :                     XLogWrite(WriteRqst, tli, false);
 4766 heikki.linnakangas@i     2099                 :        2216271 :                     LWLockRelease(WALWriteLock);
  524 michael@paquier.xyz      2100                 :        2216271 :                     pgWalUsage.wal_buffers_full++;
                               2101                 :                :                     TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
                               2102                 :                : 
                               2103                 :                :                     /*
                               2104                 :                :                      * Required for the flush of pending stats WAL data, per
                               2105                 :                :                      * update of pgWalUsage.
                               2106                 :                :                      */
  363                          2107                 :        2216271 :                     pgstat_report_fixed = true;
                               2108                 :                :                 }
                               2109                 :                :                 /* Re-acquire WALBufMappingLock and retry */
  338 akorotkov@postgresql     2110                 :        2367889 :                 LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE);
                               2111                 :        2367889 :                 continue;
                               2112                 :                :             }
                               2113                 :                :         }
                               2114                 :                : 
                               2115                 :                :         /*
                               2116                 :                :          * Now the next buffer slot is free and we can set it up to be the
                               2117                 :                :          * next output page.
                               2118                 :                :          */
                               2119                 :        2857374 :         NewPageBeginPtr = XLogCtl->InitializedUpTo;
 4766 heikki.linnakangas@i     2120                 :        2857374 :         NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ;
                               2121                 :                : 
  338 akorotkov@postgresql     2122         [ -  + ]:        2857374 :         Assert(XLogRecPtrToBufIdx(NewPageBeginPtr) == nextidx);
                               2123                 :                : 
 4766 heikki.linnakangas@i     2124                 :        2857374 :         NewPage = (XLogPageHeader) (XLogCtl->pages + nextidx * (Size) XLOG_BLCKSZ);
                               2125                 :                : 
                               2126                 :                :         /*
                               2127                 :                :          * Mark the xlblock with InvalidXLogRecPtr and issue a write barrier
                               2128                 :                :          * before initializing. Otherwise, the old page may be partially
                               2129                 :                :          * zeroed but look valid.
                               2130                 :                :          */
  950 jdavis@postgresql.or     2131                 :        2857374 :         pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr);
                               2132                 :        2857374 :         pg_write_barrier();
                               2133                 :                : 
                               2134                 :                :         /*
                               2135                 :                :          * Be sure to re-zero the buffer so that bytes beyond what we've
                               2136                 :                :          * written will look like zeroes and not valid XLOG records...
                               2137                 :                :          */
  529 peter@eisentraut.org     2138   [ +  -  +  -  :        2857374 :         MemSet(NewPage, 0, XLOG_BLCKSZ);
                                     +  -  -  +  -  
                                                 - ]
                               2139                 :                : 
                               2140                 :                :         /*
                               2141                 :                :          * Fill the new page's header
                               2142                 :                :          */
 4082 bruce@momjian.us         2143                 :        2857374 :         NewPage->xlp_magic = XLOG_PAGE_MAGIC;
                               2144                 :                : 
                               2145                 :                :         /* NewPage->xlp_info = 0; */ /* done by memset */
 1724 rhaas@postgresql.org     2146                 :        2857374 :         NewPage->xlp_tli = tli;
 4082 bruce@momjian.us         2147                 :        2857374 :         NewPage->xlp_pageaddr = NewPageBeginPtr;
                               2148                 :                : 
                               2149                 :                :         /* NewPage->xlp_rem_len = 0; */  /* done by memset */
                               2150                 :                : 
                               2151                 :                :         /*
                               2152                 :                :          * If first page of an XLOG segment file, make it a long header.
                               2153                 :                :          */
 3232 andres@anarazel.de       2154         [ +  + ]:        2857374 :         if ((XLogSegmentOffset(NewPage->xlp_pageaddr, wal_segment_size)) == 0)
                               2155                 :                :         {
 4766 heikki.linnakangas@i     2156                 :           1970 :             XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;
                               2157                 :                : 
                               2158                 :           1970 :             NewLongPage->xlp_sysid = ControlFile->system_identifier;
 3232 andres@anarazel.de       2159                 :           1970 :             NewLongPage->xlp_seg_size = wal_segment_size;
 4766 heikki.linnakangas@i     2160                 :           1970 :             NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
 4082 bruce@momjian.us         2161                 :           1970 :             NewPage->xlp_info |= XLP_LONG_HEADER;
                               2162                 :                :         }
                               2163                 :                : 
                               2164                 :                :         /*
                               2165                 :                :          * Make sure the initialization of the page becomes visible to others
                               2166                 :                :          * before the xlblocks update. GetXLogBuffer() reads xlblocks without
                               2167                 :                :          * holding a lock.
                               2168                 :                :          */
 4766 heikki.linnakangas@i     2169                 :        2857374 :         pg_write_barrier();
                               2170                 :                : 
  950 jdavis@postgresql.or     2171                 :        2857374 :         pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr);
  338 akorotkov@postgresql     2172                 :        2857374 :         XLogCtl->InitializedUpTo = NewPageEndPtr;
                               2173                 :                : 
 4766 heikki.linnakangas@i     2174                 :        2857374 :         npages++;
                               2175                 :                :     }
  338 akorotkov@postgresql     2176                 :        2684771 :     LWLockRelease(WALBufMappingLock);
                               2177                 :                : 
                               2178                 :                : #ifdef WAL_DEBUG
                               2179                 :                :     if (XLOG_DEBUG && npages > 0)
                               2180                 :                :     {
                               2181                 :                :         elog(DEBUG1, "initialized %d pages, up to %X/%08X",
                               2182                 :                :              npages, LSN_FORMAT_ARGS(NewPageEndPtr));
                               2183                 :                :     }
                               2184                 :                : #endif
 9799 vadim4o@yahoo.com        2185                 :        2684771 : }
                               2186                 :                : 
                               2187                 :                : /*
                               2188                 :                :  * Calculate CheckPointSegments based on max_wal_size_mb and
                               2189                 :                :  * checkpoint_completion_target.
                               2190                 :                :  */
                               2191                 :                : static void
 4171 heikki.linnakangas@i     2192                 :           9639 : CalculateCheckpointSegments(void)
                               2193                 :                : {
                               2194                 :                :     double      target;
                               2195                 :                : 
                               2196                 :                :     /*-------
                               2197                 :                :      * Calculate the distance at which to trigger a checkpoint, to avoid
                               2198                 :                :      * exceeding max_wal_size_mb. This is based on two assumptions:
                               2199                 :                :      *
                               2200                 :                :      * a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept
                               2201                 :                :      *    WAL for two checkpoint cycles to allow us to recover from the
                               2202                 :                :      *    secondary checkpoint if the first checkpoint failed, though we
                               2203                 :                :      *    only did this on the primary anyway, not on standby. Keeping just
                               2204                 :                :      *    one checkpoint simplifies processing and reduces disk space in
                               2205                 :                :      *    many smaller databases.)
                               2206                 :                :      * b) during checkpoint, we consume checkpoint_completion_target *
                               2207                 :                :      *    number of segments consumed between checkpoints.
                               2208                 :                :      *-------
                               2209                 :                :      */
 3232 andres@anarazel.de       2210                 :           9639 :     target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) /
 3183 simon@2ndQuadrant.co     2211                 :           9639 :         (1.0 + CheckPointCompletionTarget);
                               2212                 :                : 
                               2213                 :                :     /* round down */
 4171 heikki.linnakangas@i     2214                 :           9639 :     CheckPointSegments = (int) target;
                               2215                 :                : 
                               2216         [ +  + ]:           9639 :     if (CheckPointSegments < 1)
                               2217                 :             10 :         CheckPointSegments = 1;
                               2218                 :           9639 : }
                               2219                 :                : 
                               2220                 :                : void
                               2221                 :           7246 : assign_max_wal_size(int newval, void *extra)
                               2222                 :                : {
 3400 simon@2ndQuadrant.co     2223                 :           7246 :     max_wal_size_mb = newval;
 4171 heikki.linnakangas@i     2224                 :           7246 :     CalculateCheckpointSegments();
                               2225                 :           7246 : }
                               2226                 :                : 
                               2227                 :                : void
                               2228                 :           1266 : assign_checkpoint_completion_target(double newval, void *extra)
                               2229                 :                : {
                               2230                 :           1266 :     CheckPointCompletionTarget = newval;
                               2231                 :           1266 :     CalculateCheckpointSegments();
                               2232                 :           1266 : }
                               2233                 :                : 
                               2234                 :                : bool
 1063 peter@eisentraut.org     2235                 :           2450 : check_wal_segment_size(int *newval, void **extra, GucSource source)
                               2236                 :                : {
                               2237   [ +  -  +  -  :           2450 :     if (!IsValidWalSegSize(*newval))
                                        +  -  -  + ]
                               2238                 :                :     {
 1063 peter@eisentraut.org     2239                 :UBC           0 :         GUC_check_errdetail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
                               2240                 :              0 :         return false;
                               2241                 :                :     }
                               2242                 :                : 
 1063 peter@eisentraut.org     2243                 :CBC        2450 :     return true;
                               2244                 :                : }
                               2245                 :                : 
                               2246                 :                : /*
                               2247                 :                :  * At a checkpoint, how many WAL segments to recycle as preallocated future
                               2248                 :                :  * XLOG segments? Returns the highest segment that should be preallocated.
                               2249                 :                :  */
                               2250                 :                : static XLogSegNo
 2412 michael@paquier.xyz      2251                 :           1930 : XLOGfileslop(XLogRecPtr lastredoptr)
                               2252                 :                : {
                               2253                 :                :     XLogSegNo   minSegNo;
                               2254                 :                :     XLogSegNo   maxSegNo;
                               2255                 :                :     double      distance;
                               2256                 :                :     XLogSegNo   recycleSegNo;
                               2257                 :                : 
                               2258                 :                :     /*
                               2259                 :                :      * Calculate the segment numbers that min_wal_size_mb and max_wal_size_mb
                               2260                 :                :      * correspond to. Always recycle enough segments to meet the minimum, and
                               2261                 :                :      * remove enough segments to stay below the maximum.
                               2262                 :                :      */
                               2263                 :           1930 :     minSegNo = lastredoptr / wal_segment_size +
 3232 andres@anarazel.de       2264                 :           1930 :         ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
 2412 michael@paquier.xyz      2265                 :           1930 :     maxSegNo = lastredoptr / wal_segment_size +
 3232 andres@anarazel.de       2266                 :           1930 :         ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
                               2267                 :                : 
                               2268                 :                :     /*
                               2269                 :                :      * Between those limits, recycle enough segments to get us through to the
                               2270                 :                :      * estimated end of next checkpoint.
                               2271                 :                :      *
                               2272                 :                :      * To estimate where the next checkpoint will finish, assume that the
                               2273                 :                :      * system runs steadily consuming CheckPointDistanceEstimate bytes between
                               2274                 :                :      * every checkpoint.
                               2275                 :                :      */
 3183 simon@2ndQuadrant.co     2276                 :           1930 :     distance = (1.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate;
                               2277                 :                :     /* add 10% for good measure. */
 4171 heikki.linnakangas@i     2278                 :           1930 :     distance *= 1.10;
                               2279                 :                : 
 2412 michael@paquier.xyz      2280                 :           1930 :     recycleSegNo = (XLogSegNo) ceil(((double) lastredoptr + distance) /
                               2281                 :                :                                     wal_segment_size);
                               2282                 :                : 
 4171 heikki.linnakangas@i     2283         [ +  + ]:           1930 :     if (recycleSegNo < minSegNo)
                               2284                 :           1337 :         recycleSegNo = minSegNo;
                               2285         [ +  + ]:           1930 :     if (recycleSegNo > maxSegNo)
                               2286                 :            436 :         recycleSegNo = maxSegNo;
                               2287                 :                : 
                               2288                 :           1930 :     return recycleSegNo;
                               2289                 :                : }
                               2290                 :                : 
                               2291                 :                : /*
                               2292                 :                :  * Check whether we've consumed enough xlog space that a checkpoint is needed.
                               2293                 :                :  *
                               2294                 :                :  * new_segno indicates a log file that has just been filled up (or read
                               2295                 :                :  * during recovery). We measure the distance from RedoRecPtr to new_segno
                               2296                 :                :  * and see if that exceeds CheckPointSegments.
                               2297                 :                :  *
                               2298                 :                :  * Note: it is caller's responsibility that RedoRecPtr is up-to-date.
                               2299                 :                :  */
                               2300                 :                : bool
 5145                          2301                 :           4982 : XLogCheckpointNeeded(XLogSegNo new_segno)
                               2302                 :                : {
                               2303                 :                :     XLogSegNo   old_segno;
                               2304                 :                : 
 3232 andres@anarazel.de       2305                 :           4982 :     XLByteToSeg(RedoRecPtr, old_segno, wal_segment_size);
                               2306                 :                : 
 5145 heikki.linnakangas@i     2307         [ +  + ]:           4982 :     if (new_segno >= old_segno + (uint64) (CheckPointSegments - 1))
 6862 tgl@sss.pgh.pa.us        2308                 :           3018 :         return true;
                               2309                 :           1964 :     return false;
                               2310                 :                : }
                               2311                 :                : 
                               2312                 :                : /*
                               2313                 :                :  * Write and/or fsync the log at least as far as WriteRqst indicates.
                               2314                 :                :  *
                               2315                 :                :  * If flexible == true, we don't have to write as far as WriteRqst, but
                               2316                 :                :  * may stop at any convenient boundary (such as a cache or logfile boundary).
                               2317                 :                :  * This option allows us to avoid uselessly issuing multiple writes when a
                               2318                 :                :  * single one would do.
                               2319                 :                :  *
                               2320                 :                :  * Must be called with WALWriteLock held. WaitXLogInsertionsToFinish(WriteRqst)
                               2321                 :                :  * must be called before grabbing the lock, to make sure the data is ready to
                               2322                 :                :  * write.
                               2323                 :                :  */
                               2324                 :                : static void
 1724 rhaas@postgresql.org     2325                 :        2387488 : XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
                               2326                 :                : {
                               2327                 :                :     bool        ispartialpage;
                               2328                 :                :     bool        last_iteration;
                               2329                 :                :     bool        finishing_seg;
                               2330                 :                :     int         curridx;
                               2331                 :                :     int         npages;
                               2332                 :                :     int         startidx;
                               2333                 :                :     uint32      startoffset;
                               2334                 :                : 
                               2335                 :                :     /* We should always be inside a critical section here */
 7772 tgl@sss.pgh.pa.us        2336         [ -  + ]:        2387488 :     Assert(CritSectionCount > 0);
                               2337                 :                : 
                               2338                 :                :     /*
                               2339                 :                :      * Update local LogwrtResult (caller probably did this already, but...)
                               2340                 :                :      */
  844 alvherre@alvh.no-ip.     2341                 :        2387488 :     RefreshXLogWriteResult(LogwrtResult);
                               2342                 :                : 
                               2343                 :                :     /*
                               2344                 :                :      * Since successive pages in the xlog cache are consecutively allocated,
                               2345                 :                :      * we can usually gather multiple pages together and issue just one
                               2346                 :                :      * write() call.  npages is the number of pages we have determined can be
                               2347                 :                :      * written together; startidx is the cache block index of the first one,
                               2348                 :                :      * and startoffset is the file offset at which it should go. The latter
                               2349                 :                :      * two variables are only valid when npages > 0, but we must initialize
                               2350                 :                :      * all of them to keep the compiler quiet.
                               2351                 :                :      */
 7643 tgl@sss.pgh.pa.us        2352                 :        2387488 :     npages = 0;
                               2353                 :        2387488 :     startidx = 0;
                               2354                 :        2387488 :     startoffset = 0;
                               2355                 :                : 
                               2356                 :                :     /*
                               2357                 :                :      * Within the loop, curridx is the cache block index of the page to
                               2358                 :                :      * consider writing.  Begin at the buffer containing the next unwritten
                               2359                 :                :      * page, or last partially written page.
                               2360                 :                :      */
 4757 heikki.linnakangas@i     2361                 :        2387488 :     curridx = XLogRecPtrToBufIdx(LogwrtResult.Write);
                               2362                 :                : 
 4958 alvherre@alvh.no-ip.     2363         [ +  + ]:        5194562 :     while (LogwrtResult.Write < WriteRqst.Write)
                               2364                 :                :     {
                               2365                 :                :         /*
                               2366                 :                :          * Make sure we're not ahead of the insert process.  This could happen
                               2367                 :                :          * if we're passed a bogus WriteRqst.Write that is past the end of the
                               2368                 :                :          * last page that's been initialized by AdvanceXLInsertBuffer.
                               2369                 :                :          */
  950 jdavis@postgresql.or     2370                 :        2972428 :         XLogRecPtr  EndPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[curridx]);
                               2371                 :                : 
 4766 heikki.linnakangas@i     2372         [ -  + ]:        2972428 :         if (LogwrtResult.Write >= EndPtr)
  384 alvherre@kurilemu.de     2373         [ #  # ]:UBC           0 :             elog(PANIC, "xlog write request %X/%08X is past end of log %X/%08X",
                               2374                 :                :                  LSN_FORMAT_ARGS(LogwrtResult.Write),
                               2375                 :                :                  LSN_FORMAT_ARGS(EndPtr));
                               2376                 :                : 
                               2377                 :                :         /* Advance LogwrtResult.Write to end of current buffer page */
 4766 heikki.linnakangas@i     2378                 :CBC     2972428 :         LogwrtResult.Write = EndPtr;
 4958 alvherre@alvh.no-ip.     2379                 :        2972428 :         ispartialpage = WriteRqst.Write < LogwrtResult.Write;
                               2380                 :                : 
 3232 andres@anarazel.de       2381         [ +  + ]:        2972428 :         if (!XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2382                 :                :                              wal_segment_size))
                               2383                 :                :         {
                               2384                 :                :             /*
                               2385                 :                :              * Switch to new logfile segment.  We cannot have any pending
                               2386                 :                :              * pages here (since we dump what we have at segment end).
                               2387                 :                :              */
 7643 tgl@sss.pgh.pa.us        2388         [ -  + ]:          15103 :             Assert(npages == 0);
 9266                          2389         [ +  + ]:          15103 :             if (openLogFile >= 0)
 7346 bruce@momjian.us         2390                 :           7003 :                 XLogFileClose();
 3232 andres@anarazel.de       2391                 :          15103 :             XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2392                 :                :                             wal_segment_size);
 1724 rhaas@postgresql.org     2393                 :          15103 :             openLogTLI = tli;
                               2394                 :                : 
                               2395                 :                :             /* create/use new log file */
                               2396                 :          15103 :             openLogFile = XLogFileInit(openLogSegNo, tli);
 2344 tgl@sss.pgh.pa.us        2397                 :          15103 :             ReserveExternalFD();
                               2398                 :                :         }
                               2399                 :                : 
                               2400                 :                :         /* Make sure we have the current logfile open */
 9266                          2401         [ -  + ]:        2972428 :         if (openLogFile < 0)
                               2402                 :                :         {
 3232 andres@anarazel.de       2403                 :UBC           0 :             XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2404                 :                :                             wal_segment_size);
 1724 rhaas@postgresql.org     2405                 :              0 :             openLogTLI = tli;
                               2406                 :              0 :             openLogFile = XLogFileOpen(openLogSegNo, tli);
 2344 tgl@sss.pgh.pa.us        2407                 :              0 :             ReserveExternalFD();
                               2408                 :                :         }
                               2409                 :                : 
                               2410                 :                :         /* Add current page to the set of pending pages-to-dump */
 7643 tgl@sss.pgh.pa.us        2411         [ +  + ]:CBC     2972428 :         if (npages == 0)
                               2412                 :                :         {
                               2413                 :                :             /* first of group */
                               2414                 :        2405394 :             startidx = curridx;
 3232 andres@anarazel.de       2415                 :        2405394 :             startoffset = XLogSegmentOffset(LogwrtResult.Write - XLOG_BLCKSZ,
                               2416                 :                :                                             wal_segment_size);
                               2417                 :                :         }
 7643 tgl@sss.pgh.pa.us        2418                 :        2972428 :         npages++;
                               2419                 :                : 
                               2420                 :                :         /*
                               2421                 :                :          * Dump the set if this will be the last loop iteration, or if we are
                               2422                 :                :          * at the last page of the cache area (since the next page won't be
                               2423                 :                :          * contiguous in memory), or if we are at the end of the logfile
                               2424                 :                :          * segment.
                               2425                 :                :          */
 4958 alvherre@alvh.no-ip.     2426                 :        2972428 :         last_iteration = WriteRqst.Write <= LogwrtResult.Write;
                               2427                 :                : 
 7643 tgl@sss.pgh.pa.us        2428         [ +  + ]:        5785518 :         finishing_seg = !ispartialpage &&
 3232 andres@anarazel.de       2429         [ +  + ]:        2813090 :             (startoffset + npages * XLOG_BLCKSZ) >= wal_segment_size;
                               2430                 :                : 
 7294 tgl@sss.pgh.pa.us        2431         [ +  + ]:        2972428 :         if (last_iteration ||
 7643                          2432   [ +  +  -  + ]:         587774 :             curridx == XLogCtl->XLogCacheBlck ||
                               2433                 :                :             finishing_seg)
                               2434                 :                :         {
                               2435                 :                :             char       *from;
                               2436                 :                :             Size        nbytes;
                               2437                 :                :             Size        nleft;
                               2438                 :                :             ssize_t     written;
                               2439                 :                :             instr_time  start;
                               2440                 :                : 
                               2441                 :                :             /* OK to write the page(s) */
 7419                          2442                 :        2405394 :             from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
                               2443                 :        2405394 :             nbytes = npages * (Size) XLOG_BLCKSZ;
 4773 heikki.linnakangas@i     2444                 :        2405394 :             nleft = nbytes;
                               2445                 :                :             do
                               2446                 :                :             {
                               2447                 :        2405394 :                 errno = 0;
                               2448                 :                : 
                               2449                 :                :                 /*
                               2450                 :                :                  * Measure I/O timing to write WAL data, for pg_stat_io.
                               2451                 :                :                  */
  515 michael@paquier.xyz      2452                 :        2405394 :                 start = pgstat_prepare_io_time(track_wal_io_timing);
                               2453                 :                : 
 3417 rhaas@postgresql.org     2454                 :        2405394 :                 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 1396 tmunro@postgresql.or     2455                 :        2405394 :                 written = pg_pwrite(openLogFile, from, nleft, startoffset);
 3417 rhaas@postgresql.org     2456                 :        2405394 :                 pgstat_report_wait_end();
                               2457                 :                : 
 4773 heikki.linnakangas@i     2458         [ -  + ]:        2405394 :                 if (written <= 0)
                               2459                 :                :                 {
                               2460                 :                :                     char        xlogfname[MAXFNAMELEN];
                               2461                 :                :                     int         save_errno;
                               2462                 :                : 
 4773 heikki.linnakangas@i     2463         [ #  # ]:UBC           0 :                     if (errno == EINTR)
                               2464                 :              0 :                         continue;
                               2465                 :                : 
 2427 michael@paquier.xyz      2466                 :              0 :                     save_errno = errno;
 1724 rhaas@postgresql.org     2467                 :              0 :                     XLogFileName(xlogfname, tli, openLogSegNo,
                               2468                 :                :                                  wal_segment_size);
 2427 michael@paquier.xyz      2469                 :              0 :                     errno = save_errno;
 4773 heikki.linnakangas@i     2470         [ #  # ]:              0 :                     ereport(PANIC,
                               2471                 :                :                             (errcode_for_file_access(),
                               2472                 :                :                              errmsg("could not write to log file \"%s\" at offset %u, length %zu: %m",
                               2473                 :                :                                     xlogfname, startoffset, nleft)));
                               2474                 :                :                 }
                               2475                 :                : 
   39 michael@paquier.xyz      2476                 :CBC     2405394 :                 pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
                               2477                 :                :                                         IOOP_WRITE, start, 1, written);
 4773 heikki.linnakangas@i     2478                 :        2405394 :                 nleft -= written;
                               2479                 :        2405394 :                 from += written;
 2818 tmunro@postgresql.or     2480                 :        2405394 :                 startoffset += written;
 4773 heikki.linnakangas@i     2481         [ -  + ]:        2405394 :             } while (nleft > 0);
                               2482                 :                : 
 7643 tgl@sss.pgh.pa.us        2483                 :        2405394 :             npages = 0;
                               2484                 :                : 
                               2485                 :                :             /*
                               2486                 :                :              * If we just wrote the whole last page of a logfile segment,
                               2487                 :                :              * fsync the segment immediately.  This avoids having to go back
                               2488                 :                :              * and re-open prior segments when an fsync request comes along
                               2489                 :                :              * later. Doing it here ensures that one and only one backend will
                               2490                 :                :              * perform this fsync.
                               2491                 :                :              *
                               2492                 :                :              * This is also the right place to notify the Archiver that the
                               2493                 :                :              * segment is ready to copy to archival storage, and to update the
                               2494                 :                :              * timer for archive_timeout, and to signal for a checkpoint if
                               2495                 :                :              * too many logfile segments have been used since the last
                               2496                 :                :              * checkpoint.
                               2497                 :                :              */
 4766 heikki.linnakangas@i     2498         [ +  + ]:        2405394 :             if (finishing_seg)
                               2499                 :                :             {
 1724 rhaas@postgresql.org     2500                 :           2081 :                 issue_xlog_fsync(openLogFile, openLogSegNo, tli);
                               2501                 :                : 
                               2502                 :                :                 /* signal that we need to wakeup walsenders later */
 5137                          2503                 :           2081 :                 WalSndWakeupRequest();
                               2504                 :                : 
 3322 tgl@sss.pgh.pa.us        2505                 :           2081 :                 LogwrtResult.Flush = LogwrtResult.Write;    /* end of page */
                               2506                 :                : 
 7643                          2507   [ +  +  -  +  :           2081 :                 if (XLogArchivingActive())
                                              +  + ]
 1724 rhaas@postgresql.org     2508                 :            417 :                     XLogArchiveNotifySeg(openLogSegNo, tli);
                               2509                 :                : 
 4757 heikki.linnakangas@i     2510                 :           2081 :                 XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 3503 andres@anarazel.de       2511                 :           2081 :                 XLogCtl->lastSegSwitchLSN = LogwrtResult.Flush;
                               2512                 :                : 
                               2513                 :                :                 /*
                               2514                 :                :                  * Request a checkpoint if we've consumed too much xlog since
                               2515                 :                :                  * the last one.  For speed, we first check using the local
                               2516                 :                :                  * copy of RedoRecPtr, which might be out of date; if it looks
                               2517                 :                :                  * like a checkpoint is needed, forcibly update RedoRecPtr and
                               2518                 :                :                  * recheck.
                               2519                 :                :                  */
 5145 heikki.linnakangas@i     2520   [ +  +  +  + ]:           2081 :                 if (IsUnderPostmaster && XLogCheckpointNeeded(openLogSegNo))
                               2521                 :                :                 {
 6862 tgl@sss.pgh.pa.us        2522                 :            282 :                     (void) GetRedoRecPtr();
 5145 heikki.linnakangas@i     2523         [ +  + ]:            282 :                     if (XLogCheckpointNeeded(openLogSegNo))
 6966 tgl@sss.pgh.pa.us        2524                 :            222 :                         RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
                               2525                 :                :                 }
                               2526                 :                :             }
                               2527                 :                :         }
                               2528                 :                : 
 9266                          2529         [ +  + ]:        2972428 :         if (ispartialpage)
                               2530                 :                :         {
                               2531                 :                :             /* Only asked to write a partial page */
                               2532                 :         159338 :             LogwrtResult.Write = WriteRqst.Write;
                               2533                 :         159338 :             break;
                               2534                 :                :         }
 7643                          2535         [ +  + ]:        2813090 :         curridx = NextBufIdx(curridx);
                               2536                 :                : 
                               2537                 :                :         /* If flexible, break out of loop as soon as we wrote something */
                               2538   [ +  +  +  + ]:        2813090 :         if (flexible && npages == 0)
                               2539                 :           6016 :             break;
                               2540                 :                :     }
                               2541                 :                : 
                               2542         [ -  + ]:        2387488 :     Assert(npages == 0);
                               2543                 :                : 
                               2544                 :                :     /*
                               2545                 :                :      * If asked to flush, do so
                               2546                 :                :      */
 4958 alvherre@alvh.no-ip.     2547         [ +  + ]:        2387488 :     if (LogwrtResult.Flush < WriteRqst.Flush &&
                               2548         [ +  + ]:         170409 :         LogwrtResult.Flush < LogwrtResult.Write)
                               2549                 :                :     {
                               2550                 :                :         /*
                               2551                 :                :          * Could get here without iterating above loop, in which case we might
                               2552                 :                :          * have no open file or the wrong one.  However, we do not need to
                               2553                 :                :          * fsync more than one file.
                               2554                 :                :          */
 1017 nathan@postgresql.or     2555         [ +  - ]:         170288 :         if (wal_sync_method != WAL_SYNC_METHOD_OPEN &&
                               2556         [ +  - ]:         170288 :             wal_sync_method != WAL_SYNC_METHOD_OPEN_DSYNC)
                               2557                 :                :         {
 9263 tgl@sss.pgh.pa.us        2558         [ +  + ]:         170288 :             if (openLogFile >= 0 &&
 3232 andres@anarazel.de       2559         [ +  + ]:         170265 :                 !XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2560                 :                :                                  wal_segment_size))
 7346 bruce@momjian.us         2561                 :            165 :                 XLogFileClose();
 9263 tgl@sss.pgh.pa.us        2562         [ +  + ]:         170288 :             if (openLogFile < 0)
                               2563                 :                :             {
 3232 andres@anarazel.de       2564                 :            188 :                 XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2565                 :                :                                 wal_segment_size);
 1724 rhaas@postgresql.org     2566                 :            188 :                 openLogTLI = tli;
                               2567                 :            188 :                 openLogFile = XLogFileOpen(openLogSegNo, tli);
 2344 tgl@sss.pgh.pa.us        2568                 :            188 :                 ReserveExternalFD();
                               2569                 :                :             }
                               2570                 :                : 
 1724 rhaas@postgresql.org     2571                 :         170288 :             issue_xlog_fsync(openLogFile, openLogSegNo, tli);
                               2572                 :                :         }
                               2573                 :                : 
                               2574                 :                :         /* signal that we need to wakeup walsenders later */
 5137                          2575                 :         170288 :         WalSndWakeupRequest();
                               2576                 :                : 
 9266 tgl@sss.pgh.pa.us        2577                 :         170288 :         LogwrtResult.Flush = LogwrtResult.Write;
                               2578                 :                :     }
                               2579                 :                : 
                               2580                 :                :     /*
                               2581                 :                :      * Update shared-memory status
                               2582                 :                :      *
                               2583                 :                :      * We make sure that the shared 'request' values do not fall behind the
                               2584                 :                :      * 'result' values.  This is not absolutely essential, but it saves some
                               2585                 :                :      * code in a couple of places.
                               2586                 :                :      */
  842 alvherre@alvh.no-ip.     2587                 :        2387488 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2588         [ +  + ]:        2387488 :     if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
                               2589                 :         146954 :         XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
                               2590         [ +  + ]:        2387488 :     if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
                               2591                 :         171970 :         XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
                               2592                 :        2387488 :     SpinLockRelease(&XLogCtl->info_lck);
                               2593                 :                : 
                               2594                 :                :     /*
                               2595                 :                :      * We write Write first, bar, then Flush.  When reading, the opposite must
                               2596                 :                :      * be done (with a matching barrier in between), so that we always see a
                               2597                 :                :      * Flush value that trails behind the Write value seen.
                               2598                 :                :      */
                               2599                 :        2387488 :     pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write);
                               2600                 :        2387488 :     pg_write_barrier();
                               2601                 :        2387488 :     pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush);
                               2602                 :                : 
                               2603                 :                : #ifdef USE_ASSERT_CHECKING
                               2604                 :                :     {
                               2605                 :                :         XLogRecPtr  Flush;
                               2606                 :                :         XLogRecPtr  Write;
                               2607                 :                :         XLogRecPtr  Insert;
                               2608                 :                : 
                               2609                 :        2387488 :         Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult);
                               2610                 :        2387488 :         pg_read_barrier();
                               2611                 :        2387488 :         Write = pg_atomic_read_u64(&XLogCtl->logWriteResult);
  840                          2612                 :        2387488 :         pg_read_barrier();
                               2613                 :        2387488 :         Insert = pg_atomic_read_u64(&XLogCtl->logInsertResult);
                               2614                 :                : 
                               2615                 :                :         /* WAL written to disk is always ahead of WAL flushed */
  842                          2616         [ -  + ]:        2387488 :         Assert(Write >= Flush);
                               2617                 :                : 
                               2618                 :                :         /* WAL inserted to buffers is always ahead of WAL written */
  840                          2619         [ -  + ]:        2387488 :         Assert(Insert >= Write);
                               2620                 :                :     }
                               2621                 :                : #endif
 9266 tgl@sss.pgh.pa.us        2622                 :        2387488 : }
                               2623                 :                : 
                               2624                 :                : /*
                               2625                 :                :  * Record the LSN for an asynchronous transaction commit/abort
                               2626                 :                :  * and nudge the WALWriter if there is work for it to do.
                               2627                 :                :  * (This should not be called for synchronous commits.)
                               2628                 :                :  */
                               2629                 :                : void
 5841 simon@2ndQuadrant.co     2630                 :          43273 : XLogSetAsyncXactLSN(XLogRecPtr asyncXactLSN)
                               2631                 :                : {
 5369                          2632                 :          43273 :     XLogRecPtr  WriteRqstPtr = asyncXactLSN;
                               2633                 :                :     bool        sleeping;
  972 heikki.linnakangas@i     2634                 :          43273 :     bool        wakeup = false;
                               2635                 :                :     XLogRecPtr  prevAsyncXactLSN;
                               2636                 :                : 
 4325 andres@anarazel.de       2637                 :          43273 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2638                 :          43273 :     sleeping = XLogCtl->WalWriterSleeping;
  972 heikki.linnakangas@i     2639                 :          43273 :     prevAsyncXactLSN = XLogCtl->asyncXactLSN;
 4325 andres@anarazel.de       2640         [ +  + ]:          43273 :     if (XLogCtl->asyncXactLSN < asyncXactLSN)
                               2641                 :          42634 :         XLogCtl->asyncXactLSN = asyncXactLSN;
                               2642                 :          43273 :     SpinLockRelease(&XLogCtl->info_lck);
                               2643                 :                : 
                               2644                 :                :     /*
                               2645                 :                :      * If somebody else already called this function with a more aggressive
                               2646                 :                :      * LSN, they will have done what we needed (and perhaps more).
                               2647                 :                :      */
  972 heikki.linnakangas@i     2648         [ +  + ]:          43273 :     if (asyncXactLSN <= prevAsyncXactLSN)
                               2649                 :            639 :         return;
                               2650                 :                : 
                               2651                 :                :     /*
                               2652                 :                :      * If the WALWriter is sleeping, kick it to make it come out of low-power
                               2653                 :                :      * mode, so that this async commit will reach disk within the expected
                               2654                 :                :      * amount of time.  Otherwise, determine whether it has enough WAL
                               2655                 :                :      * available to flush, the same way that XLogBackgroundFlush() does.
                               2656                 :                :      */
                               2657         [ +  + ]:          42634 :     if (sleeping)
                               2658                 :             36 :         wakeup = true;
                               2659                 :                :     else
                               2660                 :                :     {
                               2661                 :                :         int         flushblocks;
                               2662                 :                : 
  842 alvherre@alvh.no-ip.     2663                 :          42598 :         RefreshXLogWriteResult(LogwrtResult);
                               2664                 :                : 
  972 heikki.linnakangas@i     2665                 :          42598 :         flushblocks =
                               2666                 :          42598 :             WriteRqstPtr / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
                               2667                 :                : 
                               2668   [ +  -  +  + ]:          42598 :         if (WalWriterFlushAfter == 0 || flushblocks >= WalWriterFlushAfter)
                               2669                 :           4622 :             wakeup = true;
                               2670                 :                :     }
                               2671                 :                : 
  632                          2672         [ +  + ]:          42634 :     if (wakeup)
                               2673                 :                :     {
   18 heikki.linnakangas@i     2674                 :GNC        4658 :         ProcNumber  walwriterProc = pg_atomic_read_u32(&ProcGlobal->walwriterProc);
                               2675                 :                : 
  632 heikki.linnakangas@i     2676         [ +  + ]:CBC        4658 :         if (walwriterProc != INVALID_PROC_NUMBER)
                               2677                 :            500 :             SetLatch(&GetPGProcByNumber(walwriterProc)->procLatch);
                               2678                 :                :     }
                               2679                 :                : }
                               2680                 :                : 
                               2681                 :                : /*
                               2682                 :                :  * Record the LSN up to which we can remove WAL because it's not required by
                               2683                 :                :  * any replication slot.
                               2684                 :                :  */
                               2685                 :                : void
 4559 rhaas@postgresql.org     2686                 :          57568 : XLogSetReplicationSlotMinimumLSN(XLogRecPtr lsn)
                               2687                 :                : {
 4325 andres@anarazel.de       2688                 :          57568 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2689                 :          57568 :     XLogCtl->replicationSlotMinLSN = lsn;
                               2690                 :          57568 :     SpinLockRelease(&XLogCtl->info_lck);
 4559 rhaas@postgresql.org     2691                 :          57568 : }
                               2692                 :                : 
                               2693                 :                : 
                               2694                 :                : /*
                               2695                 :                :  * Return the oldest LSN we must retain to satisfy the needs of some
                               2696                 :                :  * replication slot.
                               2697                 :                :  */
                               2698                 :                : XLogRecPtr
                               2699                 :           2548 : XLogGetReplicationSlotMinimumLSN(void)
                               2700                 :                : {
                               2701                 :                :     XLogRecPtr  retval;
                               2702                 :                : 
 4325 andres@anarazel.de       2703                 :           2548 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2704                 :           2548 :     retval = XLogCtl->replicationSlotMinLSN;
                               2705                 :           2548 :     SpinLockRelease(&XLogCtl->info_lck);
                               2706                 :                : 
 4559 rhaas@postgresql.org     2707                 :           2548 :     return retval;
                               2708                 :                : }
                               2709                 :                : 
                               2710                 :                : /*
                               2711                 :                :  * Advance minRecoveryPoint in control file.
                               2712                 :                :  *
                               2713                 :                :  * If we crash during recovery, we must reach this point again before the
                               2714                 :                :  * database is consistent.
                               2715                 :                :  *
                               2716                 :                :  * If 'force' is true, 'lsn' argument is ignored. Otherwise, minRecoveryPoint
                               2717                 :                :  * is only updated if it's not already greater than or equal to 'lsn'.
                               2718                 :                :  */
                               2719                 :                : static void
 6367 heikki.linnakangas@i     2720                 :         132582 : UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
                               2721                 :                : {
                               2722                 :                :     /* Quick check using our local copy of the variable */
 1621                          2723   [ +  +  +  +  :         132582 :     if (!updateMinRecoveryPoint || (!force && lsn <= LocalMinRecoveryPoint))
                                              +  + ]
 6367                          2724                 :         124034 :         return;
                               2725                 :                : 
                               2726                 :                :     /*
                               2727                 :                :      * An invalid minRecoveryPoint means that we need to recover all the WAL,
                               2728                 :                :      * i.e., we're doing crash recovery.  We never modify the control file's
                               2729                 :                :      * value in that case, so we can short-circuit future checks here too. The
                               2730                 :                :      * local values of minRecoveryPoint and minRecoveryPointTLI should not be
                               2731                 :                :      * updated until crash recovery finishes.  We only do this for the startup
                               2732                 :                :      * process as it should not update its own reference of minRecoveryPoint
                               2733                 :                :      * until it has finished crash recovery to make sure that all WAL
                               2734                 :                :      * available is replayed in this case.  This also saves from extra locks
                               2735                 :                :      * taken on the control file from the startup process.
                               2736                 :                :      */
  262 alvherre@kurilemu.de     2737   [ +  +  +  + ]:           8548 :     if (!XLogRecPtrIsValid(LocalMinRecoveryPoint) && InRecovery)
                               2738                 :                :     {
 2943 michael@paquier.xyz      2739                 :             30 :         updateMinRecoveryPoint = false;
                               2740                 :             30 :         return;
                               2741                 :                :     }
                               2742                 :                : 
 6367 heikki.linnakangas@i     2743                 :           8518 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               2744                 :                : 
                               2745                 :                :     /* update local copy */
 1621                          2746                 :           8518 :     LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               2747                 :           8518 :     LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               2748                 :                : 
  262 alvherre@kurilemu.de     2749         [ +  + ]:           8518 :     if (!XLogRecPtrIsValid(LocalMinRecoveryPoint))
 2886 michael@paquier.xyz      2750                 :              3 :         updateMinRecoveryPoint = false;
 1621 heikki.linnakangas@i     2751   [ +  +  +  + ]:           8515 :     else if (force || LocalMinRecoveryPoint < lsn)
                               2752                 :                :     {
                               2753                 :                :         XLogRecPtr  newMinRecoveryPoint;
                               2754                 :                :         TimeLineID  newMinRecoveryPointTLI;
                               2755                 :                : 
                               2756                 :                :         /*
                               2757                 :                :          * To avoid having to update the control file too often, we update it
                               2758                 :                :          * all the way to the last record being replayed, even though 'lsn'
                               2759                 :                :          * would suffice for correctness.  This also allows the 'force' case
                               2760                 :                :          * to not need a valid 'lsn' value.
                               2761                 :                :          *
                               2762                 :                :          * Another important reason for doing it this way is that the passed
                               2763                 :                :          * 'lsn' value could be bogus, i.e., past the end of available WAL, if
                               2764                 :                :          * the caller got it from a corrupted heap page.  Accepting such a
                               2765                 :                :          * value as the min recovery point would prevent us from coming up at
                               2766                 :                :          * all.  Instead, we just log a warning and continue with recovery.
                               2767                 :                :          * (See also the comments about corrupt LSNs in XLogFlush.)
                               2768                 :                :          */
                               2769                 :           6490 :         newMinRecoveryPoint = GetCurrentReplayRecPtr(&newMinRecoveryPointTLI);
 4958 alvherre@alvh.no-ip.     2770   [ +  +  -  + ]:           6490 :         if (!force && newMinRecoveryPoint < lsn)
 6239 tgl@sss.pgh.pa.us        2771         [ #  # ]:UBC           0 :             elog(WARNING,
                               2772                 :                :                  "xlog min recovery request %X/%08X is past current point %X/%08X",
                               2773                 :                :                  LSN_FORMAT_ARGS(lsn), LSN_FORMAT_ARGS(newMinRecoveryPoint));
                               2774                 :                : 
                               2775                 :                :         /* update control file */
 4958 alvherre@alvh.no-ip.     2776         [ +  + ]:CBC        6490 :         if (ControlFile->minRecoveryPoint < newMinRecoveryPoint)
                               2777                 :                :         {
 6367 heikki.linnakangas@i     2778                 :           6128 :             ControlFile->minRecoveryPoint = newMinRecoveryPoint;
 4982                          2779                 :           6128 :             ControlFile->minRecoveryPointTLI = newMinRecoveryPointTLI;
 6367                          2780                 :           6128 :             UpdateControlFile();
 1621                          2781                 :           6128 :             LocalMinRecoveryPoint = newMinRecoveryPoint;
                               2782                 :           6128 :             LocalMinRecoveryPointTLI = newMinRecoveryPointTLI;
                               2783                 :                : 
 6367                          2784         [ +  + ]:           6128 :             ereport(DEBUG2,
                               2785                 :                :                     errmsg_internal("updated min recovery point to %X/%08X on timeline %u",
                               2786                 :                :                                     LSN_FORMAT_ARGS(newMinRecoveryPoint),
                               2787                 :                :                                     newMinRecoveryPointTLI));
                               2788                 :                :         }
                               2789                 :                :     }
                               2790                 :           8518 :     LWLockRelease(ControlFileLock);
                               2791                 :                : }
                               2792                 :                : 
                               2793                 :                : /*
                               2794                 :                :  * Ensure that all XLOG data through the given position is flushed to disk.
                               2795                 :                :  *
                               2796                 :                :  * NOTE: this differs from XLogWrite mainly in that the WALWriteLock is not
                               2797                 :                :  * already held, and we try to avoid acquiring it if possible.
                               2798                 :                :  */
                               2799                 :                : void
 9266 tgl@sss.pgh.pa.us        2800                 :         897139 : XLogFlush(XLogRecPtr record)
                               2801                 :                : {
                               2802                 :                :     XLogRecPtr  WriteRqstPtr;
                               2803                 :                :     XLogwrtRqst WriteRqst;
 1719 rhaas@postgresql.org     2804                 :         897139 :     TimeLineID  insertTLI = XLogCtl->InsertTimeLineID;
                               2805                 :                : 
                               2806                 :                :     /*
                               2807                 :                :      * During REDO, we are reading not writing WAL.  Therefore, instead of
                               2808                 :                :      * trying to flush the WAL, we should update minRecoveryPoint instead. We
                               2809                 :                :      * test XLogInsertAllowed(), not InRecovery, because we need checkpointer
                               2810                 :                :      * to act this way too, and because when it tries to write the
                               2811                 :                :      * end-of-recovery checkpoint, it should indeed flush.
                               2812                 :                :      */
 6239 tgl@sss.pgh.pa.us        2813         [ +  + ]:         897139 :     if (!XLogInsertAllowed())
                               2814                 :                :     {
 6367 heikki.linnakangas@i     2815                 :         132069 :         UpdateMinRecoveryPoint(record, false);
 9266 tgl@sss.pgh.pa.us        2816                 :         715025 :         return;
                               2817                 :                :     }
                               2818                 :                : 
                               2819                 :                :     /* Quick exit if already known flushed */
 4958 alvherre@alvh.no-ip.     2820         [ +  + ]:         765070 :     if (record <= LogwrtResult.Flush)
 9266 tgl@sss.pgh.pa.us        2821                 :         582956 :         return;
                               2822                 :                : 
                               2823                 :                : #ifdef WAL_DEBUG
                               2824                 :                :     if (XLOG_DEBUG)
                               2825                 :                :         elog(LOG, "xlog flush request %X/%08X; write %X/%08X; flush %X/%08X",
                               2826                 :                :              LSN_FORMAT_ARGS(record),
                               2827                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Write),
                               2828                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               2829                 :                : #endif
                               2830                 :                : 
                               2831                 :         182114 :     START_CRIT_SECTION();
                               2832                 :                : 
                               2833                 :                :     /*
                               2834                 :                :      * Since fsync is usually a horribly expensive operation, we try to
                               2835                 :                :      * piggyback as much data as we can on each fsync: if we see any more data
                               2836                 :                :      * entered into the xlog buffer, we'll write and fsync that too, so that
                               2837                 :                :      * the final value of LogwrtResult.Flush is as large as possible. This
                               2838                 :                :      * gives us some chance of avoiding another fsync immediately after.
                               2839                 :                :      */
                               2840                 :                : 
                               2841                 :                :     /* initialize to given target; may increase below */
                               2842                 :         182114 :     WriteRqstPtr = record;
                               2843                 :                : 
                               2844                 :                :     /*
                               2845                 :                :      * Now wait until we get the write lock, or someone else does the flush
                               2846                 :                :      * for us.
                               2847                 :                :      */
                               2848                 :                :     for (;;)
 8976                          2849                 :           3092 :     {
                               2850                 :                :         XLogRecPtr  insertpos;
                               2851                 :                : 
                               2852                 :                :         /* done already? */
  842 alvherre@alvh.no-ip.     2853                 :         185206 :         RefreshXLogWriteResult(LogwrtResult);
 4958                          2854         [ +  + ]:         185206 :         if (record <= LogwrtResult.Flush)
 5291 heikki.linnakangas@i     2855                 :          15336 :             break;
                               2856                 :                : 
                               2857                 :                :         /*
                               2858                 :                :          * Before actually performing the write, wait for all in-flight
                               2859                 :                :          * insertions to the pages we're about to write to finish.
                               2860                 :                :          */
  842 alvherre@alvh.no-ip.     2861                 :         169870 :         SpinLockAcquire(&XLogCtl->info_lck);
                               2862         [ +  + ]:         169870 :         if (WriteRqstPtr < XLogCtl->LogwrtRqst.Write)
                               2863                 :          13529 :             WriteRqstPtr = XLogCtl->LogwrtRqst.Write;
                               2864                 :         169870 :         SpinLockRelease(&XLogCtl->info_lck);
 4766 heikki.linnakangas@i     2865                 :         169870 :         insertpos = WaitXLogInsertionsToFinish(WriteRqstPtr);
                               2866                 :                : 
                               2867                 :                :         /*
                               2868                 :                :          * Try to get the write lock. If we can't get it immediately, wait
                               2869                 :                :          * until it's released, and recheck if we still need to do the flush
                               2870                 :                :          * or if the backend that held the lock did it for us already. This
                               2871                 :                :          * helps to maintain a good rate of group committing when the system
                               2872                 :                :          * is bottlenecked by the speed of fsyncing.
                               2873                 :                :          */
 5282                          2874         [ +  + ]:         169870 :         if (!LWLockAcquireOrWait(WALWriteLock, LW_EXCLUSIVE))
                               2875                 :                :         {
                               2876                 :                :             /*
                               2877                 :                :              * The lock is now free, but we didn't acquire it yet. Before we
                               2878                 :                :              * do, loop back to check if someone else flushed the record for
                               2879                 :                :              * us already.
                               2880                 :                :              */
 5291                          2881                 :           3092 :             continue;
                               2882                 :                :         }
                               2883                 :                : 
                               2884                 :                :         /* Got the lock; recheck whether request is satisfied */
  844 alvherre@alvh.no-ip.     2885                 :         166778 :         RefreshXLogWriteResult(LogwrtResult);
 4958                          2886         [ +  + ]:         166778 :         if (record <= LogwrtResult.Flush)
                               2887                 :                :         {
 5137 rhaas@postgresql.org     2888                 :           2779 :             LWLockRelease(WALWriteLock);
                               2889                 :           2779 :             break;
                               2890                 :                :         }
                               2891                 :                : 
                               2892                 :                :         /*
                               2893                 :                :          * Sleep before flush! By adding a delay here, we may give further
                               2894                 :                :          * backends the opportunity to join the backlog of group commit
                               2895                 :                :          * followers; this can significantly improve transaction throughput,
                               2896                 :                :          * at the risk of increasing transaction latency.
                               2897                 :                :          *
                               2898                 :                :          * We do not sleep if enableFsync is not turned on, nor if there are
                               2899                 :                :          * fewer than CommitSiblings other backends with active transactions.
                               2900                 :                :          */
                               2901   [ -  +  -  -  :         163999 :         if (CommitDelay > 0 && enableFsync &&
                                              -  - ]
 5137 rhaas@postgresql.org     2902                 :UBC           0 :             MinimumActiveBackends(CommitSiblings))
                               2903                 :                :         {
  229 heikki.linnakangas@i     2904                 :              0 :             pgstat_report_wait_start(WAIT_EVENT_COMMIT_DELAY);
 5137 rhaas@postgresql.org     2905                 :              0 :             pg_usleep(CommitDelay);
  229 heikki.linnakangas@i     2906                 :              0 :             pgstat_report_wait_end();
                               2907                 :                : 
                               2908                 :                :             /*
                               2909                 :                :              * Re-check how far we can now flush the WAL. It's generally not
                               2910                 :                :              * safe to call WaitXLogInsertionsToFinish while holding
                               2911                 :                :              * WALWriteLock, because an in-progress insertion might need to
                               2912                 :                :              * also grab WALWriteLock to make progress. But we know that all
                               2913                 :                :              * the insertions up to insertpos have already finished, because
                               2914                 :                :              * that's what the earlier WaitXLogInsertionsToFinish() returned.
                               2915                 :                :              * We're only calling it again to allow insertpos to be moved
                               2916                 :                :              * further forward, not to actually wait for anyone.
                               2917                 :                :              */
 4766                          2918                 :              0 :             insertpos = WaitXLogInsertionsToFinish(insertpos);
                               2919                 :                :         }
                               2920                 :                : 
                               2921                 :                :         /* try to write/flush later additions to XLOG as well */
 4766 heikki.linnakangas@i     2922                 :CBC      163999 :         WriteRqst.Write = insertpos;
                               2923                 :         163999 :         WriteRqst.Flush = insertpos;
                               2924                 :                : 
 1724 rhaas@postgresql.org     2925                 :         163999 :         XLogWrite(WriteRqst, insertTLI, false);
                               2926                 :                : 
 9066 tgl@sss.pgh.pa.us        2927                 :         163999 :         LWLockRelease(WALWriteLock);
                               2928                 :                :         /* done */
 5291 heikki.linnakangas@i     2929                 :         163999 :         break;
                               2930                 :                :     }
                               2931                 :                : 
 9266 tgl@sss.pgh.pa.us        2932         [ -  + ]:         182114 :     END_CRIT_SECTION();
                               2933                 :                : 
                               2934                 :                :     /* wake up walsenders now that we've released heavily contended locks */
 1205 andres@anarazel.de       2935                 :         182114 :     WalSndWakeupProcessRequests(true, !RecoveryInProgress());
                               2936                 :                : 
                               2937                 :                :     /*
                               2938                 :                :      * Wake up processes waiting for primary flush LSN to reach current flush
                               2939                 :                :      * position.
                               2940                 :                :      */
   84 akorotkov@postgresql     2941                 :         182114 :     WaitLSNWakeup(WAIT_LSN_TYPE_PRIMARY_FLUSH, LogwrtResult.Flush);
                               2942                 :                : 
                               2943                 :                :     /*
                               2944                 :                :      * If we still haven't flushed to the request point then we have a
                               2945                 :                :      * problem; most likely, the requested flush point is past end of XLOG.
                               2946                 :                :      * This has been seen to occur when a disk page has a corrupted LSN.
                               2947                 :                :      *
                               2948                 :                :      * Formerly we treated this as a PANIC condition, but that hurts the
                               2949                 :                :      * system's robustness rather than helping it: we do not want to take down
                               2950                 :                :      * the whole system due to corruption on one data page.  In particular, if
                               2951                 :                :      * the bad page is encountered again during recovery then we would be
                               2952                 :                :      * unable to restart the database at all!  (This scenario actually
                               2953                 :                :      * happened in the field several times with 7.1 releases.)  As of 8.4, bad
                               2954                 :                :      * LSNs encountered during recovery are UpdateMinRecoveryPoint's problem;
                               2955                 :                :      * the only time we can reach here during recovery is while flushing the
                               2956                 :                :      * end-of-recovery checkpoint record, and we don't expect that to have a
                               2957                 :                :      * bad LSN.
                               2958                 :                :      *
                               2959                 :                :      * Note that for calls from xact.c, the ERROR will be promoted to PANIC
                               2960                 :                :      * since xact.c calls this routine inside a critical section.  However,
                               2961                 :                :      * calls from bufmgr.c are not within critical sections and so we will not
                               2962                 :                :      * force a restart for a bad LSN on a data page.
                               2963                 :                :      */
 4958 alvherre@alvh.no-ip.     2964         [ -  + ]:         182114 :     if (LogwrtResult.Flush < record)
 6239 tgl@sss.pgh.pa.us        2965         [ #  # ]:UBC           0 :         elog(ERROR,
                               2966                 :                :              "xlog flush request %X/%08X is not satisfied --- flushed only to %X/%08X",
                               2967                 :                :              LSN_FORMAT_ARGS(record),
                               2968                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               2969                 :                : 
                               2970                 :                :     /*
                               2971                 :                :      * Cross-check XLogNeedsFlush().  Some of the checks of XLogFlush() and
                               2972                 :                :      * XLogNeedsFlush() are duplicated, and this assertion ensures that these
                               2973                 :                :      * remain consistent.
                               2974                 :                :      */
  310 michael@paquier.xyz      2975         [ -  + ]:CBC      182114 :     Assert(!XLogNeedsFlush(record));
                               2976                 :                : }
                               2977                 :                : 
                               2978                 :                : /*
                               2979                 :                :  * Write & flush xlog, but without specifying exactly where to.
                               2980                 :                :  *
                               2981                 :                :  * We normally write only completed blocks; but if there is nothing to do on
                               2982                 :                :  * that basis, we check for unwritten async commits in the current incomplete
                               2983                 :                :  * block, and write through the latest one of those.  Thus, if async commits
                               2984                 :                :  * are not being used, we will write complete blocks only.
                               2985                 :                :  *
                               2986                 :                :  * If, based on the above, there's anything to write we do so immediately. But
                               2987                 :                :  * to avoid calling fsync, fdatasync et. al. at a rate that'd impact
                               2988                 :                :  * concurrent IO, we only flush WAL every wal_writer_delay ms, or if there's
                               2989                 :                :  * more than wal_writer_flush_after unflushed blocks.
                               2990                 :                :  *
                               2991                 :                :  * We can guarantee that async commits reach disk after at most three
                               2992                 :                :  * wal_writer_delay cycles. (When flushing complete blocks, we allow XLogWrite
                               2993                 :                :  * to write "flexibly", meaning it can stop at the end of the buffer ring;
                               2994                 :                :  * this makes a difference only with very high load or long wal_writer_delay,
                               2995                 :                :  * but imposes one extra cycle for the worst case for async commits.)
                               2996                 :                :  *
                               2997                 :                :  * This routine is invoked periodically by the background walwriter process.
                               2998                 :                :  *
                               2999                 :                :  * Returns true if there was any work to do, even if we skipped flushing due
                               3000                 :                :  * to wal_writer_delay/wal_writer_flush_after.
                               3001                 :                :  */
                               3002                 :                : bool
 6942 tgl@sss.pgh.pa.us        3003                 :          19675 : XLogBackgroundFlush(void)
                               3004                 :                : {
                               3005                 :                :     XLogwrtRqst WriteRqst;
                               3006                 :          19675 :     bool        flexible = true;
                               3007                 :                :     static TimestampTz lastflush;
                               3008                 :                :     TimestampTz now;
                               3009                 :                :     int         flushblocks;
                               3010                 :                :     TimeLineID  insertTLI;
                               3011                 :                : 
                               3012                 :                :     /* XLOG doesn't need flushing during recovery */
 6367 heikki.linnakangas@i     3013         [ -  + ]:          19675 :     if (RecoveryInProgress())
 5192 tgl@sss.pgh.pa.us        3014                 :UBC           0 :         return false;
                               3015                 :                : 
                               3016                 :                :     /*
                               3017                 :                :      * Since we're not in recovery, InsertTimeLineID is set and can't change,
                               3018                 :                :      * so we can read it without a lock.
                               3019                 :                :      */
 1719 rhaas@postgresql.org     3020                 :CBC       19675 :     insertTLI = XLogCtl->InsertTimeLineID;
                               3021                 :                : 
                               3022                 :                :     /* read updated LogwrtRqst */
 4325 andres@anarazel.de       3023                 :          19675 :     SpinLockAcquire(&XLogCtl->info_lck);
 3814                          3024                 :          19675 :     WriteRqst = XLogCtl->LogwrtRqst;
 4325                          3025                 :          19675 :     SpinLockRelease(&XLogCtl->info_lck);
                               3026                 :                : 
                               3027                 :                :     /* back off to last completed page boundary */
 3814                          3028                 :          19675 :     WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
                               3029                 :                : 
                               3030                 :                :     /* if we have already flushed that far, consider async commit records */
  842 alvherre@alvh.no-ip.     3031                 :          19675 :     RefreshXLogWriteResult(LogwrtResult);
 3814 andres@anarazel.de       3032         [ +  + ]:          19675 :     if (WriteRqst.Write <= LogwrtResult.Flush)
                               3033                 :                :     {
 4325                          3034                 :          13364 :         SpinLockAcquire(&XLogCtl->info_lck);
 3814                          3035                 :          13364 :         WriteRqst.Write = XLogCtl->asyncXactLSN;
 4325                          3036                 :          13364 :         SpinLockRelease(&XLogCtl->info_lck);
 6942 tgl@sss.pgh.pa.us        3037                 :          13364 :         flexible = false;       /* ensure it all gets written */
                               3038                 :                :     }
                               3039                 :                : 
                               3040                 :                :     /*
                               3041                 :                :      * If already known flushed, we're done. Just need to check if we are
                               3042                 :                :      * holding an open file handle to a logfile that's no longer in use,
                               3043                 :                :      * preventing the file from being deleted.
                               3044                 :                :      */
 3814 andres@anarazel.de       3045         [ +  + ]:          19675 :     if (WriteRqst.Write <= LogwrtResult.Flush)
                               3046                 :                :     {
 5864 bruce@momjian.us         3047         [ +  + ]:          12341 :         if (openLogFile >= 0)
                               3048                 :                :         {
 3232 andres@anarazel.de       3049         [ +  + ]:           9216 :             if (!XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               3050                 :                :                                  wal_segment_size))
                               3051                 :                :             {
 5891 magnus@hagander.net      3052                 :            256 :                 XLogFileClose();
                               3053                 :                :             }
                               3054                 :                :         }
 5192 tgl@sss.pgh.pa.us        3055                 :          12341 :         return false;
                               3056                 :                :     }
                               3057                 :                : 
                               3058                 :                :     /*
                               3059                 :                :      * Determine how far to flush WAL, based on the wal_writer_delay and
                               3060                 :                :      * wal_writer_flush_after GUCs.
                               3061                 :                :      *
                               3062                 :                :      * Note that XLogSetAsyncXactLSN() performs similar calculation based on
                               3063                 :                :      * wal_writer_flush_after, to decide when to wake us up.  Make sure the
                               3064                 :                :      * logic is the same in both places if you change this.
                               3065                 :                :      */
 3814 andres@anarazel.de       3066                 :           7334 :     now = GetCurrentTimestamp();
  972 heikki.linnakangas@i     3067                 :           7334 :     flushblocks =
 3814 andres@anarazel.de       3068                 :           7334 :         WriteRqst.Write / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
                               3069                 :                : 
                               3070   [ +  -  +  + ]:           7334 :     if (WalWriterFlushAfter == 0 || lastflush == 0)
                               3071                 :                :     {
                               3072                 :                :         /* first call, or block based limits disabled */
                               3073                 :            332 :         WriteRqst.Flush = WriteRqst.Write;
                               3074                 :            332 :         lastflush = now;
                               3075                 :                :     }
                               3076         [ +  + ]:           7002 :     else if (TimestampDifferenceExceeds(lastflush, now, WalWriterDelay))
                               3077                 :                :     {
                               3078                 :                :         /*
                               3079                 :                :          * Flush the writes at least every WalWriterDelay ms. This is
                               3080                 :                :          * important to bound the amount of time it takes for an asynchronous
                               3081                 :                :          * commit to hit disk.
                               3082                 :                :          */
                               3083                 :           6783 :         WriteRqst.Flush = WriteRqst.Write;
                               3084                 :           6783 :         lastflush = now;
                               3085                 :                :     }
  972 heikki.linnakangas@i     3086         [ +  + ]:            219 :     else if (flushblocks >= WalWriterFlushAfter)
                               3087                 :                :     {
                               3088                 :                :         /* exceeded wal_writer_flush_after blocks, flush */
 3814 andres@anarazel.de       3089                 :            197 :         WriteRqst.Flush = WriteRqst.Write;
                               3090                 :            197 :         lastflush = now;
                               3091                 :                :     }
                               3092                 :                :     else
                               3093                 :                :     {
                               3094                 :                :         /* no flushing, this time round */
  178 alvherre@kurilemu.de     3095                 :             22 :         WriteRqst.Flush = InvalidXLogRecPtr;
                               3096                 :                :     }
                               3097                 :                : 
                               3098                 :                : #ifdef WAL_DEBUG
                               3099                 :                :     if (XLOG_DEBUG)
                               3100                 :                :         elog(LOG, "xlog bg flush request write %X/%08X; flush: %X/%08X, current is write %X/%08X; flush %X/%08X",
                               3101                 :                :              LSN_FORMAT_ARGS(WriteRqst.Write),
                               3102                 :                :              LSN_FORMAT_ARGS(WriteRqst.Flush),
                               3103                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Write),
                               3104                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               3105                 :                : #endif
                               3106                 :                : 
 6942 tgl@sss.pgh.pa.us        3107                 :           7334 :     START_CRIT_SECTION();
                               3108                 :                : 
                               3109                 :                :     /* now wait for any in-progress insertions to finish and get write lock */
 3814 andres@anarazel.de       3110                 :           7334 :     WaitXLogInsertionsToFinish(WriteRqst.Write);
 6942 tgl@sss.pgh.pa.us        3111                 :           7334 :     LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
  844 alvherre@alvh.no-ip.     3112                 :           7334 :     RefreshXLogWriteResult(LogwrtResult);
 3814 andres@anarazel.de       3113         [ +  + ]:           7334 :     if (WriteRqst.Write > LogwrtResult.Write ||
                               3114         [ +  + ]:            301 :         WriteRqst.Flush > LogwrtResult.Flush)
                               3115                 :                :     {
 1724 rhaas@postgresql.org     3116                 :           7218 :         XLogWrite(WriteRqst, insertTLI, flexible);
                               3117                 :                :     }
 6942 tgl@sss.pgh.pa.us        3118                 :           7334 :     LWLockRelease(WALWriteLock);
                               3119                 :                : 
                               3120         [ -  + ]:           7334 :     END_CRIT_SECTION();
                               3121                 :                : 
                               3122                 :                :     /* wake up walsenders now that we've released heavily contended locks */
 1205 andres@anarazel.de       3123                 :           7334 :     WalSndWakeupProcessRequests(true, !RecoveryInProgress());
                               3124                 :                : 
                               3125                 :                :     /*
                               3126                 :                :      * Wake up processes waiting for primary flush LSN to reach current flush
                               3127                 :                :      * position.
                               3128                 :                :      */
   84 akorotkov@postgresql     3129                 :           7334 :     WaitLSNWakeup(WAIT_LSN_TYPE_PRIMARY_FLUSH, LogwrtResult.Flush);
                               3130                 :                : 
                               3131                 :                :     /*
                               3132                 :                :      * Great, done. To take some work off the critical path, try to initialize
                               3133                 :                :      * as many of the no-longer-needed WAL buffers for future use as we can.
                               3134                 :                :      */
 1724 rhaas@postgresql.org     3135                 :           7334 :     AdvanceXLInsertBuffer(InvalidXLogRecPtr, insertTLI, true);
                               3136                 :                : 
                               3137                 :                :     /*
                               3138                 :                :      * If we determined that we need to write data, but somebody else
                               3139                 :                :      * wrote/flushed already, it should be considered as being active, to
                               3140                 :                :      * avoid hibernating too early.
                               3141                 :                :      */
 3814 andres@anarazel.de       3142                 :           7334 :     return true;
                               3143                 :                : }
                               3144                 :                : 
                               3145                 :                : /*
                               3146                 :                :  * Test whether XLOG data has been flushed up to (at least) the given
                               3147                 :                :  * position, or whether the minimum recovery point has been updated past
                               3148                 :                :  * the given position.
                               3149                 :                :  *
                               3150                 :                :  * Returns true if a flush is still needed, or if the minimum recovery point
                               3151                 :                :  * must be updated.
                               3152                 :                :  *
                               3153                 :                :  * It is possible that someone else is already in the process of flushing
                               3154                 :                :  * that far, or has updated the minimum recovery point up to the given
                               3155                 :                :  * position.
                               3156                 :                :  */
                               3157                 :                : bool
 6997 tgl@sss.pgh.pa.us        3158                 :       16764090 : XLogNeedsFlush(XLogRecPtr record)
                               3159                 :                : {
                               3160                 :                :     /*
                               3161                 :                :      * During recovery, we don't flush WAL but update minRecoveryPoint
                               3162                 :                :      * instead. So "needs flush" is taken to mean whether minRecoveryPoint
                               3163                 :                :      * would need to be updated.
                               3164                 :                :      *
                               3165                 :                :      * Using XLogInsertAllowed() rather than RecoveryInProgress() matters for
                               3166                 :                :      * the case of an end-of-recovery checkpoint, where WAL data is flushed.
                               3167                 :                :      * This check should be consistent with the one in XLogFlush().
                               3168                 :                :      */
  310 michael@paquier.xyz      3169         [ +  + ]:       16764090 :     if (!XLogInsertAllowed())
                               3170                 :                :     {
                               3171                 :                :         /* Quick exit if already known to be updated or cannot be updated */
  299                          3172   [ +  -  +  + ]:         560252 :         if (!updateMinRecoveryPoint || record <= LocalMinRecoveryPoint)
                               3173                 :         544331 :             return false;
                               3174                 :                : 
                               3175                 :                :         /*
                               3176                 :                :          * An invalid minRecoveryPoint means that we need to recover all the
                               3177                 :                :          * WAL, i.e., we're doing crash recovery.  We never modify the control
                               3178                 :                :          * file's value in that case, so we can short-circuit future checks
                               3179                 :                :          * here too.  This triggers a quick exit path for the startup process,
                               3180                 :                :          * which cannot update its local copy of minRecoveryPoint as long as
                               3181                 :                :          * it has not replayed all WAL available when doing crash recovery.
                               3182                 :                :          */
  262 alvherre@kurilemu.de     3183   [ +  +  -  + ]:          15921 :         if (!XLogRecPtrIsValid(LocalMinRecoveryPoint) && InRecovery)
                               3184                 :                :         {
 2943 michael@paquier.xyz      3185                 :UBC           0 :             updateMinRecoveryPoint = false;
 6063 simon@2ndQuadrant.co     3186                 :              0 :             return false;
                               3187                 :                :         }
                               3188                 :                : 
                               3189                 :                :         /*
                               3190                 :                :          * Update local copy of minRecoveryPoint. But if the lock is busy,
                               3191                 :                :          * just return a conservative guess.
                               3192                 :                :          */
 6063 simon@2ndQuadrant.co     3193         [ -  + ]:CBC       15921 :         if (!LWLockConditionalAcquire(ControlFileLock, LW_SHARED))
 6063 simon@2ndQuadrant.co     3194                 :UBC           0 :             return true;
 1621 heikki.linnakangas@i     3195                 :CBC       15921 :         LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               3196                 :          15921 :         LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 6063 simon@2ndQuadrant.co     3197                 :          15921 :         LWLockRelease(ControlFileLock);
                               3198                 :                : 
                               3199                 :                :         /*
                               3200                 :                :          * Check minRecoveryPoint for any other process than the startup
                               3201                 :                :          * process doing crash recovery, which should not update the control
                               3202                 :                :          * file value if crash recovery is still running.
                               3203                 :                :          */
  262 alvherre@kurilemu.de     3204         [ -  + ]:          15921 :         if (!XLogRecPtrIsValid(LocalMinRecoveryPoint))
 2886 michael@paquier.xyz      3205                 :UBC           0 :             updateMinRecoveryPoint = false;
                               3206                 :                : 
                               3207                 :                :         /* check again */
 1621 heikki.linnakangas@i     3208   [ +  +  -  + ]:CBC       15921 :         if (record <= LocalMinRecoveryPoint || !updateMinRecoveryPoint)
 2886 michael@paquier.xyz      3209                 :            112 :             return false;
                               3210                 :                :         else
                               3211                 :          15809 :             return true;
                               3212                 :                :     }
                               3213                 :                : 
                               3214                 :                :     /* Quick exit if already known flushed */
 4958 alvherre@alvh.no-ip.     3215         [ +  + ]:       16203838 :     if (record <= LogwrtResult.Flush)
 6997 tgl@sss.pgh.pa.us        3216                 :       16000471 :         return false;
                               3217                 :                : 
                               3218                 :                :     /* read LogwrtResult and update local state */
  844 alvherre@alvh.no-ip.     3219                 :         203367 :     RefreshXLogWriteResult(LogwrtResult);
                               3220                 :                : 
                               3221                 :                :     /* check again */
 4958                          3222         [ +  + ]:         203367 :     if (record <= LogwrtResult.Flush)
 6997 tgl@sss.pgh.pa.us        3223                 :           3394 :         return false;
                               3224                 :                : 
                               3225                 :         199973 :     return true;
                               3226                 :                : }
                               3227                 :                : 
                               3228                 :                : /*
                               3229                 :                :  * Try to make a given XLOG file segment exist.
                               3230                 :                :  *
                               3231                 :                :  * logsegno: identify segment.
                               3232                 :                :  *
                               3233                 :                :  * *added: on return, true if this call raised the number of extant segments.
                               3234                 :                :  *
                               3235                 :                :  * path: on return, this char[MAXPGPATH] has the path to the logsegno file.
                               3236                 :                :  *
                               3237                 :                :  * Returns -1 or FD of opened file.  A -1 here is not an error; a caller
                               3238                 :                :  * wanting an open segment should attempt to open "path", which usually will
                               3239                 :                :  * succeed.  (This is weird, but it's efficient for the callers.)
                               3240                 :                :  */
                               3241                 :                : static int
 1724 rhaas@postgresql.org     3242                 :          16310 : XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
                               3243                 :                :                      bool *added, char *path)
                               3244                 :                : {
                               3245                 :                :     char        tmppath[MAXPGPATH];
                               3246                 :                :     XLogSegNo   installed_segno;
                               3247                 :                :     XLogSegNo   max_segno;
                               3248                 :                :     int         fd;
                               3249                 :                :     int         save_errno;
 1205 tmunro@postgresql.or     3250                 :          16310 :     int         open_flags = O_RDWR | O_CREAT | O_EXCL | PG_BINARY;
                               3251                 :                :     instr_time  io_start;
                               3252                 :                : 
 1724 rhaas@postgresql.org     3253         [ -  + ]:          16310 :     Assert(logtli != 0);
                               3254                 :                : 
                               3255                 :          16310 :     XLogFilePath(path, logtli, logsegno, wal_segment_size);
                               3256                 :                : 
                               3257                 :                :     /*
                               3258                 :                :      * Try to use existent file (checkpoint maker may have created it already)
                               3259                 :                :      */
 1854 noah@leadboat.com        3260                 :          16310 :     *added = false;
 1241 tmunro@postgresql.or     3261                 :          16310 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
 1017 nathan@postgresql.or     3262                 :          16310 :                        get_sync_bit(wal_sync_method));
 1854 noah@leadboat.com        3263         [ +  + ]:          16310 :     if (fd < 0)
                               3264                 :                :     {
                               3265         [ -  + ]:           1533 :         if (errno != ENOENT)
 1854 noah@leadboat.com        3266         [ #  # ]:UBC           0 :             ereport(ERROR,
                               3267                 :                :                     (errcode_for_file_access(),
                               3268                 :                :                      errmsg("could not open file \"%s\": %m", path)));
                               3269                 :                :     }
                               3270                 :                :     else
 1854 noah@leadboat.com        3271                 :CBC       14777 :         return fd;
                               3272                 :                : 
                               3273                 :                :     /*
                               3274                 :                :      * Initialize an empty (all zeroes) segment.  NOTE: it is possible that
                               3275                 :                :      * another process is doing the same thing.  If so, we will end up
                               3276                 :                :      * pre-creating an extra log segment.  That seems OK, and better than
                               3277                 :                :      * holding the lock throughout this lengthy process.
                               3278                 :                :      */
 6966 tgl@sss.pgh.pa.us        3279         [ +  + ]:           1533 :     elog(DEBUG2, "creating and filling new WAL file");
                               3280                 :                : 
 7692                          3281                 :           1533 :     snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
                               3282                 :                : 
 9262                          3283                 :           1533 :     unlink(tmppath);
                               3284                 :                : 
 1205 tmunro@postgresql.or     3285         [ -  + ]:           1533 :     if (io_direct_flags & IO_DIRECT_WAL_INIT)
 1205 tmunro@postgresql.or     3286                 :UBC           0 :         open_flags |= PG_O_DIRECT;
                               3287                 :                : 
                               3288                 :                :     /* do not use get_sync_bit() here --- want to fsync only at end of fill */
 1205 tmunro@postgresql.or     3289                 :CBC        1533 :     fd = BasicOpenFile(tmppath, open_flags);
 9799 vadim4o@yahoo.com        3290         [ -  + ]:           1533 :     if (fd < 0)
 7772 tgl@sss.pgh.pa.us        3291         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3292                 :                :                 (errcode_for_file_access(),
                               3293                 :                :                  errmsg("could not create file \"%s\": %m", tmppath)));
                               3294                 :                : 
                               3295                 :                :     /* Measure I/O timing when initializing segment */
  515 michael@paquier.xyz      3296                 :CBC        1533 :     io_start = pgstat_prepare_io_time(track_wal_io_timing);
                               3297                 :                : 
 2672 tmunro@postgresql.or     3298                 :           1533 :     pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
                               3299                 :           1533 :     save_errno = 0;
                               3300         [ +  - ]:           1533 :     if (wal_init_zero)
                               3301                 :                :     {
                               3302                 :                :         ssize_t     rc;
                               3303                 :                : 
                               3304                 :                :         /*
                               3305                 :                :          * Zero-fill the file.  With this setting, we do this the hard way to
                               3306                 :                :          * ensure that all the file space has really been allocated.  On
                               3307                 :                :          * platforms that allow "holes" in files, just seeking to the end
                               3308                 :                :          * doesn't allocate intermediate space.  This way, we know that we
                               3309                 :                :          * have all the space and (after the fsync below) that all the
                               3310                 :                :          * indirect blocks are down on disk.  Therefore, fdatasync(2) or
                               3311                 :                :          * O_DSYNC will be sufficient to sync future writes to the log file.
                               3312                 :                :          */
 1238 michael@paquier.xyz      3313                 :           1533 :         rc = pg_pwrite_zeros(fd, wal_segment_size, 0);
                               3314                 :                : 
 1356                          3315         [ -  + ]:           1533 :         if (rc < 0)
 1356 michael@paquier.xyz      3316                 :UBC           0 :             save_errno = errno;
                               3317                 :                :     }
                               3318                 :                :     else
                               3319                 :                :     {
                               3320                 :                :         /*
                               3321                 :                :          * Otherwise, seeking to the end and writing a solitary byte is
                               3322                 :                :          * enough.
                               3323                 :                :          */
 4708 jdavis@postgresql.or     3324                 :              0 :         errno = 0;
 1356 michael@paquier.xyz      3325         [ #  # ]:              0 :         if (pg_pwrite(fd, "\0", 1, wal_segment_size - 1) != 1)
                               3326                 :                :         {
                               3327                 :                :             /* if write didn't set errno, assume no disk space */
 2672 tmunro@postgresql.or     3328         [ #  # ]:              0 :             save_errno = errno ? errno : ENOSPC;
                               3329                 :                :         }
                               3330                 :                :     }
 2672 tmunro@postgresql.or     3331                 :CBC        1533 :     pgstat_report_wait_end();
                               3332                 :                : 
                               3333         [ -  + ]:           1533 :     if (save_errno)
                               3334                 :                :     {
                               3335                 :                :         /*
                               3336                 :                :          * If we fail to make the file, delete it to release disk space
                               3337                 :                :          */
 2672 tmunro@postgresql.or     3338                 :UBC           0 :         unlink(tmppath);
                               3339                 :                : 
                               3340                 :              0 :         close(fd);
                               3341                 :                : 
                               3342                 :              0 :         errno = save_errno;
                               3343                 :                : 
                               3344         [ #  # ]:              0 :         ereport(ERROR,
                               3345                 :                :                 (errcode_for_file_access(),
                               3346                 :                :                  errmsg("could not write to file \"%s\": %m", tmppath)));
                               3347                 :                :     }
                               3348                 :                : 
                               3349                 :                :     /*
                               3350                 :                :      * A full segment worth of data is written when using wal_init_zero. One
                               3351                 :                :      * byte is written when not using it.
                               3352                 :                :      */
   39 michael@paquier.xyz      3353                 :CBC        1533 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE,
                               3354                 :                :                             io_start, 1,
                               3355         [ +  - ]:           1533 :                             wal_init_zero ? wal_segment_size : 1);
                               3356                 :                : 
                               3357                 :                :     /* Measure I/O timing when flushing segment */
  515                          3358                 :           1533 :     io_start = pgstat_prepare_io_time(track_wal_io_timing);
                               3359                 :                : 
 3417 rhaas@postgresql.org     3360                 :           1533 :     pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 9361 tgl@sss.pgh.pa.us        3361         [ -  + ]:           1533 :     if (pg_fsync(fd) != 0)
                               3362                 :                :     {
 1430 drowley@postgresql.o     3363                 :UBC           0 :         save_errno = errno;
 4989 heikki.linnakangas@i     3364                 :              0 :         close(fd);
 2953 michael@paquier.xyz      3365                 :              0 :         errno = save_errno;
 7772 tgl@sss.pgh.pa.us        3366         [ #  # ]:              0 :         ereport(ERROR,
                               3367                 :                :                 (errcode_for_file_access(),
                               3368                 :                :                  errmsg("could not fsync file \"%s\": %m", tmppath)));
                               3369                 :                :     }
 3417 rhaas@postgresql.org     3370                 :CBC        1533 :     pgstat_report_wait_end();
                               3371                 :                : 
  537 michael@paquier.xyz      3372                 :           1533 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT,
                               3373                 :                :                             IOOP_FSYNC, io_start, 1, 0);
                               3374                 :                : 
 2577 peter@eisentraut.org     3375         [ -  + ]:           1533 :     if (close(fd) != 0)
 7772 tgl@sss.pgh.pa.us        3376         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3377                 :                :                 (errcode_for_file_access(),
                               3378                 :                :                  errmsg("could not close file \"%s\": %m", tmppath)));
                               3379                 :                : 
                               3380                 :                :     /*
                               3381                 :                :      * Now move the segment into place with its final name.  Cope with
                               3382                 :                :      * possibility that someone else has created the file while we were
                               3383                 :                :      * filling ours: if so, use ours to pre-create a future log segment.
                               3384                 :                :      */
 5145 heikki.linnakangas@i     3385                 :CBC        1533 :     installed_segno = logsegno;
                               3386                 :                : 
                               3387                 :                :     /*
                               3388                 :                :      * XXX: What should we use as max_segno? We used to use XLOGfileslop when
                               3389                 :                :      * that was a constant, but that was always a bit dubious: normally, at a
                               3390                 :                :      * checkpoint, XLOGfileslop was the offset from the checkpoint record, but
                               3391                 :                :      * here, it was the offset from the insert location. We can't do the
                               3392                 :                :      * normal XLOGfileslop calculation here because we don't have access to
                               3393                 :                :      * the prior checkpoint's redo location. So somewhat arbitrarily, just use
                               3394                 :                :      * CheckPointSegments.
                               3395                 :                :      */
 4171                          3396                 :           1533 :     max_segno = logsegno + CheckPointSegments;
 1724 rhaas@postgresql.org     3397         [ +  - ]:           1533 :     if (InstallXLogFileSegment(&installed_segno, tmppath, true, max_segno,
                               3398                 :                :                                logtli))
                               3399                 :                :     {
 1854 noah@leadboat.com        3400                 :           1533 :         *added = true;
                               3401         [ +  + ]:           1533 :         elog(DEBUG2, "done creating and filling new WAL file");
                               3402                 :                :     }
                               3403                 :                :     else
                               3404                 :                :     {
                               3405                 :                :         /*
                               3406                 :                :          * No need for any more future segments, or InstallXLogFileSegment()
                               3407                 :                :          * failed to rename the file into place. If the rename failed, a
                               3408                 :                :          * caller opening the file may fail.
                               3409                 :                :          */
 9138 tgl@sss.pgh.pa.us        3410                 :UBC           0 :         unlink(tmppath);
 1854 noah@leadboat.com        3411         [ #  # ]:              0 :         elog(DEBUG2, "abandoned new WAL file");
                               3412                 :                :     }
                               3413                 :                : 
 1854 noah@leadboat.com        3414                 :CBC        1533 :     return -1;
                               3415                 :                : }
                               3416                 :                : 
                               3417                 :                : /*
                               3418                 :                :  * Create a new XLOG file segment, or open a pre-existing one.
                               3419                 :                :  *
                               3420                 :                :  * logsegno: identify segment to be created/opened.
                               3421                 :                :  *
                               3422                 :                :  * Returns FD of opened file.
                               3423                 :                :  *
                               3424                 :                :  * Note: errors here are ERROR not PANIC because we might or might not be
                               3425                 :                :  * inside a critical section (eg, during checkpoint there is no reason to
                               3426                 :                :  * take down the system on failure).  They will promote to PANIC if we are
                               3427                 :                :  * in a critical section.
                               3428                 :                :  */
                               3429                 :                : int
 1724 rhaas@postgresql.org     3430                 :          16079 : XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
                               3431                 :                : {
                               3432                 :                :     bool        ignore_added;
                               3433                 :                :     char        path[MAXPGPATH];
                               3434                 :                :     int         fd;
                               3435                 :                : 
                               3436         [ -  + ]:          16079 :     Assert(logtli != 0);
                               3437                 :                : 
                               3438                 :          16079 :     fd = XLogFileInitInternal(logsegno, logtli, &ignore_added, path);
 1854 noah@leadboat.com        3439         [ +  + ]:          16079 :     if (fd >= 0)
                               3440                 :          14612 :         return fd;
                               3441                 :                : 
                               3442                 :                :     /* Now open original target segment (might not be file I just made) */
 1241 tmunro@postgresql.or     3443                 :           1467 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
 1017 nathan@postgresql.or     3444                 :           1467 :                        get_sync_bit(wal_sync_method));
 9138 tgl@sss.pgh.pa.us        3445         [ -  + ]:           1467 :     if (fd < 0)
 7772 tgl@sss.pgh.pa.us        3446         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3447                 :                :                 (errcode_for_file_access(),
                               3448                 :                :                  errmsg("could not open file \"%s\": %m", path)));
 7501 neilc@samurai.com        3449                 :CBC        1467 :     return fd;
                               3450                 :                : }
                               3451                 :                : 
                               3452                 :                : /*
                               3453                 :                :  * Create a new XLOG file segment by copying a pre-existing one.
                               3454                 :                :  *
                               3455                 :                :  * destsegno: identify segment to be created.
                               3456                 :                :  *
                               3457                 :                :  * srcTLI, srcsegno: identify segment to be copied (could be from
                               3458                 :                :  *      a different timeline)
                               3459                 :                :  *
                               3460                 :                :  * upto: how much of the source file to copy (the rest is filled with
                               3461                 :                :  *      zeros)
                               3462                 :                :  *
                               3463                 :                :  * Currently this is only used during recovery, and so there are no locking
                               3464                 :                :  * considerations.  But we should be just as tense as XLogFileInit to avoid
                               3465                 :                :  * emplacing a bogus file.
                               3466                 :                :  */
                               3467                 :                : static void
 1724 rhaas@postgresql.org     3468                 :             48 : XLogFileCopy(TimeLineID destTLI, XLogSegNo destsegno,
                               3469                 :                :              TimeLineID srcTLI, XLogSegNo srcsegno,
                               3470                 :                :              int upto)
                               3471                 :                : {
                               3472                 :                :     char        path[MAXPGPATH];
                               3473                 :                :     char        tmppath[MAXPGPATH];
                               3474                 :                :     PGAlignedXLogBlock buffer;
                               3475                 :                :     int         srcfd;
                               3476                 :                :     int         fd;
                               3477                 :                :     int         nbytes;
                               3478                 :                : 
                               3479                 :                :     /*
                               3480                 :                :      * Open the source file
                               3481                 :                :      */
 3232 andres@anarazel.de       3482                 :             48 :     XLogFilePath(path, srcTLI, srcsegno, wal_segment_size);
 3228 peter_e@gmx.net          3483                 :             48 :     srcfd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 8040 tgl@sss.pgh.pa.us        3484         [ -  + ]:             48 :     if (srcfd < 0)
 7772 tgl@sss.pgh.pa.us        3485         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3486                 :                :                 (errcode_for_file_access(),
                               3487                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                               3488                 :                : 
                               3489                 :                :     /*
                               3490                 :                :      * Copy into a temp file name.
                               3491                 :                :      */
 7692 tgl@sss.pgh.pa.us        3492                 :CBC          48 :     snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
                               3493                 :                : 
 8040                          3494                 :             48 :     unlink(tmppath);
                               3495                 :                : 
                               3496                 :                :     /* do not use get_sync_bit() here --- want to fsync only at end of fill */
 3228 peter_e@gmx.net          3497                 :             48 :     fd = OpenTransientFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
 8040 tgl@sss.pgh.pa.us        3498         [ -  + ]:             48 :     if (fd < 0)
 7772 tgl@sss.pgh.pa.us        3499         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3500                 :                :                 (errcode_for_file_access(),
                               3501                 :                :                  errmsg("could not create file \"%s\": %m", tmppath)));
                               3502                 :                : 
                               3503                 :                :     /*
                               3504                 :                :      * Do the data copying.
                               3505                 :                :      */
 3232 andres@anarazel.de       3506         [ +  + ]:CBC       98352 :     for (nbytes = 0; nbytes < wal_segment_size; nbytes += sizeof(buffer))
                               3507                 :                :     {
                               3508                 :                :         ssize_t     nread;
                               3509                 :                : 
 4238 heikki.linnakangas@i     3510                 :          98304 :         nread = upto - nbytes;
                               3511                 :                : 
                               3512                 :                :         /*
                               3513                 :                :          * The part that is not read from the source file is filled with
                               3514                 :                :          * zeros.
                               3515                 :                :          */
                               3516         [ +  + ]:          98304 :         if (nread < sizeof(buffer))
 2885 tgl@sss.pgh.pa.us        3517                 :             48 :             memset(buffer.data, 0, sizeof(buffer));
                               3518                 :                : 
 4238 heikki.linnakangas@i     3519         [ +  + ]:          98304 :         if (nread > 0)
                               3520                 :                :         {
                               3521                 :                :             ssize_t     r;
                               3522                 :                : 
                               3523         [ +  + ]:           4917 :             if (nread > sizeof(buffer))
                               3524                 :           4869 :                 nread = sizeof(buffer);
 3417 rhaas@postgresql.org     3525                 :           4917 :             pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_READ);
 2885 tgl@sss.pgh.pa.us        3526                 :           4917 :             r = read(srcfd, buffer.data, nread);
 2930 michael@paquier.xyz      3527         [ -  + ]:           4917 :             if (r != nread)
                               3528                 :                :             {
 2930 michael@paquier.xyz      3529         [ #  # ]:UBC           0 :                 if (r < 0)
 4238 heikki.linnakangas@i     3530         [ #  # ]:              0 :                     ereport(ERROR,
                               3531                 :                :                             (errcode_for_file_access(),
                               3532                 :                :                              errmsg("could not read file \"%s\": %m",
                               3533                 :                :                                     path)));
                               3534                 :                :                 else
                               3535         [ #  # ]:              0 :                     ereport(ERROR,
                               3536                 :                :                             (errcode(ERRCODE_DATA_CORRUPTED),
                               3537                 :                :                              errmsg("could not read file \"%s\": read %zd of %zu",
                               3538                 :                :                                     path, r, nread)));
                               3539                 :                :             }
 3417 rhaas@postgresql.org     3540                 :CBC        4917 :             pgstat_report_wait_end();
                               3541                 :                :         }
 8040 tgl@sss.pgh.pa.us        3542                 :          98304 :         errno = 0;
 3417 rhaas@postgresql.org     3543                 :          98304 :         pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_WRITE);
   11 peter@eisentraut.org     3544         [ -  + ]:GNC       98304 :         if (write(fd, buffer.data, sizeof(buffer)) != sizeof(buffer))
                               3545                 :                :         {
 8040 tgl@sss.pgh.pa.us        3546                 :UBC           0 :             int         save_errno = errno;
                               3547                 :                : 
                               3548                 :                :             /*
                               3549                 :                :              * If we fail to make the file, delete it to release disk space
                               3550                 :                :              */
                               3551                 :              0 :             unlink(tmppath);
                               3552                 :                :             /* if write didn't set errno, assume problem is no disk space */
                               3553         [ #  # ]:              0 :             errno = save_errno ? save_errno : ENOSPC;
                               3554                 :                : 
 7772                          3555         [ #  # ]:              0 :             ereport(ERROR,
                               3556                 :                :                     (errcode_for_file_access(),
                               3557                 :                :                      errmsg("could not write to file \"%s\": %m", tmppath)));
                               3558                 :                :         }
 3417 rhaas@postgresql.org     3559                 :CBC       98304 :         pgstat_report_wait_end();
                               3560                 :                :     }
                               3561                 :                : 
                               3562                 :             48 :     pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_SYNC);
 8040 tgl@sss.pgh.pa.us        3563         [ -  + ]:             48 :     if (pg_fsync(fd) != 0)
 2806 tmunro@postgresql.or     3564         [ #  # ]:UBC           0 :         ereport(data_sync_elevel(ERROR),
                               3565                 :                :                 (errcode_for_file_access(),
                               3566                 :                :                  errmsg("could not fsync file \"%s\": %m", tmppath)));
 3417 rhaas@postgresql.org     3567                 :CBC          48 :     pgstat_report_wait_end();
                               3568                 :                : 
 2577 peter@eisentraut.org     3569         [ -  + ]:             48 :     if (CloseTransientFile(fd) != 0)
 7772 tgl@sss.pgh.pa.us        3570         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3571                 :                :                 (errcode_for_file_access(),
                               3572                 :                :                  errmsg("could not close file \"%s\": %m", tmppath)));
                               3573                 :                : 
 2577 peter@eisentraut.org     3574         [ -  + ]:CBC          48 :     if (CloseTransientFile(srcfd) != 0)
 2696 michael@paquier.xyz      3575         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3576                 :                :                 (errcode_for_file_access(),
                               3577                 :                :                  errmsg("could not close file \"%s\": %m", path)));
                               3578                 :                : 
                               3579                 :                :     /*
                               3580                 :                :      * Now move the segment into place with its final name.
                               3581                 :                :      */
 1724 rhaas@postgresql.org     3582         [ -  + ]:CBC          48 :     if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, destTLI))
 4043 fujii@postgresql.org     3583         [ #  # ]:UBC           0 :         elog(ERROR, "InstallXLogFileSegment should not have failed");
 8040 tgl@sss.pgh.pa.us        3584                 :CBC          48 : }
                               3585                 :                : 
                               3586                 :                : /*
                               3587                 :                :  * Install a new XLOG segment file as a current or future log segment.
                               3588                 :                :  *
                               3589                 :                :  * This is used both to install a newly-created segment (which has a temp
                               3590                 :                :  * filename while it's being created) and to recycle an old segment.
                               3591                 :                :  *
                               3592                 :                :  * *segno: identify segment to install as (or first possible target).
                               3593                 :                :  * When find_free is true, this is modified on return to indicate the
                               3594                 :                :  * actual installation location or last segment searched.
                               3595                 :                :  *
                               3596                 :                :  * tmppath: initial name of file to install.  It will be renamed into place.
                               3597                 :                :  *
                               3598                 :                :  * find_free: if true, install the new segment at the first empty segno
                               3599                 :                :  * number at or after the passed numbers.  If false, install the new segment
                               3600                 :                :  * exactly where specified, deleting any existing segment file there.
                               3601                 :                :  *
                               3602                 :                :  * max_segno: maximum segment number to install the new file as.  Fail if no
                               3603                 :                :  * free slot is found between *segno and max_segno. (Ignored when find_free
                               3604                 :                :  * is false.)
                               3605                 :                :  *
                               3606                 :                :  * tli: The timeline on which the new segment should be installed.
                               3607                 :                :  *
                               3608                 :                :  * Returns true if the file was installed successfully.  false indicates that
                               3609                 :                :  * max_segno limit was exceeded, the startup process has disabled this
                               3610                 :                :  * function for now, or an error occurred while renaming the file into place.
                               3611                 :                :  */
                               3612                 :                : static bool
 5145 heikki.linnakangas@i     3613                 :           3400 : InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                               3614                 :                :                        bool find_free, XLogSegNo max_segno, TimeLineID tli)
                               3615                 :                : {
                               3616                 :                :     char        path[MAXPGPATH];
                               3617                 :                :     struct stat stat_buf;
                               3618                 :                : 
 1724 rhaas@postgresql.org     3619         [ -  + ]:           3400 :     Assert(tli != 0);
                               3620                 :                : 
                               3621                 :           3400 :     XLogFilePath(path, tli, *segno, wal_segment_size);
                               3622                 :                : 
 1854 noah@leadboat.com        3623                 :           3400 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               3624         [ -  + ]:           3400 :     if (!XLogCtl->InstallXLogFileSegmentActive)
                               3625                 :                :     {
 1854 noah@leadboat.com        3626                 :UBC           0 :         LWLockRelease(ControlFileLock);
                               3627                 :              0 :         return false;
                               3628                 :                :     }
                               3629                 :                : 
 9138 tgl@sss.pgh.pa.us        3630         [ +  + ]:CBC        3400 :     if (!find_free)
                               3631                 :                :     {
                               3632                 :                :         /* Force installation: get rid of any pre-existing segment file */
 3408 teodor@sigaev.ru         3633                 :             48 :         durable_unlink(path, DEBUG1);
                               3634                 :                :     }
                               3635                 :                :     else
                               3636                 :                :     {
                               3637                 :                :         /* Find a free slot to put it in */
 8583 tgl@sss.pgh.pa.us        3638         [ +  + ]:           4520 :         while (stat(path, &stat_buf) == 0)
                               3639                 :                :         {
 4171 heikki.linnakangas@i     3640         [ +  + ]:           1430 :             if ((*segno) >= max_segno)
                               3641                 :                :             {
                               3642                 :                :                 /* Failed to find a free slot within specified range */
 1854 noah@leadboat.com        3643                 :            262 :                 LWLockRelease(ControlFileLock);
 9138 tgl@sss.pgh.pa.us        3644                 :            262 :                 return false;
                               3645                 :                :             }
 5145 heikki.linnakangas@i     3646                 :           1168 :             (*segno)++;
 1724 rhaas@postgresql.org     3647                 :           1168 :             XLogFilePath(path, tli, *segno, wal_segment_size);
                               3648                 :                :         }
                               3649                 :                :     }
                               3650                 :                : 
 1482 michael@paquier.xyz      3651   [ +  -  -  + ]:           3138 :     Assert(access(path, F_OK) != 0 && errno == ENOENT);
                               3652         [ -  + ]:           3138 :     if (durable_rename(tmppath, path, LOG) != 0)
                               3653                 :                :     {
 1854 noah@leadboat.com        3654                 :UBC           0 :         LWLockRelease(ControlFileLock);
                               3655                 :                :         /* durable_rename already emitted log message */
 6160 heikki.linnakangas@i     3656                 :              0 :         return false;
                               3657                 :                :     }
                               3658                 :                : 
 1854 noah@leadboat.com        3659                 :CBC        3138 :     LWLockRelease(ControlFileLock);
                               3660                 :                : 
 9138 tgl@sss.pgh.pa.us        3661                 :           3138 :     return true;
                               3662                 :                : }
                               3663                 :                : 
                               3664                 :                : /*
                               3665                 :                :  * Open a pre-existing logfile segment for writing.
                               3666                 :                :  */
                               3667                 :                : int
 1724 rhaas@postgresql.org     3668                 :            188 : XLogFileOpen(XLogSegNo segno, TimeLineID tli)
                               3669                 :                : {
                               3670                 :                :     char        path[MAXPGPATH];
                               3671                 :                :     int         fd;
                               3672                 :                : 
                               3673                 :            188 :     XLogFilePath(path, tli, segno, wal_segment_size);
                               3674                 :                : 
 1241 tmunro@postgresql.or     3675                 :            188 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
 1017 nathan@postgresql.or     3676                 :            188 :                        get_sync_bit(wal_sync_method));
 9799 vadim4o@yahoo.com        3677         [ -  + ]:            188 :     if (fd < 0)
 8406 tgl@sss.pgh.pa.us        3678         [ #  # ]:UBC           0 :         ereport(PANIC,
                               3679                 :                :                 (errcode_for_file_access(),
                               3680                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                               3681                 :                : 
 8040 tgl@sss.pgh.pa.us        3682                 :CBC         188 :     return fd;
                               3683                 :                : }
                               3684                 :                : 
                               3685                 :                : /*
                               3686                 :                :  * Close the current logfile segment for writing.
                               3687                 :                :  */
                               3688                 :                : static void
 7346 bruce@momjian.us         3689                 :           7424 : XLogFileClose(void)
                               3690                 :                : {
                               3691         [ -  + ]:           7424 :     Assert(openLogFile >= 0);
                               3692                 :                : 
                               3693                 :                :     /*
                               3694                 :                :      * WAL segment files will not be re-read in normal operation, so we advise
                               3695                 :                :      * the OS to release any cached pages.  But do not do so if WAL archiving
                               3696                 :                :      * or streaming is active, because archiver and walsender process could
                               3697                 :                :      * use the cache to read the WAL segment.
                               3698                 :                :      */
                               3699                 :                : #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
 1205 tmunro@postgresql.or     3700   [ +  +  +  - ]:           7424 :     if (!XLogIsNeeded() && (io_direct_flags & IO_DIRECT_WAL) == 0)
 6405 tgl@sss.pgh.pa.us        3701                 :            126 :         (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
                               3702                 :                : #endif
                               3703                 :                : 
 2577 peter@eisentraut.org     3704         [ -  + ]:           7424 :     if (close(openLogFile) != 0)
                               3705                 :                :     {
                               3706                 :                :         char        xlogfname[MAXFNAMELEN];
 2427 michael@paquier.xyz      3707                 :UBC           0 :         int         save_errno = errno;
                               3708                 :                : 
 1724 rhaas@postgresql.org     3709                 :              0 :         XLogFileName(xlogfname, openLogTLI, openLogSegNo, wal_segment_size);
 2427 michael@paquier.xyz      3710                 :              0 :         errno = save_errno;
 7346 bruce@momjian.us         3711         [ #  # ]:              0 :         ereport(PANIC,
                               3712                 :                :                 (errcode_for_file_access(),
                               3713                 :                :                  errmsg("could not close file \"%s\": %m", xlogfname)));
                               3714                 :                :     }
                               3715                 :                : 
 7346 bruce@momjian.us         3716                 :CBC        7424 :     openLogFile = -1;
 2344 tgl@sss.pgh.pa.us        3717                 :           7424 :     ReleaseExternalFD();
 7346 bruce@momjian.us         3718                 :           7424 : }
                               3719                 :                : 
                               3720                 :                : /*
                               3721                 :                :  * Preallocate log files beyond the specified log endpoint.
                               3722                 :                :  *
                               3723                 :                :  * XXX this is currently extremely conservative, since it forces only one
                               3724                 :                :  * future log segment to exist, and even that only if we are 75% done with
                               3725                 :                :  * the current one.  This is only appropriate for very low-WAL-volume systems.
                               3726                 :                :  * High-volume systems will be OK once they've built up a sufficient set of
                               3727                 :                :  * recycled log segments, but the startup transient is likely to include
                               3728                 :                :  * a lot of segment creations by foreground processes, which is not so good.
                               3729                 :                :  *
                               3730                 :                :  * XLogFileInitInternal() can ereport(ERROR).  All known causes indicate big
                               3731                 :                :  * trouble; for example, a full filesystem is one cause.  The checkpoint WAL
                               3732                 :                :  * and/or ControlFile updates already completed.  If a RequestCheckpoint()
                               3733                 :                :  * initiated the present checkpoint and an ERROR ends this function, the
                               3734                 :                :  * command that called RequestCheckpoint() fails.  That's not ideal, but it's
                               3735                 :                :  * not worth contorting more functions to use caller-specified elevel values.
                               3736                 :                :  * (With or without RequestCheckpoint(), an ERROR forestalls some inessential
                               3737                 :                :  * reporting and resource reclamation.)
                               3738                 :                :  */
                               3739                 :                : static void
 1724 rhaas@postgresql.org     3740                 :           2208 : PreallocXlogFiles(XLogRecPtr endptr, TimeLineID tli)
                               3741                 :                : {
                               3742                 :                :     XLogSegNo   _logSegNo;
                               3743                 :                :     int         lf;
                               3744                 :                :     bool        added;
                               3745                 :                :     char        path[MAXPGPATH];
                               3746                 :                :     uint64      offset;
                               3747                 :                : 
 1854 noah@leadboat.com        3748         [ +  + ]:           2208 :     if (!XLogCtl->InstallXLogFileSegmentActive)
                               3749                 :             10 :         return;                 /* unlocked check says no */
                               3750                 :                : 
 3232 andres@anarazel.de       3751                 :           2198 :     XLByteToPrevSeg(endptr, _logSegNo, wal_segment_size);
                               3752                 :           2198 :     offset = XLogSegmentOffset(endptr - 1, wal_segment_size);
                               3753         [ +  + ]:           2198 :     if (offset >= (uint32) (0.75 * wal_segment_size))
                               3754                 :                :     {
 5145 heikki.linnakangas@i     3755                 :            231 :         _logSegNo++;
 1724 rhaas@postgresql.org     3756                 :            231 :         lf = XLogFileInitInternal(_logSegNo, tli, &added, path);
 1854 noah@leadboat.com        3757         [ +  + ]:            231 :         if (lf >= 0)
                               3758                 :            165 :             close(lf);
                               3759         [ +  + ]:            231 :         if (added)
 6966 tgl@sss.pgh.pa.us        3760                 :             66 :             CheckpointStats.ckpt_segs_added++;
                               3761                 :                :     }
                               3762                 :                : }
                               3763                 :                : 
                               3764                 :                : /*
                               3765                 :                :  * Throws an error if the given log segment has already been removed or
                               3766                 :                :  * recycled. The caller should only pass a segment that it knows to have
                               3767                 :                :  * existed while the server has been running, as this function always
                               3768                 :                :  * succeeds if no WAL segments have been removed since startup.
                               3769                 :                :  * 'tli' is only used in the error message.
                               3770                 :                :  *
                               3771                 :                :  * Note: this function guarantees to keep errno unchanged on return.
                               3772                 :                :  * This supports callers that use this to possibly deliver a better
                               3773                 :                :  * error message about a missing file, while still being able to throw
                               3774                 :                :  * a normal file-access error afterwards, if this does return.
                               3775                 :                :  */
                               3776                 :                : void
 4952 heikki.linnakangas@i     3777                 :         131497 : CheckXLogRemoved(XLogSegNo segno, TimeLineID tli)
                               3778                 :                : {
 3156 tgl@sss.pgh.pa.us        3779                 :         131497 :     int         save_errno = errno;
                               3780                 :                :     XLogSegNo   lastRemovedSegNo;
                               3781                 :                : 
 4325 andres@anarazel.de       3782                 :         131497 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3783                 :         131497 :     lastRemovedSegNo = XLogCtl->lastRemovedSegNo;
                               3784                 :         131497 :     SpinLockRelease(&XLogCtl->info_lck);
                               3785                 :                : 
 4952 heikki.linnakangas@i     3786         [ -  + ]:         131497 :     if (segno <= lastRemovedSegNo)
                               3787                 :                :     {
                               3788                 :                :         char        filename[MAXFNAMELEN];
                               3789                 :                : 
 3232 andres@anarazel.de       3790                 :UBC           0 :         XLogFileName(filename, tli, segno, wal_segment_size);
 3156 tgl@sss.pgh.pa.us        3791                 :              0 :         errno = save_errno;
 4952 heikki.linnakangas@i     3792         [ #  # ]:              0 :         ereport(ERROR,
                               3793                 :                :                 (errcode_for_file_access(),
                               3794                 :                :                  errmsg("requested WAL segment %s has already been removed",
                               3795                 :                :                         filename)));
                               3796                 :                :     }
 3156 tgl@sss.pgh.pa.us        3797                 :CBC      131497 :     errno = save_errno;
 5949 heikki.linnakangas@i     3798                 :         131497 : }
                               3799                 :                : 
                               3800                 :                : /*
                               3801                 :                :  * Return the last WAL segment removed, or 0 if no segment has been removed
                               3802                 :                :  * since startup.
                               3803                 :                :  *
                               3804                 :                :  * NB: the result can be out of date arbitrarily fast, the caller has to deal
                               3805                 :                :  * with that.
                               3806                 :                :  */
                               3807                 :                : XLogSegNo
 4528 rhaas@postgresql.org     3808                 :           1297 : XLogGetLastRemovedSegno(void)
                               3809                 :                : {
                               3810                 :                :     XLogSegNo   lastRemovedSegNo;
                               3811                 :                : 
 4325 andres@anarazel.de       3812                 :           1297 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3813                 :           1297 :     lastRemovedSegNo = XLogCtl->lastRemovedSegNo;
                               3814                 :           1297 :     SpinLockRelease(&XLogCtl->info_lck);
                               3815                 :                : 
 4528 rhaas@postgresql.org     3816                 :           1297 :     return lastRemovedSegNo;
                               3817                 :                : }
                               3818                 :                : 
                               3819                 :                : /*
                               3820                 :                :  * Return the oldest WAL segment on the given TLI that still exists in
                               3821                 :                :  * XLOGDIR, or 0 if none.
                               3822                 :                :  */
                               3823                 :                : XLogSegNo
  949                          3824                 :             12 : XLogGetOldestSegno(TimeLineID tli)
                               3825                 :                : {
                               3826                 :                :     DIR        *xldir;
                               3827                 :                :     struct dirent *xlde;
                               3828                 :             12 :     XLogSegNo   oldest_segno = 0;
                               3829                 :                : 
                               3830                 :             12 :     xldir = AllocateDir(XLOGDIR);
                               3831         [ +  + ]:             86 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3832                 :                :     {
                               3833                 :                :         TimeLineID  file_tli;
                               3834                 :                :         XLogSegNo   file_segno;
                               3835                 :                : 
                               3836                 :                :         /* Ignore files that are not XLOG segments. */
                               3837         [ +  + ]:             74 :         if (!IsXLogFileName(xlde->d_name))
                               3838                 :             49 :             continue;
                               3839                 :                : 
                               3840                 :                :         /* Parse filename to get TLI and segno. */
                               3841                 :             25 :         XLogFromFileName(xlde->d_name, &file_tli, &file_segno,
                               3842                 :                :                          wal_segment_size);
                               3843                 :                : 
                               3844                 :                :         /* Ignore anything that's not from the TLI of interest. */
                               3845         [ -  + ]:             25 :         if (tli != file_tli)
  949 rhaas@postgresql.org     3846                 :UBC           0 :             continue;
                               3847                 :                : 
                               3848                 :                :         /* If it's the oldest so far, update oldest_segno. */
  949 rhaas@postgresql.org     3849   [ +  +  +  + ]:CBC          25 :         if (oldest_segno == 0 || file_segno < oldest_segno)
                               3850                 :             14 :             oldest_segno = file_segno;
                               3851                 :                :     }
                               3852                 :                : 
                               3853                 :             12 :     FreeDir(xldir);
                               3854                 :             12 :     return oldest_segno;
                               3855                 :                : }
                               3856                 :                : 
                               3857                 :                : /*
                               3858                 :                :  * Update the last removed segno pointer in shared memory, to reflect that the
                               3859                 :                :  * given XLOG file has been removed.
                               3860                 :                :  */
                               3861                 :                : static void
 5949 heikki.linnakangas@i     3862                 :           2666 : UpdateLastRemovedPtr(char *filename)
                               3863                 :                : {
                               3864                 :                :     uint32      tli;
                               3865                 :                :     XLogSegNo   segno;
                               3866                 :                : 
 3232 andres@anarazel.de       3867                 :           2666 :     XLogFromFileName(filename, &tli, &segno, wal_segment_size);
                               3868                 :                : 
 4325                          3869                 :           2666 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3870         [ +  + ]:           2666 :     if (segno > XLogCtl->lastRemovedSegNo)
                               3871                 :           1201 :         XLogCtl->lastRemovedSegNo = segno;
                               3872                 :           2666 :     SpinLockRelease(&XLogCtl->info_lck);
 5949 heikki.linnakangas@i     3873                 :           2666 : }
                               3874                 :                : 
                               3875                 :                : /*
                               3876                 :                :  * Remove all temporary log files in pg_wal
                               3877                 :                :  *
                               3878                 :                :  * This is called at the beginning of recovery after a previous crash,
                               3879                 :                :  * at a point where no other processes write fresh WAL data.
                               3880                 :                :  */
                               3881                 :                : static void
 2935 michael@paquier.xyz      3882                 :            200 : RemoveTempXlogFiles(void)
                               3883                 :                : {
                               3884                 :                :     DIR        *xldir;
                               3885                 :                :     struct dirent *xlde;
                               3886                 :                : 
                               3887         [ +  + ]:            200 :     elog(DEBUG2, "removing all temporary WAL segments");
                               3888                 :                : 
                               3889                 :            200 :     xldir = AllocateDir(XLOGDIR);
                               3890         [ +  + ]:           1345 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3891                 :                :     {
                               3892                 :                :         char        path[MAXPGPATH];
                               3893                 :                : 
                               3894         [ +  - ]:           1145 :         if (strncmp(xlde->d_name, "xlogtemp.", 9) != 0)
                               3895                 :           1145 :             continue;
                               3896                 :                : 
 2935 michael@paquier.xyz      3897                 :UBC           0 :         snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name);
                               3898                 :              0 :         unlink(path);
                               3899         [ #  # ]:              0 :         elog(DEBUG2, "removed temporary WAL segment \"%s\"", path);
                               3900                 :                :     }
 2935 michael@paquier.xyz      3901                 :CBC         200 :     FreeDir(xldir);
                               3902                 :            200 : }
                               3903                 :                : 
                               3904                 :                : /*
                               3905                 :                :  * Recycle or remove all log files older or equal to passed segno.
                               3906                 :                :  *
                               3907                 :                :  * endptr is current (or recent) end of xlog, and lastredoptr is the
                               3908                 :                :  * redo pointer of the last checkpoint. These are used to determine
                               3909                 :                :  * whether we want to recycle rather than delete no-longer-wanted log files.
                               3910                 :                :  *
                               3911                 :                :  * insertTLI is the current timeline for XLOG insertion. Any recycled
                               3912                 :                :  * segments should be reused for this timeline.
                               3913                 :                :  */
                               3914                 :                : static void
 1724 rhaas@postgresql.org     3915                 :           1930 : RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr,
                               3916                 :                :                    TimeLineID insertTLI)
                               3917                 :                : {
                               3918                 :                :     DIR        *xldir;
                               3919                 :                :     struct dirent *xlde;
                               3920                 :                :     char        lastoff[MAXFNAMELEN];
                               3921                 :                :     XLogSegNo   endlogSegNo;
                               3922                 :                :     XLogSegNo   recycleSegNo;
                               3923                 :                : 
                               3924                 :                :     /* Initialize info about where to try to recycle to */
 2018 michael@paquier.xyz      3925                 :           1930 :     XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
                               3926                 :           1930 :     recycleSegNo = XLOGfileslop(lastredoptr);
                               3927                 :                : 
                               3928                 :                :     /*
                               3929                 :                :      * Construct a filename of the last segment to be kept. The timeline ID
                               3930                 :                :      * doesn't matter, we ignore that in the comparison. (During recovery,
                               3931                 :                :      * InsertTimeLineID isn't set, so we can't use that.)
                               3932                 :                :      */
 3232 andres@anarazel.de       3933                 :           1930 :     XLogFileName(lastoff, 0, segno, wal_segment_size);
                               3934                 :                : 
 5809 simon@2ndQuadrant.co     3935         [ +  + ]:           1930 :     elog(DEBUG2, "attempting to remove WAL segments older than log file %s",
                               3936                 :                :          lastoff);
                               3937                 :                : 
 3156 tgl@sss.pgh.pa.us        3938                 :           1930 :     xldir = AllocateDir(XLOGDIR);
                               3939                 :                : 
 7692                          3940         [ +  + ]:          52545 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3941                 :                :     {
                               3942                 :                :         /* Ignore files that are not XLOG segments */
 4097 heikki.linnakangas@i     3943         [ +  + ]:          50615 :         if (!IsXLogFileName(xlde->d_name) &&
                               3944         [ +  + ]:           8153 :             !IsPartialXLogFileName(xlde->d_name))
 4122                          3945                 :           8151 :             continue;
                               3946                 :                : 
                               3947                 :                :         /*
                               3948                 :                :          * We ignore the timeline part of the XLOG segment identifiers in
                               3949                 :                :          * deciding whether a segment is still needed.  This ensures that we
                               3950                 :                :          * won't prematurely remove a segment from a parent timeline. We could
                               3951                 :                :          * probably be a little more proactive about removing segments of
                               3952                 :                :          * non-parent timelines, but that would be a whole lot more
                               3953                 :                :          * complicated.
                               3954                 :                :          *
                               3955                 :                :          * We use the alphanumeric sorting property of the filenames to decide
                               3956                 :                :          * which ones are earlier than the lastoff segment.
                               3957                 :                :          */
                               3958         [ +  + ]:          42464 :         if (strcmp(xlde->d_name + 8, lastoff + 8) <= 0)
                               3959                 :                :         {
 4909                          3960         [ +  + ]:          34927 :             if (XLogArchiveCheckDone(xlde->d_name))
                               3961                 :                :             {
                               3962                 :                :                 /* Update the last removed location in shared memory first */
 5949                          3963                 :           2666 :                 UpdateLastRemovedPtr(xlde->d_name);
                               3964                 :                : 
 1423 michael@paquier.xyz      3965                 :           2666 :                 RemoveXlogFile(xlde, recycleSegNo, &endlogSegNo, insertTLI);
                               3966                 :                :             }
                               3967                 :                :         }
                               3968                 :                :     }
                               3969                 :                : 
 4122 heikki.linnakangas@i     3970                 :           1930 :     FreeDir(xldir);
                               3971                 :           1930 : }
                               3972                 :                : 
                               3973                 :                : /*
                               3974                 :                :  * Recycle or remove WAL files that are not part of the given timeline's
                               3975                 :                :  * history.
                               3976                 :                :  *
                               3977                 :                :  * This is called during recovery, whenever we switch to follow a new
                               3978                 :                :  * timeline, and at the end of recovery when we create a new timeline. We
                               3979                 :                :  * wouldn't otherwise care about extra WAL files lying in pg_wal, but they
                               3980                 :                :  * might be leftover pre-allocated or recycled WAL segments on the old timeline
                               3981                 :                :  * that we haven't used yet, and contain garbage. If we just leave them in
                               3982                 :                :  * pg_wal, they will eventually be archived, and we can't let that happen.
                               3983                 :                :  * Files that belong to our timeline history are valid, because we have
                               3984                 :                :  * successfully replayed them, but from others we can't be sure.
                               3985                 :                :  *
                               3986                 :                :  * 'switchpoint' is the current point in WAL where we switch to new timeline,
                               3987                 :                :  * and 'newTLI' is the new timeline we switch to.
                               3988                 :                :  */
                               3989                 :                : void
                               3990                 :             73 : RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
                               3991                 :                : {
                               3992                 :                :     DIR        *xldir;
                               3993                 :                :     struct dirent *xlde;
                               3994                 :                :     char        switchseg[MAXFNAMELEN];
                               3995                 :                :     XLogSegNo   endLogSegNo;
                               3996                 :                :     XLogSegNo   switchLogSegNo;
                               3997                 :                :     XLogSegNo   recycleSegNo;
                               3998                 :                : 
                               3999                 :                :     /*
                               4000                 :                :      * Initialize info about where to begin the work.  This will recycle,
                               4001                 :                :      * somewhat arbitrarily, 10 future segments.
                               4002                 :                :      */
 2018 michael@paquier.xyz      4003                 :             73 :     XLByteToPrevSeg(switchpoint, switchLogSegNo, wal_segment_size);
                               4004                 :             73 :     XLByteToSeg(switchpoint, endLogSegNo, wal_segment_size);
                               4005                 :             73 :     recycleSegNo = endLogSegNo + 10;
                               4006                 :                : 
                               4007                 :                :     /*
                               4008                 :                :      * Construct a filename of the last segment to be kept.
                               4009                 :                :      */
                               4010                 :             73 :     XLogFileName(switchseg, newTLI, switchLogSegNo, wal_segment_size);
                               4011                 :                : 
 4122 heikki.linnakangas@i     4012         [ +  + ]:             73 :     elog(DEBUG2, "attempting to remove WAL segments newer than log file %s",
                               4013                 :                :          switchseg);
                               4014                 :                : 
 3156 tgl@sss.pgh.pa.us        4015                 :             73 :     xldir = AllocateDir(XLOGDIR);
                               4016                 :                : 
 4122 heikki.linnakangas@i     4017         [ +  + ]:            692 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               4018                 :                :     {
                               4019                 :                :         /* Ignore files that are not XLOG segments */
 4097                          4020         [ +  + ]:            619 :         if (!IsXLogFileName(xlde->d_name))
 4122                          4021                 :            384 :             continue;
                               4022                 :                : 
                               4023                 :                :         /*
                               4024                 :                :          * Remove files that are on a timeline older than the new one we're
                               4025                 :                :          * switching to, but with a segment number >= the first segment on the
                               4026                 :                :          * new timeline.
                               4027                 :                :          */
                               4028         [ +  + ]:            235 :         if (strncmp(xlde->d_name, switchseg, 8) < 0 &&
                               4029         [ +  + ]:            154 :             strcmp(xlde->d_name + 8, switchseg + 8) > 0)
                               4030                 :                :         {
                               4031                 :                :             /*
                               4032                 :                :              * If the file has already been marked as .ready, however, don't
                               4033                 :                :              * remove it yet. It should be OK to remove it - files that are
                               4034                 :                :              * not part of our timeline history are not required for recovery
                               4035                 :                :              * - but seems safer to let them be archived and removed later.
                               4036                 :                :              */
                               4037         [ +  - ]:             17 :             if (!XLogArchiveIsReady(xlde->d_name))
 1423 michael@paquier.xyz      4038                 :             17 :                 RemoveXlogFile(xlde, recycleSegNo, &endLogSegNo, newTLI);
                               4039                 :                :         }
                               4040                 :                :     }
                               4041                 :                : 
 4122 heikki.linnakangas@i     4042                 :             73 :     FreeDir(xldir);
                               4043                 :             73 : }
                               4044                 :                : 
                               4045                 :                : /*
                               4046                 :                :  * Recycle or remove a log file that's no longer needed.
                               4047                 :                :  *
                               4048                 :                :  * segment_de is the dirent structure of the segment to recycle or remove.
                               4049                 :                :  * recycleSegNo is the segment number to recycle up to.  endlogSegNo is
                               4050                 :                :  * the segment number of the current (or recent) end of WAL.
                               4051                 :                :  *
                               4052                 :                :  * endlogSegNo gets incremented if the segment is recycled so as it is not
                               4053                 :                :  * checked again with future callers of this function.
                               4054                 :                :  *
                               4055                 :                :  * insertTLI is the current timeline for XLOG insertion. Any recycled segments
                               4056                 :                :  * should be used for this timeline.
                               4057                 :                :  */
                               4058                 :                : static void
 1423 michael@paquier.xyz      4059                 :           2683 : RemoveXlogFile(const struct dirent *segment_de,
                               4060                 :                :                XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo,
                               4061                 :                :                TimeLineID insertTLI)
                               4062                 :                : {
                               4063                 :                :     char        path[MAXPGPATH];
                               4064                 :                : #ifdef WIN32
                               4065                 :                :     char        newpath[MAXPGPATH];
                               4066                 :                : #endif
                               4067                 :           2683 :     const char *segname = segment_de->d_name;
                               4068                 :                : 
 4122 heikki.linnakangas@i     4069                 :           2683 :     snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
                               4070                 :                : 
                               4071                 :                :     /*
                               4072                 :                :      * Before deleting the file, see if it can be recycled as a future log
                               4073                 :                :      * segment. Only recycle normal files, because we don't want to recycle
                               4074                 :                :      * symbolic links pointing to a separate archive directory.
                               4075                 :                :      */
 2672 tmunro@postgresql.or     4076         [ +  - ]:           2683 :     if (wal_recycle &&
 2018 michael@paquier.xyz      4077         [ +  + ]:           2683 :         *endlogSegNo <= recycleSegNo &&
 1854 noah@leadboat.com        4078   [ +  +  +  - ]:           3965 :         XLogCtl->InstallXLogFileSegmentActive && /* callee rechecks this */
 1423 michael@paquier.xyz      4079         [ +  + ]:           3638 :         get_dirent_type(path, segment_de, false, DEBUG2) == PGFILETYPE_REG &&
 2018                          4080                 :           1819 :         InstallXLogFileSegment(endlogSegNo, path,
                               4081                 :                :                                true, recycleSegNo, insertTLI))
                               4082                 :                :     {
 4122 heikki.linnakangas@i     4083         [ +  + ]:           1557 :         ereport(DEBUG2,
                               4084                 :                :                 (errmsg_internal("recycled write-ahead log file \"%s\"",
                               4085                 :                :                                  segname)));
                               4086                 :           1557 :         CheckpointStats.ckpt_segs_recycled++;
                               4087                 :                :         /* Needn't recheck that slot on future iterations */
 2018 michael@paquier.xyz      4088                 :           1557 :         (*endlogSegNo)++;
                               4089                 :                :     }
                               4090                 :                :     else
                               4091                 :                :     {
                               4092                 :                :         /* No need for any more future segments, or recycling failed ... */
                               4093                 :                :         int         rc;
                               4094                 :                : 
 4122 heikki.linnakangas@i     4095         [ +  + ]:           1126 :         ereport(DEBUG2,
                               4096                 :                :                 (errmsg_internal("removing write-ahead log file \"%s\"",
                               4097                 :                :                                  segname)));
                               4098                 :                : 
                               4099                 :                : #ifdef WIN32
                               4100                 :                : 
                               4101                 :                :         /*
                               4102                 :                :          * On Windows, if another process (e.g another backend) holds the file
                               4103                 :                :          * open in FILE_SHARE_DELETE mode, unlink will succeed, but the file
                               4104                 :                :          * will still show up in directory listing until the last handle is
                               4105                 :                :          * closed. To avoid confusing the lingering deleted file for a live
                               4106                 :                :          * WAL file that needs to be archived, rename it before deleting it.
                               4107                 :                :          *
                               4108                 :                :          * If another process holds the file open without FILE_SHARE_DELETE
                               4109                 :                :          * flag, rename will fail. We'll try again at the next checkpoint.
                               4110                 :                :          */
                               4111                 :                :         snprintf(newpath, MAXPGPATH, "%s.deleted", path);
                               4112                 :                :         if (rename(path, newpath) != 0)
                               4113                 :                :         {
                               4114                 :                :             ereport(LOG,
                               4115                 :                :                     (errcode_for_file_access(),
                               4116                 :                :                      errmsg("could not rename file \"%s\": %m",
                               4117                 :                :                             path)));
                               4118                 :                :             return;
                               4119                 :                :         }
                               4120                 :                :         rc = durable_unlink(newpath, LOG);
                               4121                 :                : #else
 3408 teodor@sigaev.ru         4122                 :           1126 :         rc = durable_unlink(path, LOG);
                               4123                 :                : #endif
 4122 heikki.linnakangas@i     4124         [ -  + ]:           1126 :         if (rc != 0)
                               4125                 :                :         {
                               4126                 :                :             /* Message already logged by durable_unlink() */
 4122 heikki.linnakangas@i     4127                 :UBC           0 :             return;
                               4128                 :                :         }
 4122 heikki.linnakangas@i     4129                 :CBC        1126 :         CheckpointStats.ckpt_segs_removed++;
                               4130                 :                :     }
                               4131                 :                : 
                               4132                 :           2683 :     XLogArchiveCleanup(segname);
                               4133                 :                : }
                               4134                 :                : 
                               4135                 :                : /*
                               4136                 :                :  * Verify whether pg_wal, pg_wal/archive_status, and pg_wal/summaries exist.
                               4137                 :                :  * If the latter do not exist, recreate them.
                               4138                 :                :  *
                               4139                 :                :  * It is not the goal of this function to verify the contents of these
                               4140                 :                :  * directories, but to help in cases where someone has performed a cluster
                               4141                 :                :  * copy for PITR purposes but omitted pg_wal from the copy.
                               4142                 :                :  *
                               4143                 :                :  * We could also recreate pg_wal if it doesn't exist, but a deliberate
                               4144                 :                :  * policy decision was made not to.  It is fairly common for pg_wal to be
                               4145                 :                :  * a symlink, and if that was the DBA's intent then automatically making a
                               4146                 :                :  * plain directory would result in degraded performance with no notice.
                               4147                 :                :  */
                               4148                 :                : static void
 6468 tgl@sss.pgh.pa.us        4149                 :           1065 : ValidateXLOGDirectoryStructure(void)
                               4150                 :                : {
                               4151                 :                :     char        path[MAXPGPATH];
                               4152                 :                :     struct stat stat_buf;
                               4153                 :                : 
                               4154                 :                :     /* Check for pg_wal; if it doesn't exist, error out */
                               4155         [ +  - ]:           1065 :     if (stat(XLOGDIR, &stat_buf) != 0 ||
                               4156         [ -  + ]:           1065 :         !S_ISDIR(stat_buf.st_mode))
 6254 bruce@momjian.us         4157         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4158                 :                :                 (errcode_for_file_access(),
                               4159                 :                :                  errmsg("required WAL directory \"%s\" does not exist",
                               4160                 :                :                         XLOGDIR)));
                               4161                 :                : 
                               4162                 :                :     /* Check for archive_status */
 6468 tgl@sss.pgh.pa.us        4163                 :CBC        1065 :     snprintf(path, MAXPGPATH, XLOGDIR "/archive_status");
                               4164         [ +  + ]:           1065 :     if (stat(path, &stat_buf) == 0)
                               4165                 :                :     {
                               4166                 :                :         /* Check for weird cases where it exists but isn't a directory */
                               4167         [ -  + ]:           1063 :         if (!S_ISDIR(stat_buf.st_mode))
 6254 bruce@momjian.us         4168         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4169                 :                :                     (errcode_for_file_access(),
                               4170                 :                :                      errmsg("required WAL directory \"%s\" does not exist",
                               4171                 :                :                             path)));
                               4172                 :                :     }
                               4173                 :                :     else
                               4174                 :                :     {
 6468 tgl@sss.pgh.pa.us        4175         [ +  - ]:CBC           2 :         ereport(LOG,
                               4176                 :                :                 (errmsg("creating missing WAL directory \"%s\"", path)));
 3032 sfrost@snowman.net       4177         [ -  + ]:              2 :         if (MakePGDirectory(path) < 0)
 6254 bruce@momjian.us         4178         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4179                 :                :                     (errcode_for_file_access(),
                               4180                 :                :                      errmsg("could not create missing directory \"%s\": %m",
                               4181                 :                :                             path)));
                               4182                 :                :     }
                               4183                 :                : 
                               4184                 :                :     /* Check for summaries */
  949 rhaas@postgresql.org     4185                 :CBC        1065 :     snprintf(path, MAXPGPATH, XLOGDIR "/summaries");
                               4186         [ +  + ]:           1065 :     if (stat(path, &stat_buf) == 0)
                               4187                 :                :     {
                               4188                 :                :         /* Check for weird cases where it exists but isn't a directory */
                               4189         [ -  + ]:           1063 :         if (!S_ISDIR(stat_buf.st_mode))
  949 rhaas@postgresql.org     4190         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4191                 :                :                     (errmsg("required WAL directory \"%s\" does not exist",
                               4192                 :                :                             path)));
                               4193                 :                :     }
                               4194                 :                :     else
                               4195                 :                :     {
  949 rhaas@postgresql.org     4196         [ +  - ]:CBC           2 :         ereport(LOG,
                               4197                 :                :                 (errmsg("creating missing WAL directory \"%s\"", path)));
                               4198         [ -  + ]:              2 :         if (MakePGDirectory(path) < 0)
  949 rhaas@postgresql.org     4199         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4200                 :                :                     (errmsg("could not create missing directory \"%s\": %m",
                               4201                 :                :                             path)));
                               4202                 :                :     }
 6468 tgl@sss.pgh.pa.us        4203                 :CBC        1065 : }
                               4204                 :                : 
                               4205                 :                : /*
                               4206                 :                :  * Remove previous backup history files.  This also retries creation of
                               4207                 :                :  * .ready files for any backup history files for which XLogArchiveNotify
                               4208                 :                :  * failed earlier.
                               4209                 :                :  */
                               4210                 :                : static void
 7339                          4211                 :            170 : CleanupBackupHistory(void)
                               4212                 :                : {
                               4213                 :                :     DIR        *xldir;
                               4214                 :                :     struct dirent *xlde;
                               4215                 :                :     char        path[MAXPGPATH + sizeof(XLOGDIR)];
                               4216                 :                : 
 7692                          4217                 :            170 :     xldir = AllocateDir(XLOGDIR);
                               4218                 :                : 
                               4219         [ +  + ]:           1739 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               4220                 :                :     {
 4097 heikki.linnakangas@i     4221         [ +  + ]:           1399 :         if (IsBackupHistoryFileName(xlde->d_name))
                               4222                 :                :         {
 6530 tgl@sss.pgh.pa.us        4223         [ +  + ]:            180 :             if (XLogArchiveCheckDone(xlde->d_name))
                               4224                 :                :             {
 3362 peter_e@gmx.net          4225         [ +  + ]:            143 :                 elog(DEBUG2, "removing WAL backup history file \"%s\"",
                               4226                 :                :                      xlde->d_name);
 3393                          4227                 :            143 :                 snprintf(path, sizeof(path), XLOGDIR "/%s", xlde->d_name);
 7711 bruce@momjian.us         4228                 :            143 :                 unlink(path);
                               4229                 :            143 :                 XLogArchiveCleanup(xlde->d_name);
                               4230                 :                :             }
                               4231                 :                :         }
                               4232                 :                :     }
                               4233                 :                : 
                               4234                 :            170 :     FreeDir(xldir);
                               4235                 :            170 : }
                               4236                 :                : 
                               4237                 :                : /*
                               4238                 :                :  * I/O routines for pg_control
                               4239                 :                :  *
                               4240                 :                :  * *ControlFile is a buffer in shared memory that holds an image of the
                               4241                 :                :  * contents of pg_control.  WriteControlFile() initializes pg_control
                               4242                 :                :  * given a preloaded buffer, ReadControlFile() loads the buffer from
                               4243                 :                :  * the pg_control file (during postmaster or standalone-backend startup),
                               4244                 :                :  * and UpdateControlFile() rewrites pg_control after we modify xlog state.
                               4245                 :                :  * InitControlFile() fills the buffer with initial values.
                               4246                 :                :  *
                               4247                 :                :  * For simplicity, WriteControlFile() initializes the fields of pg_control
                               4248                 :                :  * that are related to checking backend/database compatibility, and
                               4249                 :                :  * ReadControlFile() verifies they are correct.  We could split out the
                               4250                 :                :  * I/O and compatibility-check functions, but there seems no need currently.
                               4251                 :                :  */
                               4252                 :                : 
                               4253                 :                : static void
  733 peter@eisentraut.org     4254                 :             56 : InitControlFile(uint64 sysidentifier, uint32 data_checksum_version)
                               4255                 :                : {
                               4256                 :                :     char        mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
                               4257                 :                : 
                               4258                 :                :     /*
                               4259                 :                :      * Generate a random nonce. This is used for authentication requests that
                               4260                 :                :      * will fail because the user does not exist. The nonce is used to create
                               4261                 :                :      * a genuine-looking password challenge for the non-existent user, in lieu
                               4262                 :                :      * of an actual stored password.
                               4263                 :                :      */
 1621 heikki.linnakangas@i     4264         [ -  + ]:             56 :     if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
 1621 heikki.linnakangas@i     4265         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4266                 :                :                 (errcode(ERRCODE_INTERNAL_ERROR),
                               4267                 :                :                  errmsg("could not generate secret authorization token")));
                               4268                 :                : 
 1621 heikki.linnakangas@i     4269                 :CBC          56 :     memset(ControlFile, 0, sizeof(ControlFileData));
                               4270                 :                :     /* Initialize pg_control status fields */
                               4271                 :             56 :     ControlFile->system_identifier = sysidentifier;
                               4272                 :             56 :     memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
                               4273                 :             56 :     ControlFile->state = DB_SHUTDOWNED;
                               4274                 :             56 :     ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
                               4275                 :                : 
                               4276                 :                :     /* Set important parameter values for use when replaying WAL */
 2351 peter@eisentraut.org     4277                 :             56 :     ControlFile->MaxConnections = MaxConnections;
                               4278                 :             56 :     ControlFile->max_worker_processes = max_worker_processes;
                               4279                 :             56 :     ControlFile->max_wal_senders = max_wal_senders;
                               4280                 :             56 :     ControlFile->max_prepared_xacts = max_prepared_xacts;
                               4281                 :             56 :     ControlFile->max_locks_per_xact = max_locks_per_xact;
                               4282                 :             56 :     ControlFile->wal_level = wal_level;
                               4283                 :             56 :     ControlFile->wal_log_hints = wal_log_hints;
                               4284                 :             56 :     ControlFile->track_commit_timestamp = track_commit_timestamp;
  733                          4285                 :             56 :     ControlFile->data_checksum_version = data_checksum_version;
                               4286                 :                : 
                               4287                 :                :     /*
                               4288                 :                :      * Set the data_checksum_version value into XLogCtl, which is where all
                               4289                 :                :      * processes get the current value from.
                               4290                 :                :      */
  114 dgustafsson@postgres     4291                 :             56 :     XLogCtl->data_checksum_version = data_checksum_version;
 2351 peter@eisentraut.org     4292                 :             56 : }
                               4293                 :                : 
                               4294                 :                : static void
 9374 tgl@sss.pgh.pa.us        4295                 :             56 : WriteControlFile(void)
                               4296                 :                : {
                               4297                 :                :     int         fd;
                               4298                 :                :     char        buffer[PG_CONTROL_FILE_SIZE];   /* need not be aligned */
                               4299                 :                : 
                               4300                 :                :     /*
                               4301                 :                :      * Initialize version and compatibility-check fields
                               4302                 :                :      */
 9266                          4303                 :             56 :     ControlFile->pg_control_version = PG_CONTROL_VERSION;
                               4304                 :             56 :     ControlFile->catalog_version_no = CATALOG_VERSION_NO;
                               4305                 :                : 
 7601                          4306                 :             56 :     ControlFile->maxAlign = MAXIMUM_ALIGNOF;
                               4307                 :             56 :     ControlFile->floatFormat = FLOATFORMAT_VALUE;
                               4308                 :                : 
 9374                          4309                 :             56 :     ControlFile->blcksz = BLCKSZ;
                               4310                 :             56 :     ControlFile->relseg_size = RELSEG_SIZE;
  258 heikki.linnakangas@i     4311                 :             56 :     ControlFile->slru_pages_per_segment = SLRU_PAGES_PER_SEGMENT;
 7419 tgl@sss.pgh.pa.us        4312                 :             56 :     ControlFile->xlog_blcksz = XLOG_BLCKSZ;
 3232 andres@anarazel.de       4313                 :             56 :     ControlFile->xlog_seg_size = wal_segment_size;
                               4314                 :                : 
 8862 lockhart@fourpalms.o     4315                 :             56 :     ControlFile->nameDataLen = NAMEDATALEN;
 7789 tgl@sss.pgh.pa.us        4316                 :             56 :     ControlFile->indexMaxKeys = INDEX_MAX_KEYS;
                               4317                 :                : 
 7054                          4318                 :             56 :     ControlFile->toast_max_chunk_size = TOAST_MAX_CHUNK_SIZE;
 4434                          4319                 :             56 :     ControlFile->loblksize = LOBLKSIZE;
                               4320                 :                : 
  347                          4321                 :             56 :     ControlFile->float8ByVal = true; /* vestigial */
                               4322                 :                : 
                               4323                 :                :     /*
                               4324                 :                :      * Initialize the default 'char' signedness.
                               4325                 :                :      *
                               4326                 :                :      * The signedness of the char type is implementation-defined. For instance
                               4327                 :                :      * on x86 architecture CPUs, the char data type is typically treated as
                               4328                 :                :      * signed by default, whereas on aarch architecture CPUs, it is typically
                               4329                 :                :      * treated as unsigned by default. In v17 or earlier, we accidentally let
                               4330                 :                :      * C implementation signedness affect persistent data. This led to
                               4331                 :                :      * inconsistent results when comparing char data across different
                               4332                 :                :      * platforms.
                               4333                 :                :      *
                               4334                 :                :      * This flag can be used as a hint to ensure consistent behavior for
                               4335                 :                :      * pre-v18 data files that store data sorted by the 'char' type on disk,
                               4336                 :                :      * especially in cross-platform replication scenarios.
                               4337                 :                :      *
                               4338                 :                :      * Newly created database clusters unconditionally set the default char
                               4339                 :                :      * signedness to true. pg_upgrade changes this flag for clusters that were
                               4340                 :                :      * initialized on signedness=false platforms. As a result,
                               4341                 :                :      * signedness=false setting will become rare over time. If we had known
                               4342                 :                :      * about this problem during the last development cycle that forced initdb
                               4343                 :                :      * (v8.3), we would have made all clusters signed or all clusters
                               4344                 :                :      * unsigned. Making pg_upgrade the only source of signedness=false will
                               4345                 :                :      * cause the population of database clusters to converge toward that
                               4346                 :                :      * retrospective ideal.
                               4347                 :                :      */
  520 msawada@postgresql.o     4348                 :             56 :     ControlFile->default_char_signedness = true;
                               4349                 :                : 
                               4350                 :                :     /* Contents are protected with a CRC */
 4282 heikki.linnakangas@i     4351                 :             56 :     INIT_CRC32C(ControlFile->crc);
                               4352                 :             56 :     COMP_CRC32C(ControlFile->crc,
                               4353                 :                :                 ControlFile,
                               4354                 :                :                 offsetof(ControlFileData, crc));
                               4355                 :             56 :     FIN_CRC32C(ControlFile->crc);
                               4356                 :                : 
                               4357                 :                :     /*
                               4358                 :                :      * We write out PG_CONTROL_FILE_SIZE bytes into pg_control, zero-padding
                               4359                 :                :      * the excess over sizeof(ControlFileData).  This reduces the odds of
                               4360                 :                :      * premature-EOF errors when reading pg_control.  We'll still fail when we
                               4361                 :                :      * check the contents of the file, but hopefully with a more specific
                               4362                 :                :      * error than "couldn't read pg_control".
                               4363                 :                :      */
 3294 tgl@sss.pgh.pa.us        4364                 :             56 :     memset(buffer, 0, PG_CONTROL_FILE_SIZE);
 9374                          4365                 :             56 :     memcpy(buffer, ControlFile, sizeof(ControlFileData));
                               4366                 :                : 
 7692                          4367                 :             56 :     fd = BasicOpenFile(XLOG_CONTROL_FILE,
                               4368                 :                :                        O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
 9374                          4369         [ -  + ]:             56 :     if (fd < 0)
 8406 tgl@sss.pgh.pa.us        4370         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4371                 :                :                 (errcode_for_file_access(),
                               4372                 :                :                  errmsg("could not create file \"%s\": %m",
                               4373                 :                :                         XLOG_CONTROL_FILE)));
                               4374                 :                : 
 9181 tgl@sss.pgh.pa.us        4375                 :CBC          56 :     errno = 0;
 3417 rhaas@postgresql.org     4376                 :             56 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE);
 3294 tgl@sss.pgh.pa.us        4377         [ -  + ]:             56 :     if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
                               4378                 :                :     {
                               4379                 :                :         /* if write didn't set errno, assume problem is no disk space */
 9181 tgl@sss.pgh.pa.us        4380         [ #  # ]:UBC           0 :         if (errno == 0)
                               4381                 :              0 :             errno = ENOSPC;
 8406                          4382         [ #  # ]:              0 :         ereport(PANIC,
                               4383                 :                :                 (errcode_for_file_access(),
                               4384                 :                :                  errmsg("could not write to file \"%s\": %m",
                               4385                 :                :                         XLOG_CONTROL_FILE)));
                               4386                 :                :     }
 3417 rhaas@postgresql.org     4387                 :CBC          56 :     pgstat_report_wait_end();
                               4388                 :                : 
                               4389                 :             56 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC);
 9361 tgl@sss.pgh.pa.us        4390         [ -  + ]:             56 :     if (pg_fsync(fd) != 0)
 8406 tgl@sss.pgh.pa.us        4391         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4392                 :                :                 (errcode_for_file_access(),
                               4393                 :                :                  errmsg("could not fsync file \"%s\": %m",
                               4394                 :                :                         XLOG_CONTROL_FILE)));
 3417 rhaas@postgresql.org     4395                 :CBC          56 :     pgstat_report_wait_end();
                               4396                 :                : 
 2577 peter@eisentraut.org     4397         [ -  + ]:             56 :     if (close(fd) != 0)
 8217 tgl@sss.pgh.pa.us        4398         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4399                 :                :                 (errcode_for_file_access(),
                               4400                 :                :                  errmsg("could not close file \"%s\": %m",
                               4401                 :                :                         XLOG_CONTROL_FILE)));
 9374 tgl@sss.pgh.pa.us        4402                 :CBC          56 : }
                               4403                 :                : 
                               4404                 :                : static void
                               4405                 :           1127 : ReadControlFile(void)
                               4406                 :                : {
                               4407                 :                :     pg_crc32c   crc;
                               4408                 :                :     int         fd;
                               4409                 :                :     char        wal_segsz_str[20];
                               4410                 :                :     ssize_t     r;
                               4411                 :                : 
                               4412                 :                :     /*
                               4413                 :                :      * Read data...
                               4414                 :                :      */
 7692                          4415                 :           1127 :     fd = BasicOpenFile(XLOG_CONTROL_FILE,
                               4416                 :                :                        O_RDWR | PG_BINARY);
 9374                          4417         [ -  + ]:           1127 :     if (fd < 0)
 8406 tgl@sss.pgh.pa.us        4418         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4419                 :                :                 (errcode_for_file_access(),
                               4420                 :                :                  errmsg("could not open file \"%s\": %m",
                               4421                 :                :                         XLOG_CONTROL_FILE)));
                               4422                 :                : 
 3417 rhaas@postgresql.org     4423                 :CBC        1127 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_READ);
 2991 magnus@hagander.net      4424                 :           1127 :     r = read(fd, ControlFile, sizeof(ControlFileData));
                               4425         [ -  + ]:           1127 :     if (r != sizeof(ControlFileData))
                               4426                 :                :     {
 2991 magnus@hagander.net      4427         [ #  # ]:UBC           0 :         if (r < 0)
                               4428         [ #  # ]:              0 :             ereport(PANIC,
                               4429                 :                :                     (errcode_for_file_access(),
                               4430                 :                :                      errmsg("could not read file \"%s\": %m",
                               4431                 :                :                             XLOG_CONTROL_FILE)));
                               4432                 :                :         else
                               4433         [ #  # ]:              0 :             ereport(PANIC,
                               4434                 :                :                     (errcode(ERRCODE_DATA_CORRUPTED),
                               4435                 :                :                      errmsg("could not read file \"%s\": read %zd of %zu",
                               4436                 :                :                             XLOG_CONTROL_FILE, r, sizeof(ControlFileData))));
                               4437                 :                :     }
 3417 rhaas@postgresql.org     4438                 :CBC        1127 :     pgstat_report_wait_end();
                               4439                 :                : 
 9374 tgl@sss.pgh.pa.us        4440                 :           1127 :     close(fd);
                               4441                 :                : 
                               4442                 :                :     /*
                               4443                 :                :      * Check for expected pg_control format version.  If this is wrong, the
                               4444                 :                :      * CRC check will likely fail because we'll be checking the wrong number
                               4445                 :                :      * of bytes.  Complaining about wrong version will probably be more
                               4446                 :                :      * enlightening than complaining about wrong CRC.
                               4447                 :                :      */
                               4448                 :                : 
 6761 peter_e@gmx.net          4449   [ -  +  -  -  :           1127 :     if (ControlFile->pg_control_version != PG_CONTROL_VERSION && ControlFile->pg_control_version % 65536 == 0 && ControlFile->pg_control_version / 65536 != 0)
                                              -  - ]
 6761 peter_e@gmx.net          4450         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4451                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4452                 :                :                  errmsg("database files are incompatible with server"),
                               4453                 :                :                  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d (0x%08x),"
                               4454                 :                :                            " but the server was compiled with PG_CONTROL_VERSION %d (0x%08x).",
                               4455                 :                :                            ControlFile->pg_control_version, ControlFile->pg_control_version,
                               4456                 :                :                            PG_CONTROL_VERSION, PG_CONTROL_VERSION),
                               4457                 :                :                  errhint("This could be a problem of mismatched byte ordering.  It looks like you need to initdb.")));
                               4458                 :                : 
 9266 tgl@sss.pgh.pa.us        4459         [ -  + ]:CBC        1127 :     if (ControlFile->pg_control_version != PG_CONTROL_VERSION)
 8406 tgl@sss.pgh.pa.us        4460         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4461                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4462                 :                :                  errmsg("database files are incompatible with server"),
                               4463                 :                :                  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d,"
                               4464                 :                :                            " but the server was compiled with PG_CONTROL_VERSION %d.",
                               4465                 :                :                            ControlFile->pg_control_version, PG_CONTROL_VERSION),
                               4466                 :                :                  errhint("It looks like you need to initdb.")));
                               4467                 :                : 
                               4468                 :                :     /* Now check the CRC. */
 4282 heikki.linnakangas@i     4469                 :CBC        1127 :     INIT_CRC32C(crc);
                               4470                 :           1127 :     COMP_CRC32C(crc,
                               4471                 :                :                 ControlFile,
                               4472                 :                :                 offsetof(ControlFileData, crc));
                               4473                 :           1127 :     FIN_CRC32C(crc);
                               4474                 :                : 
                               4475         [ -  + ]:           1127 :     if (!EQ_CRC32C(crc, ControlFile->crc))
 8406 tgl@sss.pgh.pa.us        4476         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4477                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4478                 :                :                  errmsg("incorrect checksum in control file")));
                               4479                 :                : 
                               4480                 :                :     /*
                               4481                 :                :      * Do compatibility checking immediately.  If the database isn't
                               4482                 :                :      * compatible with the backend executable, we want to abort before we can
                               4483                 :                :      * possibly do any damage.
                               4484                 :                :      */
 9266 tgl@sss.pgh.pa.us        4485         [ -  + ]:CBC        1127 :     if (ControlFile->catalog_version_no != CATALOG_VERSION_NO)
 8406 tgl@sss.pgh.pa.us        4486         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4487                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4488                 :                :                  errmsg("database files are incompatible with server"),
                               4489                 :                :         /* translator: %s is a variable name and %d is its value */
                               4490                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4491                 :                :                            " but the server was compiled with %s %d.",
                               4492                 :                :                            "CATALOG_VERSION_NO", ControlFile->catalog_version_no,
                               4493                 :                :                            "CATALOG_VERSION_NO", CATALOG_VERSION_NO),
                               4494                 :                :                  errhint("It looks like you need to initdb.")));
 7601 tgl@sss.pgh.pa.us        4495         [ -  + ]:CBC        1127 :     if (ControlFile->maxAlign != MAXIMUM_ALIGNOF)
 7601 tgl@sss.pgh.pa.us        4496         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4497                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4498                 :                :                  errmsg("database files are incompatible with server"),
                               4499                 :                :         /* translator: %s is a variable name and %d is its value */
                               4500                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4501                 :                :                            " but the server was compiled with %s %d.",
                               4502                 :                :                            "MAXALIGN", ControlFile->maxAlign,
                               4503                 :                :                            "MAXALIGN", MAXIMUM_ALIGNOF),
                               4504                 :                :                  errhint("It looks like you need to initdb.")));
 7601 tgl@sss.pgh.pa.us        4505         [ -  + ]:CBC        1127 :     if (ControlFile->floatFormat != FLOATFORMAT_VALUE)
 7601 tgl@sss.pgh.pa.us        4506         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4507                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4508                 :                :                  errmsg("database files are incompatible with server"),
                               4509                 :                :                  errdetail("The database cluster appears to use a different floating-point number format than the server executable."),
                               4510                 :                :                  errhint("It looks like you need to initdb.")));
 9374 tgl@sss.pgh.pa.us        4511         [ -  + ]:CBC        1127 :     if (ControlFile->blcksz != BLCKSZ)
 8406 tgl@sss.pgh.pa.us        4512         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4513                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4514                 :                :                  errmsg("database files are incompatible with server"),
                               4515                 :                :         /* translator: %s is a variable name and %d is its value */
                               4516                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4517                 :                :                            " but the server was compiled with %s %d.",
                               4518                 :                :                            "BLCKSZ", ControlFile->blcksz,
                               4519                 :                :                            "BLCKSZ", BLCKSZ),
                               4520                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 9374 tgl@sss.pgh.pa.us        4521         [ -  + ]:CBC        1127 :     if (ControlFile->relseg_size != RELSEG_SIZE)
 8406 tgl@sss.pgh.pa.us        4522         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4523                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4524                 :                :                  errmsg("database files are incompatible with server"),
                               4525                 :                :         /* translator: %s is a variable name and %d is its value */
                               4526                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4527                 :                :                            " but the server was compiled with %s %d.",
                               4528                 :                :                            "RELSEG_SIZE", ControlFile->relseg_size,
                               4529                 :                :                            "RELSEG_SIZE", RELSEG_SIZE),
                               4530                 :                :                  errhint("It looks like you need to recompile or initdb.")));
  258 heikki.linnakangas@i     4531         [ -  + ]:CBC        1127 :     if (ControlFile->slru_pages_per_segment != SLRU_PAGES_PER_SEGMENT)
  258 heikki.linnakangas@i     4532         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4533                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4534                 :                :                  errmsg("database files are incompatible with server"),
                               4535                 :                :         /* translator: %s is a variable name and %d is its value */
                               4536                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4537                 :                :                            " but the server was compiled with %s %d.",
                               4538                 :                :                            "SLRU_PAGES_PER_SEGMENT", ControlFile->slru_pages_per_segment,
                               4539                 :                :                            "SLRU_PAGES_PER_SEGMENT", SLRU_PAGES_PER_SEGMENT),
                               4540                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 7419 tgl@sss.pgh.pa.us        4541         [ -  + ]:CBC        1127 :     if (ControlFile->xlog_blcksz != XLOG_BLCKSZ)
 7419 tgl@sss.pgh.pa.us        4542         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4543                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4544                 :                :                  errmsg("database files are incompatible with server"),
                               4545                 :                :         /* translator: %s is a variable name and %d is its value */
                               4546                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4547                 :                :                            " but the server was compiled with %s %d.",
                               4548                 :                :                            "XLOG_BLCKSZ", ControlFile->xlog_blcksz,
                               4549                 :                :                            "XLOG_BLCKSZ", XLOG_BLCKSZ),
                               4550                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 8862 lockhart@fourpalms.o     4551         [ -  + ]:CBC        1127 :     if (ControlFile->nameDataLen != NAMEDATALEN)
 8406 tgl@sss.pgh.pa.us        4552         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4553                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4554                 :                :                  errmsg("database files are incompatible with server"),
                               4555                 :                :         /* translator: %s is a variable name and %d is its value */
                               4556                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4557                 :                :                            " but the server was compiled with %s %d.",
                               4558                 :                :                            "NAMEDATALEN", ControlFile->nameDataLen,
                               4559                 :                :                            "NAMEDATALEN", NAMEDATALEN),
                               4560                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 7789 tgl@sss.pgh.pa.us        4561         [ -  + ]:CBC        1127 :     if (ControlFile->indexMaxKeys != INDEX_MAX_KEYS)
 8406 tgl@sss.pgh.pa.us        4562         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4563                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4564                 :                :                  errmsg("database files are incompatible with server"),
                               4565                 :                :         /* translator: %s is a variable name and %d is its value */
                               4566                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4567                 :                :                            " but the server was compiled with %s %d.",
                               4568                 :                :                            "INDEX_MAX_KEYS", ControlFile->indexMaxKeys,
                               4569                 :                :                            "INDEX_MAX_KEYS", INDEX_MAX_KEYS),
                               4570                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 7054 tgl@sss.pgh.pa.us        4571         [ -  + ]:CBC        1127 :     if (ControlFile->toast_max_chunk_size != TOAST_MAX_CHUNK_SIZE)
 7054 tgl@sss.pgh.pa.us        4572         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4573                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4574                 :                :                  errmsg("database files are incompatible with server"),
                               4575                 :                :         /* translator: %s is a variable name and %d is its value */
                               4576                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4577                 :                :                            " but the server was compiled with %s %d.",
                               4578                 :                :                            "TOAST_MAX_CHUNK_SIZE", ControlFile->toast_max_chunk_size,
                               4579                 :                :                            "TOAST_MAX_CHUNK_SIZE", (int) TOAST_MAX_CHUNK_SIZE),
                               4580                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 4434 tgl@sss.pgh.pa.us        4581         [ -  + ]:CBC        1127 :     if (ControlFile->loblksize != LOBLKSIZE)
 4434 tgl@sss.pgh.pa.us        4582         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4583                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4584                 :                :                  errmsg("database files are incompatible with server"),
                               4585                 :                :         /* translator: %s is a variable name and %d is its value */
                               4586                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4587                 :                :                            " but the server was compiled with %s %d.",
                               4588                 :                :                            "LOBLKSIZE", ControlFile->loblksize,
                               4589                 :                :                            "LOBLKSIZE", (int) LOBLKSIZE),
                               4590                 :                :                  errhint("It looks like you need to recompile or initdb.")));
                               4591                 :                : 
  347 tgl@sss.pgh.pa.us        4592         [ -  + ]:CBC        1127 :     Assert(ControlFile->float8ByVal);    /* vestigial, not worth an error msg */
                               4593                 :                : 
 3232 andres@anarazel.de       4594                 :           1127 :     wal_segment_size = ControlFile->xlog_seg_size;
                               4595                 :                : 
                               4596   [ +  -  +  -  :           1127 :     if (!IsValidWalSegSize(wal_segment_size))
                                        +  -  -  + ]
 3232 andres@anarazel.de       4597         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4598                 :                :                         errmsg_plural("invalid WAL segment size in control file (%d byte)",
                               4599                 :                :                                       "invalid WAL segment size in control file (%d bytes)",
                               4600                 :                :                                       wal_segment_size,
                               4601                 :                :                                       wal_segment_size),
                               4602                 :                :                         errdetail("The WAL segment size must be a power of two between 1 MB and 1 GB.")));
                               4603                 :                : 
 3232 andres@anarazel.de       4604                 :CBC        1127 :     snprintf(wal_segsz_str, sizeof(wal_segsz_str), "%d", wal_segment_size);
                               4605                 :           1127 :     SetConfigOption("wal_segment_size", wal_segsz_str, PGC_INTERNAL,
                               4606                 :                :                     PGC_S_DYNAMIC_DEFAULT);
                               4607                 :                : 
                               4608                 :                :     /* check and update variables dependent on wal_segment_size */
                               4609         [ -  + ]:           1127 :     if (ConvertToXSegs(min_wal_size_mb, wal_segment_size) < 2)
 3232 andres@anarazel.de       4610         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4611                 :                :         /* translator: both %s are GUC names */
                               4612                 :                :                         errmsg("\"%s\" must be at least twice \"%s\"",
                               4613                 :                :                                "min_wal_size", "wal_segment_size")));
                               4614                 :                : 
 3232 andres@anarazel.de       4615         [ -  + ]:CBC        1127 :     if (ConvertToXSegs(max_wal_size_mb, wal_segment_size) < 2)
 3232 andres@anarazel.de       4616         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4617                 :                :         /* translator: both %s are GUC names */
                               4618                 :                :                         errmsg("\"%s\" must be at least twice \"%s\"",
                               4619                 :                :                                "max_wal_size", "wal_segment_size")));
                               4620                 :                : 
 3232 andres@anarazel.de       4621                 :CBC        1127 :     UsableBytesInSegment =
                               4622                 :           1127 :         (wal_segment_size / XLOG_BLCKSZ * UsableBytesInPage) -
                               4623                 :                :         (SizeOfXLogLongPHD - SizeOfXLogShortPHD);
                               4624                 :                : 
                               4625                 :           1127 :     CalculateCheckpointSegments();
 9374 tgl@sss.pgh.pa.us        4626                 :           1127 : }
                               4627                 :                : 
                               4628                 :                : /*
                               4629                 :                :  * Utility wrapper to update the control file.  Note that the control
                               4630                 :                :  * file gets flushed.
                               4631                 :                :  */
                               4632                 :                : static void
                               4633                 :          10803 : UpdateControlFile(void)
                               4634                 :                : {
 2673 peter@eisentraut.org     4635                 :          10803 :     update_controlfile(DataDir, ControlFile, true);
 9790 vadim4o@yahoo.com        4636                 :          10803 : }
                               4637                 :                : 
                               4638                 :                : /*
                               4639                 :                :  * Returns the unique system identifier from control file.
                               4640                 :                :  */
                               4641                 :                : uint64
 6036 heikki.linnakangas@i     4642                 :           1600 : GetSystemIdentifier(void)
                               4643                 :                : {
                               4644         [ -  + ]:           1600 :     Assert(ControlFile != NULL);
                               4645                 :           1600 :     return ControlFile->system_identifier;
                               4646                 :                : }
                               4647                 :                : 
                               4648                 :                : /*
                               4649                 :                :  * Returns the random nonce from control file.
                               4650                 :                :  */
                               4651                 :                : char *
 3428                          4652                 :              2 : GetMockAuthenticationNonce(void)
                               4653                 :                : {
                               4654         [ -  + ]:              2 :     Assert(ControlFile != NULL);
                               4655                 :              2 :     return ControlFile->mock_authentication_nonce;
                               4656                 :                : }
                               4657                 :                : 
                               4658                 :                : /*
                               4659                 :                :  * DataChecksumsNeedWrite
                               4660                 :                :  *      Returns whether data checksums must be written or not
                               4661                 :                :  *
                               4662                 :                :  * Returns true if data checksums are enabled, or are in the process of being
                               4663                 :                :  * enabled. During "inprogress-on" and "inprogress-off" states checksums must
                               4664                 :                :  * be written even though they are not verified (see datachecksum_state.c for
                               4665                 :                :  * a longer discussion).
                               4666                 :                :  *
                               4667                 :                :  * This function is intended for callsites which are about to write a data page
                               4668                 :                :  * to storage, and need to know whether to re-calculate the checksum for the
                               4669                 :                :  * page header. Calling this function must be performed as close to the write
                               4670                 :                :  * operation as possible to keep the critical section short.
                               4671                 :                :  */
                               4672                 :                : bool
  114 dgustafsson@postgres     4673                 :         878349 : DataChecksumsNeedWrite(void)
                               4674                 :                : {
                               4675                 :         984114 :     return (LocalDataChecksumState == PG_DATA_CHECKSUM_VERSION ||
                               4676   [ +  +  +  + ]:         942309 :             LocalDataChecksumState == PG_DATA_CHECKSUM_INPROGRESS_ON ||
                               4677         [ +  + ]:          63960 :             LocalDataChecksumState == PG_DATA_CHECKSUM_INPROGRESS_OFF);
                               4678                 :                : }
                               4679                 :                : 
                               4680                 :                : 
                               4681                 :                : bool
   87                          4682                 :              7 : DataChecksumsOff(void)
                               4683                 :                : {
                               4684                 :                :     bool        ret;
                               4685                 :                : 
                               4686                 :              7 :     SpinLockAcquire(&XLogCtl->info_lck);
                               4687                 :              7 :     ret = (XLogCtl->data_checksum_version == PG_DATA_CHECKSUM_OFF);
                               4688                 :              7 :     SpinLockRelease(&XLogCtl->info_lck);
                               4689                 :                : 
                               4690                 :              7 :     return ret;
                               4691                 :                : }
                               4692                 :                : 
                               4693                 :                : bool
                               4694                 :             11 : DataChecksumsOn(void)
                               4695                 :                : {
                               4696                 :                :     bool        ret;
                               4697                 :                : 
                               4698                 :             11 :     SpinLockAcquire(&XLogCtl->info_lck);
                               4699                 :             11 :     ret = (XLogCtl->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
                               4700                 :             11 :     SpinLockRelease(&XLogCtl->info_lck);
                               4701                 :                : 
                               4702                 :             11 :     return ret;
                               4703                 :                : }
                               4704                 :                : 
                               4705                 :                : bool
  114                          4706                 :            164 : DataChecksumsInProgressOn(void)
                               4707                 :                : {
                               4708                 :                :     bool        ret;
                               4709                 :                : 
   87                          4710                 :            164 :     SpinLockAcquire(&XLogCtl->info_lck);
                               4711                 :            164 :     ret = (XLogCtl->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON);
                               4712                 :            164 :     SpinLockRelease(&XLogCtl->info_lck);
                               4713                 :                : 
                               4714                 :            164 :     return ret;
                               4715                 :                : }
                               4716                 :                : 
                               4717                 :                : /*
                               4718                 :                :  * DataChecksumsNeedVerify
                               4719                 :                :  *      Returns whether data checksums must be verified or not
                               4720                 :                :  *
                               4721                 :                :  * Data checksums are only verified if they are fully enabled in the cluster.
                               4722                 :                :  * During the "inprogress-on" and "inprogress-off" states they are only
                               4723                 :                :  * updated, not verified (see datachecksum_state.c for a longer discussion).
                               4724                 :                :  *
                               4725                 :                :  * This function is intended for callsites which have read data and are about
                               4726                 :                :  * to perform checksum validation based on the result of this.  Calling this
                               4727                 :                :  * function must be performed as close to the validation call as possible to
                               4728                 :                :  * keep the critical section short. This is in order to protect against time of
                               4729                 :                :  * check/time of use situations around data checksum validation.
                               4730                 :                :  */
                               4731                 :                : bool
  114                          4732                 :        2656015 : DataChecksumsNeedVerify(void)
                               4733                 :                : {
                               4734                 :        2656015 :     return (LocalDataChecksumState == PG_DATA_CHECKSUM_VERSION);
                               4735                 :                : }
                               4736                 :                : 
                               4737                 :                : /*
                               4738                 :                :  * SetDataChecksumsOnInProgress
                               4739                 :                :  *      Sets the data checksum state to "inprogress-on" to enable checksums
                               4740                 :                :  *
                               4741                 :                :  * To start the process of enabling data checksums in a running cluster the
                               4742                 :                :  * data_checksum_version state must be changed to "inprogress-on". See
                               4743                 :                :  * SetDataChecksumsOn below for a description on how this state change works.
                               4744                 :                :  * This function blocks until all backends in the cluster have acknowledged the
                               4745                 :                :  * state transition.
                               4746                 :                :  */
                               4747                 :                : void
                               4748                 :              9 : SetDataChecksumsOnInProgress(void)
                               4749                 :                : {
                               4750                 :                :     uint64      barrier;
                               4751                 :                : 
                               4752                 :                :     /*
                               4753                 :                :      * The state transition is performed in a critical section with
                               4754                 :                :      * checkpoints held off to provide crash safety.
                               4755                 :                :      */
                               4756                 :              9 :     START_CRIT_SECTION();
                               4757                 :              9 :     MyProc->delayChkptFlags |= DELAY_CHKPT_START;
                               4758                 :                : 
                               4759                 :              9 :     XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON);
                               4760                 :                : 
                               4761                 :              9 :     SpinLockAcquire(&XLogCtl->info_lck);
                               4762                 :              9 :     XLogCtl->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON;
                               4763                 :              9 :     SpinLockRelease(&XLogCtl->info_lck);
                               4764                 :                : 
                               4765                 :              9 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               4766                 :              9 :     ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON;
                               4767                 :              9 :     UpdateControlFile();
                               4768                 :              9 :     LWLockRelease(ControlFileLock);
                               4769                 :                : 
   87                          4770                 :              9 :     barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
                               4771                 :                : 
                               4772                 :              9 :     MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
                               4773         [ -  + ]:              9 :     END_CRIT_SECTION();
                               4774                 :                : 
  114                          4775                 :              9 :     WaitForProcSignalBarrier(barrier);
                               4776                 :              9 : }
                               4777                 :                : 
                               4778                 :                : /*
                               4779                 :                :  * SetDataChecksumsOn
                               4780                 :                :  *      Set data checksums state to 'on' cluster-wide
                               4781                 :                :  *
                               4782                 :                :  * Enabling data checksums is performed using two barriers, the first one to
                               4783                 :                :  * set the state to "inprogress-on" (done by SetDataChecksumsOnInProgress())
                               4784                 :                :  * and the second one to set the state to "on" (done here). Below is a short
                               4785                 :                :  * description of the processing, a more detailed write-up can be found in
                               4786                 :                :  * datachecksum_state.c.
                               4787                 :                :  *
                               4788                 :                :  * To start the process of enabling data checksums in a running cluster the
                               4789                 :                :  * data_checksum_version state must be changed to "inprogress-on".  This state
                               4790                 :                :  * requires data checksums to be written but not verified. This ensures that
                               4791                 :                :  * all data pages can be checksummed without the risk of false negatives in
                               4792                 :                :  * validation during the process.  When all existing pages are guaranteed to
                               4793                 :                :  * have checksums, and all new pages will be initiated with checksums, the
                               4794                 :                :  * state can be changed to "on". Once the state is "on" checksums will be both
                               4795                 :                :  * written and verified.
                               4796                 :                :  *
                               4797                 :                :  * This function blocks until all backends in the cluster have acknowledged the
                               4798                 :                :  * state transition.
                               4799                 :                :  */
                               4800                 :                : void
                               4801                 :              7 : SetDataChecksumsOn(void)
                               4802                 :                : {
                               4803                 :                :     uint64      barrier;
                               4804                 :                : 
                               4805                 :              7 :     SpinLockAcquire(&XLogCtl->info_lck);
                               4806                 :                : 
                               4807                 :                :     /*
                               4808                 :                :      * The only allowed state transition to "on" is from "inprogress-on" since
                               4809                 :                :      * that state ensures that all pages will have data checksums written. Any
                               4810                 :                :      * other attempted state transition is likely due to a programmer error.
                               4811                 :                :      */
                               4812         [ -  + ]:              7 :     if (XLogCtl->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON)
                               4813                 :                :     {
  114 dgustafsson@postgres     4814                 :UBC           0 :         SpinLockRelease(&XLogCtl->info_lck);
                               4815         [ #  # ]:              0 :         elog(WARNING,
                               4816                 :                :              "cannot set data checksums to \"on\", current state is not \"inprogress-on\", disabling");
                               4817                 :              0 :         SetDataChecksumsOff();
                               4818                 :              0 :         return;
                               4819                 :                :     }
                               4820                 :                : 
  114 dgustafsson@postgres     4821                 :CBC           7 :     SpinLockRelease(&XLogCtl->info_lck);
                               4822                 :                : 
                               4823                 :              7 :     INJECTION_POINT("datachecksums-enable-checksums-delay", NULL);
                               4824                 :              7 :     START_CRIT_SECTION();
                               4825                 :              7 :     MyProc->delayChkptFlags |= DELAY_CHKPT_START;
                               4826                 :                : 
                               4827                 :              7 :     XLogChecksums(PG_DATA_CHECKSUM_VERSION);
                               4828                 :                : 
                               4829                 :              7 :     SpinLockAcquire(&XLogCtl->info_lck);
                               4830                 :              7 :     XLogCtl->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
                               4831                 :              7 :     SpinLockRelease(&XLogCtl->info_lck);
                               4832                 :                : 
                               4833                 :                :     /*
                               4834                 :                :      * Update the controlfile before waiting since if we have an immediate
                               4835                 :                :      * shutdown while waiting we want to come back up with checksums enabled.
                               4836                 :                :      */
                               4837                 :              7 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               4838                 :              7 :     ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
                               4839                 :              7 :     UpdateControlFile();
                               4840                 :              7 :     LWLockRelease(ControlFileLock);
                               4841                 :                : 
   87                          4842                 :              7 :     barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
                               4843                 :                : 
                               4844                 :              7 :     MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
                               4845         [ -  + ]:              7 :     END_CRIT_SECTION();
                               4846                 :                : 
                               4847                 :              7 :     RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_FAST);
  114                          4848                 :              7 :     WaitForProcSignalBarrier(barrier);
                               4849                 :                : }
                               4850                 :                : 
                               4851                 :                : /*
                               4852                 :                :  * SetDataChecksumsOff
                               4853                 :                :  *      Disables data checksums cluster-wide
                               4854                 :                :  *
                               4855                 :                :  * Disabling data checksums must be performed with two sets of barriers, each
                               4856                 :                :  * carrying a different state. The state is first set to "inprogress-off"
                               4857                 :                :  * during which checksums are still written but not verified. This ensures that
                               4858                 :                :  * backends which have yet to observe the state change from "on" won't get
                               4859                 :                :  * validation errors on concurrently modified pages. Once all backends have
                               4860                 :                :  * changed to "inprogress-off", the barrier for moving to "off" can be emitted.
                               4861                 :                :  * This function blocks until all backends in the cluster have acknowledged the
                               4862                 :                :  * state transition.
                               4863                 :                :  */
                               4864                 :                : void
                               4865                 :              7 : SetDataChecksumsOff(void)
                               4866                 :                : {
                               4867                 :                :     uint64      barrier;
                               4868                 :                : 
                               4869                 :              7 :     SpinLockAcquire(&XLogCtl->info_lck);
                               4870                 :                : 
                               4871                 :                :     /* If data checksums are already disabled there is nothing to do */
  111                          4872         [ -  + ]:              7 :     if (XLogCtl->data_checksum_version == PG_DATA_CHECKSUM_OFF)
                               4873                 :                :     {
  114 dgustafsson@postgres     4874                 :UBC           0 :         SpinLockRelease(&XLogCtl->info_lck);
                               4875                 :              0 :         return;
                               4876                 :                :     }
                               4877                 :                : 
                               4878                 :                :     /*
                               4879                 :                :      * If data checksums are currently enabled, or in the process of being
                               4880                 :                :      * enabled, we first transition to the "inprogress-off" state during which
                               4881                 :                :      * backends continue to write checksums without verifying them. When all
                               4882                 :                :      * backends are in "inprogress-off" the next transition to "off" can be
                               4883                 :                :      * performed, after which all data checksum processing is disabled.
                               4884                 :                :      */
   87 dgustafsson@postgres     4885         [ +  + ]:CBC           7 :     if (XLogCtl->data_checksum_version == PG_DATA_CHECKSUM_VERSION ||
                               4886         [ +  - ]:              2 :         XLogCtl->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON)
                               4887                 :                :     {
  114                          4888                 :              7 :         SpinLockRelease(&XLogCtl->info_lck);
                               4889                 :                : 
                               4890                 :              7 :         START_CRIT_SECTION();
                               4891                 :              7 :         MyProc->delayChkptFlags |= DELAY_CHKPT_START;
                               4892                 :                : 
                               4893                 :              7 :         XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF);
                               4894                 :                : 
                               4895                 :              7 :         SpinLockAcquire(&XLogCtl->info_lck);
                               4896                 :              7 :         XLogCtl->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF;
                               4897                 :              7 :         SpinLockRelease(&XLogCtl->info_lck);
                               4898                 :                : 
   87                          4899                 :              7 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               4900                 :              7 :         ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF;
                               4901                 :              7 :         UpdateControlFile();
                               4902                 :              7 :         LWLockRelease(ControlFileLock);
                               4903                 :                : 
  114                          4904                 :              7 :         barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
                               4905                 :                : 
                               4906                 :              7 :         MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
                               4907         [ -  + ]:              7 :         END_CRIT_SECTION();
                               4908                 :                : 
                               4909                 :              7 :         RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_FAST);
                               4910                 :              7 :         WaitForProcSignalBarrier(barrier);
                               4911                 :                : 
                               4912                 :                :         /*
                               4913                 :                :          * At this point we know that no backends are verifying data checksums
                               4914                 :                :          * during reading. Next, we can safely move to state "off" to also
                               4915                 :                :          * stop writing checksums.
                               4916                 :                :          */
                               4917                 :                :     }
                               4918                 :                :     else
                               4919                 :                :     {
                               4920                 :                :         /*
                               4921                 :                :          * Ending up here implies that the checksums state is "inprogress-off"
                               4922                 :                :          * and we can transition directly to "off" from there.
                               4923                 :                :          */
  114 dgustafsson@postgres     4924                 :UBC           0 :         SpinLockRelease(&XLogCtl->info_lck);
                               4925                 :                :     }
                               4926                 :                : 
  114 dgustafsson@postgres     4927                 :CBC           7 :     START_CRIT_SECTION();
                               4928                 :                :     /* Ensure that we don't incur a checkpoint during disabling checksums */
                               4929                 :              7 :     MyProc->delayChkptFlags |= DELAY_CHKPT_START;
                               4930                 :                : 
                               4931                 :              7 :     XLogChecksums(PG_DATA_CHECKSUM_OFF);
                               4932                 :                : 
                               4933                 :              7 :     SpinLockAcquire(&XLogCtl->info_lck);
  111                          4934                 :              7 :     XLogCtl->data_checksum_version = PG_DATA_CHECKSUM_OFF;
  114                          4935                 :              7 :     SpinLockRelease(&XLogCtl->info_lck);
                               4936                 :                : 
                               4937                 :              7 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               4938                 :              7 :     ControlFile->data_checksum_version = PG_DATA_CHECKSUM_OFF;
                               4939                 :              7 :     UpdateControlFile();
                               4940                 :              7 :     LWLockRelease(ControlFileLock);
                               4941                 :                : 
   87                          4942                 :              7 :     barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
                               4943                 :                : 
                               4944                 :              7 :     MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
                               4945         [ -  + ]:              7 :     END_CRIT_SECTION();
                               4946                 :                : 
                               4947                 :              7 :     RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_FAST);
  114                          4948                 :              7 :     WaitForProcSignalBarrier(barrier);
                               4949                 :                : }
                               4950                 :                : 
                               4951                 :                : /*
                               4952                 :                :  * InitLocalDataChecksumState
                               4953                 :                :  *
                               4954                 :                :  * Set up backend local caches of controldata variables which may change at
                               4955                 :                :  * any point during runtime and thus require special cased locking. So far
                               4956                 :                :  * this only applies to data_checksum_version, but it's intended to be general
                               4957                 :                :  * purpose enough to handle future cases.
                               4958                 :                :  */
                               4959                 :                : void
                               4960                 :          22701 : InitLocalDataChecksumState(void)
                               4961                 :                : {
   87                          4962         [ -  + ]:          22701 :     Assert(InterruptHoldoffCount > 0);
  114                          4963                 :          22701 :     SpinLockAcquire(&XLogCtl->info_lck);
                               4964                 :          22701 :     SetLocalDataChecksumState(XLogCtl->data_checksum_version);
                               4965                 :          22701 :     SpinLockRelease(&XLogCtl->info_lck);
                               4966                 :          22701 : }
                               4967                 :                : 
                               4968                 :                : void
                               4969                 :          25986 : SetLocalDataChecksumState(uint32 data_checksum_version)
                               4970                 :                : {
                               4971                 :          25986 :     LocalDataChecksumState = data_checksum_version;
                               4972                 :                : 
                               4973                 :          25986 :     data_checksums = data_checksum_version;
                               4974                 :          25986 : }
                               4975                 :                : 
                               4976                 :                : /* guc hook */
                               4977                 :                : const char *
                               4978                 :           1911 : show_data_checksums(void)
                               4979                 :                : {
                               4980                 :           1911 :     return get_checksum_state_string(LocalDataChecksumState);
                               4981                 :                : }
                               4982                 :                : 
                               4983                 :                : /*
                               4984                 :                :  * Return true if the cluster was initialized on a platform where the
                               4985                 :                :  * default signedness of char is "signed". This function exists for code
                               4986                 :                :  * that deals with pre-v18 data files that store data sorted by the 'char'
                               4987                 :                :  * type on disk (e.g., GIN and GiST indexes). See the comments in
                               4988                 :                :  * WriteControlFile() for details.
                               4989                 :                :  */
                               4990                 :                : bool
  520 msawada@postgresql.o     4991                 :          89903 : GetDefaultCharSignedness(void)
                               4992                 :                : {
                               4993                 :          89903 :     return ControlFile->default_char_signedness;
                               4994                 :                : }
                               4995                 :                : 
                               4996                 :                : /*
                               4997                 :                :  * Returns a fake LSN for unlogged relations.
                               4998                 :                :  *
                               4999                 :                :  * Each call generates an LSN that is greater than any previous value
                               5000                 :                :  * returned. The current counter value is saved and restored across clean
                               5001                 :                :  * shutdowns, but like unlogged relations, does not survive a crash. This can
                               5002                 :                :  * be used in lieu of real LSN values returned by XLogInsert, if you need an
                               5003                 :                :  * LSN-like increasing sequence of numbers without writing any WAL.
                               5004                 :                :  */
                               5005                 :                : XLogRecPtr
 4913 heikki.linnakangas@i     5006                 :         202669 : GetFakeLSNForUnloggedRel(void)
                               5007                 :                : {
  878 nathan@postgresql.or     5008                 :         202669 :     return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
                               5009                 :                : }
                               5010                 :                : 
                               5011                 :                : /*
                               5012                 :                :  * Auto-tune the number of XLOG buffers.
                               5013                 :                :  *
                               5014                 :                :  * The preferred setting for wal_buffers is about 3% of shared_buffers, with
                               5015                 :                :  * a maximum of one XLOG segment (there is little reason to think that more
                               5016                 :                :  * is helpful, at least so long as we force an fsync when switching log files)
                               5017                 :                :  * and a minimum of 8 blocks (which was the default value prior to PostgreSQL
                               5018                 :                :  * 9.1, when auto-tuning was added).
                               5019                 :                :  *
                               5020                 :                :  * This should not be called until NBuffers has received its final value.
                               5021                 :                :  */
                               5022                 :                : static int
 5589 tgl@sss.pgh.pa.us        5023                 :           1221 : XLOGChooseNumBuffers(void)
                               5024                 :                : {
                               5025                 :                :     int         xbuffers;
                               5026                 :                : 
                               5027                 :           1221 :     xbuffers = NBuffers / 32;
 3232 andres@anarazel.de       5028         [ +  + ]:           1221 :     if (xbuffers > (wal_segment_size / XLOG_BLCKSZ))
                               5029                 :             24 :         xbuffers = (wal_segment_size / XLOG_BLCKSZ);
 5589 tgl@sss.pgh.pa.us        5030         [ +  + ]:           1221 :     if (xbuffers < 8)
                               5031                 :            482 :         xbuffers = 8;
                               5032                 :           1221 :     return xbuffers;
                               5033                 :                : }
                               5034                 :                : 
                               5035                 :                : /*
                               5036                 :                :  * GUC check_hook for wal_buffers
                               5037                 :                :  */
                               5038                 :                : bool
                               5039                 :           2487 : check_wal_buffers(int *newval, void **extra, GucSource source)
                               5040                 :                : {
                               5041                 :                :     /*
                               5042                 :                :      * -1 indicates a request for auto-tune.
                               5043                 :                :      */
                               5044         [ +  + ]:           2487 :     if (*newval == -1)
                               5045                 :                :     {
                               5046                 :                :         /*
                               5047                 :                :          * If we haven't yet changed the boot_val default of -1, just let it
                               5048                 :                :          * be.  We'll fix it when XLOGShmemRequest is called.
                               5049                 :                :          */
                               5050         [ +  - ]:           1266 :         if (XLOGbuffers == -1)
                               5051                 :           1266 :             return true;
                               5052                 :                : 
                               5053                 :                :         /* Otherwise, substitute the auto-tune value */
 5589 tgl@sss.pgh.pa.us        5054                 :UBC           0 :         *newval = XLOGChooseNumBuffers();
                               5055                 :                :     }
                               5056                 :                : 
                               5057                 :                :     /*
                               5058                 :                :      * We clamp manually-set values to at least 4 blocks.  Prior to PostgreSQL
                               5059                 :                :      * 9.1, a minimum of 4 was enforced by guc.c, but since that is no longer
                               5060                 :                :      * the case, we just silently treat such values as a request for the
                               5061                 :                :      * minimum.  (We could throw an error instead, but that doesn't seem very
                               5062                 :                :      * helpful.)
                               5063                 :                :      */
 5589 tgl@sss.pgh.pa.us        5064         [ -  + ]:CBC        1221 :     if (*newval < 4)
 5589 tgl@sss.pgh.pa.us        5065                 :UBC           0 :         *newval = 4;
                               5066                 :                : 
 5589 tgl@sss.pgh.pa.us        5067                 :CBC        1221 :     return true;
                               5068                 :                : }
                               5069                 :                : 
                               5070                 :                : /*
                               5071                 :                :  * GUC check_hook for wal_consistency_checking
                               5072                 :                :  */
                               5073                 :                : bool
 1412                          5074                 :           2252 : check_wal_consistency_checking(char **newval, void **extra, GucSource source)
                               5075                 :                : {
                               5076                 :                :     char       *rawstring;
                               5077                 :                :     List       *elemlist;
                               5078                 :                :     ListCell   *l;
                               5079                 :                :     bool        newwalconsistency[RM_MAX_ID + 1];
                               5080                 :                : 
                               5081                 :                :     /* Initialize the array */
                               5082   [ +  -  +  -  :          74316 :     MemSet(newwalconsistency, 0, (RM_MAX_ID + 1) * sizeof(bool));
                                     +  -  +  -  +  
                                                 + ]
                               5083                 :                : 
                               5084                 :                :     /* Need a modifiable copy of string */
                               5085                 :           2252 :     rawstring = pstrdup(*newval);
                               5086                 :                : 
                               5087                 :                :     /* Parse string into list of identifiers */
                               5088         [ -  + ]:           2252 :     if (!SplitIdentifierString(rawstring, ',', &elemlist))
                               5089                 :                :     {
                               5090                 :                :         /* syntax error in list */
 1412 tgl@sss.pgh.pa.us        5091                 :UBC           0 :         GUC_check_errdetail("List syntax is invalid.");
                               5092                 :              0 :         pfree(rawstring);
                               5093                 :              0 :         list_free(elemlist);
                               5094                 :              0 :         return false;
                               5095                 :                :     }
                               5096                 :                : 
 1412 tgl@sss.pgh.pa.us        5097   [ +  +  +  +  :CBC        2748 :     foreach(l, elemlist)
                                              +  + ]
                               5098                 :                :     {
                               5099                 :            496 :         char       *tok = (char *) lfirst(l);
                               5100                 :                :         int         rmid;
                               5101                 :                : 
                               5102                 :                :         /* Check for 'all'. */
                               5103         [ +  + ]:            496 :         if (pg_strcasecmp(tok, "all") == 0)
                               5104                 :                :         {
                               5105         [ +  + ]:         126958 :             for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
                               5106   [ +  +  +  + ]:         126464 :                 if (RmgrIdExists(rmid) && GetRmgr(rmid).rm_mask != NULL)
                               5107                 :           4940 :                     newwalconsistency[rmid] = true;
                               5108                 :                :         }
                               5109                 :                :         else
                               5110                 :                :         {
                               5111                 :                :             /* Check if the token matches any known resource manager. */
                               5112                 :              2 :             bool        found = false;
                               5113                 :                : 
                               5114         [ +  - ]:             36 :             for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
                               5115                 :                :             {
                               5116   [ +  -  +  +  :             54 :                 if (RmgrIdExists(rmid) && GetRmgr(rmid).rm_mask != NULL &&
                                              +  + ]
                               5117                 :             18 :                     pg_strcasecmp(tok, GetRmgr(rmid).rm_name) == 0)
                               5118                 :                :                 {
                               5119                 :              2 :                     newwalconsistency[rmid] = true;
                               5120                 :              2 :                     found = true;
                               5121                 :              2 :                     break;
                               5122                 :                :                 }
                               5123                 :                :             }
                               5124         [ -  + ]:              2 :             if (!found)
                               5125                 :                :             {
                               5126                 :                :                 /*
                               5127                 :                :                  * During startup, it might be a not-yet-loaded custom
                               5128                 :                :                  * resource manager.  Defer checking until
                               5129                 :                :                  * InitializeWalConsistencyChecking().
                               5130                 :                :                  */
 1412 tgl@sss.pgh.pa.us        5131         [ #  # ]:UBC           0 :                 if (!process_shared_preload_libraries_done)
                               5132                 :                :                 {
                               5133                 :              0 :                     check_wal_consistency_checking_deferred = true;
                               5134                 :                :                 }
                               5135                 :                :                 else
                               5136                 :                :                 {
                               5137                 :              0 :                     GUC_check_errdetail("Unrecognized key word: \"%s\".", tok);
                               5138                 :              0 :                     pfree(rawstring);
                               5139                 :              0 :                     list_free(elemlist);
                               5140                 :              0 :                     return false;
                               5141                 :                :                 }
                               5142                 :                :             }
                               5143                 :                :         }
                               5144                 :                :     }
                               5145                 :                : 
 1412 tgl@sss.pgh.pa.us        5146                 :CBC        2252 :     pfree(rawstring);
                               5147                 :           2252 :     list_free(elemlist);
                               5148                 :                : 
                               5149                 :                :     /* assign new value */
  486 dgustafsson@postgres     5150                 :           2252 :     *extra = guc_malloc(LOG, (RM_MAX_ID + 1) * sizeof(bool));
                               5151         [ -  + ]:           2252 :     if (!*extra)
  486 dgustafsson@postgres     5152                 :UBC           0 :         return false;
 1412 tgl@sss.pgh.pa.us        5153                 :CBC        2252 :     memcpy(*extra, newwalconsistency, (RM_MAX_ID + 1) * sizeof(bool));
                               5154                 :           2252 :     return true;
                               5155                 :                : }
                               5156                 :                : 
                               5157                 :                : /*
                               5158                 :                :  * GUC assign_hook for wal_consistency_checking
                               5159                 :                :  */
                               5160                 :                : void
                               5161                 :           2251 : assign_wal_consistency_checking(const char *newval, void *extra)
                               5162                 :                : {
                               5163                 :                :     /*
                               5164                 :                :      * If some checks were deferred, it's possible that the checks will fail
                               5165                 :                :      * later during InitializeWalConsistencyChecking(). But in that case, the
                               5166                 :                :      * postmaster will exit anyway, so it's safe to proceed with the
                               5167                 :                :      * assignment.
                               5168                 :                :      *
                               5169                 :                :      * Any built-in resource managers specified are assigned immediately,
                               5170                 :                :      * which affects WAL created before shared_preload_libraries are
                               5171                 :                :      * processed. Any custom resource managers specified won't be assigned
                               5172                 :                :      * until after shared_preload_libraries are processed, but that's OK
                               5173                 :                :      * because WAL for a custom resource manager can't be written before the
                               5174                 :                :      * module is loaded anyway.
                               5175                 :                :      */
                               5176                 :           2251 :     wal_consistency_checking = extra;
                               5177                 :           2251 : }
                               5178                 :                : 
                               5179                 :                : /*
                               5180                 :                :  * InitializeWalConsistencyChecking: run after loading custom resource managers
                               5181                 :                :  *
                               5182                 :                :  * If any unknown resource managers were specified in the
                               5183                 :                :  * wal_consistency_checking GUC, processing was deferred.  Now that
                               5184                 :                :  * shared_preload_libraries have been loaded, process wal_consistency_checking
                               5185                 :                :  * again.
                               5186                 :                :  */
                               5187                 :                : void
                               5188                 :           1053 : InitializeWalConsistencyChecking(void)
                               5189                 :                : {
                               5190         [ -  + ]:           1053 :     Assert(process_shared_preload_libraries_done);
                               5191                 :                : 
                               5192         [ -  + ]:           1053 :     if (check_wal_consistency_checking_deferred)
                               5193                 :                :     {
                               5194                 :                :         struct config_generic *guc;
                               5195                 :                : 
 1412 tgl@sss.pgh.pa.us        5196                 :UBC           0 :         guc = find_option("wal_consistency_checking", false, false, ERROR);
                               5197                 :                : 
                               5198                 :              0 :         check_wal_consistency_checking_deferred = false;
                               5199                 :                : 
                               5200                 :              0 :         set_config_option_ext("wal_consistency_checking",
                               5201                 :                :                               wal_consistency_checking_string,
                               5202                 :                :                               guc->scontext, guc->source, guc->srole,
                               5203                 :                :                               GUC_ACTION_SET, true, ERROR, false);
                               5204                 :                : 
                               5205                 :                :         /* checking should not be deferred again */
                               5206         [ #  # ]:              0 :         Assert(!check_wal_consistency_checking_deferred);
                               5207                 :                :     }
 1412 tgl@sss.pgh.pa.us        5208                 :CBC        1053 : }
                               5209                 :                : 
                               5210                 :                : /*
                               5211                 :                :  * GUC show_hook for archive_command
                               5212                 :                :  */
                               5213                 :                : const char *
                               5214                 :           1907 : show_archive_command(void)
                               5215                 :                : {
                               5216   [ +  +  -  +  :           1907 :     if (XLogArchivingActive())
                                              +  + ]
                               5217                 :            105 :         return XLogArchiveCommand;
                               5218                 :                :     else
                               5219                 :           1802 :         return "(disabled)";
                               5220                 :                : }
                               5221                 :                : 
                               5222                 :                : /*
                               5223                 :                :  * GUC show_hook for in_hot_standby
                               5224                 :                :  */
                               5225                 :                : const char *
                               5226                 :          17252 : show_in_hot_standby(void)
                               5227                 :                : {
                               5228                 :                :     /*
                               5229                 :                :      * We display the actual state based on shared memory, so that this GUC
                               5230                 :                :      * reports up-to-date state if examined intra-query.  The underlying
                               5231                 :                :      * variable (in_hot_standby_guc) changes only when we transmit a new value
                               5232                 :                :      * to the client.
                               5233                 :                :      */
                               5234         [ +  + ]:          17252 :     return RecoveryInProgress() ? "on" : "off";
                               5235                 :                : }
                               5236                 :                : 
                               5237                 :                : /*
                               5238                 :                :  * GUC show_hook for effective_wal_level
                               5239                 :                :  */
                               5240                 :                : const char *
  215 msawada@postgresql.o     5241                 :           1946 : show_effective_wal_level(void)
                               5242                 :                : {
                               5243         [ +  + ]:           1946 :     if (wal_level == WAL_LEVEL_MINIMAL)
                               5244                 :            233 :         return "minimal";
                               5245                 :                : 
                               5246                 :                :     /*
                               5247                 :                :      * During recovery, effective_wal_level reflects the primary's
                               5248                 :                :      * configuration rather than the local wal_level value.
                               5249                 :                :      */
                               5250         [ +  + ]:           1713 :     if (RecoveryInProgress())
                               5251         [ +  + ]:             31 :         return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
                               5252                 :                : 
                               5253   [ +  +  +  + ]:           1682 :     return XLogLogicalInfoActive() ? "logical" : "replica";
                               5254                 :                : }
                               5255                 :                : 
                               5256                 :                : /*
                               5257                 :                :  * Read the control file, set respective GUCs.
                               5258                 :                :  *
                               5259                 :                :  * This is to be called during startup, including a crash recovery cycle,
                               5260                 :                :  * unless in bootstrap mode, where no control file yet exists.  As there's no
                               5261                 :                :  * usable shared memory yet (its sizing can depend on the contents of the
                               5262                 :                :  * control file!), first store the contents in local memory. XLOGShmemInit()
                               5263                 :                :  * will then copy it to shared memory later.
                               5264                 :                :  *
                               5265                 :                :  * reset just controls whether previous contents are to be expected (in the
                               5266                 :                :  * reset case, there's a dangling pointer into old shared memory), or not.
                               5267                 :                :  */
                               5268                 :                : void
 3234 andres@anarazel.de       5269                 :           1071 : LocalProcessControlFile(bool reset)
                               5270                 :                : {
                               5271   [ +  +  -  + ]:           1071 :     Assert(reset || ControlFile == NULL);
  111 heikki.linnakangas@i     5272                 :           1071 :     LocalControlFile = palloc_object(ControlFileData);
                               5273                 :           1071 :     ControlFile = LocalControlFile;
 3238 andres@anarazel.de       5274                 :           1071 :     ReadControlFile();
  114 dgustafsson@postgres     5275                 :           1071 :     SetLocalDataChecksumState(ControlFile->data_checksum_version);
 3238 andres@anarazel.de       5276                 :           1071 : }
                               5277                 :                : 
                               5278                 :                : /*
                               5279                 :                :  * Get the wal_level from the control file. For a standby, this value should be
                               5280                 :                :  * considered as its active wal_level, because it may be different from what
                               5281                 :                :  * was originally configured on standby.
                               5282                 :                :  */
                               5283                 :                : WalLevel
 1205 andres@anarazel.de       5284                 :UBC           0 : GetActiveWalLevelOnStandby(void)
                               5285                 :                : {
                               5286                 :              0 :     return ControlFile->wal_level;
                               5287                 :                : }
                               5288                 :                : 
                               5289                 :                : /*
                               5290                 :                :  * Register shared memory for XLOG.
                               5291                 :                :  */
                               5292                 :                : static void
  111 heikki.linnakangas@i     5293                 :CBC        1226 : XLOGShmemRequest(void *arg)
                               5294                 :                : {
                               5295                 :                :     Size        size;
                               5296                 :                : 
                               5297                 :                :     /*
                               5298                 :                :      * If the value of wal_buffers is -1, use the preferred auto-tune value.
                               5299                 :                :      * This isn't an amazingly clean place to do this, but we must wait till
                               5300                 :                :      * NBuffers has received its final value, and must do it before using the
                               5301                 :                :      * value of XLOGbuffers to do anything important.
                               5302                 :                :      *
                               5303                 :                :      * We prefer to report this value's source as PGC_S_DYNAMIC_DEFAULT.
                               5304                 :                :      * However, if the DBA explicitly set wal_buffers = -1 in the config file,
                               5305                 :                :      * then PGC_S_DYNAMIC_DEFAULT will fail to override that and we must force
                               5306                 :                :      * the matter with PGC_S_OVERRIDE.
                               5307                 :                :      */
 5589 tgl@sss.pgh.pa.us        5308         [ +  + ]:           1226 :     if (XLOGbuffers == -1)
                               5309                 :                :     {
                               5310                 :                :         char        buf[32];
                               5311                 :                : 
                               5312                 :           1221 :         snprintf(buf, sizeof(buf), "%d", XLOGChooseNumBuffers());
 1509                          5313                 :           1221 :         SetConfigOption("wal_buffers", buf, PGC_POSTMASTER,
                               5314                 :                :                         PGC_S_DYNAMIC_DEFAULT);
                               5315         [ -  + ]:           1221 :         if (XLOGbuffers == -1)  /* failed to apply it? */
 1509 tgl@sss.pgh.pa.us        5316                 :UBC           0 :             SetConfigOption("wal_buffers", buf, PGC_POSTMASTER,
                               5317                 :                :                             PGC_S_OVERRIDE);
                               5318                 :                :     }
 5664 tgl@sss.pgh.pa.us        5319         [ -  + ]:CBC        1226 :     Assert(XLOGbuffers > 0);
                               5320                 :                : 
                               5321                 :                :     /* XLogCtl */
 7645                          5322                 :           1226 :     size = sizeof(XLogCtlData);
                               5323                 :                : 
                               5324                 :                :     /* WAL insertion locks, plus alignment */
 4316 heikki.linnakangas@i     5325                 :           1226 :     size = add_size(size, mul_size(sizeof(WALInsertLockPadded), NUM_XLOGINSERT_LOCKS + 1));
                               5326                 :                :     /* xlblocks array */
  950 jdavis@postgresql.or     5327                 :           1226 :     size = add_size(size, mul_size(sizeof(pg_atomic_uint64), XLOGbuffers));
                               5328                 :                :     /* extra alignment padding for XLOG I/O buffers */
 1205 tmunro@postgresql.or     5329                 :           1226 :     size = add_size(size, Max(XLOG_BLCKSZ, PG_IO_ALIGN_SIZE));
                               5330                 :                :     /* and the buffers themselves */
 7419 tgl@sss.pgh.pa.us        5331                 :           1226 :     size = add_size(size, mul_size(XLOG_BLCKSZ, XLOGbuffers));
                               5332                 :                : 
  111 heikki.linnakangas@i     5333                 :           1226 :     ShmemRequestStruct(.name = "XLOG Ctl",
                               5334                 :                :                        .size = size,
                               5335                 :                :                        .ptr = (void **) &XLogCtl,
                               5336                 :                :         );
                               5337                 :           1226 :     ShmemRequestStruct(.name = "Control File",
                               5338                 :                :                        .size = sizeof(ControlFileData),
                               5339                 :                :                        .ptr = (void **) &ControlFile,
                               5340                 :                :         );
 9790 vadim4o@yahoo.com        5341                 :           1226 : }
                               5342                 :                : 
                               5343                 :                : /*
                               5344                 :                :  * XLOGShmemInit - initialize the XLogCtl shared memory area.
                               5345                 :                :  */
                               5346                 :                : static void
  111 heikki.linnakangas@i     5347                 :           1223 : XLOGShmemInit(void *arg)
                               5348                 :                : {
                               5349                 :                :     char       *allocptr;
                               5350                 :                :     int         i;
                               5351                 :                : 
                               5352                 :                : #ifdef WAL_DEBUG
                               5353                 :                : 
                               5354                 :                :     /*
                               5355                 :                :      * Create a memory context for WAL debugging that's exempt from the normal
                               5356                 :                :      * "no pallocs in critical section" rule. Yes, that can lead to a PANIC if
                               5357                 :                :      * an allocation fails, but wal_debug is not for production use anyway.
                               5358                 :                :      */
                               5359                 :                :     if (walDebugCxt == NULL)
                               5360                 :                :     {
                               5361                 :                :         walDebugCxt = AllocSetContextCreate(TopMemoryContext,
                               5362                 :                :                                             "WAL Debug",
                               5363                 :                :                                             ALLOCSET_DEFAULT_SIZES);
                               5364                 :                :         MemoryContextAllowInCriticalSection(walDebugCxt, true);
                               5365                 :                :     }
                               5366                 :                : #endif
                               5367                 :                : 
 9266 tgl@sss.pgh.pa.us        5368                 :           1223 :     memset(XLogCtl, 0, sizeof(XLogCtlData));
                               5369                 :                : 
                               5370                 :                :     /*
                               5371                 :                :      * Already have read control file locally, unless in bootstrap mode. Move
                               5372                 :                :      * contents into shared memory.
                               5373                 :                :      */
  111 heikki.linnakangas@i     5374         [ +  + ]:           1223 :     if (LocalControlFile)
                               5375                 :                :     {
                               5376                 :           1055 :         memcpy(ControlFile, LocalControlFile, sizeof(ControlFileData));
                               5377                 :           1055 :         pfree(LocalControlFile);
                               5378                 :           1055 :         LocalControlFile = NULL;
                               5379                 :                :     }
                               5380                 :                : 
                               5381                 :                :     /*
                               5382                 :                :      * Since XLogCtlData contains XLogRecPtr fields, its sizeof should be a
                               5383                 :                :      * multiple of the alignment for same, so no extra alignment padding is
                               5384                 :                :      * needed here.
                               5385                 :                :      */
 4766                          5386                 :           1223 :     allocptr = ((char *) XLogCtl) + sizeof(XLogCtlData);
  950 jdavis@postgresql.or     5387                 :           1223 :     XLogCtl->xlblocks = (pg_atomic_uint64 *) allocptr;
                               5388                 :           1223 :     allocptr += sizeof(pg_atomic_uint64) * XLOGbuffers;
                               5389                 :                : 
                               5390         [ +  + ]:         346807 :     for (i = 0; i < XLOGbuffers; i++)
                               5391                 :                :     {
                               5392                 :         345584 :         pg_atomic_init_u64(&XLogCtl->xlblocks[i], InvalidXLogRecPtr);
                               5393                 :                :     }
                               5394                 :                : 
                               5395                 :                :     /* WAL insertion locks. Ensure they're aligned to the full padded size */
 4510 heikki.linnakangas@i     5396                 :           1223 :     allocptr += sizeof(WALInsertLockPadded) -
 3322 tgl@sss.pgh.pa.us        5397                 :           1223 :         ((uintptr_t) allocptr) % sizeof(WALInsertLockPadded);
 4510 heikki.linnakangas@i     5398                 :           1223 :     WALInsertLocks = XLogCtl->Insert.WALInsertLocks =
                               5399                 :                :         (WALInsertLockPadded *) allocptr;
 4316                          5400                 :           1223 :     allocptr += sizeof(WALInsertLockPadded) * NUM_XLOGINSERT_LOCKS;
                               5401                 :                : 
                               5402         [ +  + ]:          11007 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               5403                 :                :     {
 3876 rhaas@postgresql.org     5404                 :           9784 :         LWLockInitialize(&WALInsertLocks[i].l.lock, LWTRANCHE_WAL_INSERT);
 1097 michael@paquier.xyz      5405                 :           9784 :         pg_atomic_init_u64(&WALInsertLocks[i].l.insertingAt, InvalidXLogRecPtr);
 3503 andres@anarazel.de       5406                 :           9784 :         WALInsertLocks[i].l.lastImportantAt = InvalidXLogRecPtr;
                               5407                 :                :     }
                               5408                 :                : 
                               5409                 :                :     /*
                               5410                 :                :      * Align the start of the page buffers to a full xlog block size boundary.
                               5411                 :                :      * This simplifies some calculations in XLOG insertion. It is also
                               5412                 :                :      * required for O_DIRECT.
                               5413                 :                :      */
 4766 heikki.linnakangas@i     5414                 :           1223 :     allocptr = (char *) TYPEALIGN(XLOG_BLCKSZ, allocptr);
 7645 tgl@sss.pgh.pa.us        5415                 :           1223 :     XLogCtl->pages = allocptr;
 7419                          5416                 :           1223 :     memset(XLogCtl->pages, 0, (Size) XLOG_BLCKSZ * XLOGbuffers);
                               5417                 :                : 
                               5418                 :                :     /*
                               5419                 :                :      * Do basic initialization of XLogCtl shared data. (StartupXLOG will fill
                               5420                 :                :      * in additional info.)
                               5421                 :                :      */
 9266                          5422                 :           1223 :     XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
 2284 michael@paquier.xyz      5423                 :           1223 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
 1854 noah@leadboat.com        5424                 :           1223 :     XLogCtl->InstallXLogFileSegmentActive = false;
 5192 tgl@sss.pgh.pa.us        5425                 :           1223 :     XLogCtl->WalWriterSleeping = false;
                               5426                 :                : 
                               5427                 :                :     /* Use the checksum info from control file */
  114 dgustafsson@postgres     5428                 :           1223 :     XLogCtl->data_checksum_version = ControlFile->data_checksum_version;
                               5429                 :           1223 :     SetLocalDataChecksumState(XLogCtl->data_checksum_version);
                               5430                 :                : 
 4766 heikki.linnakangas@i     5431                 :           1223 :     SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 9066 tgl@sss.pgh.pa.us        5432                 :           1223 :     SpinLockInit(&XLogCtl->info_lck);
  840 alvherre@alvh.no-ip.     5433                 :           1223 :     pg_atomic_init_u64(&XLogCtl->logInsertResult, InvalidXLogRecPtr);
  842                          5434                 :           1223 :     pg_atomic_init_u64(&XLogCtl->logWriteResult, InvalidXLogRecPtr);
                               5435                 :           1223 :     pg_atomic_init_u64(&XLogCtl->logFlushResult, InvalidXLogRecPtr);
  878 nathan@postgresql.or     5436                 :           1223 :     pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr);
 9790 vadim4o@yahoo.com        5437                 :           1223 : }
                               5438                 :                : 
                               5439                 :                : /*
                               5440                 :                :  * XLOGShmemAttach - re-establish WALInsertLocks pointer after attaching.
                               5441                 :                :  */
                               5442                 :                : static void
  111 heikki.linnakangas@i     5443                 :UBC           0 : XLOGShmemAttach(void *arg)
                               5444                 :                : {
                               5445                 :              0 :     WALInsertLocks = XLogCtl->Insert.WALInsertLocks;
                               5446                 :              0 : }
                               5447                 :                : 
                               5448                 :                : /*
                               5449                 :                :  * This func must be called ONCE on system install.  It creates pg_control
                               5450                 :                :  * and the initial XLOG segment.
                               5451                 :                :  */
                               5452                 :                : void
  733 peter@eisentraut.org     5453                 :CBC          56 : BootStrapXLOG(uint32 data_checksum_version)
                               5454                 :                : {
                               5455                 :                :     CheckPoint  checkPoint;
                               5456                 :                :     PGAlignedXLogBlock buffer;
                               5457                 :                :     XLogPageHeader page;
                               5458                 :                :     XLogLongPageHeader longpage;
                               5459                 :                :     XLogRecord *record;
                               5460                 :                :     char       *recptr;
                               5461                 :                :     uint64      sysidentifier;
                               5462                 :                :     struct timeval tv;
                               5463                 :                :     pg_crc32c   crc;
                               5464                 :                : 
                               5465                 :                :     /* allow ordinary WAL segment creation, like StartupXLOG() would */
 1439 michael@paquier.xyz      5466                 :             56 :     SetInstallXLogFileSegmentActive();
                               5467                 :                : 
                               5468                 :                :     /*
                               5469                 :                :      * Select a hopefully-unique system identifier code for this installation.
                               5470                 :                :      * We use the result of gettimeofday(), including the fractional seconds
                               5471                 :                :      * field, as being about as unique as we can easily get.  (Think not to
                               5472                 :                :      * use random(), since it hasn't been seeded and there's no portable way
                               5473                 :                :      * to seed it other than the system clock value...)  The upper half of the
                               5474                 :                :      * uint64 value is just the tv_sec part, while the lower half contains the
                               5475                 :                :      * tv_usec part (which must fit in 20 bits), plus 12 bits from our current
                               5476                 :                :      * PID for a little extra uniqueness.  A person knowing this encoding can
                               5477                 :                :      * determine the initialization time of the installation, which could
                               5478                 :                :      * perhaps be useful sometimes.
                               5479                 :                :      */
 8201 tgl@sss.pgh.pa.us        5480                 :             56 :     gettimeofday(&tv, NULL);
                               5481                 :             56 :     sysidentifier = ((uint64) tv.tv_sec) << 32;
 4474                          5482                 :             56 :     sysidentifier |= ((uint64) tv.tv_usec) << 12;
                               5483                 :             56 :     sysidentifier |= getpid() & 0xFFF;
                               5484                 :                : 
  230 peter@eisentraut.org     5485                 :             56 :     memset(&buffer, 0, sizeof buffer);
                               5486                 :             56 :     page = (XLogPageHeader) &buffer;
                               5487                 :                : 
                               5488                 :                :     /*
                               5489                 :                :      * Set up information for the initial checkpoint record
                               5490                 :                :      *
                               5491                 :                :      * The initial checkpoint record is written to the beginning of the WAL
                               5492                 :                :      * segment with logid=0 logseg=1. The very first WAL segment, 0/0, is not
                               5493                 :                :      * used, so that we can use 0/0 to mean "before any valid WAL segment".
                               5494                 :                :      */
 3232 andres@anarazel.de       5495                 :             56 :     checkPoint.redo = wal_segment_size + SizeOfXLogLongPHD;
 1724 rhaas@postgresql.org     5496                 :             56 :     checkPoint.ThisTimeLineID = BootstrapTimeLineID;
                               5497                 :             56 :     checkPoint.PrevTimeLineID = BootstrapTimeLineID;
 5296 simon@2ndQuadrant.co     5498                 :             56 :     checkPoint.fullPageWrites = fullPageWrites;
  215 msawada@postgresql.o     5499                 :             56 :     checkPoint.logicalDecodingEnabled = (wal_level == WAL_LEVEL_LOGICAL);
  734 rhaas@postgresql.org     5500                 :             56 :     checkPoint.wal_level = wal_level;
                               5501                 :                :     checkPoint.nextXid =
 2677 tmunro@postgresql.or     5502                 :             56 :         FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
 1837 tgl@sss.pgh.pa.us        5503                 :             56 :     checkPoint.nextOid = FirstGenbkiObjectId;
 7759                          5504                 :             56 :     checkPoint.nextMulti = FirstMultiXactId;
  229 heikki.linnakangas@i     5505                 :             56 :     checkPoint.nextMultiOffset = 1;
 6173 tgl@sss.pgh.pa.us        5506                 :             56 :     checkPoint.oldestXid = FirstNormalTransactionId;
 1557                          5507                 :             56 :     checkPoint.oldestXidDB = Template1DbOid;
 4932 alvherre@alvh.no-ip.     5508                 :             56 :     checkPoint.oldestMulti = FirstMultiXactId;
 1557 tgl@sss.pgh.pa.us        5509                 :             56 :     checkPoint.oldestMultiDB = Template1DbOid;
 3863 mail@joeconway.com       5510                 :             56 :     checkPoint.oldestCommitTsXid = InvalidTransactionId;
                               5511                 :             56 :     checkPoint.newestCommitTsXid = InvalidTransactionId;
 6734 tgl@sss.pgh.pa.us        5512                 :             56 :     checkPoint.time = (pg_time_t) time(NULL);
 6063 simon@2ndQuadrant.co     5513                 :             56 :     checkPoint.oldestActiveXid = InvalidTransactionId;
  114 dgustafsson@postgres     5514                 :             56 :     checkPoint.dataChecksumState = data_checksum_version;
                               5515                 :                : 
  961 heikki.linnakangas@i     5516                 :             56 :     TransamVariables->nextXid = checkPoint.nextXid;
                               5517                 :             56 :     TransamVariables->nextOid = checkPoint.nextOid;
                               5518                 :             56 :     TransamVariables->oidCount = 0;
 7718 tgl@sss.pgh.pa.us        5519                 :             56 :     MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 3412 rhaas@postgresql.org     5520                 :             56 :     AdvanceOldestClogXid(checkPoint.oldestXid);
 6003 tgl@sss.pgh.pa.us        5521                 :             56 :     SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
  229 heikki.linnakangas@i     5522                 :             56 :     SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
 4253 alvherre@alvh.no-ip.     5523                 :             56 :     SetCommitTsLimit(InvalidTransactionId, InvalidTransactionId);
                               5524                 :                : 
                               5525                 :                :     /* Set up the XLOG page header */
 9790 vadim4o@yahoo.com        5526                 :             56 :     page->xlp_magic = XLOG_PAGE_MAGIC;
 8040 tgl@sss.pgh.pa.us        5527                 :             56 :     page->xlp_info = XLP_LONG_HEADER;
 1724 rhaas@postgresql.org     5528                 :             56 :     page->xlp_tli = BootstrapTimeLineID;
 3232 andres@anarazel.de       5529                 :             56 :     page->xlp_pageaddr = wal_segment_size;
 8040 tgl@sss.pgh.pa.us        5530                 :             56 :     longpage = (XLogLongPageHeader) page;
                               5531                 :             56 :     longpage->xlp_sysid = sysidentifier;
 3232 andres@anarazel.de       5532                 :             56 :     longpage->xlp_seg_size = wal_segment_size;
 7417 tgl@sss.pgh.pa.us        5533                 :             56 :     longpage->xlp_xlog_blcksz = XLOG_BLCKSZ;
                               5534                 :                : 
                               5535                 :                :     /* Insert the initial checkpoint record */
 4266 heikki.linnakangas@i     5536                 :             56 :     recptr = ((char *) page + SizeOfXLogLongPHD);
                               5537                 :             56 :     record = (XLogRecord *) recptr;
  178 alvherre@kurilemu.de     5538                 :             56 :     record->xl_prev = InvalidXLogRecPtr;
 9790 vadim4o@yahoo.com        5539                 :             56 :     record->xl_xid = InvalidTransactionId;
 4266 heikki.linnakangas@i     5540                 :             56 :     record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(checkPoint);
 9266 tgl@sss.pgh.pa.us        5541                 :             56 :     record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
 9790 vadim4o@yahoo.com        5542                 :             56 :     record->xl_rmid = RM_XLOG_ID;
 4266 heikki.linnakangas@i     5543                 :             56 :     recptr += SizeOfXLogRecord;
                               5544                 :                :     /* fill the XLogRecordDataHeaderShort struct */
 3407 tgl@sss.pgh.pa.us        5545                 :             56 :     *(recptr++) = (char) XLR_BLOCK_ID_DATA_SHORT;
 4266 heikki.linnakangas@i     5546                 :             56 :     *(recptr++) = sizeof(checkPoint);
                               5547                 :             56 :     memcpy(recptr, &checkPoint, sizeof(checkPoint));
                               5548                 :             56 :     recptr += sizeof(checkPoint);
                               5549         [ -  + ]:             56 :     Assert(recptr - (char *) record == record->xl_tot_len);
                               5550                 :                : 
 4282                          5551                 :             56 :     INIT_CRC32C(crc);
 4266                          5552                 :             56 :     COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
 4282                          5553                 :             56 :     COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
                               5554                 :             56 :     FIN_CRC32C(crc);
 9341 vadim4o@yahoo.com        5555                 :             56 :     record->xl_crc = crc;
                               5556                 :                : 
                               5557                 :                :     /* Create first XLOG segment file */
 1724 rhaas@postgresql.org     5558                 :             56 :     openLogTLI = BootstrapTimeLineID;
                               5559                 :             56 :     openLogFile = XLogFileInit(1, BootstrapTimeLineID);
                               5560                 :                : 
                               5561                 :                :     /*
                               5562                 :                :      * We needn't bother with Reserve/ReleaseExternalFD here, since we'll
                               5563                 :                :      * close the file again in a moment.
                               5564                 :                :      */
                               5565                 :                : 
                               5566                 :                :     /* Write the first page with the initial record */
 9181 tgl@sss.pgh.pa.us        5567                 :             56 :     errno = 0;
 3417 rhaas@postgresql.org     5568                 :             56 :     pgstat_report_wait_start(WAIT_EVENT_WAL_BOOTSTRAP_WRITE);
  230 peter@eisentraut.org     5569         [ -  + ]:             56 :     if (write(openLogFile, &buffer, XLOG_BLCKSZ) != XLOG_BLCKSZ)
                               5570                 :                :     {
                               5571                 :                :         /* if write didn't set errno, assume problem is no disk space */
 9181 tgl@sss.pgh.pa.us        5572         [ #  # ]:UBC           0 :         if (errno == 0)
                               5573                 :              0 :             errno = ENOSPC;
 8406                          5574         [ #  # ]:              0 :         ereport(PANIC,
                               5575                 :                :                 (errcode_for_file_access(),
                               5576                 :                :                  errmsg("could not write bootstrap write-ahead log file: %m")));
                               5577                 :                :     }
 3417 rhaas@postgresql.org     5578                 :CBC          56 :     pgstat_report_wait_end();
                               5579                 :                : 
                               5580                 :             56 :     pgstat_report_wait_start(WAIT_EVENT_WAL_BOOTSTRAP_SYNC);
 9266 tgl@sss.pgh.pa.us        5581         [ -  + ]:             56 :     if (pg_fsync(openLogFile) != 0)
 8406 tgl@sss.pgh.pa.us        5582         [ #  # ]:UBC           0 :         ereport(PANIC,
                               5583                 :                :                 (errcode_for_file_access(),
                               5584                 :                :                  errmsg("could not fsync bootstrap write-ahead log file: %m")));
 3417 rhaas@postgresql.org     5585                 :CBC          56 :     pgstat_report_wait_end();
                               5586                 :                : 
 2577 peter@eisentraut.org     5587         [ -  + ]:             56 :     if (close(openLogFile) != 0)
 8217 tgl@sss.pgh.pa.us        5588         [ #  # ]:UBC           0 :         ereport(PANIC,
                               5589                 :                :                 (errcode_for_file_access(),
                               5590                 :                :                  errmsg("could not close bootstrap write-ahead log file: %m")));
                               5591                 :                : 
 9266 tgl@sss.pgh.pa.us        5592                 :CBC          56 :     openLogFile = -1;
                               5593                 :                : 
                               5594                 :                :     /* Now create pg_control */
  733 peter@eisentraut.org     5595                 :             56 :     InitControlFile(sysidentifier, data_checksum_version);
 9266 tgl@sss.pgh.pa.us        5596                 :             56 :     ControlFile->time = checkPoint.time;
 9790 vadim4o@yahoo.com        5597                 :             56 :     ControlFile->checkPoint = checkPoint.redo;
 9266 tgl@sss.pgh.pa.us        5598                 :             56 :     ControlFile->checkPointCopy = checkPoint;
                               5599                 :                : 
                               5600                 :                :     /* some additional ControlFile fields are set in WriteControlFile() */
 9374                          5601                 :             56 :     WriteControlFile();
                               5602                 :                : 
                               5603                 :                :     /* Bootstrap the commit log, too */
 9101                          5604                 :             56 :     BootStrapCLOG();
 4253 alvherre@alvh.no-ip.     5605                 :             56 :     BootStrapCommitTs();
 8060 tgl@sss.pgh.pa.us        5606                 :             56 :     BootStrapSUBTRANS();
 7759                          5607                 :             56 :     BootStrapMultiXact();
                               5608                 :                : 
                               5609                 :                :     /*
                               5610                 :                :      * Force control file to be read - in contrast to normal processing we'd
                               5611                 :                :      * otherwise never run the checks and GUC related initializations therein.
                               5612                 :                :      */
 3238 andres@anarazel.de       5613                 :             56 :     ReadControlFile();
 9790 vadim4o@yahoo.com        5614                 :             56 : }
                               5615                 :                : 
                               5616                 :                : static char *
  358 tgl@sss.pgh.pa.us        5617                 :            943 : str_time(pg_time_t tnow, char *buf, size_t bufsize)
                               5618                 :                : {
                               5619                 :            943 :     pg_strftime(buf, bufsize,
                               5620                 :                :                 "%Y-%m-%d %H:%M:%S %Z",
 6931                          5621                 :            943 :                 pg_localtime(&tnow, log_timezone));
                               5622                 :                : 
 9378 peter_e@gmx.net          5623                 :            943 :     return buf;
                               5624                 :                : }
                               5625                 :                : 
                               5626                 :                : /*
                               5627                 :                :  * Initialize the first WAL segment on new timeline.
                               5628                 :                :  */
                               5629                 :                : static void
 1621 heikki.linnakangas@i     5630                 :             60 : XLogInitNewTimeline(TimeLineID endTLI, XLogRecPtr endOfLog, TimeLineID newTLI)
                               5631                 :                : {
                               5632                 :                :     char        xlogfname[MAXFNAMELEN];
                               5633                 :                :     XLogSegNo   endLogSegNo;
                               5634                 :                :     XLogSegNo   startLogSegNo;
                               5635                 :                : 
                               5636                 :                :     /* we always switch to a new timeline after archive recovery */
 1724 rhaas@postgresql.org     5637         [ -  + ]:             60 :     Assert(endTLI != newTLI);
                               5638                 :                : 
                               5639                 :                :     /*
                               5640                 :                :      * Update min recovery point one last time.
                               5641                 :                :      */
 6240 heikki.linnakangas@i     5642                 :             60 :     UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
                               5643                 :                : 
                               5644                 :                :     /*
                               5645                 :                :      * Calculate the last segment on the old timeline, and the first segment
                               5646                 :                :      * on the new timeline. If the switch happens in the middle of a segment,
                               5647                 :                :      * they are the same, but if the switch happens exactly at a segment
                               5648                 :                :      * boundary, startLogSegNo will be endLogSegNo + 1.
                               5649                 :                :      */
 3232 andres@anarazel.de       5650                 :             60 :     XLByteToPrevSeg(endOfLog, endLogSegNo, wal_segment_size);
                               5651                 :             60 :     XLByteToSeg(endOfLog, startLogSegNo, wal_segment_size);
                               5652                 :                : 
                               5653                 :                :     /*
                               5654                 :                :      * Initialize the starting WAL segment for the new timeline. If the switch
                               5655                 :                :      * happens in the middle of a segment, copy data from the last WAL segment
                               5656                 :                :      * of the old timeline up to the switch point, to the starting WAL segment
                               5657                 :                :      * on the new timeline.
                               5658                 :                :      */
 4238 heikki.linnakangas@i     5659         [ +  + ]:             60 :     if (endLogSegNo == startLogSegNo)
                               5660                 :                :     {
                               5661                 :                :         /*
                               5662                 :                :          * Make a copy of the file on the new timeline.
                               5663                 :                :          *
                               5664                 :                :          * Writing WAL isn't allowed yet, so there are no locking
                               5665                 :                :          * considerations. But we should be just as tense as XLogFileInit to
                               5666                 :                :          * avoid emplacing a bogus file.
                               5667                 :                :          */
 1724 rhaas@postgresql.org     5668                 :             48 :         XLogFileCopy(newTLI, endLogSegNo, endTLI, endLogSegNo,
 3232 andres@anarazel.de       5669                 :             48 :                      XLogSegmentOffset(endOfLog, wal_segment_size));
                               5670                 :                :     }
                               5671                 :                :     else
                               5672                 :                :     {
                               5673                 :                :         /*
                               5674                 :                :          * The switch happened at a segment boundary, so just create the next
                               5675                 :                :          * segment on the new timeline.
                               5676                 :                :          */
                               5677                 :                :         int         fd;
                               5678                 :                : 
 1724 rhaas@postgresql.org     5679                 :             12 :         fd = XLogFileInit(startLogSegNo, newTLI);
                               5680                 :                : 
 2577 peter@eisentraut.org     5681         [ -  + ]:             12 :         if (close(fd) != 0)
                               5682                 :                :         {
 2427 michael@paquier.xyz      5683                 :UBC           0 :             int         save_errno = errno;
                               5684                 :                : 
 1724 rhaas@postgresql.org     5685                 :              0 :             XLogFileName(xlogfname, newTLI, startLogSegNo, wal_segment_size);
 2427 michael@paquier.xyz      5686                 :              0 :             errno = save_errno;
 4235 heikki.linnakangas@i     5687         [ #  # ]:              0 :             ereport(ERROR,
                               5688                 :                :                     (errcode_for_file_access(),
                               5689                 :                :                      errmsg("could not close file \"%s\": %m", xlogfname)));
                               5690                 :                :         }
                               5691                 :                :     }
                               5692                 :                : 
                               5693                 :                :     /*
                               5694                 :                :      * Let's just make real sure there are not .ready or .done flags posted
                               5695                 :                :      * for the new segment.
                               5696                 :                :      */
 1724 rhaas@postgresql.org     5697                 :CBC          60 :     XLogFileName(xlogfname, newTLI, startLogSegNo, wal_segment_size);
 4294 fujii@postgresql.org     5698                 :             60 :     XLogArchiveCleanup(xlogfname);
 8042 tgl@sss.pgh.pa.us        5699                 :             60 : }
                               5700                 :                : 
                               5701                 :                : /*
                               5702                 :                :  * Perform cleanup actions at the conclusion of archive recovery.
                               5703                 :                :  */
                               5704                 :                : static void
 1724 rhaas@postgresql.org     5705                 :             60 : CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog,
                               5706                 :                :                             TimeLineID newTLI)
                               5707                 :                : {
                               5708                 :                :     /*
                               5709                 :                :      * Execute the recovery_end_command, if any.
                               5710                 :                :      */
 1747                          5711   [ +  -  +  + ]:             60 :     if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
 1266 michael@paquier.xyz      5712                 :              2 :         ExecuteRecoveryCommand(recoveryEndCommand,
                               5713                 :                :                                "recovery_end_command",
                               5714                 :                :                                true,
                               5715                 :                :                                WAIT_EVENT_RECOVERY_END_COMMAND);
                               5716                 :                : 
                               5717                 :                :     /*
                               5718                 :                :      * We switched to a new timeline. Clean up segments on the old timeline.
                               5719                 :                :      *
                               5720                 :                :      * If there are any higher-numbered segments on the old timeline, remove
                               5721                 :                :      * them. They might contain valid WAL, but they might also be
                               5722                 :                :      * pre-allocated files containing garbage. In any case, they are not part
                               5723                 :                :      * of the new timeline's history so we don't need them.
                               5724                 :                :      */
 1724 rhaas@postgresql.org     5725                 :             60 :     RemoveNonParentXlogFiles(EndOfLog, newTLI);
                               5726                 :                : 
                               5727                 :                :     /*
                               5728                 :                :      * If the switch happened in the middle of a segment, what to do with the
                               5729                 :                :      * last, partial segment on the old timeline? If we don't archive it, and
                               5730                 :                :      * the server that created the WAL never archives it either (e.g. because
                               5731                 :                :      * it was hit by a meteor), it will never make it to the archive. That's
                               5732                 :                :      * OK from our point of view, because the new segment that we created with
                               5733                 :                :      * the new TLI contains all the WAL from the old timeline up to the switch
                               5734                 :                :      * point. But if you later try to do PITR to the "missing" WAL on the old
                               5735                 :                :      * timeline, recovery won't find it in the archive. It's physically
                               5736                 :                :      * present in the new file with new TLI, but recovery won't look there
                               5737                 :                :      * when it's recovering to the older timeline. On the other hand, if we
                               5738                 :                :      * archive the partial segment, and the original server on that timeline
                               5739                 :                :      * is still running and archives the completed version of the same segment
                               5740                 :                :      * later, it will fail. (We used to do that in 9.4 and below, and it
                               5741                 :                :      * caused such problems).
                               5742                 :                :      *
                               5743                 :                :      * As a compromise, we rename the last segment with the .partial suffix,
                               5744                 :                :      * and archive it. Archive recovery will never try to read .partial
                               5745                 :                :      * segments, so they will normally go unused. But in the odd PITR case,
                               5746                 :                :      * the administrator can copy them manually to the pg_wal directory
                               5747                 :                :      * (removing the suffix). They can be useful in debugging, too.
                               5748                 :                :      *
                               5749                 :                :      * If a .done or .ready file already exists for the old timeline, however,
                               5750                 :                :      * we had already determined that the segment is complete, so we can let
                               5751                 :                :      * it be archived normally. (In particular, if it was restored from the
                               5752                 :                :      * archive to begin with, it's expected to have a .done file).
                               5753                 :                :      */
 1747                          5754   [ +  +  +  + ]:            108 :     if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
                               5755   [ +  +  -  + ]:             48 :         XLogArchivingActive())
                               5756                 :                :     {
                               5757                 :                :         char        origfname[MAXFNAMELEN];
                               5758                 :                :         XLogSegNo   endLogSegNo;
                               5759                 :                : 
                               5760                 :             10 :         XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
                               5761                 :             10 :         XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
                               5762                 :                : 
                               5763         [ +  + ]:             10 :         if (!XLogArchiveIsReadyOrDone(origfname))
                               5764                 :                :         {
                               5765                 :                :             char        origpath[MAXPGPATH];
                               5766                 :                :             char        partialfname[MAXFNAMELEN];
                               5767                 :                :             char        partialpath[MAXPGPATH];
                               5768                 :                : 
                               5769                 :                :             /*
                               5770                 :                :              * If we're summarizing WAL, we can't rename the partial file
                               5771                 :                :              * until the summarizer finishes with it, else it will fail.
                               5772                 :                :              */
  730                          5773         [ +  + ]:              6 :             if (summarize_wal)
                               5774                 :              1 :                 WaitForWalSummarization(EndOfLog);
                               5775                 :                : 
 1747                          5776                 :              6 :             XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
                               5777                 :              6 :             snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
                               5778                 :              6 :             snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
                               5779                 :                : 
                               5780                 :                :             /*
                               5781                 :                :              * Make sure there's no .done or .ready file for the .partial
                               5782                 :                :              * file.
                               5783                 :                :              */
                               5784                 :              6 :             XLogArchiveCleanup(partialfname);
                               5785                 :                : 
                               5786                 :              6 :             durable_rename(origpath, partialpath, ERROR);
                               5787                 :              6 :             XLogArchiveNotify(partialfname);
                               5788                 :                :         }
                               5789                 :                :     }
                               5790                 :             60 : }
                               5791                 :                : 
                               5792                 :                : /*
                               5793                 :                :  * Check to see if required parameters are set high enough on this server
                               5794                 :                :  * for various aspects of recovery operation.
                               5795                 :                :  *
                               5796                 :                :  * Note that all the parameters which this function tests need to be
                               5797                 :                :  * listed in Administrator's Overview section in high-availability.sgml.
                               5798                 :                :  * If you change them, don't forget to update the list.
                               5799                 :                :  */
                               5800                 :                : static void
 1621 heikki.linnakangas@i     5801                 :            273 : CheckRequiredParameterValues(void)
                               5802                 :                : {
                               5803                 :                :     /*
                               5804                 :                :      * For archive recovery, the WAL must be generated with at least 'replica'
                               5805                 :                :      * wal_level.
                               5806                 :                :      */
                               5807   [ +  +  +  + ]:            273 :     if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
                               5808                 :                :     {
                               5809         [ +  - ]:              2 :         ereport(FATAL,
                               5810                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5811                 :                :                  errmsg("WAL was generated with \"wal_level=minimal\", cannot continue recovering"),
                               5812                 :                :                  errdetail("This happens if you temporarily set \"wal_level=minimal\" on the server."),
                               5813                 :                :                  errhint("Use a backup taken after setting \"wal_level\" to higher than \"minimal\".")));
                               5814                 :                :     }
                               5815                 :                : 
                               5816                 :                :     /*
                               5817                 :                :      * For Hot Standby, the WAL must be generated with 'replica' mode, and we
                               5818                 :                :      * must have at least as many backend slots as the primary.
                               5819                 :                :      */
 4526                          5820   [ +  +  +  + ]:            271 :     if (ArchiveRecoveryRequested && EnableHotStandby)
                               5821                 :                :     {
                               5822                 :                :         /* We ignore autovacuum_worker_slots when we make this test. */
 5933                          5823                 :            146 :         RecoveryRequiresIntParameter("max_connections",
                               5824                 :                :                                      MaxConnections,
 5932 tgl@sss.pgh.pa.us        5825                 :            146 :                                      ControlFile->MaxConnections);
 4770 rhaas@postgresql.org     5826                 :            146 :         RecoveryRequiresIntParameter("max_worker_processes",
                               5827                 :                :                                      max_worker_processes,
                               5828                 :            146 :                                      ControlFile->max_worker_processes);
 2721 michael@paquier.xyz      5829                 :            146 :         RecoveryRequiresIntParameter("max_wal_senders",
                               5830                 :                :                                      max_wal_senders,
                               5831                 :            146 :                                      ControlFile->max_wal_senders);
 5072 tgl@sss.pgh.pa.us        5832                 :            146 :         RecoveryRequiresIntParameter("max_prepared_transactions",
                               5833                 :                :                                      max_prepared_xacts,
 5932                          5834                 :            146 :                                      ControlFile->max_prepared_xacts);
 5072                          5835                 :            146 :         RecoveryRequiresIntParameter("max_locks_per_transaction",
                               5836                 :                :                                      max_locks_per_xact,
 5932                          5837                 :            146 :                                      ControlFile->max_locks_per_xact);
                               5838                 :                :     }
 6063 simon@2ndQuadrant.co     5839                 :            271 : }
                               5840                 :                : 
                               5841                 :                : /*
                               5842                 :                :  * This must be called ONCE during postmaster or standalone-backend startup
                               5843                 :                :  */
                               5844                 :                : void
 9266 tgl@sss.pgh.pa.us        5845                 :           1065 : StartupXLOG(void)
                               5846                 :                : {
                               5847                 :                :     XLogCtlInsert *Insert;
                               5848                 :                :     CheckPoint  checkPoint;
                               5849                 :                :     bool        wasShutdown;
                               5850                 :                :     bool        didCrash;
                               5851                 :                :     bool        haveTblspcMap;
                               5852                 :                :     bool        haveBackupLabel;
                               5853                 :                :     XLogRecPtr  EndOfLog;
                               5854                 :                :     TimeLineID  EndOfLogTLI;
                               5855                 :                :     TimeLineID  newTLI;
                               5856                 :                :     bool        performedWalRecovery;
                               5857                 :                :     EndOfWalRecoveryInfo *endOfRecoveryInfo;
                               5858                 :                :     XLogRecPtr  abortedRecPtr;
                               5859                 :                :     XLogRecPtr  missingContrecPtr;
                               5860                 :                :     TransactionId oldestActiveXID;
 2188 fujii@postgresql.org     5861                 :           1065 :     bool        promoted = false;
                               5862                 :                :     char        timebuf[128];
                               5863                 :                : 
                               5864                 :                :     /*
                               5865                 :                :      * We should have an aux process resource owner to use, and we should not
                               5866                 :                :      * be in a transaction that's installed some other resowner.
                               5867                 :                :      */
 2930 tgl@sss.pgh.pa.us        5868         [ -  + ]:           1065 :     Assert(AuxProcessResourceOwner != NULL);
                               5869   [ +  -  -  + ]:           1065 :     Assert(CurrentResourceOwner == NULL ||
                               5870                 :                :            CurrentResourceOwner == AuxProcessResourceOwner);
                               5871                 :           1065 :     CurrentResourceOwner = AuxProcessResourceOwner;
                               5872                 :                : 
                               5873                 :                :     /*
                               5874                 :                :      * Check that contents look valid.
                               5875                 :                :      */
 2452 peter@eisentraut.org     5876         [ -  + ]:           1065 :     if (!XRecOffIsValid(ControlFile->checkPoint))
 8406 tgl@sss.pgh.pa.us        5877         [ #  # ]:UBC           0 :         ereport(FATAL,
                               5878                 :                :                 (errcode(ERRCODE_DATA_CORRUPTED),
                               5879                 :                :                  errmsg("control file contains invalid checkpoint location")));
                               5880                 :                : 
 2452 peter@eisentraut.org     5881   [ +  +  -  -  :CBC        1065 :     switch (ControlFile->state)
                                           +  +  - ]
                               5882                 :                :     {
                               5883                 :            831 :         case DB_SHUTDOWNED:
                               5884                 :                : 
                               5885                 :                :             /*
                               5886                 :                :              * This is the expected case, so don't be chatty in standalone
                               5887                 :                :              * mode
                               5888                 :                :              */
                               5889   [ +  +  +  + ]:            831 :             ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                               5890                 :                :                     (errmsg("database system was shut down at %s",
                               5891                 :                :                             str_time(ControlFile->time,
                               5892                 :                :                                      timebuf, sizeof(timebuf)))));
                               5893                 :            831 :             break;
                               5894                 :                : 
                               5895                 :             34 :         case DB_SHUTDOWNED_IN_RECOVERY:
                               5896         [ +  - ]:             34 :             ereport(LOG,
                               5897                 :                :                     (errmsg("database system was shut down in recovery at %s",
                               5898                 :                :                             str_time(ControlFile->time,
                               5899                 :                :                                      timebuf, sizeof(timebuf)))));
                               5900                 :             34 :             break;
                               5901                 :                : 
 2452 peter@eisentraut.org     5902                 :UBC           0 :         case DB_SHUTDOWNING:
                               5903         [ #  # ]:              0 :             ereport(LOG,
                               5904                 :                :                     (errmsg("database system shutdown was interrupted; last known up at %s",
                               5905                 :                :                             str_time(ControlFile->time,
                               5906                 :                :                                      timebuf, sizeof(timebuf)))));
                               5907                 :              0 :             break;
                               5908                 :                : 
                               5909                 :              0 :         case DB_IN_CRASH_RECOVERY:
                               5910         [ #  # ]:              0 :             ereport(LOG,
                               5911                 :                :                     (errmsg("database system was interrupted while in recovery at %s",
                               5912                 :                :                             str_time(ControlFile->time,
                               5913                 :                :                                      timebuf, sizeof(timebuf))),
                               5914                 :                :                      errhint("This probably means that some data is corrupted and"
                               5915                 :                :                              " you will have to use the last backup for recovery.")));
                               5916                 :              0 :             break;
                               5917                 :                : 
 2452 peter@eisentraut.org     5918                 :CBC           9 :         case DB_IN_ARCHIVE_RECOVERY:
                               5919         [ +  - ]:              9 :             ereport(LOG,
                               5920                 :                :                     (errmsg("database system was interrupted while in recovery at log time %s",
                               5921                 :                :                             str_time(ControlFile->checkPointCopy.time,
                               5922                 :                :                                      timebuf, sizeof(timebuf))),
                               5923                 :                :                      errhint("If this has occurred more than once some data might be corrupted"
                               5924                 :                :                              " and you might need to choose an earlier recovery target.")));
                               5925                 :              9 :             break;
                               5926                 :                : 
                               5927                 :            191 :         case DB_IN_PRODUCTION:
                               5928         [ +  - ]:            191 :             ereport(LOG,
                               5929                 :                :                     (errmsg("database system was interrupted; last known up at %s",
                               5930                 :                :                             str_time(ControlFile->time,
                               5931                 :                :                                      timebuf, sizeof(timebuf)))));
                               5932                 :            191 :             break;
                               5933                 :                : 
 2452 peter@eisentraut.org     5934                 :UBC           0 :         default:
                               5935         [ #  # ]:              0 :             ereport(FATAL,
                               5936                 :                :                     (errcode(ERRCODE_DATA_CORRUPTED),
                               5937                 :                :                      errmsg("control file contains invalid database cluster state")));
                               5938                 :                :     }
                               5939                 :                : 
                               5940                 :                :     /* This is just to allow attaching to startup process with a debugger */
                               5941                 :                : #ifdef XLOG_REPLAY_DELAY
                               5942                 :                :     if (ControlFile->state != DB_SHUTDOWNED)
                               5943                 :                :         pg_usleep(60000000L);
                               5944                 :                : #endif
                               5945                 :                : 
                               5946                 :                :     /*
                               5947                 :                :      * Verify that pg_wal, pg_wal/archive_status, and pg_wal/summaries exist.
                               5948                 :                :      * In cases where someone has performed a copy for PITR, these directories
                               5949                 :                :      * may have been excluded and need to be re-created.
                               5950                 :                :      */
 6468 tgl@sss.pgh.pa.us        5951                 :CBC        1065 :     ValidateXLOGDirectoryStructure();
                               5952                 :                : 
                               5953                 :                :     /* Set up timeout handler needed to report startup progress. */
 1735 rhaas@postgresql.org     5954         [ +  + ]:           1065 :     if (!IsBootstrapProcessingMode())
                               5955                 :           1009 :         RegisterTimeout(STARTUP_PROGRESS_TIMEOUT,
                               5956                 :                :                         startup_progress_timeout_handler);
                               5957                 :                : 
                               5958                 :                :     /*----------
                               5959                 :                :      * If we previously crashed, perform a couple of actions:
                               5960                 :                :      *
                               5961                 :                :      * - The pg_wal directory may still include some temporary WAL segments
                               5962                 :                :      *   used when creating a new segment, so perform some clean up to not
                               5963                 :                :      *   bloat this path.  This is done first as there is no point to sync
                               5964                 :                :      *   this temporary data.
                               5965                 :                :      *
                               5966                 :                :      * - There might be data which we had written, intending to fsync it, but
                               5967                 :                :      *   which we had not actually fsync'd yet.  Therefore, a power failure in
                               5968                 :                :      *   the near future might cause earlier unflushed writes to be lost, even
                               5969                 :                :      *   though more recent data written to disk from here on would be
                               5970                 :                :      *   persisted.  To avoid that, fsync the entire data directory.
                               5971                 :                :      */
 1621 heikki.linnakangas@i     5972         [ +  + ]:           1065 :     if (ControlFile->state != DB_SHUTDOWNED &&
                               5973         [ +  + ]:            234 :         ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
                               5974                 :                :     {
                               5975                 :            200 :         RemoveTempXlogFiles();
                               5976                 :            200 :         SyncDataDirectory();
 1572 andres@anarazel.de       5977                 :            200 :         didCrash = true;
                               5978                 :                :     }
                               5979                 :                :     else
                               5980                 :            865 :         didCrash = false;
                               5981                 :                : 
                               5982                 :                :     /*
                               5983                 :                :      * Prepare for WAL recovery if needed.
                               5984                 :                :      *
                               5985                 :                :      * InitWalRecovery analyzes the control file and the backup label file, if
                               5986                 :                :      * any.  It updates the in-memory ControlFile buffer according to the
                               5987                 :                :      * starting checkpoint, and sets InRecovery and ArchiveRecoveryRequested.
                               5988                 :                :      * It also applies the tablespace map file, if any.
                               5989                 :                :      */
 1621 heikki.linnakangas@i     5990                 :           1065 :     InitWalRecovery(ControlFile, &wasShutdown,
                               5991                 :                :                     &haveBackupLabel, &haveTblspcMap);
                               5992                 :           1059 :     checkPoint = ControlFile->checkPointCopy;
                               5993                 :                : 
                               5994                 :                :     /* initialize shared memory variables from the checkpoint record */
  961                          5995                 :           1059 :     TransamVariables->nextXid = checkPoint.nextXid;
                               5996                 :           1059 :     TransamVariables->nextOid = checkPoint.nextOid;
                               5997                 :           1059 :     TransamVariables->oidCount = 0;
 7718 tgl@sss.pgh.pa.us        5998                 :           1059 :     MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 3412 rhaas@postgresql.org     5999                 :           1059 :     AdvanceOldestClogXid(checkPoint.oldestXid);
 6003 tgl@sss.pgh.pa.us        6000                 :           1059 :     SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
  229 heikki.linnakangas@i     6001                 :           1059 :     SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
 3863 mail@joeconway.com       6002                 :           1059 :     SetCommitTsLimit(checkPoint.oldestCommitTsXid,
                               6003                 :                :                      checkPoint.newestCommitTsXid);
                               6004                 :                : 
                               6005                 :                :     /*
                               6006                 :                :      * Clear out any old relcache cache files.  This is *necessary* if we do
                               6007                 :                :      * any WAL replay, since that would probably result in the cache files
                               6008                 :                :      * being out of sync with database reality.  In theory we could leave them
                               6009                 :                :      * in place if the database had been cleanly shut down, but it seems
                               6010                 :                :      * safest to just remove them always and let them be rebuilt during the
                               6011                 :                :      * first backend startup.  These files needs to be removed from all
                               6012                 :                :      * directories including pg_tblspc, however the symlinks are created only
                               6013                 :                :      * after reading tablespace_map file in case of archive recovery from
                               6014                 :                :      * backup, so needs to clear old relcache files here after creating
                               6015                 :                :      * symlinks.
                               6016                 :                :      */
 1621 heikki.linnakangas@i     6017                 :           1059 :     RelationCacheInitFileRemove();
                               6018                 :                : 
                               6019                 :                :     /*
                               6020                 :                :      * Initialize replication slots, before there's a chance to remove
                               6021                 :                :      * required resources.
                               6022                 :                :      */
 4427 andres@anarazel.de       6023                 :           1059 :     StartupReplicationSlots();
                               6024                 :                : 
                               6025                 :                :     /*
                               6026                 :                :      * Startup the logical decoding status with the last status stored in the
                               6027                 :                :      * checkpoint record.
                               6028                 :                :      */
  215 msawada@postgresql.o     6029                 :           1057 :     StartupLogicalDecodingStatus(checkPoint.logicalDecodingEnabled);
                               6030                 :                : 
                               6031                 :                :     /*
                               6032                 :                :      * Startup logical state, needs to be setup now so we have proper data
                               6033                 :                :      * during crash recovery.
                               6034                 :                :      */
 4528 rhaas@postgresql.org     6035                 :           1057 :     StartupReorderBuffer();
                               6036                 :                : 
                               6037                 :                :     /*
                               6038                 :                :      * Startup CLOG. This must be done after TransamVariables->nextXid has
                               6039                 :                :      * been initialized and before we accept connections or begin WAL replay.
                               6040                 :                :      */
 2006                          6041                 :           1057 :     StartupCLOG();
                               6042                 :                : 
                               6043                 :                :     /*
                               6044                 :                :      * Startup MultiXact. We need to do this early to be able to replay
                               6045                 :                :      * truncations.
                               6046                 :                :      */
 4622 alvherre@alvh.no-ip.     6047                 :           1057 :     StartupMultiXact();
                               6048                 :                : 
                               6049                 :                :     /*
                               6050                 :                :      * Ditto for commit timestamps.  Activate the facility if the setting is
                               6051                 :                :      * enabled in the control file, as there should be no tracking of commit
                               6052                 :                :      * timestamps done when the setting was disabled.  This facility can be
                               6053                 :                :      * started or stopped when replaying a XLOG_PARAMETER_CHANGE record.
                               6054                 :                :      */
 2860 michael@paquier.xyz      6055         [ +  + ]:           1057 :     if (ControlFile->track_commit_timestamp)
 3880 alvherre@alvh.no-ip.     6056                 :             14 :         StartupCommitTs();
                               6057                 :                : 
                               6058                 :                :     /*
                               6059                 :                :      * Recover knowledge about replay progress of known replication partners.
                               6060                 :                :      */
 4106 andres@anarazel.de       6061                 :           1057 :     StartupReplicationOrigin();
                               6062                 :                : 
                               6063                 :                :     /*
                               6064                 :                :      * Initialize unlogged LSN. On a clean shutdown, it's restored from the
                               6065                 :                :      * control file. On recovery, all unlogged relations are blown away, so
                               6066                 :                :      * the unlogged LSN counter can be reset too.
                               6067                 :                :      */
 4913 heikki.linnakangas@i     6068         [ +  + ]:           1057 :     if (ControlFile->state == DB_SHUTDOWNED)
  878 nathan@postgresql.or     6069                 :            822 :         pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
                               6070                 :            822 :                                        ControlFile->unloggedLSN);
                               6071                 :                :     else
                               6072                 :            235 :         pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
                               6073                 :                :                                        FirstNormalUnloggedLSN);
                               6074                 :                : 
                               6075                 :                :     /*
                               6076                 :                :      * Copy any missing timeline history files between 'now' and the recovery
                               6077                 :                :      * target timeline from archive to pg_wal. While we don't need those files
                               6078                 :                :      * ourselves - the history file of the recovery target timeline covers all
                               6079                 :                :      * the previous timelines in the history too - a cascading standby server
                               6080                 :                :      * might be interested in them. Or, if you archive the WAL from this
                               6081                 :                :      * server to a different archive than the primary, it'd be good for all
                               6082                 :                :      * the history files to get archived there after failover, so that you can
                               6083                 :                :      * use one of the old timelines as a PITR target. Timeline history files
                               6084                 :                :      * are small, so it's better to copy them unnecessarily than not copy them
                               6085                 :                :      * and regret later.
                               6086                 :                :      */
 1621 heikki.linnakangas@i     6087                 :           1057 :     restoreTimeLineHistoryFiles(checkPoint.ThisTimeLineID, recoveryTargetTLI);
                               6088                 :                : 
                               6089                 :                :     /*
                               6090                 :                :      * Before running in recovery, scan pg_twophase and fill in its status to
                               6091                 :                :      * be able to work on entries generated by redo.  Doing a scan before
                               6092                 :                :      * taking any recovery action has the merit to discard any 2PC files that
                               6093                 :                :      * are newer than the first record to replay, saving from any conflicts at
                               6094                 :                :      * replay.  This avoids as well any subsequent scans when doing recovery
                               6095                 :                :      * of the on-disk two-phase data.
                               6096                 :                :      */
 3400 simon@2ndQuadrant.co     6097                 :           1057 :     restoreTwoPhaseData();
                               6098                 :                : 
                               6099                 :                :     /*
                               6100                 :                :      * When starting with crash recovery, reset pgstat data - it might not be
                               6101                 :                :      * valid. Otherwise restore pgstat data. It's safe to do this here,
                               6102                 :                :      * because postmaster will not yet have started any other processes.
                               6103                 :                :      *
                               6104                 :                :      * NB: Restoring replication slot stats relies on slot state to have
                               6105                 :                :      * already been restored from disk.
                               6106                 :                :      *
                               6107                 :                :      * TODO: With a bit of extra work we could just start with a pgstat file
                               6108                 :                :      * associated with the checkpoint redo location we're starting from.
                               6109                 :                :      */
 1572 andres@anarazel.de       6110         [ +  + ]:           1057 :     if (didCrash)
                               6111                 :            194 :         pgstat_discard_stats();
                               6112                 :                :     else
  496 michael@paquier.xyz      6113                 :            863 :         pgstat_restore_stats();
                               6114                 :                : 
 5296 simon@2ndQuadrant.co     6115                 :           1057 :     lastFullPageWrites = checkPoint.fullPageWrites;
                               6116                 :                : 
 4766 heikki.linnakangas@i     6117                 :           1057 :     RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
 4280                          6118                 :           1057 :     doPageWrites = lastFullPageWrites;
                               6119                 :                : 
                               6120                 :                :     /* REDO */
 8026 tgl@sss.pgh.pa.us        6121         [ +  + ]:           1057 :     if (InRecovery)
                               6122                 :                :     {
                               6123                 :                :         /* Initialize state for RecoveryInProgress() */
 1621 heikki.linnakangas@i     6124                 :            235 :         SpinLockAcquire(&XLogCtl->info_lck);
                               6125         [ +  + ]:            235 :         if (InArchiveRecovery)
                               6126                 :            133 :             XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
                               6127                 :                :         else
                               6128                 :            102 :             XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
                               6129                 :            235 :         SpinLockRelease(&XLogCtl->info_lck);
                               6130                 :                : 
                               6131                 :                :         /*
                               6132                 :                :          * Update pg_control to show that we are recovering and to show the
                               6133                 :                :          * selected checkpoint as the place we are starting from. We also mark
                               6134                 :                :          * pg_control with any minimum recovery stop point obtained from a
                               6135                 :                :          * backup history file.
                               6136                 :                :          *
                               6137                 :                :          * No need to hold ControlFileLock yet, we aren't up far enough.
                               6138                 :                :          */
                               6139                 :            235 :         UpdateControlFile();
                               6140                 :                : 
                               6141                 :                :         /*
                               6142                 :                :          * If there was a backup label file, it's done its job and the info
                               6143                 :                :          * has now been propagated into pg_control.  We must get rid of the
                               6144                 :                :          * label file so that if we crash during recovery, we'll pick up at
                               6145                 :                :          * the latest recovery restartpoint instead of going all the way back
                               6146                 :                :          * to the backup start point.  It seems prudent though to just rename
                               6147                 :                :          * the file out of the way rather than delete it completely.
                               6148                 :                :          */
                               6149         [ +  + ]:            235 :         if (haveBackupLabel)
                               6150                 :                :         {
                               6151                 :             90 :             unlink(BACKUP_LABEL_OLD);
                               6152                 :             90 :             durable_rename(BACKUP_LABEL_FILE, BACKUP_LABEL_OLD, FATAL);
                               6153                 :                :         }
                               6154                 :                : 
                               6155                 :                :         /*
                               6156                 :                :          * If there was a tablespace_map file, it's done its job and the
                               6157                 :                :          * symlinks have been created.  We must get rid of the map file so
                               6158                 :                :          * that if we crash during recovery, we don't create symlinks again.
                               6159                 :                :          * It seems prudent though to just rename the file out of the way
                               6160                 :                :          * rather than delete it completely.
                               6161                 :                :          */
                               6162         [ +  + ]:            235 :         if (haveTblspcMap)
                               6163                 :                :         {
                               6164                 :              2 :             unlink(TABLESPACE_MAP_OLD);
                               6165                 :              2 :             durable_rename(TABLESPACE_MAP, TABLESPACE_MAP_OLD, FATAL);
                               6166                 :                :         }
                               6167                 :                : 
                               6168                 :                :         /*
                               6169                 :                :          * Initialize our local copy of minRecoveryPoint.  When doing crash
                               6170                 :                :          * recovery we want to replay up to the end of WAL.  Particularly, in
                               6171                 :                :          * the case of a promoted standby minRecoveryPoint value in the
                               6172                 :                :          * control file is only updated after the first checkpoint.  However,
                               6173                 :                :          * if the instance crashes before the first post-recovery checkpoint
                               6174                 :                :          * is completed then recovery will use a stale location causing the
                               6175                 :                :          * startup process to think that there are still invalid page
                               6176                 :                :          * references when checking for data consistency.
                               6177                 :                :          */
 2943 michael@paquier.xyz      6178         [ +  + ]:            235 :         if (InArchiveRecovery)
                               6179                 :                :         {
 1621 heikki.linnakangas@i     6180                 :            133 :             LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               6181                 :            133 :             LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               6182                 :                :         }
                               6183                 :                :         else
                               6184                 :                :         {
                               6185                 :            102 :             LocalMinRecoveryPoint = InvalidXLogRecPtr;
                               6186                 :            102 :             LocalMinRecoveryPointTLI = 0;
                               6187                 :                :         }
                               6188                 :                : 
                               6189                 :                :         /* Check that the GUCs used to generate the WAL allow recovery */
 5933                          6190                 :            235 :         CheckRequiredParameterValues();
                               6191                 :                : 
                               6192                 :                :         /*
                               6193                 :                :          * We're in recovery, so unlogged relations may be trashed and must be
                               6194                 :                :          * reset.  This should be done BEFORE allowing Hot Standby
                               6195                 :                :          * connections, so that read-only backends don't try to read whatever
                               6196                 :                :          * garbage is left over from before.
                               6197                 :                :          */
 5688 rhaas@postgresql.org     6198                 :            235 :         ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
                               6199                 :                : 
                               6200                 :                :         /*
                               6201                 :                :          * Likewise, delete any saved transaction snapshot files that got left
                               6202                 :                :          * behind by crashed backends.
                               6203                 :                :          */
 5391 tgl@sss.pgh.pa.us        6204                 :            235 :         DeleteAllExportedSnapshotFiles();
                               6205                 :                : 
                               6206                 :                :         /*
                               6207                 :                :          * Initialize for Hot Standby, if enabled. We won't let backends in
                               6208                 :                :          * yet, not until we've reached the min recovery point specified in
                               6209                 :                :          * control file and we've established a recovery snapshot from a
                               6210                 :                :          * running-xacts WAL record.
                               6211                 :                :          */
 4902 heikki.linnakangas@i     6212   [ +  +  +  + ]:            235 :         if (ArchiveRecoveryRequested && EnableHotStandby)
                               6213                 :                :         {
                               6214                 :                :             TransactionId *xids;
                               6215                 :                :             int         nxids;
                               6216                 :                : 
 6008                          6217         [ +  + ]:            124 :             ereport(DEBUG1,
                               6218                 :                :                     (errmsg_internal("initializing for hot standby")));
                               6219                 :                : 
 6063 simon@2ndQuadrant.co     6220                 :            124 :             InitRecoveryTransactionEnvironment();
                               6221                 :                : 
                               6222         [ +  + ]:            124 :             if (wasShutdown)
                               6223                 :             26 :                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
                               6224                 :                :             else
                               6225                 :             98 :                 oldestActiveXID = checkPoint.oldestActiveXid;
                               6226         [ -  + ]:            124 :             Assert(TransactionIdIsValid(oldestActiveXID));
                               6227                 :                : 
                               6228                 :                :             /* Tell procarray about the range of xids it has to deal with */
  961 heikki.linnakangas@i     6229                 :            124 :             ProcArrayInitRecovery(XidFromFullTransactionId(TransamVariables->nextXid));
                               6230                 :                : 
                               6231                 :                :             /*
                               6232                 :                :              * Startup subtrans only.  CLOG, MultiXact and commit timestamp
                               6233                 :                :              * have already been started up and other SLRUs are not maintained
                               6234                 :                :              * during recovery and need not be started yet.
                               6235                 :                :              */
 6063 simon@2ndQuadrant.co     6236                 :            124 :             StartupSUBTRANS(oldestActiveXID);
                               6237                 :                : 
                               6238                 :                :             /*
                               6239                 :                :              * If we're beginning at a shutdown checkpoint, we know that
                               6240                 :                :              * nothing was running on the primary at this point. So fake-up an
                               6241                 :                :              * empty running-xacts record and use that here and now. Recover
                               6242                 :                :              * additional standby state for prepared transactions.
                               6243                 :                :              */
 5948 heikki.linnakangas@i     6244         [ +  + ]:            124 :             if (wasShutdown)
                               6245                 :                :             {
                               6246                 :                :                 RunningTransactionsData running;
                               6247                 :                :                 TransactionId latestCompletedXid;
                               6248                 :                : 
                               6249                 :                :                 /* Update pg_subtrans entries for any prepared transactions */
  759                          6250                 :             26 :                 StandbyRecoverPreparedTransactions();
                               6251                 :                : 
                               6252                 :                :                 /*
                               6253                 :                :                  * Construct a RunningTransactions snapshot representing a
                               6254                 :                :                  * shut down server, with only prepared transactions still
                               6255                 :                :                  * alive. We're never overflowed at this point because all
                               6256                 :                :                  * subxids are listed with their parent prepared transactions.
                               6257                 :                :                  */
 5948                          6258                 :             26 :                 running.xcnt = nxids;
 4984 simon@2ndQuadrant.co     6259                 :             26 :                 running.subxcnt = 0;
  759 heikki.linnakangas@i     6260                 :             26 :                 running.subxid_status = SUBXIDS_IN_SUBTRANS;
 2175 andres@anarazel.de       6261                 :             26 :                 running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
 5948 heikki.linnakangas@i     6262                 :             26 :                 running.oldestRunningXid = oldestActiveXID;
 2175 andres@anarazel.de       6263                 :             26 :                 latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
 5918 simon@2ndQuadrant.co     6264         [ -  + ]:             26 :                 TransactionIdRetreat(latestCompletedXid);
 5917                          6265         [ -  + ]:             26 :                 Assert(TransactionIdIsNormal(latestCompletedXid));
 5918                          6266                 :             26 :                 running.latestCompletedXid = latestCompletedXid;
 5948 heikki.linnakangas@i     6267                 :             26 :                 running.xids = xids;
                               6268                 :                : 
                               6269                 :             26 :                 ProcArrayApplyRecoveryInfo(&running);
                               6270                 :                :             }
                               6271                 :                :         }
                               6272                 :                : 
                               6273                 :                :         /*
                               6274                 :                :          * We're all set for replaying the WAL now. Do it.
                               6275                 :                :          */
 1621                          6276                 :            235 :         PerformWalRecovery();
                               6277                 :            169 :         performedWalRecovery = true;
                               6278                 :                :     }
                               6279                 :                :     else
 1617                          6280                 :            822 :         performedWalRecovery = false;
                               6281                 :                : 
                               6282                 :                :     /*
                               6283                 :                :      * Finish WAL recovery.
                               6284                 :                :      */
 1621                          6285                 :            991 :     endOfRecoveryInfo = FinishWalRecovery();
                               6286                 :            991 :     EndOfLog = endOfRecoveryInfo->endOfLog;
                               6287                 :            991 :     EndOfLogTLI = endOfRecoveryInfo->endOfLogTLI;
                               6288                 :            991 :     abortedRecPtr = endOfRecoveryInfo->abortedRecPtr;
                               6289                 :            991 :     missingContrecPtr = endOfRecoveryInfo->missingContrecPtr;
                               6290                 :                : 
                               6291                 :                :     /*
                               6292                 :                :      * Reset ps status display, so as no information related to recovery shows
                               6293                 :                :      * up.
                               6294                 :                :      */
 1403 michael@paquier.xyz      6295                 :            991 :     set_ps_display("");
                               6296                 :                : 
                               6297                 :                :     /*
                               6298                 :                :      * When recovering from a backup (we are in recovery, and archive recovery
                               6299                 :                :      * was requested), complain if we did not roll forward far enough to reach
                               6300                 :                :      * the point where the database is consistent.  For regular online
                               6301                 :                :      * backup-from-primary, that means reaching the end-of-backup WAL record
                               6302                 :                :      * (at which point we reset backupStartPoint to be Invalid), for
                               6303                 :                :      * backup-from-replica (which can't inject records into the WAL stream),
                               6304                 :                :      * that point is when we reach the minRecoveryPoint in pg_control (which
                               6305                 :                :      * we purposefully copy last when backing up from a replica).  For
                               6306                 :                :      * pg_rewind (which creates a backup_label with a method of "pg_rewind")
                               6307                 :                :      * or snapshot-style backups (which don't), backupEndRequired will be set
                               6308                 :                :      * to false.
                               6309                 :                :      *
                               6310                 :                :      * Note: it is indeed okay to look at the local variable
                               6311                 :                :      * LocalMinRecoveryPoint here, even though ControlFile->minRecoveryPoint
                               6312                 :                :      * might be further ahead --- ControlFile->minRecoveryPoint cannot have
                               6313                 :                :      * been advanced beyond the WAL we processed.
                               6314                 :                :      */
 5597 heikki.linnakangas@i     6315         [ +  + ]:            991 :     if (InRecovery &&
 1621                          6316         [ +  - ]:            169 :         (EndOfLog < LocalMinRecoveryPoint ||
  262 alvherre@kurilemu.de     6317         [ -  + ]:            169 :          XLogRecPtrIsValid(ControlFile->backupStartPoint)))
                               6318                 :                :     {
                               6319                 :                :         /*
                               6320                 :                :          * Ran off end of WAL before reaching end-of-backup WAL record, or
                               6321                 :                :          * minRecoveryPoint. That's a bad sign, indicating that you tried to
                               6322                 :                :          * recover from an online backup but never called pg_backup_stop(), or
                               6323                 :                :          * you didn't archive all the WAL needed.
                               6324                 :                :          */
 4902 heikki.linnakangas@i     6325   [ #  #  #  # ]:UBC           0 :         if (ArchiveRecoveryRequested || ControlFile->backupEndRequired)
                               6326                 :                :         {
  262 alvherre@kurilemu.de     6327   [ #  #  #  # ]:              0 :             if (XLogRecPtrIsValid(ControlFile->backupStartPoint) || ControlFile->backupEndRequired)
 5464 heikki.linnakangas@i     6328         [ #  # ]:              0 :                 ereport(FATAL,
                               6329                 :                :                         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               6330                 :                :                          errmsg("WAL ends before end of online backup"),
                               6331                 :                :                          errhint("All WAL generated while online backup was taken must be available at recovery.")));
                               6332                 :                :             else
 5583                          6333         [ #  # ]:              0 :                 ereport(FATAL,
                               6334                 :                :                         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               6335                 :                :                          errmsg("WAL ends before consistent recovery point")));
                               6336                 :                :         }
                               6337                 :                :     }
                               6338                 :                : 
                               6339                 :                :     /*
                               6340                 :                :      * Reset unlogged relations to the contents of their INIT fork. This is
                               6341                 :                :      * done AFTER recovery is complete so as to include any unlogged relations
                               6342                 :                :      * created during recovery, but BEFORE recovery is marked as having
                               6343                 :                :      * completed successfully. Otherwise we'd not retry if any of the post
                               6344                 :                :      * end-of-recovery steps fail.
                               6345                 :                :      */
 1621 heikki.linnakangas@i     6346         [ +  + ]:CBC         991 :     if (InRecovery)
                               6347                 :            169 :         ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
                               6348                 :                : 
                               6349                 :                :     /*
                               6350                 :                :      * Pre-scan prepared transactions to find out the range of XIDs present.
                               6351                 :                :      * This information is not quite needed yet, but it is positioned here so
                               6352                 :                :      * as potential problems are detected before any on-disk change is done.
                               6353                 :                :      */
 2939 michael@paquier.xyz      6354                 :            991 :     oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
                               6355                 :                : 
                               6356                 :                :     /*
                               6357                 :                :      * Allow ordinary WAL segment creation before possibly switching to a new
                               6358                 :                :      * timeline, which creates a new segment, and after the last ReadRecord().
                               6359                 :                :      */
 1439                          6360                 :            991 :     SetInstallXLogFileSegmentActive();
                               6361                 :                : 
                               6362                 :                :     /*
                               6363                 :                :      * Consider whether we need to assign a new timeline ID.
                               6364                 :                :      *
                               6365                 :                :      * If we did archive recovery, we always assign a new ID.  This handles a
                               6366                 :                :      * couple of issues.  If we stopped short of the end of WAL during
                               6367                 :                :      * recovery, then we are clearly generating a new timeline and must assign
                               6368                 :                :      * it a unique new ID.  Even if we ran to the end, modifying the current
                               6369                 :                :      * last segment is problematic because it may result in trying to
                               6370                 :                :      * overwrite an already-archived copy of that segment, and we encourage
                               6371                 :                :      * DBAs to make their archive_commands reject that.  We can dodge the
                               6372                 :                :      * problem by making the new active segment have a new timeline ID.
                               6373                 :                :      *
                               6374                 :                :      * In a normal crash recovery, we can just extend the timeline we were in.
                               6375                 :                :      */
 1621 heikki.linnakangas@i     6376                 :            991 :     newTLI = endOfRecoveryInfo->lastRecTLI;
 4902                          6377         [ +  + ]:            991 :     if (ArchiveRecoveryRequested)
                               6378                 :                :     {
 1719 rhaas@postgresql.org     6379                 :             60 :         newTLI = findNewestTimeLine(recoveryTargetTLI) + 1;
 8040 tgl@sss.pgh.pa.us        6380         [ +  - ]:             60 :         ereport(LOG,
                               6381                 :                :                 (errmsg("selected new timeline ID: %u", newTLI)));
                               6382                 :                : 
                               6383                 :                :         /*
                               6384                 :                :          * Make a writable copy of the last WAL segment.  (Note that we also
                               6385                 :                :          * have a copy of the last block of the old WAL in
                               6386                 :                :          * endOfRecovery->lastPage; we will use that below.)
                               6387                 :                :          */
 1621 heikki.linnakangas@i     6388                 :             60 :         XLogInitNewTimeline(EndOfLogTLI, EndOfLog, newTLI);
                               6389                 :                : 
                               6390                 :                :         /*
                               6391                 :                :          * Remove the signal files out of the way, so that we don't
                               6392                 :                :          * accidentally re-enter archive recovery mode in a subsequent crash.
                               6393                 :                :          */
                               6394         [ +  + ]:             60 :         if (endOfRecoveryInfo->standby_signal_file_found)
                               6395                 :             57 :             durable_unlink(STANDBY_SIGNAL_FILE, FATAL);
                               6396                 :                : 
                               6397         [ +  + ]:             60 :         if (endOfRecoveryInfo->recovery_signal_file_found)
                               6398                 :              4 :             durable_unlink(RECOVERY_SIGNAL_FILE, FATAL);
                               6399                 :                : 
                               6400                 :                :         /*
                               6401                 :                :          * Write the timeline history file, and have it archived. After this
                               6402                 :                :          * point (or rather, as soon as the file is archived), the timeline
                               6403                 :                :          * will appear as "taken" in the WAL archive and to any standby
                               6404                 :                :          * servers.  If we crash before actually switching to the new
                               6405                 :                :          * timeline, standby servers will nevertheless think that we switched
                               6406                 :                :          * to the new timeline, and will try to connect to the new timeline.
                               6407                 :                :          * To minimize the window for that, try to do as little as possible
                               6408                 :                :          * between here and writing the end-of-recovery record.
                               6409                 :                :          */
 1719 rhaas@postgresql.org     6410                 :             60 :         writeTimeLineHistory(newTLI, recoveryTargetTLI,
 1621 heikki.linnakangas@i     6411                 :GIC          60 :                              EndOfLog, endOfRecoveryInfo->recoveryStopReason);
                               6412                 :                : 
 1621 heikki.linnakangas@i     6413         [ +  - ]:CBC          60 :         ereport(LOG,
                               6414                 :                :                 (errmsg("archive recovery complete")));
                               6415                 :                :     }
                               6416                 :                : 
                               6417                 :                :     /* Save the selected TimeLineID in shared memory, too */
  730 rhaas@postgresql.org     6418                 :            991 :     SpinLockAcquire(&XLogCtl->info_lck);
 1719                          6419                 :            991 :     XLogCtl->InsertTimeLineID = newTLI;
 1621 heikki.linnakangas@i     6420                 :            991 :     XLogCtl->PrevTimeLineID = endOfRecoveryInfo->lastRecTLI;
  730 rhaas@postgresql.org     6421                 :            991 :     SpinLockRelease(&XLogCtl->info_lck);
                               6422                 :                : 
                               6423                 :                :     /*
                               6424                 :                :      * Actually, if WAL ended in an incomplete record, skip the parts that
                               6425                 :                :      * made it through and start writing after the portion that persisted.
                               6426                 :                :      * (It's critical to first write an OVERWRITE_CONTRECORD message, which
                               6427                 :                :      * we'll do as soon as we're open for writing new WAL.)
                               6428                 :                :      */
  262 alvherre@kurilemu.de     6429         [ +  + ]:            991 :     if (XLogRecPtrIsValid(missingContrecPtr))
                               6430                 :                :     {
                               6431                 :                :         /*
                               6432                 :                :          * We should only have a missingContrecPtr if we're not switching to a
                               6433                 :                :          * new timeline. When a timeline switch occurs, WAL is copied from the
                               6434                 :                :          * old timeline to the new only up to the end of the last complete
                               6435                 :                :          * record, so there can't be an incomplete WAL record that we need to
                               6436                 :                :          * disregard.
                               6437                 :                :          */
 1427 rhaas@postgresql.org     6438         [ -  + ]:             11 :         Assert(newTLI == endOfRecoveryInfo->lastRecTLI);
  262 alvherre@kurilemu.de     6439         [ -  + ]:             11 :         Assert(XLogRecPtrIsValid(abortedRecPtr));
 1761 alvherre@alvh.no-ip.     6440                 :             11 :         EndOfLog = missingContrecPtr;
                               6441                 :                :     }
                               6442                 :                : 
                               6443                 :                :     /*
                               6444                 :                :      * Prepare to write WAL starting at EndOfLog location, and init xlog
                               6445                 :                :      * buffer cache using the block containing the last record from the
                               6446                 :                :      * previous incarnation.
                               6447                 :                :      */
 9402 vadim4o@yahoo.com        6448                 :            991 :     Insert = &XLogCtl->Insert;
 1621 heikki.linnakangas@i     6449                 :            991 :     Insert->PrevBytePos = XLogRecPtrToBytePos(endOfRecoveryInfo->lastRec);
 4757                          6450                 :            991 :     Insert->CurrBytePos = XLogRecPtrToBytePos(EndOfLog);
                               6451                 :                : 
                               6452                 :                :     /*
                               6453                 :                :      * Tricky point here: lastPage contains the *last* block that the LastRec
                               6454                 :                :      * record spans, not the one it starts in.  The last block is indeed the
                               6455                 :                :      * one we want to use.
                               6456                 :                :      */
                               6457         [ +  + ]:            991 :     if (EndOfLog % XLOG_BLCKSZ != 0)
                               6458                 :                :     {
                               6459                 :                :         char       *page;
                               6460                 :                :         int         len;
                               6461                 :                :         int         firstIdx;
                               6462                 :                : 
                               6463                 :            956 :         firstIdx = XLogRecPtrToBufIdx(EndOfLog);
 1621                          6464                 :            956 :         len = EndOfLog - endOfRecoveryInfo->lastPageBeginPtr;
                               6465         [ -  + ]:            956 :         Assert(len < XLOG_BLCKSZ);
                               6466                 :                : 
                               6467                 :                :         /* Copy the valid part of the last block, and zero the rest */
 4757                          6468                 :            956 :         page = &XLogCtl->pages[firstIdx * XLOG_BLCKSZ];
 1621                          6469                 :            956 :         memcpy(page, endOfRecoveryInfo->lastPage, len);
 4757                          6470                 :            956 :         memset(page + len, 0, XLOG_BLCKSZ - len);
                               6471                 :                : 
  950 jdavis@postgresql.or     6472                 :            956 :         pg_atomic_write_u64(&XLogCtl->xlblocks[firstIdx], endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ);
  338 akorotkov@postgresql     6473                 :            956 :         XLogCtl->InitializedUpTo = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
                               6474                 :                :     }
                               6475                 :                :     else
                               6476                 :                :     {
                               6477                 :                :         /*
                               6478                 :                :          * There is no partial block to copy. Just set InitializedUpTo, and
                               6479                 :                :          * let the first attempt to insert a log record to initialize the next
                               6480                 :                :          * buffer.
                               6481                 :                :          */
                               6482                 :             35 :         XLogCtl->InitializedUpTo = EndOfLog;
                               6483                 :                :     }
                               6484                 :                : 
                               6485                 :                :     /*
                               6486                 :                :      * Update local and shared status.  This is OK to do without any locks
                               6487                 :                :      * because no other process can be reading or writing WAL yet.
                               6488                 :                :      */
 4757 heikki.linnakangas@i     6489                 :            991 :     LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;
  840 alvherre@alvh.no-ip.     6490                 :            991 :     pg_atomic_write_u64(&XLogCtl->logInsertResult, EndOfLog);
  842                          6491                 :            991 :     pg_atomic_write_u64(&XLogCtl->logWriteResult, EndOfLog);
                               6492                 :            991 :     pg_atomic_write_u64(&XLogCtl->logFlushResult, EndOfLog);
 4757 heikki.linnakangas@i     6493                 :            991 :     XLogCtl->LogwrtRqst.Write = EndOfLog;
                               6494                 :            991 :     XLogCtl->LogwrtRqst.Flush = EndOfLog;
                               6495                 :                : 
                               6496                 :                :     /*
                               6497                 :                :      * Preallocate additional log files, if wanted.
                               6498                 :                :      */
 1719 rhaas@postgresql.org     6499                 :            991 :     PreallocXlogFiles(EndOfLog, newTLI);
                               6500                 :                : 
                               6501                 :                :     /*
                               6502                 :                :      * Okay, we're officially UP.
                               6503                 :                :      */
 9402 vadim4o@yahoo.com        6504                 :            991 :     InRecovery = false;
                               6505                 :                : 
                               6506                 :                :     /* start the archive_timeout timer and LSN running */
 4757 heikki.linnakangas@i     6507                 :            991 :     XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 3503 andres@anarazel.de       6508                 :            991 :     XLogCtl->lastSegSwitchLSN = EndOfLog;
                               6509                 :                : 
                               6510                 :                :     /* also initialize latestCompletedXid, to nextXid - 1 */
 5284 tgl@sss.pgh.pa.us        6511                 :            991 :     LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  961 heikki.linnakangas@i     6512                 :            991 :     TransamVariables->latestCompletedXid = TransamVariables->nextXid;
                               6513                 :            991 :     FullTransactionIdRetreat(&TransamVariables->latestCompletedXid);
 5284 tgl@sss.pgh.pa.us        6514                 :            991 :     LWLockRelease(ProcArrayLock);
                               6515                 :                : 
                               6516                 :                :     /*
                               6517                 :                :      * Start up subtrans, if not already done for hot standby.  (commit
                               6518                 :                :      * timestamps are started below, if necessary.)
                               6519                 :                :      */
 6063 simon@2ndQuadrant.co     6520         [ +  + ]:            991 :     if (standbyState == STANDBY_DISABLED)
                               6521                 :            931 :         StartupSUBTRANS(oldestActiveXID);
                               6522                 :                : 
                               6523                 :                :     /*
                               6524                 :                :      * Perform end of recovery actions for any SLRUs that need it.
                               6525                 :                :      */
 5380                          6526                 :            991 :     TrimCLOG();
 4622 alvherre@alvh.no-ip.     6527                 :            991 :     TrimMultiXact();
                               6528                 :                : 
                               6529                 :                :     /*
                               6530                 :                :      * Reload shared-memory state for prepared transactions.  This needs to
                               6531                 :                :      * happen before renaming the last partial segment of the old timeline as
                               6532                 :                :      * it may be possible that we have to recover some transactions from it.
                               6533                 :                :      */
 7709 tgl@sss.pgh.pa.us        6534                 :            991 :     RecoverPreparedTransactions();
                               6535                 :                : 
                               6536                 :                :     /* Shut down xlogreader */
 1621 heikki.linnakangas@i     6537                 :            991 :     ShutdownWalRecovery();
                               6538                 :                : 
                               6539                 :                :     /* Enable WAL writes for this backend only. */
 1746 rhaas@postgresql.org     6540                 :            991 :     LocalSetXLogInsertAllowed();
                               6541                 :                : 
                               6542                 :                :     /* If necessary, write overwrite-contrecord before doing anything else */
  262 alvherre@kurilemu.de     6543         [ +  + ]:            991 :     if (XLogRecPtrIsValid(abortedRecPtr))
                               6544                 :                :     {
                               6545         [ -  + ]:             11 :         Assert(XLogRecPtrIsValid(missingContrecPtr));
 1621 heikki.linnakangas@i     6546                 :             11 :         CreateOverwriteContrecordRecord(abortedRecPtr, missingContrecPtr, newTLI);
                               6547                 :                :     }
                               6548                 :                : 
                               6549                 :                :     /*
                               6550                 :                :      * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
                               6551                 :                :      * record before resource manager writes cleanup WAL records or checkpoint
                               6552                 :                :      * record is written.
                               6553                 :                :      */
 1746 rhaas@postgresql.org     6554                 :            991 :     Insert->fullPageWrites = lastFullPageWrites;
                               6555                 :            991 :     UpdateFullPageWrites();
                               6556                 :                : 
                               6557                 :                :     /*
                               6558                 :                :      * Emit checkpoint or end-of-recovery record in XLOG, if required.
                               6559                 :                :      */
 1621 heikki.linnakangas@i     6560         [ +  + ]:            991 :     if (performedWalRecovery)
 1746 rhaas@postgresql.org     6561                 :            169 :         promoted = PerformRecoveryXLogAction();
                               6562                 :                : 
                               6563                 :                :     /*
                               6564                 :                :      * If any of the critical GUCs have changed, log them before we allow
                               6565                 :                :      * backends to write WAL.
                               6566                 :                :      */
 5933 heikki.linnakangas@i     6567                 :            991 :     XLogReportParameters();
                               6568                 :                : 
                               6569                 :                :     /* If this is archive recovery, perform post-recovery cleanup actions. */
 1735 rhaas@postgresql.org     6570         [ +  + ]:            991 :     if (ArchiveRecoveryRequested)
 1719                          6571                 :             60 :         CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog, newTLI);
                               6572                 :                : 
   45 michael@paquier.xyz      6573                 :            991 :     INJECTION_POINT("promotion-after-wal-segment-cleanup", NULL);
                               6574                 :                : 
                               6575                 :                :     /*
                               6576                 :                :      * Local WAL inserts enabled, so it's time to finish initialization of
                               6577                 :                :      * commit timestamp.
                               6578                 :                :      */
 4253 alvherre@alvh.no-ip.     6579                 :            991 :     CompleteCommitTsInitialization();
                               6580                 :                : 
                               6581                 :                :     /*
                               6582                 :                :      * Update logical decoding status in shared memory and write an
                               6583                 :                :      * XLOG_LOGICAL_DECODING_STATUS_CHANGE, if necessary.
                               6584                 :                :      */
  215 msawada@postgresql.o     6585                 :            991 :     UpdateLogicalDecodingStatusEndOfRecovery();
                               6586                 :                : 
                               6587                 :                :     /* Clean up EndOfWalRecoveryInfo data to appease Valgrind leak checking */
  358 tgl@sss.pgh.pa.us        6588         [ +  + ]:            991 :     if (endOfRecoveryInfo->lastPage)
                               6589                 :            967 :         pfree(endOfRecoveryInfo->lastPage);
                               6590                 :            991 :     pfree(endOfRecoveryInfo->recoveryStopReason);
                               6591                 :            991 :     pfree(endOfRecoveryInfo);
                               6592                 :                : 
                               6593                 :                :     /*
                               6594                 :                :      * If we reach this point with checksums in the state inprogress-on, it
                               6595                 :                :      * means that data checksums were in the process of being enabled when the
                               6596                 :                :      * cluster shut down. Since processing didn't finish, the operation will
                               6597                 :                :      * have to be restarted from scratch since there is no capability to
                               6598                 :                :      * continue where it was when the cluster shut down. Thus, revert the
                               6599                 :                :      * state back to off, and inform the user with a warning message. Being
                               6600                 :                :      * able to restart processing is a TODO, but it wouldn't be possible to
                               6601                 :                :      * restart here since we cannot launch a dynamic background worker
                               6602                 :                :      * directly from here (it has to be from a regular backend).
                               6603                 :                :      */
  114 dgustafsson@postgres     6604         [ +  + ]:            991 :     if (XLogCtl->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON)
                               6605                 :                :     {
                               6606                 :              1 :         XLogChecksums(PG_DATA_CHECKSUM_OFF);
                               6607                 :                : 
                               6608                 :              1 :         SpinLockAcquire(&XLogCtl->info_lck);
  111                          6609                 :              1 :         XLogCtl->data_checksum_version = PG_DATA_CHECKSUM_OFF;
  114                          6610                 :              1 :         SetLocalDataChecksumState(XLogCtl->data_checksum_version);
                               6611                 :              1 :         SpinLockRelease(&XLogCtl->info_lck);
                               6612                 :                : 
   58                          6613                 :              1 :         EmitAndWaitDataChecksumsBarrier(PG_DATA_CHECKSUM_OFF);
  114                          6614         [ +  - ]:              1 :         ereport(WARNING,
                               6615                 :                :                 errmsg("enabling data checksums was interrupted"),
                               6616                 :                :                 errhint("Data checksum processing must be manually restarted for checksums to be enabled."));
                               6617                 :                :     }
                               6618                 :                : 
                               6619                 :                :     /*
                               6620                 :                :      * If data checksums were being disabled when the cluster was shut down,
                               6621                 :                :      * we know that we have a state where all backends have stopped validating
                               6622                 :                :      * checksums and we can move to off instead of prompting the user to
                               6623                 :                :      * perform any action.
                               6624                 :                :      */
   58                          6625         [ -  + ]:            990 :     else if (XLogCtl->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF)
                               6626                 :                :     {
  114 dgustafsson@postgres     6627                 :UBC           0 :         XLogChecksums(PG_DATA_CHECKSUM_OFF);
                               6628                 :                : 
                               6629                 :              0 :         SpinLockAcquire(&XLogCtl->info_lck);
  111                          6630                 :              0 :         XLogCtl->data_checksum_version = PG_DATA_CHECKSUM_OFF;
  114                          6631                 :              0 :         SetLocalDataChecksumState(XLogCtl->data_checksum_version);
                               6632                 :              0 :         SpinLockRelease(&XLogCtl->info_lck);
                               6633                 :                : 
   58                          6634                 :              0 :         EmitAndWaitDataChecksumsBarrier(PG_DATA_CHECKSUM_OFF);
                               6635                 :                :     }
                               6636                 :                : 
                               6637                 :                :     /*
                               6638                 :                :      * All done with end-of-recovery actions.
                               6639                 :                :      *
                               6640                 :                :      * Now allow backends to write WAL and update the control file status in
                               6641                 :                :      * consequence.  SharedRecoveryState, that controls if backends can write
                               6642                 :                :      * WAL, is updated while holding ControlFileLock to prevent other backends
                               6643                 :                :      * to look at an inconsistent state of the control file in shared memory.
                               6644                 :                :      * There is still a small window during which backends can write WAL and
                               6645                 :                :      * the control file is still referring to a system not in DB_IN_PRODUCTION
                               6646                 :                :      * state while looking at the on-disk control file.
                               6647                 :                :      *
                               6648                 :                :      * Also, we use info_lck to update SharedRecoveryState to ensure that
                               6649                 :                :      * there are no race conditions concerning visibility of other recent
                               6650                 :                :      * updates to shared memory.
                               6651                 :                :      */
 3652 peter_e@gmx.net          6652                 :CBC         991 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6653                 :            991 :     ControlFile->state = DB_IN_PRODUCTION;
                               6654                 :                : 
 4325 andres@anarazel.de       6655                 :            991 :     SpinLockAcquire(&XLogCtl->info_lck);
   87 dgustafsson@postgres     6656                 :            991 :     ControlFile->data_checksum_version = XLogCtl->data_checksum_version;
 2284 michael@paquier.xyz      6657                 :            991 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
 4325 andres@anarazel.de       6658                 :            991 :     SpinLockRelease(&XLogCtl->info_lck);
                               6659                 :                : 
 3652 peter_e@gmx.net          6660                 :            991 :     UpdateControlFile();
                               6661                 :            991 :     LWLockRelease(ControlFileLock);
                               6662                 :                : 
                               6663                 :                :     /*
                               6664                 :                :      * Wake up the checkpointer process as there might be a request to disable
                               6665                 :                :      * logical decoding by concurrent slot drop.
                               6666                 :                :      */
  215 msawada@postgresql.o     6667                 :            991 :     WakeupCheckpointer();
                               6668                 :                : 
                               6669                 :                :     /*
                               6670                 :                :      * Wake up all waiters.  They need to report an error that recovery was
                               6671                 :                :      * ended before reaching the target LSN.
                               6672                 :                :      */
  202 akorotkov@postgresql     6673                 :            991 :     WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_REPLAY, InvalidXLogRecPtr);
                               6674                 :            991 :     WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, InvalidXLogRecPtr);
                               6675                 :            991 :     WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_FLUSH, InvalidXLogRecPtr);
                               6676                 :                : 
                               6677                 :                :     /*
                               6678                 :                :      * Shutdown the recovery environment.  This must occur after
                               6679                 :                :      * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
                               6680                 :                :      * and after switching SharedRecoveryState to RECOVERY_STATE_DONE so as
                               6681                 :                :      * any session building a snapshot will not rely on KnownAssignedXids as
                               6682                 :                :      * RecoveryInProgress() would return false at this stage.  This is
                               6683                 :                :      * particularly critical for prepared 2PC transactions, that would still
                               6684                 :                :      * need to be included in snapshots once recovery has ended.
                               6685                 :                :      */
 1756 michael@paquier.xyz      6686         [ +  + ]:            991 :     if (standbyState != STANDBY_DISABLED)
                               6687                 :             60 :         ShutdownRecoveryTransactionEnvironment();
                               6688                 :                : 
                               6689                 :                :     /*
                               6690                 :                :      * If there were cascading standby servers connected to us, nudge any wal
                               6691                 :                :      * sender processes to notice that we've been promoted.
                               6692                 :                :      */
 1205 andres@anarazel.de       6693                 :            991 :     WalSndWakeup(true, true);
                               6694                 :                : 
                               6695                 :                :     /*
                               6696                 :                :      * If this was a promotion, request an (online) checkpoint now. This isn't
                               6697                 :                :      * required for consistency, but the last restartpoint might be far back,
                               6698                 :                :      * and in case of a crash, recovering from it might take a longer than is
                               6699                 :                :      * appropriate now that we're not in standby mode anymore.
                               6700                 :                :      */
 2188 fujii@postgresql.org     6701         [ +  + ]:            991 :     if (promoted)
 4814 simon@2ndQuadrant.co     6702                 :             53 :         RequestCheckpoint(CHECKPOINT_FORCE);
 6367 heikki.linnakangas@i     6703                 :            991 : }
                               6704                 :                : 
                               6705                 :                : /*
                               6706                 :                :  * Callback from PerformWalRecovery(), called when we switch from crash
                               6707                 :                :  * recovery to archive recovery mode.  Updates the control file accordingly.
                               6708                 :                :  */
                               6709                 :                : void
 1621                          6710                 :              1 : SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr, TimeLineID replayTLI)
                               6711                 :                : {
                               6712                 :                :     /* initialize minRecoveryPoint to this record */
                               6713                 :              1 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6714                 :              1 :     ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
                               6715         [ +  - ]:              1 :     if (ControlFile->minRecoveryPoint < EndRecPtr)
                               6716                 :                :     {
                               6717                 :              1 :         ControlFile->minRecoveryPoint = EndRecPtr;
                               6718                 :              1 :         ControlFile->minRecoveryPointTLI = replayTLI;
                               6719                 :                :     }
                               6720                 :                :     /* update local copy */
                               6721                 :              1 :     LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               6722                 :              1 :     LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               6723                 :                : 
                               6724                 :                :     /*
                               6725                 :                :      * The startup process can update its local copy of minRecoveryPoint from
                               6726                 :                :      * this point.
                               6727                 :                :      */
                               6728                 :              1 :     updateMinRecoveryPoint = true;
                               6729                 :                : 
                               6730                 :              1 :     UpdateControlFile();
                               6731                 :                : 
                               6732                 :                :     /*
                               6733                 :                :      * We update SharedRecoveryState while holding the lock on ControlFileLock
                               6734                 :                :      * so both states are consistent in shared memory.
                               6735                 :                :      */
                               6736                 :              1 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6737                 :              1 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
                               6738                 :              1 :     SpinLockRelease(&XLogCtl->info_lck);
                               6739                 :                : 
                               6740                 :              1 :     LWLockRelease(ControlFileLock);
                               6741                 :              1 : }
                               6742                 :                : 
                               6743                 :                : /*
                               6744                 :                :  * Callback from PerformWalRecovery(), called when we reach the end of backup.
                               6745                 :                :  * Updates the control file accordingly.
                               6746                 :                :  */
                               6747                 :                : void
                               6748                 :             90 : ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli)
                               6749                 :                : {
                               6750                 :                :     /*
                               6751                 :                :      * We have reached the end of base backup, as indicated by pg_control. The
                               6752                 :                :      * data on disk is now consistent (unless minRecoveryPoint is further
                               6753                 :                :      * ahead, which can happen if we crashed during previous recovery).  Reset
                               6754                 :                :      * backupStartPoint and backupEndPoint, and update minRecoveryPoint to
                               6755                 :                :      * make sure we don't allow starting up at an earlier point even if
                               6756                 :                :      * recovery is stopped and restarted soon after this.
                               6757                 :                :      */
                               6758                 :             90 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6759                 :                : 
                               6760         [ +  + ]:             90 :     if (ControlFile->minRecoveryPoint < EndRecPtr)
                               6761                 :                :     {
                               6762                 :             83 :         ControlFile->minRecoveryPoint = EndRecPtr;
                               6763                 :             83 :         ControlFile->minRecoveryPointTLI = tli;
                               6764                 :                :     }
                               6765                 :                : 
                               6766                 :             90 :     ControlFile->backupStartPoint = InvalidXLogRecPtr;
                               6767                 :             90 :     ControlFile->backupEndPoint = InvalidXLogRecPtr;
                               6768                 :             90 :     ControlFile->backupEndRequired = false;
                               6769                 :             90 :     UpdateControlFile();
                               6770                 :                : 
                               6771                 :             90 :     LWLockRelease(ControlFileLock);
 5948                          6772                 :             90 : }
                               6773                 :                : 
                               6774                 :                : /*
                               6775                 :                :  * Perform whatever XLOG actions are necessary at end of REDO.
                               6776                 :                :  *
                               6777                 :                :  * The goal here is to make sure that we'll be able to recover properly if
                               6778                 :                :  * we crash again. If we choose to write a checkpoint, we'll write a shutdown
                               6779                 :                :  * checkpoint rather than an on-line one. This is not particularly critical,
                               6780                 :                :  * but since we may be assigning a new TLI, using a shutdown checkpoint allows
                               6781                 :                :  * us to have the rule that TLI only changes in shutdown checkpoints, which
                               6782                 :                :  * allows some extra error checking in xlog_redo.
                               6783                 :                :  */
                               6784                 :                : static bool
 1747 rhaas@postgresql.org     6785                 :            169 : PerformRecoveryXLogAction(void)
                               6786                 :                : {
                               6787                 :            169 :     bool        promoted = false;
                               6788                 :                : 
                               6789                 :                :     /*
                               6790                 :                :      * Perform a checkpoint to update all our recovery activity to disk.
                               6791                 :                :      *
                               6792                 :                :      * Note that we write a shutdown checkpoint rather than an on-line one.
                               6793                 :                :      * This is not particularly critical, but since we may be assigning a new
                               6794                 :                :      * TLI, using a shutdown checkpoint allows us to have the rule that TLI
                               6795                 :                :      * only changes in shutdown checkpoints, which allows some extra error
                               6796                 :                :      * checking in xlog_redo.
                               6797                 :                :      *
                               6798                 :                :      * In promotion, only create a lightweight end-of-recovery record instead
                               6799                 :                :      * of a full checkpoint. A checkpoint is requested later, after we're
                               6800                 :                :      * fully out of recovery mode and already accepting queries.
                               6801                 :                :      */
                               6802   [ +  +  +  -  :            229 :     if (ArchiveRecoveryRequested && IsUnderPostmaster &&
                                              +  + ]
 1621 heikki.linnakangas@i     6803                 :             60 :         PromoteIsTriggered())
                               6804                 :                :     {
 1747 rhaas@postgresql.org     6805                 :             53 :         promoted = true;
                               6806                 :                : 
                               6807                 :                :         /*
                               6808                 :                :          * Insert a special WAL record to mark the end of recovery, since we
                               6809                 :                :          * aren't doing a checkpoint. That means that the checkpointer process
                               6810                 :                :          * may likely be in the middle of a time-smoothed restartpoint and
                               6811                 :                :          * could continue to be for minutes after this.  That sounds strange,
                               6812                 :                :          * but the effect is roughly the same and it would be stranger to try
                               6813                 :                :          * to come out of the restartpoint and then checkpoint. We request a
                               6814                 :                :          * checkpoint later anyway, just for safety.
                               6815                 :                :          */
                               6816                 :             53 :         CreateEndOfRecoveryRecord();
                               6817                 :                :     }
                               6818                 :                :     else
                               6819                 :                :     {
                               6820                 :            116 :         RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
                               6821                 :                :                           CHECKPOINT_FAST |
                               6822                 :                :                           CHECKPOINT_WAIT);
                               6823                 :                :     }
                               6824                 :                : 
                               6825                 :            169 :     return promoted;
                               6826                 :                : }
                               6827                 :                : 
                               6828                 :                : /*
                               6829                 :                :  * Is the system still in recovery?
                               6830                 :                :  *
                               6831                 :                :  * Unlike testing InRecovery, this works in any process that's connected to
                               6832                 :                :  * shared memory.
                               6833                 :                :  */
                               6834                 :                : bool
 6367 heikki.linnakangas@i     6835                 :       69364410 : RecoveryInProgress(void)
                               6836                 :                : {
                               6837                 :                :     /*
                               6838                 :                :      * We check shared state each time only until we leave recovery mode. We
                               6839                 :                :      * can't re-enter recovery, so there's no need to keep checking after the
                               6840                 :                :      * shared variable has once been seen false.
                               6841                 :                :      */
                               6842         [ +  + ]:       69364410 :     if (!LocalRecoveryInProgress)
                               6843                 :       67473415 :         return false;
                               6844                 :                :     else
                               6845                 :                :     {
                               6846                 :                :         /*
                               6847                 :                :          * use volatile pointer to make sure we make a fresh read of the
                               6848                 :                :          * shared variable.
                               6849                 :                :          */
                               6850                 :        1890995 :         volatile XLogCtlData *xlogctl = XLogCtl;
                               6851                 :                : 
 2284 michael@paquier.xyz      6852                 :        1890995 :         LocalRecoveryInProgress = (xlogctl->SharedRecoveryState != RECOVERY_STATE_DONE);
                               6853                 :                : 
                               6854                 :                :         /*
                               6855                 :                :          * Note: We don't need a memory barrier when we're still in recovery.
                               6856                 :                :          * We might exit recovery immediately after return, so the caller
                               6857                 :                :          * can't rely on 'true' meaning that we're still in recovery anyway.
                               6858                 :                :          */
                               6859                 :                : 
 6367 heikki.linnakangas@i     6860                 :        1890995 :         return LocalRecoveryInProgress;
                               6861                 :                :     }
                               6862                 :                : }
                               6863                 :                : 
                               6864                 :                : /*
                               6865                 :                :  * Returns current recovery state from shared memory.
                               6866                 :                :  *
                               6867                 :                :  * This returned state is kept consistent with the contents of the control
                               6868                 :                :  * file.  See details about the possible values of RecoveryState in xlog.h.
                               6869                 :                :  */
                               6870                 :                : RecoveryState
 2284 michael@paquier.xyz      6871                 :          32806 : GetRecoveryState(void)
                               6872                 :                : {
                               6873                 :                :     RecoveryState retval;
                               6874                 :                : 
                               6875                 :          32806 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6876                 :          32806 :     retval = XLogCtl->SharedRecoveryState;
                               6877                 :          32806 :     SpinLockRelease(&XLogCtl->info_lck);
                               6878                 :                : 
                               6879                 :          32806 :     return retval;
                               6880                 :                : }
                               6881                 :                : 
                               6882                 :                : /*
                               6883                 :                :  * Is this process allowed to insert new WAL records?
                               6884                 :                :  *
                               6885                 :                :  * Ordinarily this is essentially equivalent to !RecoveryInProgress().
                               6886                 :                :  * But we also have provisions for forcing the result "true" or "false"
                               6887                 :                :  * within specific processes regardless of the global state.
                               6888                 :                :  */
                               6889                 :                : bool
 6239 tgl@sss.pgh.pa.us        6890                 :       67588327 : XLogInsertAllowed(void)
                               6891                 :                : {
                               6892                 :                :     /*
                               6893                 :                :      * If value is "unconditionally true" or "unconditionally false", just
                               6894                 :                :      * return it.  This provides the normal fast path once recovery is known
                               6895                 :                :      * done.
                               6896                 :                :      */
                               6897         [ +  + ]:       67588327 :     if (LocalXLogInsertAllowed >= 0)
                               6898                 :       66885755 :         return (bool) LocalXLogInsertAllowed;
                               6899                 :                : 
                               6900                 :                :     /*
                               6901                 :                :      * Else, must check to see if we're still in recovery.
                               6902                 :                :      */
                               6903         [ +  + ]:         702572 :     if (RecoveryInProgress())
                               6904                 :         692321 :         return false;
                               6905                 :                : 
                               6906                 :                :     /*
                               6907                 :                :      * On exit from recovery, reset to "unconditionally true", since there is
                               6908                 :                :      * no need to keep checking.
                               6909                 :                :      */
                               6910                 :          10251 :     LocalXLogInsertAllowed = 1;
                               6911                 :          10251 :     return true;
                               6912                 :                : }
                               6913                 :                : 
                               6914                 :                : /*
                               6915                 :                :  * Make XLogInsertAllowed() return true in the current process only.
                               6916                 :                :  *
                               6917                 :                :  * Note: it is allowed to switch LocalXLogInsertAllowed back to -1 later,
                               6918                 :                :  * and even call LocalSetXLogInsertAllowed() again after that.
                               6919                 :                :  *
                               6920                 :                :  * Returns the previous value of LocalXLogInsertAllowed.
                               6921                 :                :  */
                               6922                 :                : static int
                               6923                 :           1022 : LocalSetXLogInsertAllowed(void)
                               6924                 :                : {
 1621 heikki.linnakangas@i     6925                 :           1022 :     int         oldXLogAllowed = LocalXLogInsertAllowed;
                               6926                 :                : 
 6239 tgl@sss.pgh.pa.us        6927                 :           1022 :     LocalXLogInsertAllowed = 1;
                               6928                 :                : 
 1735 rhaas@postgresql.org     6929                 :           1022 :     return oldXLogAllowed;
                               6930                 :                : }
                               6931                 :                : 
                               6932                 :                : /*
                               6933                 :                :  * Return the current Redo pointer from shared memory.
                               6934                 :                :  *
                               6935                 :                :  * As a side-effect, the local RedoRecPtr copy is updated.
                               6936                 :                :  */
                               6937                 :                : XLogRecPtr
 9341 vadim4o@yahoo.com        6938                 :         290347 : GetRedoRecPtr(void)
                               6939                 :                : {
                               6940                 :                :     XLogRecPtr  ptr;
                               6941                 :                : 
                               6942                 :                :     /*
                               6943                 :                :      * The possibly not up-to-date copy in XLogCtl is enough. Even if we
                               6944                 :                :      * grabbed a WAL insertion lock to read the authoritative value in
                               6945                 :                :      * Insert->RedoRecPtr, someone might update it just after we've released
                               6946                 :                :      * the lock.
                               6947                 :                :      */
 4325 andres@anarazel.de       6948                 :         290347 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6949                 :         290347 :     ptr = XLogCtl->RedoRecPtr;
                               6950                 :         290347 :     SpinLockRelease(&XLogCtl->info_lck);
                               6951                 :                : 
 4766 heikki.linnakangas@i     6952         [ +  + ]:         290347 :     if (RedoRecPtr < ptr)
                               6953                 :           1752 :         RedoRecPtr = ptr;
                               6954                 :                : 
 8899 tgl@sss.pgh.pa.us        6955                 :         290347 :     return RedoRecPtr;
                               6956                 :                : }
                               6957                 :                : 
                               6958                 :                : /*
                               6959                 :                :  * Return information needed to decide whether a modified block needs a
                               6960                 :                :  * full-page image to be included in the WAL record.
                               6961                 :                :  *
                               6962                 :                :  * The returned values are cached copies from backend-private memory, and
                               6963                 :                :  * possibly out-of-date or, indeed, uninitialized, in which case they will
                               6964                 :                :  * be InvalidXLogRecPtr and false, respectively.  XLogInsertRecord will
                               6965                 :                :  * re-check them against up-to-date values, while holding the WAL insert lock.
                               6966                 :                :  */
                               6967                 :                : void
 4280 heikki.linnakangas@i     6968                 :       24830238 : GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p)
                               6969                 :                : {
                               6970                 :       24830238 :     *RedoRecPtr_p = RedoRecPtr;
                               6971                 :       24830238 :     *doPageWrites_p = doPageWrites;
                               6972                 :       24830238 : }
                               6973                 :                : 
                               6974                 :                : /*
                               6975                 :                :  * GetInsertRecPtr -- Returns the current insert position.
                               6976                 :                :  *
                               6977                 :                :  * NOTE: The value *actually* returned is the position of the last full
                               6978                 :                :  * xlog page. It lags behind the real insert position by at most 1 page.
                               6979                 :                :  * For that, we don't need to scan through WAL insertion locks, and an
                               6980                 :                :  * approximation is enough for the current usage of this function.
                               6981                 :                :  */
                               6982                 :                : XLogRecPtr
 6968 tgl@sss.pgh.pa.us        6983                 :          11267 : GetInsertRecPtr(void)
                               6984                 :                : {
                               6985                 :                :     XLogRecPtr  recptr;
                               6986                 :                : 
 4325 andres@anarazel.de       6987                 :          11267 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6988                 :          11267 :     recptr = XLogCtl->LogwrtRqst.Write;
                               6989                 :          11267 :     SpinLockRelease(&XLogCtl->info_lck);
                               6990                 :                : 
 6968 tgl@sss.pgh.pa.us        6991                 :          11267 :     return recptr;
                               6992                 :                : }
                               6993                 :                : 
                               6994                 :                : /*
                               6995                 :                :  * GetFlushRecPtr -- Returns the current flush position, ie, the last WAL
                               6996                 :                :  * position known to be fsync'd to disk. This should only be used on a
                               6997                 :                :  * system that is known not to be in recovery.
                               6998                 :                :  */
                               6999                 :                : XLogRecPtr
 1724 rhaas@postgresql.org     7000                 :         230174 : GetFlushRecPtr(TimeLineID *insertTLI)
                               7001                 :                : {
 1719                          7002         [ -  + ]:         230174 :     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
                               7003                 :                : 
  844 alvherre@alvh.no-ip.     7004                 :         230174 :     RefreshXLogWriteResult(LogwrtResult);
                               7005                 :                : 
                               7006                 :                :     /*
                               7007                 :                :      * If we're writing and flushing WAL, the time line can't be changing, so
                               7008                 :                :      * no lock is required.
                               7009                 :                :      */
 1724 rhaas@postgresql.org     7010         [ +  + ]:         230174 :     if (insertTLI)
 1719                          7011                 :          25362 :         *insertTLI = XLogCtl->InsertTimeLineID;
                               7012                 :                : 
 3848 simon@2ndQuadrant.co     7013                 :         230174 :     return LogwrtResult.Flush;
                               7014                 :                : }
                               7015                 :                : 
                               7016                 :                : /*
                               7017                 :                :  * GetWALInsertionTimeLine -- Returns the current timeline of a system that
                               7018                 :                :  * is not in recovery.
                               7019                 :                :  */
                               7020                 :                : TimeLineID
 1724 rhaas@postgresql.org     7021                 :         122439 : GetWALInsertionTimeLine(void)
                               7022                 :                : {
                               7023         [ -  + ]:         122439 :     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
                               7024                 :                : 
                               7025                 :                :     /* Since the value can't be changing, no lock is required. */
 1719                          7026                 :         122439 :     return XLogCtl->InsertTimeLineID;
                               7027                 :                : }
                               7028                 :                : 
                               7029                 :                : /*
                               7030                 :                :  * GetWALInsertionTimeLineIfSet -- If the system is not in recovery, returns
                               7031                 :                :  * the WAL insertion timeline; else, returns 0. Wherever possible, use
                               7032                 :                :  * GetWALInsertionTimeLine() instead, since it's cheaper. Note that this
                               7033                 :                :  * function decides recovery has ended as soon as the insert TLI is set, which
                               7034                 :                :  * happens before we set XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE.
                               7035                 :                :  */
                               7036                 :                : TimeLineID
  730                          7037                 :           1402 : GetWALInsertionTimeLineIfSet(void)
                               7038                 :                : {
                               7039                 :                :     TimeLineID  insertTLI;
                               7040                 :                : 
                               7041                 :           1402 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7042                 :           1402 :     insertTLI = XLogCtl->InsertTimeLineID;
                               7043                 :           1402 :     SpinLockRelease(&XLogCtl->info_lck);
                               7044                 :                : 
                               7045                 :           1402 :     return insertTLI;
                               7046                 :                : }
                               7047                 :                : 
                               7048                 :                : /*
                               7049                 :                :  * GetLastImportantRecPtr -- Returns the LSN of the last important record
                               7050                 :                :  * inserted. All records not explicitly marked as unimportant are considered
                               7051                 :                :  * important.
                               7052                 :                :  *
                               7053                 :                :  * The LSN is determined by computing the maximum of
                               7054                 :                :  * WALInsertLocks[i].lastImportantAt.
                               7055                 :                :  */
                               7056                 :                : XLogRecPtr
 3503 andres@anarazel.de       7057                 :           1905 : GetLastImportantRecPtr(void)
                               7058                 :                : {
                               7059                 :           1905 :     XLogRecPtr  res = InvalidXLogRecPtr;
                               7060                 :                :     int         i;
                               7061                 :                : 
                               7062         [ +  + ]:          17145 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               7063                 :                :     {
                               7064                 :                :         XLogRecPtr  last_important;
                               7065                 :                : 
                               7066                 :                :         /*
                               7067                 :                :          * Need to take a lock to prevent torn reads of the LSN, which are
                               7068                 :                :          * possible on some of the supported platforms. WAL insert locks only
                               7069                 :                :          * support exclusive mode, so we have to use that.
                               7070                 :                :          */
                               7071                 :          15240 :         LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               7072                 :          15240 :         last_important = WALInsertLocks[i].l.lastImportantAt;
                               7073                 :          15240 :         LWLockRelease(&WALInsertLocks[i].l.lock);
                               7074                 :                : 
                               7075         [ +  + ]:          15240 :         if (res < last_important)
                               7076                 :           3433 :             res = last_important;
                               7077                 :                :     }
                               7078                 :                : 
                               7079                 :           1905 :     return res;
                               7080                 :                : }
                               7081                 :                : 
                               7082                 :                : /*
                               7083                 :                :  * Get the time and LSN of the last xlog segment switch
                               7084                 :                :  */
                               7085                 :                : pg_time_t
 3503 andres@anarazel.de       7086                 :UBC           0 : GetLastSegSwitchData(XLogRecPtr *lastSwitchLSN)
                               7087                 :                : {
                               7088                 :                :     pg_time_t   result;
                               7089                 :                : 
                               7090                 :                :     /* Need WALWriteLock, but shared lock is sufficient */
 7283 tgl@sss.pgh.pa.us        7091                 :              0 :     LWLockAcquire(WALWriteLock, LW_SHARED);
 4757 heikki.linnakangas@i     7092                 :              0 :     result = XLogCtl->lastSegSwitchTime;
 3503 andres@anarazel.de       7093                 :              0 :     *lastSwitchLSN = XLogCtl->lastSegSwitchLSN;
 7283 tgl@sss.pgh.pa.us        7094                 :              0 :     LWLockRelease(WALWriteLock);
                               7095                 :                : 
                               7096                 :              0 :     return result;
                               7097                 :                : }
                               7098                 :                : 
                               7099                 :                : /*
                               7100                 :                :  * This must be called ONCE during postmaster or standalone-backend shutdown
                               7101                 :                :  */
                               7102                 :                : void
 8262 peter_e@gmx.net          7103                 :CBC         745 : ShutdownXLOG(int code, Datum arg)
                               7104                 :                : {
                               7105                 :                :     /*
                               7106                 :                :      * We should have an aux process resource owner to use, and we should not
                               7107                 :                :      * be in a transaction that's installed some other resowner.
                               7108                 :                :      */
 2930 tgl@sss.pgh.pa.us        7109         [ -  + ]:            745 :     Assert(AuxProcessResourceOwner != NULL);
                               7110   [ +  +  -  + ]:            745 :     Assert(CurrentResourceOwner == NULL ||
                               7111                 :                :            CurrentResourceOwner == AuxProcessResourceOwner);
                               7112                 :            745 :     CurrentResourceOwner = AuxProcessResourceOwner;
                               7113                 :                : 
                               7114                 :                :     /* Don't be chatty in standalone mode */
 4791                          7115   [ +  +  +  + ]:            745 :     ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                               7116                 :                :             (errmsg("shutting down")));
                               7117                 :                : 
                               7118                 :                :     /*
                               7119                 :                :      * Signal walsenders to move to stopping state.
                               7120                 :                :      */
 3338 andres@anarazel.de       7121                 :            745 :     WalSndInitStopping();
                               7122                 :                : 
                               7123                 :                :     /*
                               7124                 :                :      * Wait for WAL senders to be in stopping state.  This prevents commands
                               7125                 :                :      * from writing new WAL.
                               7126                 :                :      */
                               7127                 :            745 :     WalSndWaitStopping();
                               7128                 :                : 
 6367 heikki.linnakangas@i     7129         [ +  + ]:            745 :     if (RecoveryInProgress())
  380 nathan@postgresql.or     7130                 :             63 :         CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FAST);
                               7131                 :                :     else
                               7132                 :                :     {
                               7133                 :                :         /*
                               7134                 :                :          * If archiving is enabled, rotate the last XLOG file so that all the
                               7135                 :                :          * remaining records are archived (postmaster wakes up the archiver
                               7136                 :                :          * process one more time at the end of shutdown). The checkpoint
                               7137                 :                :          * record will go to the next XLOG file and won't be archived (yet).
                               7138                 :                :          */
 1634 rhaas@postgresql.org     7139   [ +  +  -  +  :            682 :         if (XLogArchivingActive())
                                              +  + ]
 3503 andres@anarazel.de       7140                 :             18 :             RequestXLogSwitch(false);
                               7141                 :                : 
  380 nathan@postgresql.or     7142                 :            682 :         CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FAST);
                               7143                 :                :     }
 9790 vadim4o@yahoo.com        7144                 :            745 : }
                               7145                 :                : 
                               7146                 :                : /*
                               7147                 :                :  * Format checkpoint request flags as a space-separated string for
                               7148                 :                :  * log messages.
                               7149                 :                :  */
                               7150                 :                : static const char *
  157 fujii@postgresql.org     7151                 :           3204 : CheckpointFlagsString(int flags)
                               7152                 :                : {
                               7153                 :                :     static char buf[128];
                               7154                 :                : 
                               7155                 :          25632 :     snprintf(buf, sizeof(buf), "%s%s%s%s%s%s%s%s",
                               7156         [ +  + ]:           3204 :              (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
                               7157         [ +  + ]:           3204 :              (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
                               7158         [ +  + ]:           3204 :              (flags & CHECKPOINT_FAST) ? " fast" : "",
                               7159         [ +  + ]:           3204 :              (flags & CHECKPOINT_FORCE) ? " force" : "",
                               7160         [ +  + ]:           3204 :              (flags & CHECKPOINT_WAIT) ? " wait" : "",
                               7161         [ +  + ]:           3204 :              (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
                               7162         [ +  + ]:           3204 :              (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
                               7163         [ +  + ]:           3204 :              (flags & CHECKPOINT_FLUSH_UNLOGGED) ? " flush-unlogged" : "");
                               7164                 :                : 
                               7165                 :           3204 :     return buf;
                               7166                 :                : }
                               7167                 :                : 
                               7168                 :                : /*
                               7169                 :                :  * Log start of a checkpoint.
                               7170                 :                :  */
                               7171                 :                : static void
 6367 heikki.linnakangas@i     7172                 :           1602 : LogCheckpointStart(int flags, bool restartpoint)
                               7173                 :                : {
 2060 peter@eisentraut.org     7174         [ +  + ]:           1602 :     if (restartpoint)
                               7175         [ +  - ]:            217 :         ereport(LOG,
                               7176                 :                :         /* translator: the placeholder shows checkpoint options */
                               7177                 :                :                 (errmsg("restartpoint starting:%s",
                               7178                 :                :                         CheckpointFlagsString(flags))));
                               7179                 :                :     else
                               7180         [ +  - ]:           1385 :         ereport(LOG,
                               7181                 :                :         /* translator: the placeholder shows checkpoint options */
                               7182                 :                :                 (errmsg("checkpoint starting:%s",
                               7183                 :                :                         CheckpointFlagsString(flags))));
 6966 tgl@sss.pgh.pa.us        7184                 :           1602 : }
                               7185                 :                : 
                               7186                 :                : /*
                               7187                 :                :  * Log end of a checkpoint.
                               7188                 :                :  */
                               7189                 :                : static void
  157 fujii@postgresql.org     7190                 :           1930 : LogCheckpointEnd(bool restartpoint, int flags)
                               7191                 :                : {
                               7192                 :                :     long        write_msecs,
                               7193                 :                :                 sync_msecs,
                               7194                 :                :                 total_msecs,
                               7195                 :                :                 longest_msecs,
                               7196                 :                :                 average_msecs;
                               7197                 :                :     uint64      average_sync_time;
                               7198                 :                : 
 6966 tgl@sss.pgh.pa.us        7199                 :           1930 :     CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
                               7200                 :                : 
 2084                          7201                 :           1930 :     write_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_write_t,
                               7202                 :                :                                                   CheckpointStats.ckpt_sync_t);
                               7203                 :                : 
                               7204                 :           1930 :     sync_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_sync_t,
                               7205                 :                :                                                  CheckpointStats.ckpt_sync_end_t);
                               7206                 :                : 
                               7207                 :                :     /* Accumulate checkpoint timing summary data, in milliseconds. */
 1000 michael@paquier.xyz      7208                 :           1930 :     PendingCheckpointerStats.write_time += write_msecs;
                               7209                 :           1930 :     PendingCheckpointerStats.sync_time += sync_msecs;
                               7210                 :                : 
                               7211                 :                :     /*
                               7212                 :                :      * All of the published timing statistics are accounted for.  Only
                               7213                 :                :      * continue if a log message is to be written.
                               7214                 :                :      */
 5225 rhaas@postgresql.org     7215         [ +  + ]:           1930 :     if (!log_checkpoints)
                               7216                 :            328 :         return;
                               7217                 :                : 
 2084 tgl@sss.pgh.pa.us        7218                 :           1602 :     total_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_start_t,
                               7219                 :                :                                                   CheckpointStats.ckpt_end_t);
                               7220                 :                : 
                               7221                 :                :     /*
                               7222                 :                :      * Timing values returned from CheckpointStats are in microseconds.
                               7223                 :                :      * Convert to milliseconds for consistent printing.
                               7224                 :                :      */
                               7225                 :           1602 :     longest_msecs = (long) ((CheckpointStats.ckpt_longest_sync + 999) / 1000);
                               7226                 :                : 
 5703 rhaas@postgresql.org     7227                 :           1602 :     average_sync_time = 0;
 5586 bruce@momjian.us         7228         [ -  + ]:           1602 :     if (CheckpointStats.ckpt_sync_rels > 0)
 5703 rhaas@postgresql.org     7229                 :UBC           0 :         average_sync_time = CheckpointStats.ckpt_agg_sync_time /
                               7230                 :              0 :             CheckpointStats.ckpt_sync_rels;
 2084 tgl@sss.pgh.pa.us        7231                 :CBC        1602 :     average_msecs = (long) ((average_sync_time + 999) / 1000);
                               7232                 :                : 
                               7233                 :                :     /*
                               7234                 :                :      * ControlFileLock is not required to see ControlFile->checkPoint and
                               7235                 :                :      * ->checkPointCopy here as we are the only updator of those variables at
                               7236                 :                :      * this moment.
                               7237                 :                :      */
 2060 peter@eisentraut.org     7238         [ +  + ]:           1602 :     if (restartpoint)
                               7239         [ +  - ]:            217 :         ereport(LOG,
                               7240                 :                :                 (errmsg("restartpoint complete:%s: wrote %d buffers (%.1f%%), "
                               7241                 :                :                         "wrote %d SLRU buffers; %d WAL file(s) added, "
                               7242                 :                :                         "%d removed, %d recycled; write=%ld.%03d s, "
                               7243                 :                :                         "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, "
                               7244                 :                :                         "longest=%ld.%03d s, average=%ld.%03d s; distance=%d kB, "
                               7245                 :                :                         "estimate=%d kB; lsn=%X/%08X, redo lsn=%X/%08X",
                               7246                 :                :                         CheckpointFlagsString(flags),
                               7247                 :                :                         CheckpointStats.ckpt_bufs_written,
                               7248                 :                :                         (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
                               7249                 :                :                         CheckpointStats.ckpt_slru_written,
                               7250                 :                :                         CheckpointStats.ckpt_segs_added,
                               7251                 :                :                         CheckpointStats.ckpt_segs_removed,
                               7252                 :                :                         CheckpointStats.ckpt_segs_recycled,
                               7253                 :                :                         write_msecs / 1000, (int) (write_msecs % 1000),
                               7254                 :                :                         sync_msecs / 1000, (int) (sync_msecs % 1000),
                               7255                 :                :                         total_msecs / 1000, (int) (total_msecs % 1000),
                               7256                 :                :                         CheckpointStats.ckpt_sync_rels,
                               7257                 :                :                         longest_msecs / 1000, (int) (longest_msecs % 1000),
                               7258                 :                :                         average_msecs / 1000, (int) (average_msecs % 1000),
                               7259                 :                :                         (int) (PrevCheckPointDistance / 1024.0),
                               7260                 :                :                         (int) (CheckPointDistanceEstimate / 1024.0),
                               7261                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPoint),
                               7262                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
                               7263                 :                :     else
                               7264         [ +  - ]:           1385 :         ereport(LOG,
                               7265                 :                :                 (errmsg("checkpoint complete:%s: wrote %d buffers (%.1f%%), "
                               7266                 :                :                         "wrote %d SLRU buffers; %d WAL file(s) added, "
                               7267                 :                :                         "%d removed, %d recycled; write=%ld.%03d s, "
                               7268                 :                :                         "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, "
                               7269                 :                :                         "longest=%ld.%03d s, average=%ld.%03d s; distance=%d kB, "
                               7270                 :                :                         "estimate=%d kB; lsn=%X/%08X, redo lsn=%X/%08X",
                               7271                 :                :                         CheckpointFlagsString(flags),
                               7272                 :                :                         CheckpointStats.ckpt_bufs_written,
                               7273                 :                :                         (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
                               7274                 :                :                         CheckpointStats.ckpt_slru_written,
                               7275                 :                :                         CheckpointStats.ckpt_segs_added,
                               7276                 :                :                         CheckpointStats.ckpt_segs_removed,
                               7277                 :                :                         CheckpointStats.ckpt_segs_recycled,
                               7278                 :                :                         write_msecs / 1000, (int) (write_msecs % 1000),
                               7279                 :                :                         sync_msecs / 1000, (int) (sync_msecs % 1000),
                               7280                 :                :                         total_msecs / 1000, (int) (total_msecs % 1000),
                               7281                 :                :                         CheckpointStats.ckpt_sync_rels,
                               7282                 :                :                         longest_msecs / 1000, (int) (longest_msecs % 1000),
                               7283                 :                :                         average_msecs / 1000, (int) (average_msecs % 1000),
                               7284                 :                :                         (int) (PrevCheckPointDistance / 1024.0),
                               7285                 :                :                         (int) (CheckPointDistanceEstimate / 1024.0),
                               7286                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPoint),
                               7287                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
                               7288                 :                : }
                               7289                 :                : 
                               7290                 :                : /*
                               7291                 :                :  * Update the estimate of distance between checkpoints.
                               7292                 :                :  *
                               7293                 :                :  * The estimate is used to calculate the number of WAL segments to keep
                               7294                 :                :  * preallocated, see XLOGfileslop().
                               7295                 :                :  */
                               7296                 :                : static void
 4171 heikki.linnakangas@i     7297                 :           1930 : UpdateCheckPointDistanceEstimate(uint64 nbytes)
                               7298                 :                : {
                               7299                 :                :     /*
                               7300                 :                :      * To estimate the number of segments consumed between checkpoints, keep a
                               7301                 :                :      * moving average of the amount of WAL generated in previous checkpoint
                               7302                 :                :      * cycles. However, if the load is bursty, with quiet periods and busy
                               7303                 :                :      * periods, we want to cater for the peak load. So instead of a plain
                               7304                 :                :      * moving average, let the average decline slowly if the previous cycle
                               7305                 :                :      * used less WAL than estimated, but bump it up immediately if it used
                               7306                 :                :      * more.
                               7307                 :                :      *
                               7308                 :                :      * When checkpoints are triggered by max_wal_size, this should converge to
                               7309                 :                :      * CheckpointSegments * wal_segment_size,
                               7310                 :                :      *
                               7311                 :                :      * Note: This doesn't pay any attention to what caused the checkpoint.
                               7312                 :                :      * Checkpoints triggered manually with CHECKPOINT command, or by e.g.
                               7313                 :                :      * starting a base backup, are counted the same as those created
                               7314                 :                :      * automatically. The slow-decline will largely mask them out, if they are
                               7315                 :                :      * not frequent. If they are frequent, it seems reasonable to count them
                               7316                 :                :      * in as any others; if you issue a manual checkpoint every 5 minutes and
                               7317                 :                :      * never let a timed checkpoint happen, it makes sense to base the
                               7318                 :                :      * preallocation on that 5 minute interval rather than whatever
                               7319                 :                :      * checkpoint_timeout is set to.
                               7320                 :                :      */
                               7321                 :           1930 :     PrevCheckPointDistance = nbytes;
                               7322         [ +  + ]:           1930 :     if (CheckPointDistanceEstimate < nbytes)
                               7323                 :            845 :         CheckPointDistanceEstimate = nbytes;
                               7324                 :                :     else
                               7325                 :           1085 :         CheckPointDistanceEstimate =
                               7326                 :           1085 :             (0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes);
 6966 tgl@sss.pgh.pa.us        7327                 :           1930 : }
                               7328                 :                : 
                               7329                 :                : /*
                               7330                 :                :  * Update the ps display for a process running a checkpoint.  Note that
                               7331                 :                :  * this routine should not do any allocations so as it can be called
                               7332                 :                :  * from a critical section.
                               7333                 :                :  */
                               7334                 :                : static void
 2050 michael@paquier.xyz      7335                 :           3860 : update_checkpoint_display(int flags, bool restartpoint, bool reset)
                               7336                 :                : {
                               7337                 :                :     /*
                               7338                 :                :      * The status is reported only for end-of-recovery and shutdown
                               7339                 :                :      * checkpoints or shutdown restartpoints.  Updating the ps display is
                               7340                 :                :      * useful in those situations as it may not be possible to rely on
                               7341                 :                :      * pg_stat_activity to see the status of the checkpointer or the startup
                               7342                 :                :      * process.
                               7343                 :                :      */
                               7344         [ +  + ]:           3860 :     if ((flags & (CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IS_SHUTDOWN)) == 0)
                               7345                 :           2388 :         return;
                               7346                 :                : 
                               7347         [ +  + ]:           1472 :     if (reset)
                               7348                 :            736 :         set_ps_display("");
                               7349                 :                :     else
                               7350                 :                :     {
                               7351                 :                :         char        activitymsg[128];
                               7352                 :                : 
                               7353         [ +  + ]:           2208 :         snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
                               7354         [ +  + ]:            736 :                  (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
                               7355         [ +  + ]:            736 :                  (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
                               7356                 :                :                  restartpoint ? "restartpoint" : "checkpoint");
                               7357                 :            736 :         set_ps_display(activitymsg);
                               7358                 :                :     }
                               7359                 :                : }
                               7360                 :                : 
                               7361                 :                : 
                               7362                 :                : /*
                               7363                 :                :  * Perform a checkpoint --- either during shutdown, or on-the-fly
                               7364                 :                :  *
                               7365                 :                :  * flags is a bitwise OR of the following:
                               7366                 :                :  *  CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
                               7367                 :                :  *  CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
                               7368                 :                :  *  CHECKPOINT_FAST: finish the checkpoint ASAP, ignoring
                               7369                 :                :  *      checkpoint_completion_target parameter.
                               7370                 :                :  *  CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
                               7371                 :                :  *      since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
                               7372                 :                :  *      CHECKPOINT_END_OF_RECOVERY).
                               7373                 :                :  *  CHECKPOINT_FLUSH_UNLOGGED: also flush buffers of unlogged tables.
                               7374                 :                :  *
                               7375                 :                :  * Note: flags contains other bits, of interest here only for logging purposes.
                               7376                 :                :  * In particular note that this routine is synchronous and does not pay
                               7377                 :                :  * attention to CHECKPOINT_WAIT.
                               7378                 :                :  *
                               7379                 :                :  * If !shutdown then we are writing an online checkpoint. An XLOG_CHECKPOINT_REDO
                               7380                 :                :  * record is inserted into WAL at the logical location of the checkpoint, before
                               7381                 :                :  * flushing anything to disk, and when the checkpoint is eventually completed,
                               7382                 :                :  * and it is from this point that WAL replay will begin in the case of a recovery
                               7383                 :                :  * from this checkpoint. Once everything is written to disk, an
                               7384                 :                :  * XLOG_CHECKPOINT_ONLINE record is written to complete the checkpoint, and
                               7385                 :                :  * points back to the earlier XLOG_CHECKPOINT_REDO record. This mechanism allows
                               7386                 :                :  * other write-ahead log records to be written while the checkpoint is in
                               7387                 :                :  * progress, but we must be very careful about order of operations. This function
                               7388                 :                :  * may take many minutes to execute on a busy system.
                               7389                 :                :  *
                               7390                 :                :  * On the other hand, when shutdown is true, concurrent insertion into the
                               7391                 :                :  * write-ahead log is impossible, so there is no need for two separate records.
                               7392                 :                :  * In this case, we only insert an XLOG_CHECKPOINT_SHUTDOWN record, and it's
                               7393                 :                :  * both the record marking the completion of the checkpoint and the location
                               7394                 :                :  * from which WAL replay would begin if needed.
                               7395                 :                :  *
                               7396                 :                :  * Returns true if a new checkpoint was performed, or false if it was skipped
                               7397                 :                :  * because the system was idle.
                               7398                 :                :  */
                               7399                 :                : bool
 6968 tgl@sss.pgh.pa.us        7400                 :           1714 : CreateCheckPoint(int flags)
                               7401                 :                : {
                               7402                 :                :     bool        shutdown;
                               7403                 :                :     CheckPoint  checkPoint;
                               7404                 :                :     XLogRecPtr  recptr;
                               7405                 :                :     XLogSegNo   _logSegNo;
 9601 bruce@momjian.us         7406                 :           1714 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               7407                 :                :     uint32      freespace;
                               7408                 :                :     XLogRecPtr  PriorRedoPtr;
                               7409                 :                :     XLogRecPtr  last_important_lsn;
                               7410                 :                :     VirtualTransactionId *vxids;
                               7411                 :                :     int         nvxids;
 1735 rhaas@postgresql.org     7412                 :           1714 :     int         oldXLogAllowed = 0;
                               7413                 :                : 
                               7414                 :                :     /*
                               7415                 :                :      * An end-of-recovery checkpoint is really a shutdown checkpoint, just
                               7416                 :                :      * issued at a different time.
                               7417                 :                :      */
 6239 tgl@sss.pgh.pa.us        7418         [ +  + ]:           1714 :     if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))
 6240 heikki.linnakangas@i     7419                 :            713 :         shutdown = true;
                               7420                 :                :     else
                               7421                 :           1001 :         shutdown = false;
                               7422                 :                : 
                               7423                 :                :     /* sanity check */
 6239 tgl@sss.pgh.pa.us        7424   [ +  +  -  + ]:           1714 :     if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
 6239 tgl@sss.pgh.pa.us        7425         [ #  # ]:UBC           0 :         elog(ERROR, "can't create a checkpoint during recovery");
                               7426                 :                : 
                               7427                 :                :     /*
                               7428                 :                :      * Prepare to accumulate statistics.
                               7429                 :                :      *
                               7430                 :                :      * Note: because it is possible for log_checkpoints to change while a
                               7431                 :                :      * checkpoint proceeds, we always accumulate stats, even if
                               7432                 :                :      * log_checkpoints is currently off.
                               7433                 :                :      */
 6966 tgl@sss.pgh.pa.us        7434   [ +  -  +  -  :CBC       18854 :     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
                                     +  -  +  -  +  
                                                 + ]
                               7435                 :           1714 :     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
                               7436                 :                : 
                               7437                 :                :     /*
                               7438                 :                :      * Let smgr prepare for checkpoint; this has to happen outside the
                               7439                 :                :      * critical section and before we determine the REDO pointer.  Note that
                               7440                 :                :      * smgr must not do anything that'd have to be undone if we decide no
                               7441                 :                :      * checkpoint is needed.
                               7442                 :                :      */
 1593 tmunro@postgresql.or     7443                 :           1714 :     SyncPreCheckpoint();
                               7444                 :                : 
                               7445                 :                :     /* Run these points outside the critical section. */
  222 michael@paquier.xyz      7446                 :           1714 :     INJECTION_POINT("create-checkpoint-initial", NULL);
                               7447                 :           1714 :     INJECTION_POINT_LOAD("create-checkpoint-run");
                               7448                 :                : 
                               7449                 :                :     /*
                               7450                 :                :      * Use a critical section to force system panic if we have trouble.
                               7451                 :                :      */
 9066 tgl@sss.pgh.pa.us        7452                 :           1714 :     START_CRIT_SECTION();
                               7453                 :                : 
 9799 vadim4o@yahoo.com        7454         [ +  + ]:           1714 :     if (shutdown)
                               7455                 :                :     {
 6367 heikki.linnakangas@i     7456                 :            713 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 9799 vadim4o@yahoo.com        7457                 :            713 :         ControlFile->state = DB_SHUTDOWNING;
                               7458                 :            713 :         UpdateControlFile();
 6367 heikki.linnakangas@i     7459                 :            713 :         LWLockRelease(ControlFileLock);
                               7460                 :                :     }
                               7461                 :                : 
                               7462                 :                :     /* Begin filling in the checkpoint WAL record */
 8466 tgl@sss.pgh.pa.us        7463   [ +  -  +  -  :          22282 :     MemSet(&checkPoint, 0, sizeof(checkPoint));
                                     +  -  +  -  +  
                                                 + ]
 6734                          7464                 :           1714 :     checkPoint.time = (pg_time_t) time(NULL);
                               7465                 :                : 
                               7466                 :                :     /*
                               7467                 :                :      * For Hot Standby, derive the oldestActiveXid before we fix the redo
                               7468                 :                :      * pointer. This allows us to begin accumulating changes to assemble our
                               7469                 :                :      * starting snapshot of locks and transactions.
                               7470                 :                :      */
 5380 simon@2ndQuadrant.co     7471   [ +  +  +  + ]:           1714 :     if (!shutdown && XLogStandbyInfoActive())
  368 akapila@postgresql.o     7472                 :            960 :         checkPoint.oldestActiveXid = GetOldestActiveTransactionId(false, true);
                               7473                 :                :     else
 5380 simon@2ndQuadrant.co     7474                 :            754 :         checkPoint.oldestActiveXid = InvalidTransactionId;
                               7475                 :                : 
                               7476                 :                :     /*
                               7477                 :                :      * Get location of last important record before acquiring insert locks (as
                               7478                 :                :      * GetLastImportantRecPtr() also locks WAL locks).
                               7479                 :                :      */
 3503 andres@anarazel.de       7480                 :           1714 :     last_important_lsn = GetLastImportantRecPtr();
                               7481                 :                : 
                               7482                 :                :     /*
                               7483                 :                :      * If this isn't a shutdown or forced checkpoint, and if there has been no
                               7484                 :                :      * WAL activity requiring a checkpoint, skip it.  The idea here is to
                               7485                 :                :      * avoid inserting duplicate checkpoints when the system is idle.
                               7486                 :                :      */
 6240 heikki.linnakangas@i     7487         [ +  + ]:           1714 :     if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
                               7488                 :                :                   CHECKPOINT_FORCE)) == 0)
                               7489                 :                :     {
 3503 andres@anarazel.de       7490         [ +  + ]:            212 :         if (last_important_lsn == ControlFile->checkPoint)
                               7491                 :                :         {
 9266 tgl@sss.pgh.pa.us        7492         [ -  + ]:              1 :             END_CRIT_SECTION();
 3503 andres@anarazel.de       7493         [ -  + ]:              1 :             ereport(DEBUG1,
                               7494                 :                :                     (errmsg_internal("checkpoint skipped because system is idle")));
  664 fujii@postgresql.org     7495                 :              1 :             return false;
                               7496                 :                :         }
                               7497                 :                :     }
                               7498                 :                : 
                               7499                 :                :     /*
                               7500                 :                :      * An end-of-recovery checkpoint is created before anyone is allowed to
                               7501                 :                :      * write WAL. To allow us to write the checkpoint record, temporarily
                               7502                 :                :      * enable XLogInsertAllowed.
                               7503                 :                :      */
 6177 heikki.linnakangas@i     7504         [ +  + ]:           1713 :     if (flags & CHECKPOINT_END_OF_RECOVERY)
 1735 rhaas@postgresql.org     7505                 :             31 :         oldXLogAllowed = LocalSetXLogInsertAllowed();
                               7506                 :                : 
 1719                          7507                 :           1713 :     checkPoint.ThisTimeLineID = XLogCtl->InsertTimeLineID;
 4913 heikki.linnakangas@i     7508         [ +  + ]:           1713 :     if (flags & CHECKPOINT_END_OF_RECOVERY)
                               7509                 :             31 :         checkPoint.PrevTimeLineID = XLogCtl->PrevTimeLineID;
                               7510                 :                :     else
 1724 rhaas@postgresql.org     7511                 :           1682 :         checkPoint.PrevTimeLineID = checkPoint.ThisTimeLineID;
                               7512                 :                : 
                               7513                 :                :     /*
                               7514                 :                :      * We must block concurrent insertions while examining insert state.
                               7515                 :                :      */
 1011                          7516                 :           1713 :     WALInsertLockAcquireExclusive();
                               7517                 :                : 
                               7518                 :           1713 :     checkPoint.fullPageWrites = Insert->fullPageWrites;
  738                          7519                 :           1713 :     checkPoint.wal_level = wal_level;
                               7520                 :                : 
                               7521                 :                :     /*
                               7522                 :                :      * Get the current data_checksum_version value from xlogctl, valid at the
                               7523                 :                :      * time of the checkpoint.
                               7524                 :                :      */
   87 dgustafsson@postgres     7525                 :           1713 :     SpinLockAcquire(&XLogCtl->info_lck);
  114                          7526                 :           1713 :     checkPoint.dataChecksumState = XLogCtl->data_checksum_version;
   87                          7527                 :           1713 :     SpinLockRelease(&XLogCtl->info_lck);
                               7528                 :                : 
 1011 rhaas@postgresql.org     7529         [ +  + ]:           1713 :     if (shutdown)
                               7530                 :                :     {
                               7531                 :            713 :         XLogRecPtr  curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos);
                               7532                 :                : 
                               7533                 :                :         /*
                               7534                 :                :          * Compute new REDO record ptr = location of next XLOG record.
                               7535                 :                :          *
                               7536                 :                :          * Since this is a shutdown checkpoint, there can't be any concurrent
                               7537                 :                :          * WAL insertion.
                               7538                 :                :          */
                               7539         [ +  - ]:            713 :         freespace = INSERT_FREESPACE(curInsert);
                               7540         [ -  + ]:            713 :         if (freespace == 0)
                               7541                 :                :         {
 1011 rhaas@postgresql.org     7542         [ #  # ]:UBC           0 :             if (XLogSegmentOffset(curInsert, wal_segment_size) == 0)
                               7543                 :              0 :                 curInsert += SizeOfXLogLongPHD;
                               7544                 :                :             else
                               7545                 :              0 :                 curInsert += SizeOfXLogShortPHD;
                               7546                 :                :         }
 1011 rhaas@postgresql.org     7547                 :CBC         713 :         checkPoint.redo = curInsert;
                               7548                 :                : 
                               7549                 :                :         /*
                               7550                 :                :          * Here we update the shared RedoRecPtr for future XLogInsert calls;
                               7551                 :                :          * this must be done while holding all the insertion locks.
                               7552                 :                :          *
                               7553                 :                :          * Note: if we fail to complete the checkpoint, RedoRecPtr will be
                               7554                 :                :          * left pointing past where it really needs to point.  This is okay;
                               7555                 :                :          * the only consequence is that XLogInsert might back up whole buffers
                               7556                 :                :          * that it didn't really need to.  We can't postpone advancing
                               7557                 :                :          * RedoRecPtr because XLogInserts that happen while we are dumping
                               7558                 :                :          * buffers must assume that their buffer changes are not included in
                               7559                 :                :          * the checkpoint.
                               7560                 :                :          */
                               7561                 :            713 :         RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
                               7562                 :                :     }
                               7563                 :                : 
                               7564                 :                :     /*
                               7565                 :                :      * Now we can release the WAL insertion locks, allowing other xacts to
                               7566                 :                :      * proceed while we are flushing disk buffers.
                               7567                 :                :      */
 4510 heikki.linnakangas@i     7568                 :           1713 :     WALInsertLockRelease();
                               7569                 :                : 
                               7570                 :                :     /*
                               7571                 :                :      * If this is an online checkpoint, we have not yet determined the redo
                               7572                 :                :      * point. We do so now by inserting the special XLOG_CHECKPOINT_REDO
                               7573                 :                :      * record; the LSN at which it starts becomes the new redo pointer. We
                               7574                 :                :      * don't do this for a shutdown checkpoint, because in that case no WAL
                               7575                 :                :      * can be written between the redo point and the insertion of the
                               7576                 :                :      * checkpoint record itself, so the checkpoint record itself serves to
                               7577                 :                :      * mark the redo point.
                               7578                 :                :      */
 1011 rhaas@postgresql.org     7579         [ +  + ]:           1713 :     if (!shutdown)
                               7580                 :                :     {
                               7581                 :                :         xl_checkpoint_redo redo_rec;
                               7582                 :                : 
  117 dgustafsson@postgres     7583                 :           1000 :         WALInsertLockAcquire();
                               7584                 :           1000 :         redo_rec.wal_level = wal_level;
  114                          7585                 :           1000 :         SpinLockAcquire(&XLogCtl->info_lck);
                               7586                 :           1000 :         redo_rec.data_checksum_version = XLogCtl->data_checksum_version;
                               7587                 :           1000 :         SpinLockRelease(&XLogCtl->info_lck);
  117                          7588                 :           1000 :         WALInsertLockRelease();
                               7589                 :                : 
                               7590                 :                :         /* Include WAL level in record for WAL summarizer's benefit. */
 1011 rhaas@postgresql.org     7591                 :           1000 :         XLogBeginInsert();
  117 dgustafsson@postgres     7592                 :           1000 :         XLogRegisterData(&redo_rec, sizeof(xl_checkpoint_redo));
 1011 rhaas@postgresql.org     7593                 :           1000 :         (void) XLogInsert(RM_XLOG_ID, XLOG_CHECKPOINT_REDO);
                               7594                 :                : 
                               7595                 :                :         /*
                               7596                 :                :          * XLogInsertRecord will have updated XLogCtl->Insert.RedoRecPtr in
                               7597                 :                :          * shared memory and RedoRecPtr in backend-local memory, but we need
                               7598                 :                :          * to copy that into the record that will be inserted when the
                               7599                 :                :          * checkpoint is complete.
                               7600                 :                :          */
                               7601                 :           1000 :         checkPoint.redo = RedoRecPtr;
                               7602                 :                :     }
                               7603                 :                : 
                               7604                 :                :     /* Update the info_lck-protected copy of RedoRecPtr as well */
 4325 andres@anarazel.de       7605                 :           1713 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7606                 :           1713 :     XLogCtl->RedoRecPtr = checkPoint.redo;
                               7607                 :           1713 :     SpinLockRelease(&XLogCtl->info_lck);
                               7608                 :                : 
                               7609                 :                :     /*
                               7610                 :                :      * If enabled, log checkpoint start.  We postpone this until now so as not
                               7611                 :                :      * to log anything if we decided to skip the checkpoint.
                               7612                 :                :      */
 6966 tgl@sss.pgh.pa.us        7613         [ +  + ]:           1713 :     if (log_checkpoints)
 6367 heikki.linnakangas@i     7614                 :           1385 :         LogCheckpointStart(flags, false);
                               7615                 :                : 
  222 michael@paquier.xyz      7616                 :           1713 :     INJECTION_POINT_CACHED("create-checkpoint-run", NULL);
                               7617                 :                : 
                               7618                 :                :     /* Update the process title */
 2050                          7619                 :           1713 :     update_checkpoint_display(flags, false, false);
                               7620                 :                : 
                               7621                 :                :     TRACE_POSTGRESQL_CHECKPOINT_START(flags);
                               7622                 :                : 
                               7623                 :                :     /*
                               7624                 :                :      * Get the other info we need for the checkpoint record.
                               7625                 :                :      *
                               7626                 :                :      * We don't need to save oldestClogXid in the checkpoint, it only matters
                               7627                 :                :      * for the short period in which clog is being truncated, and if we crash
                               7628                 :                :      * during that we'll redo the clog truncation and fix up oldestClogXid
                               7629                 :                :      * there.
                               7630                 :                :      */
 4496 heikki.linnakangas@i     7631                 :           1713 :     LWLockAcquire(XidGenLock, LW_SHARED);
  961                          7632                 :           1713 :     checkPoint.nextXid = TransamVariables->nextXid;
                               7633                 :           1713 :     checkPoint.oldestXid = TransamVariables->oldestXid;
                               7634                 :           1713 :     checkPoint.oldestXidDB = TransamVariables->oldestXidDB;
 4496                          7635                 :           1713 :     LWLockRelease(XidGenLock);
                               7636                 :                : 
 4253 alvherre@alvh.no-ip.     7637                 :           1713 :     LWLockAcquire(CommitTsLock, LW_SHARED);
  961 heikki.linnakangas@i     7638                 :           1713 :     checkPoint.oldestCommitTsXid = TransamVariables->oldestCommitTsXid;
                               7639                 :           1713 :     checkPoint.newestCommitTsXid = TransamVariables->newestCommitTsXid;
 4253 alvherre@alvh.no-ip.     7640                 :           1713 :     LWLockRelease(CommitTsLock);
                               7641                 :                : 
 4496 heikki.linnakangas@i     7642                 :           1713 :     LWLockAcquire(OidGenLock, LW_SHARED);
  961                          7643                 :           1713 :     checkPoint.nextOid = TransamVariables->nextOid;
 4496                          7644         [ +  + ]:           1713 :     if (!shutdown)
  961                          7645                 :           1000 :         checkPoint.nextOid += TransamVariables->oidCount;
 4496                          7646                 :           1713 :     LWLockRelease(OidGenLock);
                               7647                 :                : 
  215 msawada@postgresql.o     7648                 :           1713 :     checkPoint.logicalDecodingEnabled = IsLogicalDecodingEnabled();
                               7649                 :                : 
 4496 heikki.linnakangas@i     7650                 :           1713 :     MultiXactGetCheckptMulti(shutdown,
                               7651                 :                :                              &checkPoint.nextMulti,
                               7652                 :                :                              &checkPoint.nextMultiOffset,
                               7653                 :                :                              &checkPoint.oldestMulti,
                               7654                 :                :                              &checkPoint.oldestMultiDB);
                               7655                 :                : 
                               7656                 :                :     /*
                               7657                 :                :      * Having constructed the checkpoint record, ensure all shmem disk buffers
                               7658                 :                :      * and commit-log buffers are flushed to disk.
                               7659                 :                :      *
                               7660                 :                :      * This I/O could fail for various reasons.  If so, we will fail to
                               7661                 :                :      * complete the checkpoint, but there is no reason to force a system
                               7662                 :                :      * panic. Accordingly, exit critical section while doing it.
                               7663                 :                :      */
                               7664         [ -  + ]:           1713 :     END_CRIT_SECTION();
                               7665                 :                : 
                               7666                 :                :     /*
                               7667                 :                :      * In some cases there are groups of actions that must all occur on one
                               7668                 :                :      * side or the other of a checkpoint record. Before flushing the
                               7669                 :                :      * checkpoint record we must explicitly wait for any backend currently
                               7670                 :                :      * performing those groups of actions.
                               7671                 :                :      *
                               7672                 :                :      * One example is end of transaction, so we must wait for any transactions
                               7673                 :                :      * that are currently in commit critical sections.  If an xact inserted
                               7674                 :                :      * its commit record into XLOG just before the REDO point, then a crash
                               7675                 :                :      * restart from the REDO point would not replay that record, which means
                               7676                 :                :      * that our flushing had better include the xact's update of pg_xact.  So
                               7677                 :                :      * we wait till he's out of his commit critical section before proceeding.
                               7678                 :                :      * See notes in RecordTransactionCommit().
                               7679                 :                :      *
                               7680                 :                :      * Because we've already released the insertion locks, this test is a bit
                               7681                 :                :      * fuzzy: it is possible that we will wait for xacts we didn't really need
                               7682                 :                :      * to wait for.  But the delay should be short and it seems better to make
                               7683                 :                :      * checkpoint take a bit longer than to hold off insertions longer than
                               7684                 :                :      * necessary. (In fact, the whole reason we have this issue is that xact.c
                               7685                 :                :      * does commit record XLOG insertion and clog update as two separate steps
                               7686                 :                :      * protected by different locks, but again that seems best on grounds of
                               7687                 :                :      * minimizing lock contention.)
                               7688                 :                :      *
                               7689                 :                :      * A transaction that has not yet set delayChkptFlags when we look cannot
                               7690                 :                :      * be at risk, since it has not inserted its commit record yet; and one
                               7691                 :                :      * that's already cleared it is not at risk either, since it's done fixing
                               7692                 :                :      * clog and we will correctly flush the update below.  So we cannot miss
                               7693                 :                :      * any xacts we need to wait for.
                               7694                 :                :      */
 1585 rhaas@postgresql.org     7695                 :           1713 :     vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_START);
 4983 simon@2ndQuadrant.co     7696         [ +  + ]:           1713 :     if (nvxids > 0)
                               7697                 :                :     {
                               7698                 :                :         do
                               7699                 :                :         {
                               7700                 :                :             /*
                               7701                 :                :              * Keep absorbing fsync requests while we wait. There could even
                               7702                 :                :              * be a deadlock if we don't, if the process that prevents the
                               7703                 :                :              * checkpoint is trying to add a request to the queue.
                               7704                 :                :              */
  765 heikki.linnakangas@i     7705                 :             25 :             AbsorbSyncRequests();
                               7706                 :                : 
 1017 tmunro@postgresql.or     7707                 :             25 :             pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
 6828 bruce@momjian.us         7708                 :             25 :             pg_usleep(10000L);  /* wait for 10 msec */
 1017 tmunro@postgresql.or     7709                 :             25 :             pgstat_report_wait_end();
 1585 rhaas@postgresql.org     7710         [ +  + ]:             25 :         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                               7711                 :                :                                               DELAY_CHKPT_START));
                               7712                 :                :     }
 4983 simon@2ndQuadrant.co     7713                 :           1713 :     pfree(vxids);
                               7714                 :                : 
 6968 tgl@sss.pgh.pa.us        7715                 :           1713 :     CheckPointGuts(checkPoint.redo, flags);
                               7716                 :                : 
 1585 rhaas@postgresql.org     7717                 :           1713 :     vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_COMPLETE);
                               7718         [ -  + ]:           1713 :     if (nvxids > 0)
                               7719                 :                :     {
                               7720                 :                :         do
                               7721                 :                :         {
  765 heikki.linnakangas@i     7722                 :UBC           0 :             AbsorbSyncRequests();
                               7723                 :                : 
 1017 tmunro@postgresql.or     7724                 :              0 :             pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_COMPLETE);
 1585 rhaas@postgresql.org     7725                 :              0 :             pg_usleep(10000L);  /* wait for 10 msec */
 1017 tmunro@postgresql.or     7726                 :              0 :             pgstat_report_wait_end();
 1585 rhaas@postgresql.org     7727         [ #  # ]:              0 :         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                               7728                 :                :                                               DELAY_CHKPT_COMPLETE));
                               7729                 :                :     }
 1585 rhaas@postgresql.org     7730                 :CBC        1713 :     pfree(vxids);
                               7731                 :                : 
                               7732                 :                :     /*
                               7733                 :                :      * Take a snapshot of running transactions and write this to WAL. This
                               7734                 :                :      * allows us to reconstruct the state of running transactions during
                               7735                 :                :      * archive recovery, if required. Skip, if this info disabled.
                               7736                 :                :      *
                               7737                 :                :      * If we are shutting down, or Startup process is completing crash
                               7738                 :                :      * recovery we don't need to write running xact data.
                               7739                 :                :      */
 6063 simon@2ndQuadrant.co     7740   [ +  +  +  + ]:           1713 :     if (!shutdown && XLogStandbyInfoActive())
   64 alvherre@kurilemu.de     7741                 :            959 :         LogStandbySnapshot();
                               7742                 :                : 
 8478 tgl@sss.pgh.pa.us        7743                 :           1713 :     START_CRIT_SECTION();
                               7744                 :                : 
                               7745                 :                :     /*
                               7746                 :                :      * Now insert the checkpoint record into XLOG.
                               7747                 :                :      */
 4266 heikki.linnakangas@i     7748                 :           1713 :     XLogBeginInsert();
  530 peter@eisentraut.org     7749                 :           1713 :     XLogRegisterData(&checkPoint, sizeof(checkPoint));
 9266 tgl@sss.pgh.pa.us        7750         [ +  + ]:           1713 :     recptr = XLogInsert(RM_XLOG_ID,
                               7751                 :                :                         shutdown ? XLOG_CHECKPOINT_SHUTDOWN :
                               7752                 :                :                         XLOG_CHECKPOINT_ONLINE);
                               7753                 :                : 
                               7754                 :           1713 :     XLogFlush(recptr);
                               7755                 :                : 
                               7756                 :                :     /*
                               7757                 :                :      * We mustn't write any new WAL after a shutdown checkpoint, or it will be
                               7758                 :                :      * overwritten at next startup.  No-one should even try, this just allows
                               7759                 :                :      * sanity-checking.  In the case of an end-of-recovery checkpoint, we want
                               7760                 :                :      * to just temporarily disable writing until the system has exited
                               7761                 :                :      * recovery.
                               7762                 :                :      */
 6239                          7763         [ +  + ]:           1713 :     if (shutdown)
                               7764                 :                :     {
                               7765         [ +  + ]:            713 :         if (flags & CHECKPOINT_END_OF_RECOVERY)
 1735 rhaas@postgresql.org     7766                 :             31 :             LocalXLogInsertAllowed = oldXLogAllowed;
                               7767                 :                :         else
 5994 bruce@momjian.us         7768                 :            682 :             LocalXLogInsertAllowed = 0; /* never again write WAL */
                               7769                 :                :     }
                               7770                 :                : 
                               7771                 :                :     /*
                               7772                 :                :      * We now have ProcLastRecPtr = start of actual checkpoint record, recptr
                               7773                 :                :      * = end of actual checkpoint record.
                               7774                 :                :      */
 4958 alvherre@alvh.no-ip.     7775   [ +  +  -  + ]:           1713 :     if (shutdown && checkPoint.redo != ProcLastRecPtr)
 8406 tgl@sss.pgh.pa.us        7776         [ #  # ]:UBC           0 :         ereport(PANIC,
                               7777                 :                :                 (errmsg("concurrent write-ahead log activity while database system is shutting down")));
                               7778                 :                : 
                               7779                 :                :     /*
                               7780                 :                :      * Remember the prior checkpoint's redo ptr for
                               7781                 :                :      * UpdateCheckPointDistanceEstimate()
                               7782                 :                :      */
 4171 heikki.linnakangas@i     7783                 :CBC        1713 :     PriorRedoPtr = ControlFile->checkPointCopy.redo;
                               7784                 :                : 
                               7785                 :                :     /*
                               7786                 :                :      * Update the control file.
                               7787                 :                :      */
 9066 tgl@sss.pgh.pa.us        7788                 :           1713 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 9799 vadim4o@yahoo.com        7789         [ +  + ]:           1713 :     if (shutdown)
                               7790                 :            713 :         ControlFile->state = DB_SHUTDOWNED;
 9266 tgl@sss.pgh.pa.us        7791                 :           1713 :     ControlFile->checkPoint = ProcLastRecPtr;
                               7792                 :           1713 :     ControlFile->checkPointCopy = checkPoint;
                               7793                 :                :     /* crash recovery should always recover to the end of WAL */
 4959 alvherre@alvh.no-ip.     7794                 :           1713 :     ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 4982 heikki.linnakangas@i     7795                 :           1713 :     ControlFile->minRecoveryPointTLI = 0;
                               7796                 :                : 
                               7797                 :                :     /*
                               7798                 :                :      * Persist unloggedLSN value. It's reset on crash recovery, so this goes
                               7799                 :                :      * unused on non-shutdown checkpoints, but seems useful to store it always
                               7800                 :                :      * for debugging purposes.
                               7801                 :                :      */
  878 nathan@postgresql.or     7802                 :           1713 :     ControlFile->unloggedLSN = pg_atomic_read_membarrier_u64(&XLogCtl->unloggedLSN);
                               7803                 :                : 
 9799 vadim4o@yahoo.com        7804                 :           1713 :     UpdateControlFile();
 9066 tgl@sss.pgh.pa.us        7805                 :           1713 :     LWLockRelease(ControlFileLock);
                               7806                 :                : 
                               7807                 :                :     /*
                               7808                 :                :      * We are now done with critical updates; no need for system panic if we
                               7809                 :                :      * have trouble while fooling with old log segments.
                               7810                 :                :      */
 8478                          7811         [ -  + ]:           1713 :     END_CRIT_SECTION();
                               7812                 :                : 
                               7813                 :                :     /*
                               7814                 :                :      * WAL summaries end when the next XLOG_CHECKPOINT_REDO or
                               7815                 :                :      * XLOG_CHECKPOINT_SHUTDOWN record is reached. This is the first point
                               7816                 :                :      * where (a) we're not inside of a critical section and (b) we can be
                               7817                 :                :      * certain that the relevant record has been flushed to disk, which must
                               7818                 :                :      * happen before it can be summarized.
                               7819                 :                :      *
                               7820                 :                :      * If this is a shutdown checkpoint, then this happens reasonably
                               7821                 :                :      * promptly: we've only just inserted and flushed the
                               7822                 :                :      * XLOG_CHECKPOINT_SHUTDOWN record. If this is not a shutdown checkpoint,
                               7823                 :                :      * then this might not be very prompt at all: the XLOG_CHECKPOINT_REDO
                               7824                 :                :      * record was written before we began flushing data to disk, and that
                               7825                 :                :      * could be many minutes ago at this point. However, we don't XLogFlush()
                               7826                 :                :      * after inserting that record, so we're not guaranteed that it's on disk
                               7827                 :                :      * until after the above call that flushes the XLOG_CHECKPOINT_ONLINE
                               7828                 :                :      * record.
                               7829                 :                :      */
  632 heikki.linnakangas@i     7830                 :           1713 :     WakeupWalSummarizer();
                               7831                 :                : 
                               7832                 :                :     /*
                               7833                 :                :      * Let smgr do post-checkpoint cleanup (eg, deleting old files).
                               7834                 :                :      */
 2670 tmunro@postgresql.or     7835                 :           1713 :     SyncPostCheckpoint();
                               7836                 :                : 
                               7837                 :                :     /*
                               7838                 :                :      * Update the average distance between checkpoints if the prior checkpoint
                               7839                 :                :      * exists.
                               7840                 :                :      */
  262 alvherre@kurilemu.de     7841         [ +  - ]:           1713 :     if (XLogRecPtrIsValid(PriorRedoPtr))
 4171 heikki.linnakangas@i     7842                 :           1713 :         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
                               7843                 :                : 
  407 akorotkov@postgresql     7844                 :           1713 :     INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
                               7845                 :                : 
                               7846                 :                :     /*
                               7847                 :                :      * Delete old log files, those no longer needed for last checkpoint to
                               7848                 :                :      * prevent the disk holding the xlog from growing full.
                               7849                 :                :      */
 2924 michael@paquier.xyz      7850                 :           1713 :     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7851                 :           1713 :     KeepLogSeg(recptr, &_logSegNo);
  522 akapila@postgresql.o     7852         [ +  + ]:           1713 :     if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
                               7853                 :                :                                            _logSegNo, InvalidOid,
                               7854                 :                :                                            InvalidTransactionId))
                               7855                 :                :     {
                               7856                 :                :         /*
                               7857                 :                :          * Some slots have been invalidated; recalculate the old-segment
                               7858                 :                :          * horizon, starting again from RedoRecPtr.
                               7859                 :                :          */
 1836 alvherre@alvh.no-ip.     7860                 :              4 :         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7861                 :              4 :         KeepLogSeg(recptr, &_logSegNo);
                               7862                 :                :     }
 2924 michael@paquier.xyz      7863                 :           1713 :     _logSegNo--;
 1724 rhaas@postgresql.org     7864                 :           1713 :     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
                               7865                 :                :                        checkPoint.ThisTimeLineID);
                               7866                 :                : 
                               7867                 :                :     /*
                               7868                 :                :      * Make more log segments if needed.  (Do this after recycling old log
                               7869                 :                :      * segments, since that may supply some of the needed files.)
                               7870                 :                :      */
 9266 tgl@sss.pgh.pa.us        7871         [ +  + ]:           1713 :     if (!shutdown)
 1724 rhaas@postgresql.org     7872                 :           1000 :         PreallocXlogFiles(recptr, checkPoint.ThisTimeLineID);
                               7873                 :                : 
                               7874                 :                :     /*
                               7875                 :                :      * Truncate pg_subtrans if possible.  We can throw away all data before
                               7876                 :                :      * the oldest XMIN of any running transaction.  No future transaction will
                               7877                 :                :      * attempt to reference any pg_subtrans entry older than that (see Asserts
                               7878                 :                :      * in subtrans.c).  During recovery, though, we mustn't do this because
                               7879                 :                :      * StartupSUBTRANS hasn't been called yet.
                               7880                 :                :      */
 6239 tgl@sss.pgh.pa.us        7881         [ +  + ]:           1713 :     if (!RecoveryInProgress())
 2174 andres@anarazel.de       7882                 :           1682 :         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
                               7883                 :                : 
                               7884                 :                :     /* Real work is done; log and update stats. */
  157 fujii@postgresql.org     7885                 :           1713 :     LogCheckpointEnd(false, flags);
                               7886                 :                : 
                               7887                 :                :     /* Reset the process title */
 2050 michael@paquier.xyz      7888                 :           1713 :     update_checkpoint_display(flags, false, true);
                               7889                 :                : 
                               7890                 :                :     TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
                               7891                 :                :                                      NBuffers,
                               7892                 :                :                                      CheckpointStats.ckpt_segs_added,
                               7893                 :                :                                      CheckpointStats.ckpt_segs_removed,
                               7894                 :                :                                      CheckpointStats.ckpt_segs_recycled);
                               7895                 :                : 
  664 fujii@postgresql.org     7896                 :           1713 :     return true;
                               7897                 :                : }
                               7898                 :                : 
                               7899                 :                : /*
                               7900                 :                :  * Mark the end of recovery in WAL though without running a full checkpoint.
                               7901                 :                :  * We can expect that a restartpoint is likely to be in progress as we
                               7902                 :                :  * do this, though we are unwilling to wait for it to complete.
                               7903                 :                :  *
                               7904                 :                :  * CreateRestartPoint() allows for the case where recovery may end before
                               7905                 :                :  * the restartpoint completes so there is no concern of concurrent behaviour.
                               7906                 :                :  */
                               7907                 :                : static void
 4926 simon@2ndQuadrant.co     7908                 :             53 : CreateEndOfRecoveryRecord(void)
                               7909                 :                : {
                               7910                 :                :     xl_end_of_recovery xlrec;
                               7911                 :                :     XLogRecPtr  recptr;
                               7912                 :                : 
                               7913                 :                :     /* sanity check */
                               7914         [ -  + ]:             53 :     if (!RecoveryInProgress())
 4926 simon@2ndQuadrant.co     7915         [ #  # ]:UBC           0 :         elog(ERROR, "can only be used to end recovery");
                               7916                 :                : 
 4237 heikki.linnakangas@i     7917                 :CBC          53 :     xlrec.end_time = GetCurrentTimestamp();
  738 rhaas@postgresql.org     7918                 :             53 :     xlrec.wal_level = wal_level;
                               7919                 :                : 
 4510 heikki.linnakangas@i     7920                 :             53 :     WALInsertLockAcquireExclusive();
 1719 rhaas@postgresql.org     7921                 :             53 :     xlrec.ThisTimeLineID = XLogCtl->InsertTimeLineID;
 4913 heikki.linnakangas@i     7922                 :             53 :     xlrec.PrevTimeLineID = XLogCtl->PrevTimeLineID;
 4510                          7923                 :             53 :     WALInsertLockRelease();
                               7924                 :                : 
 4926 simon@2ndQuadrant.co     7925                 :             53 :     START_CRIT_SECTION();
                               7926                 :                : 
 4266 heikki.linnakangas@i     7927                 :             53 :     XLogBeginInsert();
  530 peter@eisentraut.org     7928                 :             53 :     XLogRegisterData(&xlrec, sizeof(xl_end_of_recovery));
 4266 heikki.linnakangas@i     7929                 :             53 :     recptr = XLogInsert(RM_XLOG_ID, XLOG_END_OF_RECOVERY);
                               7930                 :                : 
 4924 simon@2ndQuadrant.co     7931                 :             53 :     XLogFlush(recptr);
                               7932                 :                : 
                               7933                 :                :     /*
                               7934                 :                :      * Update the control file so that crash recovery can follow the timeline
                               7935                 :                :      * changes to this point.
                               7936                 :                :      */
                               7937                 :             53 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               7938                 :             53 :     ControlFile->minRecoveryPoint = recptr;
 1724 rhaas@postgresql.org     7939                 :             53 :     ControlFile->minRecoveryPointTLI = xlrec.ThisTimeLineID;
                               7940                 :                : 
                               7941                 :                :     /* start with the latest checksum version (as of the end of recovery) */
  114 dgustafsson@postgres     7942                 :             53 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7943                 :             53 :     ControlFile->data_checksum_version = XLogCtl->data_checksum_version;
                               7944                 :             53 :     SpinLockRelease(&XLogCtl->info_lck);
                               7945                 :                : 
 4924 simon@2ndQuadrant.co     7946                 :             53 :     UpdateControlFile();
                               7947                 :             53 :     LWLockRelease(ControlFileLock);
                               7948                 :                : 
 4926                          7949         [ -  + ]:             53 :     END_CRIT_SECTION();
                               7950                 :             53 : }
                               7951                 :                : 
                               7952                 :                : /*
                               7953                 :                :  * Write an OVERWRITE_CONTRECORD message.
                               7954                 :                :  *
                               7955                 :                :  * When on WAL replay we expect a continuation record at the start of a page
                               7956                 :                :  * that is not there, recovery ends and WAL writing resumes at that point.
                               7957                 :                :  * But it's wrong to resume writing new WAL back at the start of the record
                               7958                 :                :  * that was broken, because downstream consumers of that WAL (physical
                               7959                 :                :  * replicas) are not prepared to "rewind".  So the first action after
                               7960                 :                :  * finishing replay of all valid WAL must be to write a record of this type
                               7961                 :                :  * at the point where the contrecord was missing; to support xlogreader
                               7962                 :                :  * detecting the special case, XLP_FIRST_IS_OVERWRITE_CONTRECORD is also added
                               7963                 :                :  * to the page header where the record occurs.  xlogreader has an ad-hoc
                               7964                 :                :  * mechanism to report metadata about the broken record, which is what we
                               7965                 :                :  * use here.
                               7966                 :                :  *
                               7967                 :                :  * At replay time, XLP_FIRST_IS_OVERWRITE_CONTRECORD instructs xlogreader to
                               7968                 :                :  * skip the record it was reading, and pass back the LSN of the skipped
                               7969                 :                :  * record, so that its caller can verify (on "replay" of that record) that the
                               7970                 :                :  * XLOG_OVERWRITE_CONTRECORD matches what was effectively overwritten.
                               7971                 :                :  *
                               7972                 :                :  * 'aborted_lsn' is the beginning position of the record that was incomplete.
                               7973                 :                :  * It is included in the WAL record.  'pagePtr' and 'newTLI' point to the
                               7974                 :                :  * beginning of the XLOG page where the record is to be inserted.  They must
                               7975                 :                :  * match the current WAL insert position, they're passed here just so that we
                               7976                 :                :  * can verify that.
                               7977                 :                :  */
                               7978                 :                : static XLogRecPtr
 1621 heikki.linnakangas@i     7979                 :             11 : CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn, XLogRecPtr pagePtr,
                               7980                 :                :                                 TimeLineID newTLI)
                               7981                 :                : {
                               7982                 :                :     xl_overwrite_contrecord xlrec;
                               7983                 :                :     XLogRecPtr  recptr;
                               7984                 :                :     XLogPageHeader pagehdr;
                               7985                 :                :     XLogRecPtr  startPos;
                               7986                 :                : 
                               7987                 :                :     /* sanity checks */
 1761 alvherre@alvh.no-ip.     7988         [ -  + ]:             11 :     if (!RecoveryInProgress())
 1761 alvherre@alvh.no-ip.     7989         [ #  # ]:UBC           0 :         elog(ERROR, "can only be used at end of recovery");
 1621 heikki.linnakangas@i     7990         [ -  + ]:CBC          11 :     if (pagePtr % XLOG_BLCKSZ != 0)
  384 alvherre@kurilemu.de     7991         [ #  # ]:UBC           0 :         elog(ERROR, "invalid position for missing continuation record %X/%08X",
                               7992                 :                :              LSN_FORMAT_ARGS(pagePtr));
                               7993                 :                : 
                               7994                 :                :     /* The current WAL insert position should be right after the page header */
 1621 heikki.linnakangas@i     7995                 :CBC          11 :     startPos = pagePtr;
                               7996         [ +  + ]:             11 :     if (XLogSegmentOffset(startPos, wal_segment_size) == 0)
                               7997                 :              1 :         startPos += SizeOfXLogLongPHD;
                               7998                 :                :     else
                               7999                 :             10 :         startPos += SizeOfXLogShortPHD;
                               8000                 :             11 :     recptr = GetXLogInsertRecPtr();
                               8001         [ -  + ]:             11 :     if (recptr != startPos)
  384 alvherre@kurilemu.de     8002         [ #  # ]:UBC           0 :         elog(ERROR, "invalid WAL insert position %X/%08X for OVERWRITE_CONTRECORD",
                               8003                 :                :              LSN_FORMAT_ARGS(recptr));
                               8004                 :                : 
 1761 alvherre@alvh.no-ip.     8005                 :CBC          11 :     START_CRIT_SECTION();
                               8006                 :                : 
                               8007                 :                :     /*
                               8008                 :                :      * Initialize the XLOG page header (by GetXLogBuffer), and set the
                               8009                 :                :      * XLP_FIRST_IS_OVERWRITE_CONTRECORD flag.
                               8010                 :                :      *
                               8011                 :                :      * No other backend is allowed to write WAL yet, so acquiring the WAL
                               8012                 :                :      * insertion lock is just pro forma.
                               8013                 :                :      */
 1621 heikki.linnakangas@i     8014                 :             11 :     WALInsertLockAcquire();
                               8015                 :             11 :     pagehdr = (XLogPageHeader) GetXLogBuffer(pagePtr, newTLI);
                               8016                 :             11 :     pagehdr->xlp_info |= XLP_FIRST_IS_OVERWRITE_CONTRECORD;
                               8017                 :             11 :     WALInsertLockRelease();
                               8018                 :                : 
                               8019                 :                :     /*
                               8020                 :                :      * Insert the XLOG_OVERWRITE_CONTRECORD record as the first record on the
                               8021                 :                :      * page.  We know it becomes the first record, because no other backend is
                               8022                 :                :      * allowed to write WAL yet.
                               8023                 :                :      */
 1761 alvherre@alvh.no-ip.     8024                 :             11 :     XLogBeginInsert();
 1621 heikki.linnakangas@i     8025                 :             11 :     xlrec.overwritten_lsn = aborted_lsn;
                               8026                 :             11 :     xlrec.overwrite_time = GetCurrentTimestamp();
  530 peter@eisentraut.org     8027                 :             11 :     XLogRegisterData(&xlrec, sizeof(xl_overwrite_contrecord));
 1761 alvherre@alvh.no-ip.     8028                 :             11 :     recptr = XLogInsert(RM_XLOG_ID, XLOG_OVERWRITE_CONTRECORD);
                               8029                 :                : 
                               8030                 :                :     /* check that the record was inserted to the right place */
 1621 heikki.linnakangas@i     8031         [ -  + ]:             11 :     if (ProcLastRecPtr != startPos)
  384 alvherre@kurilemu.de     8032         [ #  # ]:UBC           0 :         elog(ERROR, "OVERWRITE_CONTRECORD was inserted to unexpected position %X/%08X",
                               8033                 :                :              LSN_FORMAT_ARGS(ProcLastRecPtr));
                               8034                 :                : 
 1761 alvherre@alvh.no-ip.     8035                 :CBC          11 :     XLogFlush(recptr);
                               8036                 :                : 
                               8037         [ -  + ]:             11 :     END_CRIT_SECTION();
                               8038                 :                : 
                               8039                 :             11 :     return recptr;
                               8040                 :                : }
                               8041                 :                : 
                               8042                 :                : /*
                               8043                 :                :  * Flush all data in shared memory to disk, and fsync
                               8044                 :                :  *
                               8045                 :                :  * This is the common code shared between regular checkpoints and
                               8046                 :                :  * recovery restartpoints.
                               8047                 :                :  */
                               8048                 :                : static void
 6968 tgl@sss.pgh.pa.us        8049                 :           1930 : CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
                               8050                 :                : {
 6013                          8051                 :           1930 :     CheckPointRelationMap();
 1046 akapila@postgresql.o     8052                 :           1930 :     CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 4528 rhaas@postgresql.org     8053                 :           1930 :     CheckPointSnapBuild();
                               8054                 :           1930 :     CheckPointLogicalRewriteHeap();
 4106 andres@anarazel.de       8055                 :           1930 :     CheckPointReplicationOrigin();
                               8056                 :                : 
                               8057                 :                :     /* Write out all dirty data in SLRUs and the main buffer pool */
                               8058                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
 2130 tmunro@postgresql.or     8059                 :           1930 :     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
                               8060                 :           1930 :     CheckPointCLOG();
                               8061                 :           1930 :     CheckPointCommitTs();
                               8062                 :           1930 :     CheckPointSUBTRANS();
                               8063                 :           1930 :     CheckPointMultiXact();
                               8064                 :           1930 :     CheckPointPredicate();
                               8065                 :           1930 :     CheckPointBuffers(flags);
                               8066                 :                : 
                               8067                 :                :     /* Perform all queued up fsyncs */
                               8068                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
                               8069                 :           1930 :     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
                               8070                 :           1930 :     ProcessSyncRequests();
                               8071                 :           1930 :     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
                               8072                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
                               8073                 :                : 
                               8074                 :                :     /* We deliberately delay 2PC checkpointing as long as possible */
 7293 tgl@sss.pgh.pa.us        8075                 :           1930 :     CheckPointTwoPhase(checkPointRedo);
                               8076                 :           1930 : }
                               8077                 :                : 
                               8078                 :                : /*
                               8079                 :                :  * Save a checkpoint for recovery restart if appropriate
                               8080                 :                :  *
                               8081                 :                :  * This function is called each time a checkpoint record is read from XLOG.
                               8082                 :                :  * It must determine whether the checkpoint represents a safe restartpoint or
                               8083                 :                :  * not.  If so, the checkpoint record is stashed in shared memory so that
                               8084                 :                :  * CreateRestartPoint can consult it.  (Note that the latter function is
                               8085                 :                :  * executed by the checkpointer, while this one will be executed by the
                               8086                 :                :  * startup process.)
                               8087                 :                :  */
                               8088                 :                : static void
 1705 rhaas@postgresql.org     8089                 :            759 : RecoveryRestartPoint(const CheckPoint *checkPoint, XLogReaderState *record)
                               8090                 :                : {
                               8091                 :                :     /*
                               8092                 :                :      * Also refrain from creating a restartpoint if we have seen any
                               8093                 :                :      * references to non-existent pages. Restarting recovery from the
                               8094                 :                :      * restartpoint would not see the references, so we would lose the
                               8095                 :                :      * cross-check that the pages belonged to a relation that was dropped
                               8096                 :                :      * later.
                               8097                 :                :      */
 5350 heikki.linnakangas@i     8098         [ -  + ]:            759 :     if (XLogHaveInvalidPages())
                               8099                 :                :     {
  958 michael@paquier.xyz      8100         [ #  # ]:UBC           0 :         elog(DEBUG2,
                               8101                 :                :              "could not record restart point at %X/%08X because there are unresolved references to invalid pages",
                               8102                 :                :              LSN_FORMAT_ARGS(checkPoint->redo));
 5350 heikki.linnakangas@i     8103                 :              0 :         return;
                               8104                 :                :     }
                               8105                 :                : 
                               8106                 :                :     /*
                               8107                 :                :      * Copy the checkpoint record to shared memory, so that checkpointer can
                               8108                 :                :      * work out the next time it wants to perform a restartpoint.
                               8109                 :                :      */
 4325 andres@anarazel.de       8110                 :CBC         759 :     SpinLockAcquire(&XLogCtl->info_lck);
 1705 rhaas@postgresql.org     8111                 :            759 :     XLogCtl->lastCheckPointRecPtr = record->ReadRecPtr;
                               8112                 :            759 :     XLogCtl->lastCheckPointEndPtr = record->EndRecPtr;
 4325 andres@anarazel.de       8113                 :            759 :     XLogCtl->lastCheckPoint = *checkPoint;
                               8114                 :            759 :     SpinLockRelease(&XLogCtl->info_lck);
                               8115                 :                : }
                               8116                 :                : 
                               8117                 :                : /*
                               8118                 :                :  * Establish a restartpoint if possible.
                               8119                 :                :  *
                               8120                 :                :  * This is similar to CreateCheckPoint, but is used during WAL recovery
                               8121                 :                :  * to establish a point from which recovery can roll forward without
                               8122                 :                :  * replaying the entire recovery log.
                               8123                 :                :  *
                               8124                 :                :  * Returns true if a new restartpoint was established. We can only establish
                               8125                 :                :  * a restartpoint if we have replayed a safe checkpoint record since last
                               8126                 :                :  * restartpoint.
                               8127                 :                :  */
                               8128                 :                : bool
 6367 heikki.linnakangas@i     8129                 :            670 : CreateRestartPoint(int flags)
                               8130                 :                : {
                               8131                 :                :     XLogRecPtr  lastCheckPointRecPtr;
                               8132                 :                :     XLogRecPtr  lastCheckPointEndPtr;
                               8133                 :                :     CheckPoint  lastCheckPoint;
                               8134                 :                :     XLogRecPtr  PriorRedoPtr;
                               8135                 :                :     XLogRecPtr  receivePtr;
                               8136                 :                :     XLogRecPtr  replayPtr;
                               8137                 :                :     TimeLineID  replayTLI;
                               8138                 :                :     XLogRecPtr  endptr;
                               8139                 :                :     XLogSegNo   _logSegNo;
                               8140                 :                :     TimestampTz xtime;
                               8141                 :                : 
                               8142                 :                :     /* Concurrent checkpoint/restartpoint cannot happen */
 1539 michael@paquier.xyz      8143   [ +  -  -  + ]:            670 :     Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
                               8144                 :                : 
                               8145                 :                :     /* Get a local copy of the last safe checkpoint record. */
 4325 andres@anarazel.de       8146                 :            670 :     SpinLockAcquire(&XLogCtl->info_lck);
                               8147                 :            670 :     lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr;
 3559 rhaas@postgresql.org     8148                 :            670 :     lastCheckPointEndPtr = XLogCtl->lastCheckPointEndPtr;
 4325 andres@anarazel.de       8149                 :            670 :     lastCheckPoint = XLogCtl->lastCheckPoint;
                               8150                 :            670 :     SpinLockRelease(&XLogCtl->info_lck);
                               8151                 :                : 
                               8152                 :                :     /*
                               8153                 :                :      * Check that we're still in recovery mode. It's ok if we exit recovery
                               8154                 :                :      * mode after this check, the restart point is valid anyway.
                               8155                 :                :      */
 6367 heikki.linnakangas@i     8156         [ -  + ]:            670 :     if (!RecoveryInProgress())
                               8157                 :                :     {
 6367 heikki.linnakangas@i     8158         [ #  # ]:UBC           0 :         ereport(DEBUG2,
                               8159                 :                :                 (errmsg_internal("skipping restartpoint, recovery has already ended")));
                               8160                 :              0 :         return false;
                               8161                 :                :     }
                               8162                 :                : 
                               8163                 :                :     /*
                               8164                 :                :      * If the last checkpoint record we've replayed is already our last
                               8165                 :                :      * restartpoint, we can't perform a new restart point. We still update
                               8166                 :                :      * minRecoveryPoint in that case, so that if this is a shutdown restart
                               8167                 :                :      * point, we won't start up earlier than before. That's not strictly
                               8168                 :                :      * necessary, but when hot standby is enabled, it would be rather weird if
                               8169                 :                :      * the database opened up for read-only connections at a point-in-time
                               8170                 :                :      * before the last shutdown. Such time travel is still possible in case of
                               8171                 :                :      * immediate shutdown, though.
                               8172                 :                :      *
                               8173                 :                :      * We don't explicitly advance minRecoveryPoint when we do create a
                               8174                 :                :      * restartpoint. It's assumed that flushing the buffers will do that as a
                               8175                 :                :      * side-effect.
                               8176                 :                :      */
  262 alvherre@kurilemu.de     8177         [ +  + ]:CBC         670 :     if (!XLogRecPtrIsValid(lastCheckPointRecPtr) ||
 4958 alvherre@alvh.no-ip.     8178         [ +  + ]:            338 :         lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
                               8179                 :                :     {
 6367 heikki.linnakangas@i     8180         [ -  + ]:            453 :         ereport(DEBUG2,
                               8181                 :                :                 errmsg_internal("skipping restartpoint, already performed at %X/%08X",
                               8182                 :                :                                 LSN_FORMAT_ARGS(lastCheckPoint.redo)));
                               8183                 :                : 
                               8184                 :            453 :         UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
 5897 rhaas@postgresql.org     8185         [ +  + ]:            453 :         if (flags & CHECKPOINT_IS_SHUTDOWN)
                               8186                 :                :         {
                               8187                 :             40 :             LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               8188                 :             40 :             ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                               8189                 :             40 :             UpdateControlFile();
                               8190                 :             40 :             LWLockRelease(ControlFileLock);
                               8191                 :                :         }
 6367 heikki.linnakangas@i     8192                 :            453 :         return false;
                               8193                 :                :     }
                               8194                 :                : 
                               8195                 :                :     /*
                               8196                 :                :      * Update the shared RedoRecPtr so that the startup process can calculate
                               8197                 :                :      * the number of segments replayed since last restartpoint, and request a
                               8198                 :                :      * restartpoint if it exceeds CheckPointSegments.
                               8199                 :                :      *
                               8200                 :                :      * Like in CreateCheckPoint(), hold off insertions to update it, although
                               8201                 :                :      * during recovery this is just pro forma, because no WAL insertions are
                               8202                 :                :      * happening.
                               8203                 :                :      */
 4510                          8204                 :            217 :     WALInsertLockAcquireExclusive();
 4171                          8205                 :            217 :     RedoRecPtr = XLogCtl->Insert.RedoRecPtr = lastCheckPoint.redo;
 4510                          8206                 :            217 :     WALInsertLockRelease();
                               8207                 :                : 
                               8208                 :                :     /* Also update the info_lck-protected copy */
 4325 andres@anarazel.de       8209                 :            217 :     SpinLockAcquire(&XLogCtl->info_lck);
                               8210                 :            217 :     XLogCtl->RedoRecPtr = lastCheckPoint.redo;
                               8211                 :            217 :     SpinLockRelease(&XLogCtl->info_lck);
                               8212                 :                : 
                               8213                 :                :     /*
                               8214                 :                :      * Prepare to accumulate statistics.
                               8215                 :                :      *
                               8216                 :                :      * Note: because it is possible for log_checkpoints to change while a
                               8217                 :                :      * checkpoint proceeds, we always accumulate stats, even if
                               8218                 :                :      * log_checkpoints is currently off.
                               8219                 :                :      */
 5653 rhaas@postgresql.org     8220   [ +  -  +  -  :           2387 :     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
                                     +  -  +  -  +  
                                                 + ]
                               8221                 :            217 :     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
                               8222                 :                : 
                               8223         [ +  - ]:            217 :     if (log_checkpoints)
 6367 heikki.linnakangas@i     8224                 :            217 :         LogCheckpointStart(flags, true);
                               8225                 :                : 
                               8226                 :                :     /* Update the process title */
 2050 michael@paquier.xyz      8227                 :            217 :     update_checkpoint_display(flags, true, false);
                               8228                 :                : 
 6367 heikki.linnakangas@i     8229                 :            217 :     CheckPointGuts(lastCheckPoint.redo, flags);
                               8230                 :                : 
                               8231                 :                :     /*
                               8232                 :                :      * This location needs to be after CheckPointGuts() to ensure that some
                               8233                 :                :      * work has already happened during this checkpoint.
                               8234                 :                :      */
  442 michael@paquier.xyz      8235                 :            217 :     INJECTION_POINT("create-restart-point", NULL);
                               8236                 :                : 
                               8237                 :                :     /*
                               8238                 :                :      * Remember the prior checkpoint's redo ptr for
                               8239                 :                :      * UpdateCheckPointDistanceEstimate()
                               8240                 :                :      */
 4171 heikki.linnakangas@i     8241                 :            217 :     PriorRedoPtr = ControlFile->checkPointCopy.redo;
                               8242                 :                : 
                               8243                 :                :     /*
                               8244                 :                :      * Update pg_control, using current time.  Check that it still shows an
                               8245                 :                :      * older checkpoint, else do nothing; this is a quick hack to make sure
                               8246                 :                :      * nothing really bad happens if somehow we get here after the
                               8247                 :                :      * end-of-recovery checkpoint.
                               8248                 :                :      */
 6367                          8249                 :            217 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1539 michael@paquier.xyz      8250         [ +  - ]:            217 :     if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
                               8251                 :                :     {
                               8252                 :                :         /*
                               8253                 :                :          * Update the checkpoint information.  We do this even if the cluster
                               8254                 :                :          * does not show DB_IN_ARCHIVE_RECOVERY to match with the set of WAL
                               8255                 :                :          * segments recycled below.
                               8256                 :                :          */
 6239 tgl@sss.pgh.pa.us        8257                 :            217 :         ControlFile->checkPoint = lastCheckPointRecPtr;
                               8258                 :            217 :         ControlFile->checkPointCopy = lastCheckPoint;
                               8259                 :                : 
                               8260                 :                :         /*
                               8261                 :                :          * Ensure minRecoveryPoint is past the checkpoint record and update it
                               8262                 :                :          * if the control file still shows DB_IN_ARCHIVE_RECOVERY.  Normally,
                               8263                 :                :          * this will have happened already while writing out dirty buffers,
                               8264                 :                :          * but not necessarily - e.g. because no buffers were dirtied.  We do
                               8265                 :                :          * this because a backup performed in recovery uses minRecoveryPoint
                               8266                 :                :          * to determine which WAL files must be included in the backup, and
                               8267                 :                :          * the file (or files) containing the checkpoint record must be
                               8268                 :                :          * included, at a minimum.  Note that for an ordinary restart of
                               8269                 :                :          * recovery there's no value in having the minimum recovery point any
                               8270                 :                :          * earlier than this anyway, because redo will begin just after the
                               8271                 :                :          * checkpoint record.
                               8272                 :                :          */
 1539 michael@paquier.xyz      8273         [ +  + ]:            217 :         if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
                               8274                 :                :         {
                               8275         [ +  + ]:            216 :             if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
                               8276                 :                :             {
                               8277                 :             19 :                 ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
                               8278                 :             19 :                 ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
                               8279                 :                : 
                               8280                 :                :                 /* update local copy */
                               8281                 :             19 :                 LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               8282                 :             19 :                 LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               8283                 :                :             }
                               8284         [ +  + ]:            216 :             if (flags & CHECKPOINT_IS_SHUTDOWN)
                               8285                 :             23 :                 ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                               8286                 :                :         }
                               8287                 :                : 
                               8288                 :                :         /* we shall start with the latest checksum version */
  114 dgustafsson@postgres     8289                 :            217 :         ControlFile->data_checksum_version = lastCheckPoint.dataChecksumState;
                               8290                 :                : 
 6239 tgl@sss.pgh.pa.us        8291                 :            217 :         UpdateControlFile();
                               8292                 :                :     }
 6367 heikki.linnakangas@i     8293                 :            217 :     LWLockRelease(ControlFileLock);
                               8294                 :                : 
                               8295                 :                :     /*
                               8296                 :                :      * Update the average distance between checkpoints/restartpoints if the
                               8297                 :                :      * prior checkpoint exists.
                               8298                 :                :      */
  262 alvherre@kurilemu.de     8299         [ +  - ]:            217 :     if (XLogRecPtrIsValid(PriorRedoPtr))
 4171 heikki.linnakangas@i     8300                 :            217 :         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
                               8301                 :                : 
                               8302                 :                :     /*
                               8303                 :                :      * Delete old log files, those no longer needed for last restartpoint to
                               8304                 :                :      * prevent the disk holding the xlog from growing full.
                               8305                 :                :      */
 2924 michael@paquier.xyz      8306                 :            217 :     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               8307                 :                : 
                               8308                 :                :     /*
                               8309                 :                :      * Retreat _logSegNo using the current end of xlog replayed or received,
                               8310                 :                :      * whichever is later.
                               8311                 :                :      */
 2300 tmunro@postgresql.or     8312                 :            217 :     receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
 2924 michael@paquier.xyz      8313                 :            217 :     replayPtr = GetXLogReplayRecPtr(&replayTLI);
                               8314                 :            217 :     endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
                               8315                 :            217 :     KeepLogSeg(endptr, &_logSegNo);
                               8316                 :                : 
  180 akapila@postgresql.o     8317                 :            217 :     INJECTION_POINT("restartpoint-before-slot-invalidation", NULL);
                               8318                 :                : 
  522                          8319         [ +  + ]:            217 :     if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
                               8320                 :                :                                            _logSegNo, InvalidOid,
                               8321                 :                :                                            InvalidTransactionId))
                               8322                 :                :     {
                               8323                 :                :         /*
                               8324                 :                :          * Some slots have been invalidated; recalculate the old-segment
                               8325                 :                :          * horizon, starting again from RedoRecPtr.
                               8326                 :                :          */
 1836 alvherre@alvh.no-ip.     8327                 :              1 :         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               8328                 :              1 :         KeepLogSeg(endptr, &_logSegNo);
                               8329                 :                :     }
 2924 michael@paquier.xyz      8330                 :            217 :     _logSegNo--;
                               8331                 :                : 
                               8332                 :                :     /*
                               8333                 :                :      * Try to recycle segments on a useful timeline. If we've been promoted
                               8334                 :                :      * since the beginning of this restartpoint, use the new timeline chosen
                               8335                 :                :      * at end of recovery.  If we're still in recovery, use the timeline we're
                               8336                 :                :      * currently replaying.
                               8337                 :                :      *
                               8338                 :                :      * There is no guarantee that the WAL segments will be useful on the
                               8339                 :                :      * current timeline; if recovery proceeds to a new timeline right after
                               8340                 :                :      * this, the pre-allocated WAL segments on this timeline will not be used,
                               8341                 :                :      * and will go wasted until recycled on the next restartpoint. We'll live
                               8342                 :                :      * with that.
                               8343                 :                :      */
 1724 rhaas@postgresql.org     8344         [ +  + ]:            217 :     if (!RecoveryInProgress())
 1719                          8345                 :              1 :         replayTLI = XLogCtl->InsertTimeLineID;
                               8346                 :                : 
 1724                          8347                 :            217 :     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr, replayTLI);
                               8348                 :                : 
                               8349                 :                :     /*
                               8350                 :                :      * Make more log segments if needed.  (Do this after recycling old log
                               8351                 :                :      * segments, since that may supply some of the needed files.)
                               8352                 :                :      */
                               8353                 :            217 :     PreallocXlogFiles(endptr, replayTLI);
                               8354                 :                : 
                               8355                 :                :     /*
                               8356                 :                :      * Truncate pg_subtrans if possible.  We can throw away all data before
                               8357                 :                :      * the oldest XMIN of any running transaction.  No future transaction will
                               8358                 :                :      * attempt to reference any pg_subtrans entry older than that (see Asserts
                               8359                 :                :      * in subtrans.c).  When hot standby is disabled, though, we mustn't do
                               8360                 :                :      * this because StartupSUBTRANS hasn't been called yet.
                               8361                 :                :      */
 5809 simon@2ndQuadrant.co     8362         [ +  - ]:            217 :     if (EnableHotStandby)
 2174 andres@anarazel.de       8363                 :            217 :         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
                               8364                 :                : 
                               8365                 :                :     /* Real work is done; log and update stats. */
  157 fujii@postgresql.org     8366                 :            217 :     LogCheckpointEnd(true, flags);
                               8367                 :                : 
                               8368                 :                :     /* Reset the process title */
 2050 michael@paquier.xyz      8369                 :            217 :     update_checkpoint_display(flags, true, true);
                               8370                 :                : 
 5867 tgl@sss.pgh.pa.us        8371                 :            217 :     xtime = GetLatestXTime();
 6367 heikki.linnakangas@i     8372   [ +  -  +  -  :            217 :     ereport((log_checkpoints ? LOG : DEBUG2),
                                              +  + ]
                               8373                 :                :             errmsg("recovery restart point at %X/%08X",
                               8374                 :                :                    LSN_FORMAT_ARGS(lastCheckPoint.redo)),
                               8375                 :                :             xtime ? errdetail("Last completed transaction was at log time %s.",
                               8376                 :                :                               timestamptz_to_str(xtime)) : 0);
                               8377                 :                : 
                               8378                 :                :     /*
                               8379                 :                :      * Finally, execute archive_cleanup_command, if any.
                               8380                 :                :      */
 2800 peter_e@gmx.net          8381   [ +  -  -  + ]:            217 :     if (archiveCleanupCommand && strcmp(archiveCleanupCommand, "") != 0)
 1266 michael@paquier.xyz      8382                 :UBC           0 :         ExecuteRecoveryCommand(archiveCleanupCommand,
                               8383                 :                :                                "archive_cleanup_command",
                               8384                 :                :                                false,
                               8385                 :                :                                WAIT_EVENT_ARCHIVE_CLEANUP_COMMAND);
                               8386                 :                : 
 6367 heikki.linnakangas@i     8387                 :CBC         217 :     return true;
                               8388                 :                : }
                               8389                 :                : 
                               8390                 :                : /*
                               8391                 :                :  * Report availability of WAL for the given target LSN
                               8392                 :                :  *      (typically a slot's restart_lsn)
                               8393                 :                :  *
                               8394                 :                :  * Returns one of the following enum values:
                               8395                 :                :  *
                               8396                 :                :  * * WALAVAIL_RESERVED means targetLSN is available and it is in the range of
                               8397                 :                :  *   max_wal_size.
                               8398                 :                :  *
                               8399                 :                :  * * WALAVAIL_EXTENDED means it is still available by preserving extra
                               8400                 :                :  *   segments beyond max_wal_size. If max_slot_wal_keep_size is smaller
                               8401                 :                :  *   than max_wal_size, this state is not returned.
                               8402                 :                :  *
                               8403                 :                :  * * WALAVAIL_UNRESERVED means it is being lost and the next checkpoint will
                               8404                 :                :  *   remove reserved segments. The walsender using this slot may return to the
                               8405                 :                :  *   above.
                               8406                 :                :  *
                               8407                 :                :  * * WALAVAIL_REMOVED means it has been removed. A replication stream on
                               8408                 :                :  *   a slot with this LSN cannot continue.  (Any associated walsender
                               8409                 :                :  *   processes should have been terminated already.)
                               8410                 :                :  *
                               8411                 :                :  * * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL.
                               8412                 :                :  */
                               8413                 :                : WALAvailability
 2301 alvherre@alvh.no-ip.     8414                 :            635 : GetWALAvailability(XLogRecPtr targetLSN)
                               8415                 :                : {
                               8416                 :                :     XLogRecPtr  currpos;        /* current write LSN */
                               8417                 :                :     XLogSegNo   currSeg;        /* segid of currpos */
                               8418                 :                :     XLogSegNo   targetSeg;      /* segid of targetLSN */
                               8419                 :                :     XLogSegNo   oldestSeg;      /* actual oldest segid */
                               8420                 :                :     XLogSegNo   oldestSegMaxWalSize;    /* oldest segid kept by max_wal_size */
                               8421                 :                :     XLogSegNo   oldestSlotSeg;  /* oldest segid kept by slot */
                               8422                 :                :     uint64      keepSegs;
                               8423                 :                : 
                               8424                 :                :     /*
                               8425                 :                :      * slot does not reserve WAL. Either deactivated, or has never been active
                               8426                 :                :      */
  262 alvherre@kurilemu.de     8427         [ +  + ]:            635 :     if (!XLogRecPtrIsValid(targetLSN))
 2301 alvherre@alvh.no-ip.     8428                 :             30 :         return WALAVAIL_INVALID_LSN;
                               8429                 :                : 
                               8430                 :                :     /*
                               8431                 :                :      * Calculate the oldest segment currently reserved by all slots,
                               8432                 :                :      * considering wal_keep_size and max_slot_wal_keep_size.  Initialize
                               8433                 :                :      * oldestSlotSeg to the current segment.
                               8434                 :                :      */
 2204                          8435                 :            605 :     currpos = GetXLogWriteRecPtr();
                               8436                 :            605 :     XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
 2301                          8437                 :            605 :     KeepLogSeg(currpos, &oldestSlotSeg);
                               8438                 :                : 
                               8439                 :                :     /*
                               8440                 :                :      * Find the oldest extant segment file. We get 1 until checkpoint removes
                               8441                 :                :      * the first WAL segment file since startup, which causes the status being
                               8442                 :                :      * wrong under certain abnormal conditions but that doesn't actually harm.
                               8443                 :                :      */
                               8444                 :            605 :     oldestSeg = XLogGetLastRemovedSegno() + 1;
                               8445                 :                : 
                               8446                 :                :     /* calculate oldest segment by max_wal_size */
                               8447                 :            605 :     XLByteToSeg(currpos, currSeg, wal_segment_size);
 2223                          8448                 :            605 :     keepSegs = ConvertToXSegs(max_wal_size_mb, wal_segment_size) + 1;
                               8449                 :                : 
 2301                          8450         [ +  + ]:            605 :     if (currSeg > keepSegs)
                               8451                 :             13 :         oldestSegMaxWalSize = currSeg - keepSegs;
                               8452                 :                :     else
                               8453                 :            592 :         oldestSegMaxWalSize = 1;
                               8454                 :                : 
                               8455                 :                :     /* the segment we care about */
 2204                          8456                 :            605 :     XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
                               8457                 :                : 
                               8458                 :                :     /*
                               8459                 :                :      * No point in returning reserved or extended status values if the
                               8460                 :                :      * targetSeg is known to be lost.
                               8461                 :                :      */
 2223                          8462         [ +  + ]:            605 :     if (targetSeg >= oldestSlotSeg)
                               8463                 :                :     {
                               8464                 :                :         /* show "reserved" when targetSeg is within max_wal_size */
                               8465         [ +  + ]:            604 :         if (targetSeg >= oldestSegMaxWalSize)
 2301                          8466                 :            602 :             return WALAVAIL_RESERVED;
                               8467                 :                : 
                               8468                 :                :         /* being retained by slots exceeding max_wal_size */
 2223                          8469                 :              2 :         return WALAVAIL_EXTENDED;
                               8470                 :                :     }
                               8471                 :                : 
                               8472                 :                :     /* WAL segments are no longer retained but haven't been removed yet */
                               8473         [ +  - ]:              1 :     if (targetSeg >= oldestSeg)
                               8474                 :              1 :         return WALAVAIL_UNRESERVED;
                               8475                 :                : 
                               8476                 :                :     /* Definitely lost */
 2301 alvherre@alvh.no-ip.     8477                 :UBC           0 :     return WALAVAIL_REMOVED;
                               8478                 :                : }
                               8479                 :                : 
                               8480                 :                : 
                               8481                 :                : /*
                               8482                 :                :  * Retreat *logSegNo to the last segment that we need to retain because of
                               8483                 :                :  * either wal_keep_size or replication slots.
                               8484                 :                :  *
                               8485                 :                :  * This is calculated by subtracting wal_keep_size from the given xlog
                               8486                 :                :  * location, recptr and by making sure that that result is below the
                               8487                 :                :  * requirement of replication slots.  For the latter criterion we do consider
                               8488                 :                :  * the effects of max_slot_wal_keep_size: reserve at most that much space back
                               8489                 :                :  * from recptr.
                               8490                 :                :  *
                               8491                 :                :  * Note about replication slots: if this function calculates a value
                               8492                 :                :  * that's further ahead than what slots need reserved, then affected
                               8493                 :                :  * slots need to be invalidated and this function invoked again.
                               8494                 :                :  * XXX it might be a good idea to rewrite this function so that
                               8495                 :                :  * invalidation is optionally done here, instead.
                               8496                 :                :  */
                               8497                 :                : static void
 5145 heikki.linnakangas@i     8498                 :CBC        2540 : KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
                               8499                 :                : {
                               8500                 :                :     XLogSegNo   currSegNo;
                               8501                 :                :     XLogSegNo   segno;
                               8502                 :                :     XLogRecPtr  keep;
                               8503                 :                : 
 2301 alvherre@alvh.no-ip.     8504                 :           2540 :     XLByteToSeg(recptr, currSegNo, wal_segment_size);
                               8505                 :           2540 :     segno = currSegNo;
                               8506                 :                : 
                               8507                 :                :     /* Calculate how many segments are kept by slots. */
                               8508                 :           2540 :     keep = XLogGetReplicationSlotMinimumLSN();
  262 alvherre@kurilemu.de     8509   [ +  +  +  + ]:           2540 :     if (XLogRecPtrIsValid(keep) && keep < recptr)
                               8510                 :                :     {
 2301 alvherre@alvh.no-ip.     8511                 :            768 :         XLByteToSeg(keep, segno, wal_segment_size);
                               8512                 :                : 
                               8513                 :                :         /*
                               8514                 :                :          * Account for max_slot_wal_keep_size to avoid keeping more than
                               8515                 :                :          * configured.  However, don't do that during a binary upgrade: if
                               8516                 :                :          * slots were to be invalidated because of this, it would not be
                               8517                 :                :          * possible to preserve logical ones during the upgrade.
                               8518                 :                :          */
  380 akapila@postgresql.o     8519   [ +  +  +  - ]:            768 :         if (max_slot_wal_keep_size_mb >= 0 && !IsBinaryUpgrade)
                               8520                 :                :         {
                               8521                 :                :             uint64      slot_keep_segs;
                               8522                 :                : 
 2301 alvherre@alvh.no-ip.     8523                 :             26 :             slot_keep_segs =
                               8524                 :             26 :                 ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
                               8525                 :                : 
                               8526         [ +  + ]:             26 :             if (currSegNo - segno > slot_keep_segs)
                               8527                 :              7 :                 segno = currSegNo - slot_keep_segs;
                               8528                 :                :         }
                               8529                 :                :     }
                               8530                 :                : 
                               8531                 :                :     /*
                               8532                 :                :      * If WAL summarization is in use, don't remove WAL that has yet to be
                               8533                 :                :      * summarized.
                               8534                 :                :      */
  761 rhaas@postgresql.org     8535                 :           2540 :     keep = GetOldestUnsummarizedLSN(NULL, NULL);
  262 alvherre@kurilemu.de     8536         [ +  + ]:           2540 :     if (XLogRecPtrIsValid(keep))
                               8537                 :                :     {
                               8538                 :                :         XLogSegNo   unsummarized_segno;
                               8539                 :                : 
  949 rhaas@postgresql.org     8540                 :             12 :         XLByteToSeg(keep, unsummarized_segno, wal_segment_size);
                               8541         [ +  + ]:             12 :         if (unsummarized_segno < segno)
                               8542                 :              9 :             segno = unsummarized_segno;
                               8543                 :                :     }
                               8544                 :                : 
                               8545                 :                :     /* but, keep at least wal_keep_size if that's set */
 2197 fujii@postgresql.org     8546         [ +  + ]:           2540 :     if (wal_keep_size_mb > 0)
                               8547                 :                :     {
                               8548                 :                :         uint64      keep_segs;
                               8549                 :                : 
                               8550                 :             74 :         keep_segs = ConvertToXSegs(wal_keep_size_mb, wal_segment_size);
                               8551         [ +  - ]:             74 :         if (currSegNo - segno < keep_segs)
                               8552                 :                :         {
                               8553                 :                :             /* avoid underflow, don't go below 1 */
                               8554         [ +  + ]:             74 :             if (currSegNo <= keep_segs)
                               8555                 :             70 :                 segno = 1;
                               8556                 :                :             else
                               8557                 :              4 :                 segno = currSegNo - keep_segs;
                               8558                 :                :         }
                               8559                 :                :     }
                               8560                 :                : 
                               8561                 :                :     /* don't delete WAL segments newer than the calculated segment */
 2204 alvherre@alvh.no-ip.     8562         [ +  + ]:           2540 :     if (segno < *logSegNo)
 5145 heikki.linnakangas@i     8563                 :            359 :         *logSegNo = segno;
 5486 simon@2ndQuadrant.co     8564                 :           2540 : }
                               8565                 :                : 
                               8566                 :                : /*
                               8567                 :                :  * Write a NEXTOID log record
                               8568                 :                :  */
                               8569                 :                : void
 9396 vadim4o@yahoo.com        8570                 :            688 : XLogPutNextOid(Oid nextOid)
                               8571                 :                : {
 4266 heikki.linnakangas@i     8572                 :            688 :     XLogBeginInsert();
  530 peter@eisentraut.org     8573                 :            688 :     XLogRegisterData(&nextOid, sizeof(Oid));
 4266 heikki.linnakangas@i     8574                 :            688 :     (void) XLogInsert(RM_XLOG_ID, XLOG_NEXTOID);
                               8575                 :                : 
                               8576                 :                :     /*
                               8577                 :                :      * We need not flush the NEXTOID record immediately, because any of the
                               8578                 :                :      * just-allocated OIDs could only reach disk as part of a tuple insert or
                               8579                 :                :      * update that would have its own XLOG record that must follow the NEXTOID
                               8580                 :                :      * record.  Therefore, the standard buffer LSN interlock applied to those
                               8581                 :                :      * records will ensure no such OID reaches disk before the NEXTOID record
                               8582                 :                :      * does.
                               8583                 :                :      *
                               8584                 :                :      * Note, however, that the above statement only covers state "within" the
                               8585                 :                :      * database.  When we use a generated OID as a file or directory name, we
                               8586                 :                :      * are in a sense violating the basic WAL rule, because that filesystem
                               8587                 :                :      * change may reach disk before the NEXTOID WAL record does.  The impact
                               8588                 :                :      * of this is that if a database crash occurs immediately afterward, we
                               8589                 :                :      * might after restart re-generate the same OID and find that it conflicts
                               8590                 :                :      * with the leftover file or directory.  But since for safety's sake we
                               8591                 :                :      * always loop until finding a nonconflicting filename, this poses no real
                               8592                 :                :      * problem in practice. See pgsql-hackers discussion 27-Sep-2006.
                               8593                 :                :      */
 7759 tgl@sss.pgh.pa.us        8594                 :            688 : }
                               8595                 :                : 
                               8596                 :                : /*
                               8597                 :                :  * Write an XLOG SWITCH record.
                               8598                 :                :  *
                               8599                 :                :  * Here we just blindly issue an XLogInsert request for the record.
                               8600                 :                :  * All the magic happens inside XLogInsert.
                               8601                 :                :  *
                               8602                 :                :  * The return value is either the end+1 address of the switch record,
                               8603                 :                :  * or the end+1 address of the prior segment if we did not need to
                               8604                 :                :  * write a switch record because we are already at segment start.
                               8605                 :                :  */
                               8606                 :                : XLogRecPtr
 3503 andres@anarazel.de       8607                 :            837 : RequestXLogSwitch(bool mark_unimportant)
                               8608                 :                : {
                               8609                 :                :     XLogRecPtr  RecPtr;
                               8610                 :                : 
                               8611                 :                :     /* XLOG SWITCH has no data */
 4266 heikki.linnakangas@i     8612                 :            837 :     XLogBeginInsert();
                               8613                 :                : 
 3503 andres@anarazel.de       8614         [ -  + ]:            837 :     if (mark_unimportant)
 3503 andres@anarazel.de       8615                 :UBC           0 :         XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
 4266 heikki.linnakangas@i     8616                 :CBC         837 :     RecPtr = XLogInsert(RM_XLOG_ID, XLOG_SWITCH);
                               8617                 :                : 
 7294 tgl@sss.pgh.pa.us        8618                 :            837 :     return RecPtr;
                               8619                 :                : }
                               8620                 :                : 
                               8621                 :                : /*
                               8622                 :                :  * Write a RESTORE POINT record
                               8623                 :                :  */
                               8624                 :                : XLogRecPtr
 5647 simon@2ndQuadrant.co     8625                 :              3 : XLogRestorePoint(const char *rpName)
                               8626                 :                : {
                               8627                 :                :     XLogRecPtr  RecPtr;
                               8628                 :                :     xl_restore_point xlrec;
                               8629                 :                : 
                               8630                 :              3 :     xlrec.rp_time = GetCurrentTimestamp();
 4542 tgl@sss.pgh.pa.us        8631                 :              3 :     strlcpy(xlrec.rp_name, rpName, MAXFNAMELEN);
                               8632                 :                : 
 4266 heikki.linnakangas@i     8633                 :              3 :     XLogBeginInsert();
  530 peter@eisentraut.org     8634                 :              3 :     XLogRegisterData(&xlrec, sizeof(xl_restore_point));
                               8635                 :                : 
 4266 heikki.linnakangas@i     8636                 :              3 :     RecPtr = XLogInsert(RM_XLOG_ID, XLOG_RESTORE_POINT);
                               8637                 :                : 
 5631 rhaas@postgresql.org     8638         [ +  - ]:              3 :     ereport(LOG,
                               8639                 :                :             errmsg("restore point \"%s\" created at %X/%08X",
                               8640                 :                :                    rpName, LSN_FORMAT_ARGS(RecPtr)));
                               8641                 :                : 
 5647 simon@2ndQuadrant.co     8642                 :              3 :     return RecPtr;
                               8643                 :                : }
                               8644                 :                : 
                               8645                 :                : /*
                               8646                 :                :  * Write an empty XLOG record to assign a distinct LSN.
                               8647                 :                :  *
                               8648                 :                :  * This is used by some index AMs when building indexes on permanent relations
                               8649                 :                :  * with wal_level=minimal.  In that scenario, WAL-logging will start after
                               8650                 :                :  * commit, but the index AM needs distinct LSNs to detect concurrent page
                               8651                 :                :  * modifications.  When the current WAL insert position hasn't advanced since
                               8652                 :                :  * the last call, we emit a dummy record to ensure we get a new, distinct LSN.
                               8653                 :                :  */
                               8654                 :                : XLogRecPtr
  135 pg@bowt.ie               8655                 :            439 : XLogAssignLSN(void)
                               8656                 :                : {
                               8657                 :            439 :     int         dummy = 0;
                               8658                 :                : 
                               8659                 :                :     /*
                               8660                 :                :      * Records other than XLOG_SWITCH must have content.  We use an integer 0
                               8661                 :                :      * to satisfy this restriction.
                               8662                 :                :      */
                               8663                 :            439 :     XLogBeginInsert();
                               8664                 :            439 :     XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
                               8665                 :            439 :     XLogRegisterData(&dummy, sizeof(dummy));
                               8666                 :            439 :     return XLogInsert(RM_XLOG_ID, XLOG_ASSIGN_LSN);
                               8667                 :                : }
                               8668                 :                : 
                               8669                 :                : /*
                               8670                 :                :  * Check if any of the GUC parameters that are critical for hot standby
                               8671                 :                :  * have changed, and update the value in pg_control file if necessary.
                               8672                 :                :  */
                               8673                 :                : static void
 5933 heikki.linnakangas@i     8674                 :            991 : XLogReportParameters(void)
                               8675                 :                : {
                               8676         [ +  + ]:            991 :     if (wal_level != ControlFile->wal_level ||
 4589 rhaas@postgresql.org     8677         [ +  + ]:            730 :         wal_log_hints != ControlFile->wal_log_hints ||
 5933 heikki.linnakangas@i     8678         [ +  + ]:            635 :         MaxConnections != ControlFile->MaxConnections ||
 4770 rhaas@postgresql.org     8679         [ +  + ]:            634 :         max_worker_processes != ControlFile->max_worker_processes ||
 2721 michael@paquier.xyz      8680         [ +  + ]:            631 :         max_wal_senders != ControlFile->max_wal_senders ||
 5933 heikki.linnakangas@i     8681         [ +  + ]:            603 :         max_prepared_xacts != ControlFile->max_prepared_xacts ||
 4253 alvherre@alvh.no-ip.     8682         [ +  - ]:            499 :         max_locks_per_xact != ControlFile->max_locks_per_xact ||
                               8683         [ +  + ]:            499 :         track_commit_timestamp != ControlFile->track_commit_timestamp)
                               8684                 :                :     {
                               8685                 :                :         /*
                               8686                 :                :          * The change in number of backend slots doesn't need to be WAL-logged
                               8687                 :                :          * if archiving is not enabled, as you can't start archive recovery
                               8688                 :                :          * with wal_level=minimal anyway. We don't really care about the
                               8689                 :                :          * values in pg_control either if wal_level=minimal, but seems better
                               8690                 :                :          * to keep them up-to-date to avoid confusion.
                               8691                 :                :          */
 5933 heikki.linnakangas@i     8692   [ +  +  +  + ]:            504 :         if (wal_level != ControlFile->wal_level || XLogIsNeeded())
                               8693                 :                :         {
                               8694                 :                :             xl_parameter_change xlrec;
                               8695                 :                :             XLogRecPtr  recptr;
                               8696                 :                : 
                               8697                 :            477 :             xlrec.MaxConnections = MaxConnections;
 4770 rhaas@postgresql.org     8698                 :            477 :             xlrec.max_worker_processes = max_worker_processes;
 2721 michael@paquier.xyz      8699                 :            477 :             xlrec.max_wal_senders = max_wal_senders;
 5933 heikki.linnakangas@i     8700                 :            477 :             xlrec.max_prepared_xacts = max_prepared_xacts;
                               8701                 :            477 :             xlrec.max_locks_per_xact = max_locks_per_xact;
                               8702                 :            477 :             xlrec.wal_level = wal_level;
 4589 rhaas@postgresql.org     8703                 :            477 :             xlrec.wal_log_hints = wal_log_hints;
 4253 alvherre@alvh.no-ip.     8704                 :            477 :             xlrec.track_commit_timestamp = track_commit_timestamp;
                               8705                 :                : 
 4266 heikki.linnakangas@i     8706                 :            477 :             XLogBeginInsert();
  530 peter@eisentraut.org     8707                 :            477 :             XLogRegisterData(&xlrec, sizeof(xlrec));
                               8708                 :                : 
 4266 heikki.linnakangas@i     8709                 :            477 :             recptr = XLogInsert(RM_XLOG_ID, XLOG_PARAMETER_CHANGE);
 4505 fujii@postgresql.org     8710                 :            477 :             XLogFlush(recptr);
                               8711                 :                :         }
                               8712                 :                : 
 2239 tmunro@postgresql.or     8713                 :            504 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               8714                 :                : 
 5933 heikki.linnakangas@i     8715                 :            504 :         ControlFile->MaxConnections = MaxConnections;
 4770 rhaas@postgresql.org     8716                 :            504 :         ControlFile->max_worker_processes = max_worker_processes;
 2721 michael@paquier.xyz      8717                 :            504 :         ControlFile->max_wal_senders = max_wal_senders;
 5933 heikki.linnakangas@i     8718                 :            504 :         ControlFile->max_prepared_xacts = max_prepared_xacts;
                               8719                 :            504 :         ControlFile->max_locks_per_xact = max_locks_per_xact;
                               8720                 :            504 :         ControlFile->wal_level = wal_level;
 4589 rhaas@postgresql.org     8721                 :            504 :         ControlFile->wal_log_hints = wal_log_hints;
 4253 alvherre@alvh.no-ip.     8722                 :            504 :         ControlFile->track_commit_timestamp = track_commit_timestamp;
 5933 heikki.linnakangas@i     8723                 :            504 :         UpdateControlFile();
                               8724                 :                : 
 2239 tmunro@postgresql.or     8725                 :            504 :         LWLockRelease(ControlFileLock);
                               8726                 :                :     }
 6031 heikki.linnakangas@i     8727                 :            991 : }
                               8728                 :                : 
                               8729                 :                : /*
                               8730                 :                :  * Log the new state of checksums
                               8731                 :                :  */
                               8732                 :                : static void
  114 dgustafsson@postgres     8733                 :             31 : XLogChecksums(uint32 new_type)
                               8734                 :                : {
                               8735                 :                :     xl_checksum_state xlrec;
                               8736                 :                :     XLogRecPtr  recptr;
                               8737                 :                : 
                               8738                 :             31 :     xlrec.new_checksum_state = new_type;
                               8739                 :                : 
                               8740                 :             31 :     XLogBeginInsert();
                               8741                 :             31 :     XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
                               8742                 :                : 
                               8743                 :             31 :     recptr = XLogInsert(RM_XLOG2_ID, XLOG2_CHECKSUMS);
                               8744                 :             31 :     XLogFlush(recptr);
                               8745                 :             31 : }
                               8746                 :                : 
                               8747                 :                : /*
                               8748                 :                :  * Update full_page_writes in shared memory, and write an
                               8749                 :                :  * XLOG_FPW_CHANGE record if necessary.
                               8750                 :                :  *
                               8751                 :                :  * Note: this function assumes there is no other process running
                               8752                 :                :  * concurrently that could update it.
                               8753                 :                :  */
                               8754                 :                : void
 5296 simon@2ndQuadrant.co     8755                 :           1688 : UpdateFullPageWrites(void)
                               8756                 :                : {
                               8757                 :           1688 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               8758                 :                :     bool        recoveryInProgress;
                               8759                 :                : 
                               8760                 :                :     /*
                               8761                 :                :      * Do nothing if full_page_writes has not been changed.
                               8762                 :                :      *
                               8763                 :                :      * It's safe to check the shared full_page_writes without the lock,
                               8764                 :                :      * because we assume that there is no concurrently running process which
                               8765                 :                :      * can update it.
                               8766                 :                :      */
                               8767         [ +  + ]:           1688 :     if (fullPageWrites == Insert->fullPageWrites)
                               8768                 :           1230 :         return;
                               8769                 :                : 
                               8770                 :                :     /*
                               8771                 :                :      * Perform this outside critical section so that the WAL insert
                               8772                 :                :      * initialization done by RecoveryInProgress() doesn't trigger an
                               8773                 :                :      * assertion failure.
                               8774                 :                :      */
 2858 akapila@postgresql.o     8775                 :            458 :     recoveryInProgress = RecoveryInProgress();
                               8776                 :                : 
 5255 heikki.linnakangas@i     8777                 :            458 :     START_CRIT_SECTION();
                               8778                 :                : 
                               8779                 :                :     /*
                               8780                 :                :      * It's always safe to take full page images, even when not strictly
                               8781                 :                :      * required, but not the other round. So if we're setting full_page_writes
                               8782                 :                :      * to true, first set it true and then write the WAL record. If we're
                               8783                 :                :      * setting it to false, first write the WAL record and then set the global
                               8784                 :                :      * flag.
                               8785                 :                :      */
                               8786         [ +  + ]:            458 :     if (fullPageWrites)
                               8787                 :                :     {
 4510                          8788                 :            445 :         WALInsertLockAcquireExclusive();
 5255                          8789                 :            445 :         Insert->fullPageWrites = true;
 4510                          8790                 :            445 :         WALInsertLockRelease();
                               8791                 :                :     }
                               8792                 :                : 
                               8793                 :                :     /*
                               8794                 :                :      * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
                               8795                 :                :      * full_page_writes during archive recovery, if required.
                               8796                 :                :      */
 2858 akapila@postgresql.o     8797   [ +  +  -  + ]:            458 :     if (XLogStandbyInfoActive() && !recoveryInProgress)
                               8798                 :                :     {
 4266 heikki.linnakangas@i     8799                 :UBC           0 :         XLogBeginInsert();
  530 peter@eisentraut.org     8800                 :              0 :         XLogRegisterData(&fullPageWrites, sizeof(bool));
                               8801                 :                : 
 4266 heikki.linnakangas@i     8802                 :              0 :         XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
                               8803                 :                :     }
                               8804                 :                : 
 5255 heikki.linnakangas@i     8805         [ +  + ]:CBC         458 :     if (!fullPageWrites)
                               8806                 :                :     {
 4510                          8807                 :             13 :         WALInsertLockAcquireExclusive();
 5255                          8808                 :             13 :         Insert->fullPageWrites = false;
 4510                          8809                 :             13 :         WALInsertLockRelease();
                               8810                 :                :     }
 5255                          8811         [ -  + ]:            458 :     END_CRIT_SECTION();
                               8812                 :                : }
                               8813                 :                : 
                               8814                 :                : /*
                               8815                 :                :  * XLOG resource manager's routines
                               8816                 :                :  *
                               8817                 :                :  * Definitions of info values are in include/catalog/pg_control.h, though
                               8818                 :                :  * not all record types are related to control file updates.
                               8819                 :                :  *
                               8820                 :                :  * NOTE: Some XLOG record types that are directly related to WAL recovery
                               8821                 :                :  * are handled in xlogrecovery_redo().
                               8822                 :                :  */
                               8823                 :                : void
 4266                          8824                 :         119422 : xlog_redo(XLogReaderState *record)
                               8825                 :                : {
                               8826                 :         119422 :     uint8       info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
                               8827                 :         119422 :     XLogRecPtr  lsn = record->EndRecPtr;
                               8828                 :                : 
                               8829                 :                :     /*
                               8830                 :                :      * In XLOG rmgr, backup blocks are only used by XLOG_FPI and
                               8831                 :                :      * XLOG_FPI_FOR_HINT records.
                               8832                 :                :      */
 4262                          8833   [ +  +  +  +  :         119422 :     Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
                                              -  + ]
                               8834                 :                :            !XLogRecHasAnyBlockRefs(record));
                               8835                 :                : 
 9261 tgl@sss.pgh.pa.us        8836         [ +  + ]:         119422 :     if (info == XLOG_NEXTOID)
                               8837                 :                :     {
                               8838                 :                :         Oid         nextOid;
                               8839                 :                : 
                               8840                 :                :         /*
                               8841                 :                :          * We used to try to take the maximum of TransamVariables->nextOid and
                               8842                 :                :          * the recorded nextOid, but that fails if the OID counter wraps
                               8843                 :                :          * around.  Since no OID allocation should be happening during replay
                               8844                 :                :          * anyway, better to just believe the record exactly.  We still take
                               8845                 :                :          * OidGenLock while setting the variable, just in case.
                               8846                 :                :          */
 9396 vadim4o@yahoo.com        8847                 :            100 :         memcpy(&nextOid, XLogRecGetData(record), sizeof(Oid));
 5284 tgl@sss.pgh.pa.us        8848                 :            100 :         LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
  961 heikki.linnakangas@i     8849                 :            100 :         TransamVariables->nextOid = nextOid;
                               8850                 :            100 :         TransamVariables->oidCount = 0;
 5284 tgl@sss.pgh.pa.us        8851                 :            100 :         LWLockRelease(OidGenLock);
                               8852                 :                :     }
 9266                          8853         [ +  + ]:         119322 :     else if (info == XLOG_CHECKPOINT_SHUTDOWN)
                               8854                 :                :     {
                               8855                 :                :         CheckPoint  checkPoint;
                               8856                 :                :         TimeLineID  replayTLI;
                               8857                 :                : 
                               8858                 :             43 :         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
                               8859                 :                :         /* In a SHUTDOWN checkpoint, believe the counters exactly */
 5284                          8860                 :             43 :         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
  961 heikki.linnakangas@i     8861                 :             43 :         TransamVariables->nextXid = checkPoint.nextXid;
 5284 tgl@sss.pgh.pa.us        8862                 :             43 :         LWLockRelease(XidGenLock);
                               8863                 :             43 :         LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
  961 heikki.linnakangas@i     8864                 :             43 :         TransamVariables->nextOid = checkPoint.nextOid;
                               8865                 :             43 :         TransamVariables->oidCount = 0;
 5284 tgl@sss.pgh.pa.us        8866                 :             43 :         LWLockRelease(OidGenLock);
 7718                          8867                 :             43 :         MultiXactSetNextMXact(checkPoint.nextMulti,
                               8868                 :                :                               checkPoint.nextMultiOffset);
                               8869                 :                : 
 3956 andres@anarazel.de       8870                 :             43 :         MultiXactAdvanceOldest(checkPoint.oldestMulti,
                               8871                 :                :                                checkPoint.oldestMultiDB);
                               8872                 :                : 
                               8873                 :                :         /*
                               8874                 :                :          * No need to set oldestClogXid here as well; it'll be set when we
                               8875                 :                :          * redo an xl_clog_truncate if it changed since initialization.
                               8876                 :                :          */
 6003 tgl@sss.pgh.pa.us        8877                 :             43 :         SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
                               8878                 :                : 
                               8879                 :                :         /*
                               8880                 :                :          * If we see a shutdown checkpoint while waiting for an end-of-backup
                               8881                 :                :          * record, the backup was canceled and the end-of-backup record will
                               8882                 :                :          * never arrive.
                               8883                 :                :          */
 4902 heikki.linnakangas@i     8884         [ +  - ]:             43 :         if (ArchiveRecoveryRequested &&
  262 alvherre@kurilemu.de     8885         [ -  + ]:             43 :             XLogRecPtrIsValid(ControlFile->backupStartPoint) &&
  262 alvherre@kurilemu.de     8886         [ #  # ]:UBC           0 :             !XLogRecPtrIsValid(ControlFile->backupEndPoint))
 5284 tgl@sss.pgh.pa.us        8887         [ #  # ]:              0 :             ereport(PANIC,
                               8888                 :                :                     (errmsg("online backup was canceled, recovery cannot continue")));
                               8889                 :                : 
                               8890                 :                :         /*
                               8891                 :                :          * If we see a shutdown checkpoint, we know that nothing was running
                               8892                 :                :          * on the primary at this point. So fake-up an empty running-xacts
                               8893                 :                :          * record and use that here and now. Recover additional standby state
                               8894                 :                :          * for prepared transactions.
                               8895                 :                :          */
 6063 simon@2ndQuadrant.co     8896         [ +  + ]:CBC          43 :         if (standbyState >= STANDBY_INITIALIZED)
                               8897                 :                :         {
                               8898                 :                :             TransactionId *xids;
                               8899                 :                :             int         nxids;
                               8900                 :                :             TransactionId oldestActiveXID;
                               8901                 :                :             TransactionId latestCompletedXid;
                               8902                 :                :             RunningTransactionsData running;
                               8903                 :                : 
 5948 heikki.linnakangas@i     8904                 :             41 :             oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
                               8905                 :                : 
                               8906                 :                :             /* Update pg_subtrans entries for any prepared transactions */
  759                          8907                 :             41 :             StandbyRecoverPreparedTransactions();
                               8908                 :                : 
                               8909                 :                :             /*
                               8910                 :                :              * Construct a RunningTransactions snapshot representing a shut
                               8911                 :                :              * down server, with only prepared transactions still alive. We're
                               8912                 :                :              * never overflowed at this point because all subxids are listed
                               8913                 :                :              * with their parent prepared transactions.
                               8914                 :                :              */
 5948                          8915                 :             41 :             running.xcnt = nxids;
 4984 simon@2ndQuadrant.co     8916                 :             41 :             running.subxcnt = 0;
  759 heikki.linnakangas@i     8917                 :             41 :             running.subxid_status = SUBXIDS_IN_SUBTRANS;
 2175 andres@anarazel.de       8918                 :             41 :             running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
 5948 heikki.linnakangas@i     8919                 :             41 :             running.oldestRunningXid = oldestActiveXID;
 2175 andres@anarazel.de       8920                 :             41 :             latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
 5918 simon@2ndQuadrant.co     8921         [ -  + ]:             41 :             TransactionIdRetreat(latestCompletedXid);
 5917                          8922         [ -  + ]:             41 :             Assert(TransactionIdIsNormal(latestCompletedXid));
 5918                          8923                 :             41 :             running.latestCompletedXid = latestCompletedXid;
 5948 heikki.linnakangas@i     8924                 :             41 :             running.xids = xids;
                               8925                 :                : 
                               8926                 :             41 :             ProcArrayApplyRecoveryInfo(&running);
                               8927                 :                :         }
                               8928                 :                : 
                               8929                 :                :         /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
 2239 tmunro@postgresql.or     8930                 :             43 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 2175 andres@anarazel.de       8931                 :             43 :         ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
  114 dgustafsson@postgres     8932                 :             43 :         ControlFile->data_checksum_version = checkPoint.dataChecksumState;
                               8933                 :                : 
   87                          8934                 :             43 :         UpdateControlFile();
 2239 tmunro@postgresql.or     8935                 :             43 :         LWLockRelease(ControlFileLock);
                               8936                 :                : 
                               8937                 :                :         /*
                               8938                 :                :          * We should've already switched to the new TLI before replaying this
                               8939                 :                :          * record.
                               8940                 :                :          */
 1621 heikki.linnakangas@i     8941                 :             43 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1724 rhaas@postgresql.org     8942         [ -  + ]:             43 :         if (checkPoint.ThisTimeLineID != replayTLI)
 4979 heikki.linnakangas@i     8943         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8944                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record",
                               8945                 :                :                             checkPoint.ThisTimeLineID, replayTLI)));
                               8946                 :                : 
 1705 rhaas@postgresql.org     8947                 :CBC          43 :         RecoveryRestartPoint(&checkPoint, record);
                               8948                 :                : 
                               8949                 :                :         /*
                               8950                 :                :          * After replaying a checkpoint record, free all smgr objects.
                               8951                 :                :          * Otherwise we would never do so for dropped relations, as the
                               8952                 :                :          * startup does not process shared invalidation messages or call
                               8953                 :                :          * AtEOXact_SMgr().
                               8954                 :                :          */
  319 michael@paquier.xyz      8955                 :             43 :         smgrdestroyall();
                               8956                 :                :     }
 9266 tgl@sss.pgh.pa.us        8957         [ +  + ]:         119279 :     else if (info == XLOG_CHECKPOINT_ONLINE)
                               8958                 :                :     {
                               8959                 :                :         CheckPoint  checkPoint;
                               8960                 :                :         TimeLineID  replayTLI;
                               8961                 :                : 
                               8962                 :            716 :         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
                               8963                 :                :         /* In an ONLINE checkpoint, treat the XID counter as a minimum */
 5284                          8964                 :            716 :         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
  961 heikki.linnakangas@i     8965         [ -  + ]:            716 :         if (FullTransactionIdPrecedes(TransamVariables->nextXid,
                               8966                 :                :                                       checkPoint.nextXid))
  961 heikki.linnakangas@i     8967                 :UBC           0 :             TransamVariables->nextXid = checkPoint.nextXid;
 5284 tgl@sss.pgh.pa.us        8968                 :CBC         716 :         LWLockRelease(XidGenLock);
                               8969                 :                : 
                               8970                 :                :         /*
                               8971                 :                :          * We ignore the nextOid counter in an ONLINE checkpoint, preferring
                               8972                 :                :          * to track OID assignment through XLOG_NEXTOID records.  The nextOid
                               8973                 :                :          * counter is from the start of the checkpoint and might well be stale
                               8974                 :                :          * compared to later XLOG_NEXTOID records.  We could try to take the
                               8975                 :                :          * maximum of the nextOid counter and our latest value, but since
                               8976                 :                :          * there's no particular guarantee about the speed with which the OID
                               8977                 :                :          * counter wraps around, that's a risky thing to do.  In any case,
                               8978                 :                :          * users of the nextOid counter are required to avoid assignment of
                               8979                 :                :          * duplicates, so that a somewhat out-of-date value should be safe.
                               8980                 :                :          */
                               8981                 :                : 
                               8982                 :                :         /* Handle multixact */
 7718                          8983                 :            716 :         MultiXactAdvanceNextMXact(checkPoint.nextMulti,
                               8984                 :                :                                   checkPoint.nextMultiOffset);
                               8985                 :                : 
                               8986                 :                :         /*
                               8987                 :                :          * NB: This may perform multixact truncation when replaying WAL
                               8988                 :                :          * generated by an older primary.
                               8989                 :                :          */
 3956 andres@anarazel.de       8990                 :            716 :         MultiXactAdvanceOldest(checkPoint.oldestMulti,
                               8991                 :                :                                checkPoint.oldestMultiDB);
  961 heikki.linnakangas@i     8992         [ -  + ]:            716 :         if (TransactionIdPrecedes(TransamVariables->oldestXid,
                               8993                 :                :                                   checkPoint.oldestXid))
 6003 tgl@sss.pgh.pa.us        8994                 :UBC           0 :             SetTransactionIdLimit(checkPoint.oldestXid,
                               8995                 :                :                                   checkPoint.oldestXidDB);
                               8996                 :                :         /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
 2239 tmunro@postgresql.or     8997                 :CBC         716 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 2175 andres@anarazel.de       8998                 :            716 :         ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 2239 tmunro@postgresql.or     8999                 :            716 :         LWLockRelease(ControlFileLock);
                               9000                 :                : 
                               9001                 :                :         /* TLI should not change in an on-line checkpoint */
 1621 heikki.linnakangas@i     9002                 :            716 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1724 rhaas@postgresql.org     9003         [ -  + ]:            716 :         if (checkPoint.ThisTimeLineID != replayTLI)
 8201 tgl@sss.pgh.pa.us        9004         [ #  # ]:UBC           0 :             ereport(PANIC,
                               9005                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in online checkpoint record",
                               9006                 :                :                             checkPoint.ThisTimeLineID, replayTLI)));
                               9007                 :                : 
 1705 rhaas@postgresql.org     9008                 :CBC         716 :         RecoveryRestartPoint(&checkPoint, record);
                               9009                 :                : 
                               9010                 :                :         /*
                               9011                 :                :          * After replaying a checkpoint record, free all smgr objects.
                               9012                 :                :          * Otherwise we would never do so for dropped relations, as the
                               9013                 :                :          * startup does not process shared invalidation messages or call
                               9014                 :                :          * AtEOXact_SMgr().
                               9015                 :                :          */
  319 michael@paquier.xyz      9016                 :            716 :         smgrdestroyall();
                               9017                 :                :     }
 1761 alvherre@alvh.no-ip.     9018         [ +  + ]:         118563 :     else if (info == XLOG_OVERWRITE_CONTRECORD)
                               9019                 :                :     {
                               9020                 :                :         /* nothing to do here, handled in xlogrecovery_redo() */
                               9021                 :                :     }
 4926 simon@2ndQuadrant.co     9022         [ +  + ]:         118562 :     else if (info == XLOG_END_OF_RECOVERY)
                               9023                 :                :     {
                               9024                 :                :         xl_end_of_recovery xlrec;
                               9025                 :                :         TimeLineID  replayTLI;
                               9026                 :                : 
                               9027                 :             12 :         memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_end_of_recovery));
                               9028                 :                : 
                               9029                 :                :         /*
                               9030                 :                :          * For Hot Standby, we could treat this like a Shutdown Checkpoint,
                               9031                 :                :          * but this case is rarer and harder to test, so the benefit doesn't
                               9032                 :                :          * outweigh the potential extra cost of maintenance.
                               9033                 :                :          */
                               9034                 :                : 
                               9035                 :                :         /*
                               9036                 :                :          * We should've already switched to the new TLI before replaying this
                               9037                 :                :          * record.
                               9038                 :                :          */
 1621 heikki.linnakangas@i     9039                 :             12 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1724 rhaas@postgresql.org     9040         [ -  + ]:             12 :         if (xlrec.ThisTimeLineID != replayTLI)
 4926 simon@2ndQuadrant.co     9041         [ #  # ]:UBC           0 :             ereport(PANIC,
                               9042                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record",
                               9043                 :                :                             xlrec.ThisTimeLineID, replayTLI)));
                               9044                 :                :     }
 7007 tgl@sss.pgh.pa.us        9045         [ +  - ]:CBC      118550 :     else if (info == XLOG_NOOP)
                               9046                 :                :     {
                               9047                 :                :         /* nothing to do here */
                               9048                 :                :     }
 7294                          9049         [ +  + ]:         118550 :     else if (info == XLOG_SWITCH)
                               9050                 :                :     {
                               9051                 :                :         /* nothing to do here */
                               9052                 :                :     }
 5647 simon@2ndQuadrant.co     9053         [ +  + ]:         118081 :     else if (info == XLOG_RESTORE_POINT)
                               9054                 :                :     {
                               9055                 :                :         /* nothing to do here, handled in xlogrecovery.c */
                               9056                 :                :     }
  135 pg@bowt.ie               9057         [ +  + ]:         118076 :     else if (info == XLOG_ASSIGN_LSN)
                               9058                 :                :     {
                               9059                 :                :         /* nothing to do here, see XLogGetFakeLSN() */
                               9060                 :                :     }
 4262 heikki.linnakangas@i     9061   [ +  +  +  + ]:          55749 :     else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT)
                               9062                 :                :     {
                               9063                 :                :         /*
                               9064                 :                :          * XLOG_FPI records contain nothing else but one or more block
                               9065                 :                :          * references. Every block reference must include a full-page image
                               9066                 :                :          * even if full_page_writes was disabled when the record was generated
                               9067                 :                :          * - otherwise there would be no point in this record.
                               9068                 :                :          *
                               9069                 :                :          * XLOG_FPI_FOR_HINT records are generated when a page needs to be
                               9070                 :                :          * WAL-logged because of a hint bit update. They are only generated
                               9071                 :                :          * when checksums and/or wal_log_hints are enabled. They may include
                               9072                 :                :          * no full-page images if full_page_writes was disabled when they were
                               9073                 :                :          * generated. In this case there is nothing to do here.
                               9074                 :                :          *
                               9075                 :                :          * No recovery conflicts are generated by these generic records - if a
                               9076                 :                :          * resource manager needs to generate conflicts, it has to define a
                               9077                 :                :          * separate WAL record type and redo routine.
                               9078                 :                :          */
 1591 tmunro@postgresql.or     9079         [ +  + ]:         114633 :         for (uint8 block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
                               9080                 :                :         {
                               9081                 :                :             Buffer      buffer;
                               9082                 :                : 
 1831 fujii@postgresql.org     9083         [ +  + ]:          59763 :             if (!XLogRecHasBlockImage(record, block_id))
                               9084                 :                :             {
                               9085         [ -  + ]:             66 :                 if (info == XLOG_FPI)
 1831 fujii@postgresql.org     9086         [ #  # ]:UBC           0 :                     elog(ERROR, "XLOG_FPI record did not contain a full-page image");
 1831 fujii@postgresql.org     9087                 :CBC          66 :                 continue;
                               9088                 :                :             }
                               9089                 :                : 
 2671 heikki.linnakangas@i     9090         [ -  + ]:          59697 :             if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED)
 2671 heikki.linnakangas@i     9091         [ #  # ]:UBC           0 :                 elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block");
 2671 heikki.linnakangas@i     9092                 :CBC       59697 :             UnlockReleaseBuffer(buffer);
                               9093                 :                :         }
                               9094                 :                :     }
 6047                          9095         [ +  + ]:            879 :     else if (info == XLOG_BACKUP_END)
                               9096                 :                :     {
                               9097                 :                :         /* nothing to do here, handled in xlogrecovery_redo() */
                               9098                 :                :     }
 5933                          9099         [ +  + ]:            774 :     else if (info == XLOG_PARAMETER_CHANGE)
                               9100                 :                :     {
                               9101                 :                :         xl_parameter_change xlrec;
                               9102                 :                : 
                               9103                 :                :         /* Update our copy of the parameters in pg_control */
                               9104                 :             38 :         memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
                               9105                 :                : 
 5928                          9106                 :             38 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 5933                          9107                 :             38 :         ControlFile->MaxConnections = xlrec.MaxConnections;
 4770 rhaas@postgresql.org     9108                 :             38 :         ControlFile->max_worker_processes = xlrec.max_worker_processes;
 2721 michael@paquier.xyz      9109                 :             38 :         ControlFile->max_wal_senders = xlrec.max_wal_senders;
 5933 heikki.linnakangas@i     9110                 :             38 :         ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
                               9111                 :             38 :         ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
                               9112                 :             38 :         ControlFile->wal_level = xlrec.wal_level;
 4210                          9113                 :             38 :         ControlFile->wal_log_hints = xlrec.wal_log_hints;
                               9114                 :                : 
                               9115                 :                :         /*
                               9116                 :                :          * Update minRecoveryPoint to ensure that if recovery is aborted, we
                               9117                 :                :          * recover back up to this point before allowing hot standby again.
                               9118                 :                :          * This is important if the max_* settings are decreased, to ensure
                               9119                 :                :          * you don't run queries against the WAL preceding the change. The
                               9120                 :                :          * local copies cannot be updated as long as crash recovery is
                               9121                 :                :          * happening and we expect all the WAL to be replayed.
                               9122                 :                :          */
 2943 michael@paquier.xyz      9123         [ +  + ]:             38 :         if (InArchiveRecovery)
                               9124                 :                :         {
 1621 heikki.linnakangas@i     9125                 :             23 :             LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               9126                 :             23 :             LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               9127                 :                :         }
  262 alvherre@kurilemu.de     9128   [ +  +  +  + ]:             38 :         if (XLogRecPtrIsValid(LocalMinRecoveryPoint) && LocalMinRecoveryPoint < lsn)
                               9129                 :                :         {
                               9130                 :                :             TimeLineID  replayTLI;
                               9131                 :                : 
 1621 heikki.linnakangas@i     9132                 :             12 :             (void) GetCurrentReplayRecPtr(&replayTLI);
 5928                          9133                 :             12 :             ControlFile->minRecoveryPoint = lsn;
 1724 rhaas@postgresql.org     9134                 :             12 :             ControlFile->minRecoveryPointTLI = replayTLI;
                               9135                 :                :         }
                               9136                 :                : 
 3951 alvherre@alvh.no-ip.     9137                 :             38 :         CommitTsParameterChange(xlrec.track_commit_timestamp,
                               9138                 :             38 :                                 ControlFile->track_commit_timestamp);
                               9139                 :             38 :         ControlFile->track_commit_timestamp = xlrec.track_commit_timestamp;
                               9140                 :                : 
 5933 heikki.linnakangas@i     9141                 :             38 :         UpdateControlFile();
 5928                          9142                 :             38 :         LWLockRelease(ControlFileLock);
                               9143                 :                : 
                               9144                 :                :         /* Check to see if any parameter change gives a problem on recovery */
 5933                          9145                 :             38 :         CheckRequiredParameterValues();
                               9146                 :                :     }
 5296 simon@2ndQuadrant.co     9147         [ -  + ]:            736 :     else if (info == XLOG_FPW_CHANGE)
                               9148                 :                :     {
                               9149                 :                :         bool        fpw;
                               9150                 :                : 
 5296 simon@2ndQuadrant.co     9151                 :UBC           0 :         memcpy(&fpw, XLogRecGetData(record), sizeof(bool));
                               9152                 :                : 
                               9153                 :                :         /*
                               9154                 :                :          * Update the LSN of the last replayed XLOG_FPW_CHANGE record so that
                               9155                 :                :          * do_pg_backup_start() and do_pg_backup_stop() can check whether
                               9156                 :                :          * full_page_writes has been disabled during online backup.
                               9157                 :                :          */
                               9158         [ #  # ]:              0 :         if (!fpw)
                               9159                 :                :         {
 4325 andres@anarazel.de       9160                 :              0 :             SpinLockAcquire(&XLogCtl->info_lck);
 1705 rhaas@postgresql.org     9161         [ #  # ]:              0 :             if (XLogCtl->lastFpwDisableRecPtr < record->ReadRecPtr)
                               9162                 :              0 :                 XLogCtl->lastFpwDisableRecPtr = record->ReadRecPtr;
 4325 andres@anarazel.de       9163                 :              0 :             SpinLockRelease(&XLogCtl->info_lck);
                               9164                 :                :         }
                               9165                 :                : 
                               9166                 :                :         /* Keep track of full_page_writes */
 5296 simon@2ndQuadrant.co     9167                 :              0 :         lastFullPageWrites = fpw;
                               9168                 :                :     }
 1011 rhaas@postgresql.org     9169         [ +  + ]:CBC         736 :     else if (info == XLOG_CHECKPOINT_REDO)
                               9170                 :                :     {
                               9171                 :                :         xl_checkpoint_redo redo_rec;
  114 dgustafsson@postgres     9172                 :            718 :         bool        new_state = false;
                               9173                 :                : 
                               9174                 :            718 :         memcpy(&redo_rec, XLogRecGetData(record), sizeof(xl_checkpoint_redo));
                               9175                 :                : 
                               9176                 :            718 :         SpinLockAcquire(&XLogCtl->info_lck);
                               9177                 :            718 :         XLogCtl->data_checksum_version = redo_rec.data_checksum_version;
   87                          9178                 :            718 :         SetLocalDataChecksumState(redo_rec.data_checksum_version);
  114                          9179         [ -  + ]:            718 :         if (redo_rec.data_checksum_version != ControlFile->data_checksum_version)
  114 dgustafsson@postgres     9180                 :UBC           0 :             new_state = true;
  114 dgustafsson@postgres     9181                 :CBC         718 :         SpinLockRelease(&XLogCtl->info_lck);
                               9182                 :                : 
                               9183         [ -  + ]:            718 :         if (new_state)
  114 dgustafsson@postgres     9184                 :UBC           0 :             EmitAndWaitDataChecksumsBarrier(redo_rec.data_checksum_version);
                               9185                 :                :     }
  215 msawada@postgresql.o     9186         [ +  - ]:CBC          18 :     else if (info == XLOG_LOGICAL_DECODING_STATUS_CHANGE)
                               9187                 :                :     {
                               9188                 :                :         bool        status;
                               9189                 :                : 
                               9190                 :             18 :         memcpy(&status, XLogRecGetData(record), sizeof(bool));
                               9191                 :                : 
                               9192                 :                :         /*
                               9193                 :                :          * We need to toggle the logical decoding status and update the
                               9194                 :                :          * XLogLogicalInfo cache of processes synchronously because
                               9195                 :                :          * XLogLogicalInfoActive() is used even during read-only queries
                               9196                 :                :          * (e.g., via RelationIsAccessibleInLogicalDecoding()). In the
                               9197                 :                :          * 'disable' case, it is safe to invalidate existing slots after
                               9198                 :                :          * disabling logical decoding because logical decoding cannot process
                               9199                 :                :          * subsequent WAL records, which may not contain logical information.
                               9200                 :                :          */
                               9201         [ +  + ]:             18 :         if (status)
                               9202                 :              9 :             EnableLogicalDecoding();
                               9203                 :                :         else
                               9204                 :              9 :             DisableLogicalDecoding();
                               9205                 :                : 
                               9206         [ +  + ]:             18 :         elog(DEBUG1, "update logical decoding status to %d during recovery",
                               9207                 :                :              status);
                               9208                 :                : 
                               9209   [ +  -  +  + ]:             18 :         if (InRecovery && InHotStandby)
                               9210                 :                :         {
                               9211         [ +  + ]:             16 :             if (!status)
                               9212                 :                :             {
                               9213                 :                :                 /*
                               9214                 :                :                  * Invalidate logical slots if we are in hot standby and the
                               9215                 :                :                  * primary disabled logical decoding.
                               9216                 :                :                  */
                               9217                 :              9 :                 InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
                               9218                 :                :                                                    0, InvalidOid,
                               9219                 :                :                                                    InvalidTransactionId);
                               9220                 :                :             }
                               9221         [ -  + ]:              7 :             else if (sync_replication_slots)
                               9222                 :                :             {
                               9223                 :                :                 /*
                               9224                 :                :                  * Signal the postmaster to launch the slotsync worker.
                               9225                 :                :                  *
                               9226                 :                :                  * XXX: For simplicity, we keep the slotsync worker running
                               9227                 :                :                  * even after logical decoding is disabled. A future
                               9228                 :                :                  * improvement can consider starting and stopping the worker
                               9229                 :                :                  * based on logical decoding status change.
                               9230                 :                :                  */
  215 msawada@postgresql.o     9231                 :UBC           0 :                 kill(PostmasterPid, SIGUSR1);
                               9232                 :                :             }
                               9233                 :                :         }
                               9234                 :                :     }
 9409 vadim4o@yahoo.com        9235                 :CBC      119420 : }
                               9236                 :                : 
                               9237                 :                : void
  114 dgustafsson@postgres     9238                 :              7 : xlog2_redo(XLogReaderState *record)
                               9239                 :                : {
                               9240                 :              7 :     uint8       info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
                               9241                 :                : 
                               9242         [ +  - ]:              7 :     if (info == XLOG2_CHECKSUMS)
                               9243                 :                :     {
                               9244                 :                :         xl_checksum_state state;
                               9245                 :                : 
                               9246                 :              7 :         memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
                               9247                 :                : 
                               9248                 :              7 :         SpinLockAcquire(&XLogCtl->info_lck);
                               9249                 :              7 :         XLogCtl->data_checksum_version = state.new_checksum_state;
                               9250                 :              7 :         SpinLockRelease(&XLogCtl->info_lck);
                               9251                 :                : 
   87                          9252                 :              7 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               9253                 :              7 :         ControlFile->data_checksum_version = state.new_checksum_state;
                               9254                 :              7 :         UpdateControlFile();
                               9255                 :              7 :         LWLockRelease(ControlFileLock);
                               9256                 :                : 
                               9257                 :                :         /*
                               9258                 :                :          * Block on a procsignalbarrier to await all processes having seen the
                               9259                 :                :          * change to checksum status. Once the barrier has been passed we can
                               9260                 :                :          * initiate the corresponding processing.
                               9261                 :                :          */
  114                          9262                 :              7 :         EmitAndWaitDataChecksumsBarrier(state.new_checksum_state);
                               9263                 :                :     }
                               9264                 :              7 : }
                               9265                 :                : 
                               9266                 :                : /*
                               9267                 :                :  * Return the extra open flags used for opening a file, depending on the
                               9268                 :                :  * value of the GUCs wal_sync_method, fsync and debug_io_direct.
                               9269                 :                :  */
                               9270                 :                : static int
 6647 magnus@hagander.net      9271                 :          17965 : get_sync_bit(int method)
                               9272                 :                : {
 5994 bruce@momjian.us         9273                 :          17965 :     int         o_direct_flag = 0;
                               9274                 :                : 
                               9275                 :                :     /*
                               9276                 :                :      * Use O_DIRECT if requested, except in walreceiver process.  The WAL
                               9277                 :                :      * written by walreceiver is normally read by the startup process soon
                               9278                 :                :      * after it's written.  Also, walreceiver performs unaligned writes, which
                               9279                 :                :      * don't work with O_DIRECT, so it is required for correctness too.
                               9280                 :                :      */
 1205 tmunro@postgresql.or     9281   [ +  +  +  - ]:          17965 :     if ((io_direct_flags & IO_DIRECT_WAL) && !AmWalReceiverProcess())
 6001 heikki.linnakangas@i     9282                 :              8 :         o_direct_flag = PG_O_DIRECT;
                               9283                 :                : 
                               9284                 :                :     /* If fsync is disabled, never open in sync mode */
 1205 tmunro@postgresql.or     9285         [ +  - ]:          17965 :     if (!enableFsync)
                               9286                 :          17965 :         return o_direct_flag;
                               9287                 :                : 
 6647 magnus@hagander.net      9288   [ #  #  #  # ]:UBC           0 :     switch (method)
                               9289                 :                :     {
                               9290                 :                :             /*
                               9291                 :                :              * enum values for all sync options are defined even if they are
                               9292                 :                :              * not supported on the current platform.  But if not, they are
                               9293                 :                :              * not included in the enum option array, and therefore will never
                               9294                 :                :              * be seen here.
                               9295                 :                :              */
 1017 nathan@postgresql.or     9296                 :              0 :         case WAL_SYNC_METHOD_FSYNC:
                               9297                 :                :         case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
                               9298                 :                :         case WAL_SYNC_METHOD_FDATASYNC:
 1205 tmunro@postgresql.or     9299                 :              0 :             return o_direct_flag;
                               9300                 :                : #ifdef O_SYNC
 1017 nathan@postgresql.or     9301                 :              0 :         case WAL_SYNC_METHOD_OPEN:
 1465 tmunro@postgresql.or     9302                 :              0 :             return O_SYNC | o_direct_flag;
                               9303                 :                : #endif
                               9304                 :                : #ifdef O_DSYNC
 1017 nathan@postgresql.or     9305                 :              0 :         case WAL_SYNC_METHOD_OPEN_DSYNC:
 1465 tmunro@postgresql.or     9306                 :              0 :             return O_DSYNC | o_direct_flag;
                               9307                 :                : #endif
 6649 magnus@hagander.net      9308                 :              0 :         default:
                               9309                 :                :             /* can't happen (unless we are out of sync with option array) */
  800 peter@eisentraut.org     9310         [ #  # ]:              0 :             elog(ERROR, "unrecognized \"wal_sync_method\": %d", method);
                               9311                 :                :             return 0;           /* silence warning */
                               9312                 :                :     }
                               9313                 :                : }
                               9314                 :                : 
                               9315                 :                : /*
                               9316                 :                :  * GUC support
                               9317                 :                :  */
                               9318                 :                : void
 1017 nathan@postgresql.or     9319                 :CBC        1266 : assign_wal_sync_method(int new_wal_sync_method, void *extra)
                               9320                 :                : {
                               9321         [ -  + ]:           1266 :     if (wal_sync_method != new_wal_sync_method)
                               9322                 :                :     {
                               9323                 :                :         /*
                               9324                 :                :          * To ensure that no blocks escape unsynced, force an fsync on the
                               9325                 :                :          * currently open log segment (if any).  Also, if the open flag is
                               9326                 :                :          * changing, close the log file so it will be reopened (with new flag
                               9327                 :                :          * bit) at next use.
                               9328                 :                :          */
 9263 tgl@sss.pgh.pa.us        9329         [ #  # ]:UBC           0 :         if (openLogFile >= 0)
                               9330                 :                :         {
 3417 rhaas@postgresql.org     9331                 :              0 :             pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN);
 9263 tgl@sss.pgh.pa.us        9332         [ #  # ]:              0 :             if (pg_fsync(openLogFile) != 0)
                               9333                 :                :             {
                               9334                 :                :                 char        xlogfname[MAXFNAMELEN];
                               9335                 :                :                 int         save_errno;
                               9336                 :                : 
 2427 michael@paquier.xyz      9337                 :              0 :                 save_errno = errno;
 1724 rhaas@postgresql.org     9338                 :              0 :                 XLogFileName(xlogfname, openLogTLI, openLogSegNo,
                               9339                 :                :                              wal_segment_size);
 2427 michael@paquier.xyz      9340                 :              0 :                 errno = save_errno;
 8406 tgl@sss.pgh.pa.us        9341         [ #  # ]:              0 :                 ereport(PANIC,
                               9342                 :                :                         (errcode_for_file_access(),
                               9343                 :                :                          errmsg("could not fsync file \"%s\": %m", xlogfname)));
                               9344                 :                :             }
                               9345                 :                : 
 3417 rhaas@postgresql.org     9346                 :              0 :             pgstat_report_wait_end();
 1017 nathan@postgresql.or     9347         [ #  # ]:              0 :             if (get_sync_bit(wal_sync_method) != get_sync_bit(new_wal_sync_method))
 7346 bruce@momjian.us         9348                 :              0 :                 XLogFileClose();
                               9349                 :                :         }
                               9350                 :                :     }
 9263 tgl@sss.pgh.pa.us        9351                 :CBC        1266 : }
                               9352                 :                : 
                               9353                 :                : 
                               9354                 :                : /*
                               9355                 :                :  * Issue appropriate kind of fsync (if any) for an XLOG output file.
                               9356                 :                :  *
                               9357                 :                :  * 'fd' is a file descriptor for the XLOG file to be fsync'd.
                               9358                 :                :  * 'segno' is for error reporting purposes.
                               9359                 :                :  */
                               9360                 :                : void
 1724 rhaas@postgresql.org     9361                 :         228636 : issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
                               9362                 :                : {
 2427 michael@paquier.xyz      9363                 :         228636 :     char       *msg = NULL;
                               9364                 :                :     instr_time  start;
                               9365                 :                : 
 1724 rhaas@postgresql.org     9366         [ -  + ]:         228636 :     Assert(tli != 0);
                               9367                 :                : 
                               9368                 :                :     /*
                               9369                 :                :      * Quick exit if fsync is disabled or write() has already synced the WAL
                               9370                 :                :      * file.
                               9371                 :                :      */
 1965 fujii@postgresql.org     9372         [ -  + ]:         228636 :     if (!enableFsync ||
 1017 nathan@postgresql.or     9373         [ #  # ]:UBC           0 :         wal_sync_method == WAL_SYNC_METHOD_OPEN ||
                               9374         [ #  # ]:              0 :         wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC)
 1965 fujii@postgresql.org     9375                 :CBC      228636 :         return;
                               9376                 :                : 
                               9377                 :                :     /*
                               9378                 :                :      * Measure I/O timing to sync the WAL file for pg_stat_io.
                               9379                 :                :      */
  515 michael@paquier.xyz      9380                 :UBC           0 :     start = pgstat_prepare_io_time(track_wal_io_timing);
                               9381                 :                : 
 2946                          9382                 :              0 :     pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 1017 nathan@postgresql.or     9383   [ #  #  #  # ]:              0 :     switch (wal_sync_method)
                               9384                 :                :     {
                               9385                 :              0 :         case WAL_SYNC_METHOD_FSYNC:
 6036 heikki.linnakangas@i     9386         [ #  # ]:              0 :             if (pg_fsync_no_writethrough(fd) != 0)
 2427 michael@paquier.xyz      9387                 :              0 :                 msg = _("could not fsync file \"%s\": %m");
 9263 tgl@sss.pgh.pa.us        9388                 :              0 :             break;
                               9389                 :                : #ifdef HAVE_FSYNC_WRITETHROUGH
                               9390                 :                :         case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
                               9391                 :                :             if (pg_fsync_writethrough(fd) != 0)
                               9392                 :                :                 msg = _("could not fsync write-through file \"%s\": %m");
                               9393                 :                :             break;
                               9394                 :                : #endif
 1017 nathan@postgresql.or     9395                 :              0 :         case WAL_SYNC_METHOD_FDATASYNC:
 6036 heikki.linnakangas@i     9396         [ #  # ]:              0 :             if (pg_fdatasync(fd) != 0)
 2427 michael@paquier.xyz      9397                 :              0 :                 msg = _("could not fdatasync file \"%s\": %m");
 9263 tgl@sss.pgh.pa.us        9398                 :              0 :             break;
 1017 nathan@postgresql.or     9399                 :              0 :         case WAL_SYNC_METHOD_OPEN:
                               9400                 :                :         case WAL_SYNC_METHOD_OPEN_DSYNC:
                               9401                 :                :             /* not reachable */
 1965 fujii@postgresql.org     9402                 :              0 :             Assert(false);
                               9403                 :                :             break;
 9263 tgl@sss.pgh.pa.us        9404                 :              0 :         default:
  844 dgustafsson@postgres     9405         [ #  # ]:              0 :             ereport(PANIC,
                               9406                 :                :                     errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               9407                 :                :                     errmsg_internal("unrecognized \"wal_sync_method\": %d", wal_sync_method));
                               9408                 :                :             break;
                               9409                 :                :     }
                               9410                 :                : 
                               9411                 :                :     /* PANIC if failed to fsync */
 2427 michael@paquier.xyz      9412         [ #  # ]:              0 :     if (msg)
                               9413                 :                :     {
                               9414                 :                :         char        xlogfname[MAXFNAMELEN];
                               9415                 :              0 :         int         save_errno = errno;
                               9416                 :                : 
 1724 rhaas@postgresql.org     9417                 :              0 :         XLogFileName(xlogfname, tli, segno, wal_segment_size);
 2427 michael@paquier.xyz      9418                 :              0 :         errno = save_errno;
                               9419         [ #  # ]:              0 :         ereport(PANIC,
                               9420                 :                :                 (errcode_for_file_access(),
                               9421                 :                :                  errmsg(msg, xlogfname)));
                               9422                 :                :     }
                               9423                 :                : 
                               9424                 :              0 :     pgstat_report_wait_end();
                               9425                 :                : 
  537                          9426                 :              0 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
                               9427                 :                :                             start, 1, 0);
                               9428                 :                : }
                               9429                 :                : 
                               9430                 :                : /*
                               9431                 :                :  * do_pg_backup_start is the workhorse of the user-visible pg_backup_start()
                               9432                 :                :  * function. It creates the necessary starting checkpoint and constructs the
                               9433                 :                :  * backup state and tablespace map.
                               9434                 :                :  *
                               9435                 :                :  * Input parameters are "state" (the backup state), "fast" (if true, we do
                               9436                 :                :  * the checkpoint in fast mode), and "tablespaces" (if non-NULL, indicates a
                               9437                 :                :  * list of tablespaceinfo structs describing the cluster's tablespaces.).
                               9438                 :                :  *
                               9439                 :                :  * The tablespace map contents are appended to passed-in parameter
                               9440                 :                :  * tablespace_map and the caller is responsible for including it in the backup
                               9441                 :                :  * archive as 'tablespace_map'. The tablespace_map file is required mainly for
                               9442                 :                :  * tar format in windows as native windows utilities are not able to create
                               9443                 :                :  * symlinks while extracting files from tar. However for consistency and
                               9444                 :                :  * platform-independence, we do it the same way everywhere.
                               9445                 :                :  *
                               9446                 :                :  * It fills in "state" with the information required for the backup, such
                               9447                 :                :  * as the minimum WAL location that must be present to restore from this
                               9448                 :                :  * backup (starttli) and the corresponding timeline ID (starttli).
                               9449                 :                :  *
                               9450                 :                :  * Every successfully started backup must be stopped by calling
                               9451                 :                :  * do_pg_backup_stop() or do_pg_abort_backup(). There can be many
                               9452                 :                :  * backups active at the same time.
                               9453                 :                :  *
                               9454                 :                :  * It is the responsibility of the caller of this function to verify the
                               9455                 :                :  * permissions of the calling user!
                               9456                 :                :  */
                               9457                 :                : void
 1399 michael@paquier.xyz      9458                 :CBC         186 : do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
                               9459                 :                :                    BackupState *state, StringInfo tblspcmapfile)
                               9460                 :                : {
                               9461                 :                :     bool        backup_started_in_recovery;
                               9462                 :                : 
                               9463         [ -  + ]:            186 :     Assert(state != NULL);
 5296 simon@2ndQuadrant.co     9464                 :            186 :     backup_started_in_recovery = RecoveryInProgress();
                               9465                 :                : 
                               9466                 :                :     /*
                               9467                 :                :      * During recovery, we don't need to check WAL level. Because, if WAL
                               9468                 :                :      * level is not sufficient, it's impossible to get here during recovery.
                               9469                 :                :      */
                               9470   [ +  +  -  + ]:            186 :     if (!backup_started_in_recovery && !XLogIsNeeded())
 6878 tgl@sss.pgh.pa.us        9471         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9472                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9473                 :                :                  errmsg("WAL level not sufficient for making an online backup"),
                               9474                 :                :                  errhint("\"wal_level\" must be set to \"replica\" or \"logical\" at server start.")));
                               9475                 :                : 
 5655 heikki.linnakangas@i     9476         [ +  + ]:CBC         186 :     if (strlen(backupidstr) > MAXPGPATH)
                               9477         [ +  - ]:              1 :         ereport(ERROR,
                               9478                 :                :                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               9479                 :                :                  errmsg("backup label too long (max %d bytes)",
                               9480                 :                :                         MAXPGPATH)));
                               9481                 :                : 
  754 dgustafsson@postgres     9482                 :            185 :     strlcpy(state->name, backupidstr, sizeof(state->name));
                               9483                 :                : 
                               9484                 :                :     /*
                               9485                 :                :      * Mark backup active in shared memory.  We must do full-page WAL writes
                               9486                 :                :      * during an on-line backup even if not doing so at other times, because
                               9487                 :                :      * it's quite possible for the backup dump to obtain a "torn" (partially
                               9488                 :                :      * written) copy of a database page if it reads the page concurrently with
                               9489                 :                :      * our write to the same page.  This can be fixed as long as the first
                               9490                 :                :      * write to the page in the WAL sequence is a full-page write. Hence, we
                               9491                 :                :      * increment runningBackups then force a CHECKPOINT, to ensure there are
                               9492                 :                :      * no dirty pages in shared memory that might get dumped while the backup
                               9493                 :                :      * is in progress without having a corresponding WAL record.  (Once the
                               9494                 :                :      * backup is complete, we need not force full-page writes anymore, since
                               9495                 :                :      * we expect that any pages not modified during the backup interval must
                               9496                 :                :      * have been correctly captured by the backup.)
                               9497                 :                :      *
                               9498                 :                :      * Note that forcing full-page writes has no effect during an online
                               9499                 :                :      * backup from the standby.
                               9500                 :                :      *
                               9501                 :                :      * We must hold all the insertion locks to change the value of
                               9502                 :                :      * runningBackups, to ensure adequate interlocking against
                               9503                 :                :      * XLogInsertRecord().
                               9504                 :                :      */
 4510 heikki.linnakangas@i     9505                 :            185 :     WALInsertLockAcquireExclusive();
 1572 sfrost@snowman.net       9506                 :            185 :     XLogCtl->Insert.runningBackups++;
 4510 heikki.linnakangas@i     9507                 :            185 :     WALInsertLockRelease();
                               9508                 :                : 
                               9509                 :                :     /*
                               9510                 :                :      * Ensure we decrement runningBackups if we fail below. NB -- for this to
                               9511                 :                :      * work correctly, it is critical that sessionBackupState is only updated
                               9512                 :                :      * after this block is over.
                               9513                 :                :      */
  355 peter@eisentraut.org     9514         [ +  - ]:            185 :     PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(true));
                               9515                 :                :     {
 5586 bruce@momjian.us         9516                 :            185 :         bool        gotUniqueStartpoint = false;
                               9517                 :                :         DIR        *tblspcdir;
                               9518                 :                :         struct dirent *de;
                               9519                 :                :         tablespaceinfo *ti;
                               9520                 :                :         int         datadirpathlen;
                               9521                 :                : 
                               9522                 :                :         /*
                               9523                 :                :          * Force an XLOG file switch before the checkpoint, to ensure that the
                               9524                 :                :          * WAL segment the checkpoint is written to doesn't contain pages with
                               9525                 :                :          * old timeline IDs.  That would otherwise happen if you called
                               9526                 :                :          * pg_backup_start() right after restoring from a PITR archive: the
                               9527                 :                :          * first WAL segment containing the startup checkpoint has pages in
                               9528                 :                :          * the beginning with the old timeline ID.  That can cause trouble at
                               9529                 :                :          * recovery: we won't have a history file covering the old timeline if
                               9530                 :                :          * pg_wal directory was not included in the base backup and the WAL
                               9531                 :                :          * archive was cleared too before starting the backup.
                               9532                 :                :          *
                               9533                 :                :          * During recovery, we skip forcing XLOG file switch, which means that
                               9534                 :                :          * the backup taken during recovery is not available for the special
                               9535                 :                :          * recovery case described above.
                               9536                 :                :          */
 5296 simon@2ndQuadrant.co     9537         [ +  + ]:            185 :         if (!backup_started_in_recovery)
 3503 andres@anarazel.de       9538                 :            176 :             RequestXLogSwitch(false);
                               9539                 :                : 
                               9540                 :                :         do
                               9541                 :                :         {
                               9542                 :                :             bool        checkpointfpw;
                               9543                 :                : 
                               9544                 :                :             /*
                               9545                 :                :              * Force a CHECKPOINT.  Aside from being necessary to prevent torn
                               9546                 :                :              * page problems, this guarantees that two successive backup runs
                               9547                 :                :              * will have different checkpoint positions and hence different
                               9548                 :                :              * history file names, even if nothing happened in between.
                               9549                 :                :              *
                               9550                 :                :              * During recovery, establish a restartpoint if possible. We use
                               9551                 :                :              * the last restartpoint as the backup starting checkpoint. This
                               9552                 :                :              * means that two successive backup runs can have same checkpoint
                               9553                 :                :              * positions.
                               9554                 :                :              *
                               9555                 :                :              * Since the fact that we are executing do_pg_backup_start()
                               9556                 :                :              * during recovery means that checkpointer is running, we can use
                               9557                 :                :              * RequestCheckpoint() to establish a restartpoint.
                               9558                 :                :              *
                               9559                 :                :              * We use CHECKPOINT_FAST only if requested by user (via passing
                               9560                 :                :              * fast = true).  Otherwise this can take awhile.
                               9561                 :                :              */
 5606 heikki.linnakangas@i     9562         [ +  + ]:            185 :             RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
                               9563                 :                :                               (fast ? CHECKPOINT_FAST : 0));
                               9564                 :                : 
                               9565                 :                :             /*
                               9566                 :                :              * Now we need to fetch the checkpoint record location, and also
                               9567                 :                :              * its REDO pointer.  The oldest point in WAL that would be needed
                               9568                 :                :              * to restore starting from the checkpoint is precisely the REDO
                               9569                 :                :              * pointer.
                               9570                 :                :              */
                               9571                 :            185 :             LWLockAcquire(ControlFileLock, LW_SHARED);
 1399 michael@paquier.xyz      9572                 :            185 :             state->checkpointloc = ControlFile->checkPoint;
                               9573                 :            185 :             state->startpoint = ControlFile->checkPointCopy.redo;
                               9574                 :            185 :             state->starttli = ControlFile->checkPointCopy.ThisTimeLineID;
 5296 simon@2ndQuadrant.co     9575                 :            185 :             checkpointfpw = ControlFile->checkPointCopy.fullPageWrites;
 5606 heikki.linnakangas@i     9576                 :            185 :             LWLockRelease(ControlFileLock);
                               9577                 :                : 
 5296 simon@2ndQuadrant.co     9578         [ +  + ]:            185 :             if (backup_started_in_recovery)
                               9579                 :                :             {
                               9580                 :                :                 XLogRecPtr  recptr;
                               9581                 :                : 
                               9582                 :                :                 /*
                               9583                 :                :                  * Check to see if all WAL replayed during online backup
                               9584                 :                :                  * (i.e., since last restartpoint used as backup starting
                               9585                 :                :                  * checkpoint) contain full-page writes.
                               9586                 :                :                  */
 4325 andres@anarazel.de       9587                 :              9 :                 SpinLockAcquire(&XLogCtl->info_lck);
                               9588                 :              9 :                 recptr = XLogCtl->lastFpwDisableRecPtr;
                               9589                 :              9 :                 SpinLockRelease(&XLogCtl->info_lck);
                               9590                 :                : 
 1399 michael@paquier.xyz      9591   [ +  -  -  + ]:              9 :                 if (!checkpointfpw || state->startpoint <= recptr)
 5296 simon@2ndQuadrant.co     9592         [ #  # ]:UBC           0 :                     ereport(ERROR,
                               9593                 :                :                             (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9594                 :                :                              errmsg("WAL generated with \"full_page_writes=off\" was replayed "
                               9595                 :                :                                     "since last restartpoint"),
                               9596                 :                :                              errhint("This means that the backup being taken on the standby "
                               9597                 :                :                                      "is corrupt and should not be used. "
                               9598                 :                :                                      "Enable \"full_page_writes\" and run CHECKPOINT on the primary, "
                               9599                 :                :                                      "and then try an online backup again.")));
                               9600                 :                : 
                               9601                 :                :                 /*
                               9602                 :                :                  * During recovery, since we don't use the end-of-backup WAL
                               9603                 :                :                  * record and don't write the backup history file, the
                               9604                 :                :                  * starting WAL location doesn't need to be unique. This means
                               9605                 :                :                  * that two base backups started at the same time might use
                               9606                 :                :                  * the same checkpoint as starting locations.
                               9607                 :                :                  */
 5296 simon@2ndQuadrant.co     9608                 :CBC           9 :                 gotUniqueStartpoint = true;
                               9609                 :                :             }
                               9610                 :                : 
                               9611                 :                :             /*
                               9612                 :                :              * If two base backups are started at the same time (in WAL sender
                               9613                 :                :              * processes), we need to make sure that they use different
                               9614                 :                :              * checkpoints as starting locations, because we use the starting
                               9615                 :                :              * WAL location as a unique identifier for the base backup in the
                               9616                 :                :              * end-of-backup WAL record and when we write the backup history
                               9617                 :                :              * file. Perhaps it would be better generate a separate unique ID
                               9618                 :                :              * for each backup instead of forcing another checkpoint, but
                               9619                 :                :              * taking a checkpoint right after another is not that expensive
                               9620                 :                :              * either because only few buffers have been dirtied yet.
                               9621                 :                :              */
 4510 heikki.linnakangas@i     9622                 :            185 :             WALInsertLockAcquireExclusive();
 1399 michael@paquier.xyz      9623         [ +  - ]:            185 :             if (XLogCtl->Insert.lastBackupStart < state->startpoint)
                               9624                 :                :             {
                               9625                 :            185 :                 XLogCtl->Insert.lastBackupStart = state->startpoint;
 5606 heikki.linnakangas@i     9626                 :            185 :                 gotUniqueStartpoint = true;
                               9627                 :                :             }
 4510                          9628                 :            185 :             WALInsertLockRelease();
 5586 bruce@momjian.us         9629         [ -  + ]:            185 :         } while (!gotUniqueStartpoint);
                               9630                 :                : 
                               9631                 :                :         /*
                               9632                 :                :          * Construct tablespace_map file.
                               9633                 :                :          */
 4093 andrew@dunslane.net      9634                 :            185 :         datadirpathlen = strlen(DataDir);
                               9635                 :                : 
                               9636                 :                :         /* Collect information about all tablespaces */
  691 michael@paquier.xyz      9637                 :            185 :         tblspcdir = AllocateDir(PG_TBLSPC_DIR);
                               9638         [ +  + ]:            592 :         while ((de = ReadDir(tblspcdir, PG_TBLSPC_DIR)) != NULL)
                               9639                 :                :         {
                               9640                 :                :             char        fullpath[MAXPGPATH + sizeof(PG_TBLSPC_DIR)];
                               9641                 :                :             char        linkpath[MAXPGPATH];
 4093 andrew@dunslane.net      9642                 :            407 :             char       *relpath = NULL;
                               9643                 :                :             char       *s;
                               9644                 :                :             PGFileType  de_type;
                               9645                 :                :             char       *badp;
                               9646                 :                :             Oid         tsoid;
                               9647                 :                : 
                               9648                 :                :             /*
                               9649                 :                :              * Try to parse the directory name as an unsigned integer.
                               9650                 :                :              *
                               9651                 :                :              * Tablespace directories should be positive integers that can be
                               9652                 :                :              * represented in 32 bits, with no leading zeroes or trailing
                               9653                 :                :              * garbage. If we come across a name that doesn't meet those
                               9654                 :                :              * criteria, skip it.
                               9655                 :                :              */
 1007 rhaas@postgresql.org     9656   [ +  +  -  + ]:            407 :             if (de->d_name[0] < '1' || de->d_name[1] > '9')
                               9657                 :            370 :                 continue;
                               9658                 :             37 :             errno = 0;
                               9659                 :             37 :             tsoid = strtoul(de->d_name, &badp, 10);
                               9660   [ +  -  +  -  :             37 :             if (*badp != '\0' || errno == EINVAL || errno == ERANGE)
                                              -  + ]
 4093 andrew@dunslane.net      9661                 :UBC           0 :                 continue;
                               9662                 :                : 
  691 michael@paquier.xyz      9663                 :CBC          37 :             snprintf(fullpath, sizeof(fullpath), "%s/%s", PG_TBLSPC_DIR, de->d_name);
                               9664                 :                : 
 1195 rhaas@postgresql.org     9665                 :             37 :             de_type = get_dirent_type(fullpath, de, false, ERROR);
                               9666                 :                : 
                               9667         [ +  + ]:             37 :             if (de_type == PGFILETYPE_LNK)
                               9668                 :                :             {
                               9669                 :                :                 StringInfoData escapedpath;
                               9670                 :                :                 ssize_t     rllen;
                               9671                 :                : 
                               9672                 :             23 :                 rllen = readlink(fullpath, linkpath, sizeof(linkpath));
                               9673         [ -  + ]:             23 :                 if (rllen < 0)
                               9674                 :                :                 {
 1195 rhaas@postgresql.org     9675         [ #  # ]:UBC           0 :                     ereport(WARNING,
                               9676                 :                :                             (errmsg("could not read symbolic link \"%s\": %m",
                               9677                 :                :                                     fullpath)));
                               9678                 :              0 :                     continue;
                               9679                 :                :                 }
 1195 rhaas@postgresql.org     9680         [ -  + ]:CBC          23 :                 else if (rllen >= sizeof(linkpath))
                               9681                 :                :                 {
 1195 rhaas@postgresql.org     9682         [ #  # ]:UBC           0 :                     ereport(WARNING,
                               9683                 :                :                             (errmsg("symbolic link \"%s\" target is too long",
                               9684                 :                :                                     fullpath)));
                               9685                 :              0 :                     continue;
                               9686                 :                :                 }
 1195 rhaas@postgresql.org     9687                 :CBC          23 :                 linkpath[rllen] = '\0';
                               9688                 :                : 
                               9689                 :                :                 /*
                               9690                 :                :                  * Relpath holds the relative path of the tablespace directory
                               9691                 :                :                  * when it's located within PGDATA, or NULL if it's located
                               9692                 :                :                  * elsewhere.
                               9693                 :                :                  */
                               9694         [ +  + ]:             23 :                 if (rllen > datadirpathlen &&
                               9695         [ -  + ]:              1 :                     strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
 1164 tgl@sss.pgh.pa.us        9696         [ #  # ]:UBC           0 :                     IS_DIR_SEP(linkpath[datadirpathlen]))
 1195 rhaas@postgresql.org     9697                 :              0 :                     relpath = pstrdup(linkpath + datadirpathlen + 1);
                               9698                 :                : 
                               9699                 :                :                 /*
                               9700                 :                :                  * Add a backslash-escaped version of the link path to the
                               9701                 :                :                  * tablespace map file.
                               9702                 :                :                  */
 1195 rhaas@postgresql.org     9703                 :CBC          23 :                 initStringInfo(&escapedpath);
                               9704         [ +  + ]:            562 :                 for (s = linkpath; *s; s++)
                               9705                 :                :                 {
                               9706   [ +  -  +  -  :            539 :                     if (*s == '\n' || *s == '\r' || *s == '\\')
                                              -  + ]
 1195 rhaas@postgresql.org     9707                 :UBC           0 :                         appendStringInfoChar(&escapedpath, '\\');
 1195 rhaas@postgresql.org     9708                 :CBC         539 :                     appendStringInfoChar(&escapedpath, *s);
                               9709                 :                :                 }
                               9710                 :             23 :                 appendStringInfo(tblspcmapfile, "%s %s\n",
                               9711                 :             23 :                                  de->d_name, escapedpath.data);
                               9712                 :             23 :                 pfree(escapedpath.data);
                               9713                 :                :             }
                               9714         [ +  - ]:             14 :             else if (de_type == PGFILETYPE_DIR)
                               9715                 :                :             {
                               9716                 :                :                 /*
                               9717                 :                :                  * It's possible to use allow_in_place_tablespaces to create
                               9718                 :                :                  * directories directly under pg_tblspc, for testing purposes
                               9719                 :                :                  * only.
                               9720                 :                :                  *
                               9721                 :                :                  * In this case, we store a relative path rather than an
                               9722                 :                :                  * absolute path into the tablespaceinfo.
                               9723                 :                :                  */
  691 michael@paquier.xyz      9724                 :             14 :                 snprintf(linkpath, sizeof(linkpath), "%s/%s",
                               9725                 :             14 :                          PG_TBLSPC_DIR, de->d_name);
 1195 rhaas@postgresql.org     9726                 :             14 :                 relpath = pstrdup(linkpath);
                               9727                 :                :             }
                               9728                 :                :             else
                               9729                 :                :             {
                               9730                 :                :                 /* Skip any other file type that appears here. */
 1195 rhaas@postgresql.org     9731                 :UBC           0 :                 continue;
                               9732                 :                :             }
                               9733                 :                : 
  228 michael@paquier.xyz      9734                 :CBC          37 :             ti = palloc_object(tablespaceinfo);
 1007 rhaas@postgresql.org     9735                 :             37 :             ti->oid = tsoid;
 1957 tgl@sss.pgh.pa.us        9736                 :             37 :             ti->path = pstrdup(linkpath);
 1195 rhaas@postgresql.org     9737                 :             37 :             ti->rpath = relpath;
 2230                          9738                 :             37 :             ti->size = -1;
                               9739                 :                : 
 4082 bruce@momjian.us         9740         [ +  - ]:             37 :             if (tablespaces)
                               9741                 :             37 :                 *tablespaces = lappend(*tablespaces, ti);
                               9742                 :                :         }
 3156 tgl@sss.pgh.pa.us        9743                 :            185 :         FreeDir(tblspcdir);
                               9744                 :                : 
 1399 michael@paquier.xyz      9745                 :            185 :         state->starttime = (pg_time_t) time(NULL);
                               9746                 :                :     }
  355 peter@eisentraut.org     9747         [ -  + ]:            185 :     PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(true));
                               9748                 :                : 
 1399 michael@paquier.xyz      9749                 :            185 :     state->started_in_recovery = backup_started_in_recovery;
                               9750                 :                : 
                               9751                 :                :     /*
                               9752                 :                :      * Mark that the start phase has correctly finished for the backup.
                               9753                 :                :      */
 1572 sfrost@snowman.net       9754                 :            185 :     sessionBackupState = SESSION_BACKUP_RUNNING;
 8027 tgl@sss.pgh.pa.us        9755                 :            185 : }
                               9756                 :                : 
                               9757                 :                : /*
                               9758                 :                :  * Utility routine to fetch the session-level status of a backup running.
                               9759                 :                :  */
                               9760                 :                : SessionBackupState
 3411 teodor@sigaev.ru         9761                 :            207 : get_backup_status(void)
                               9762                 :                : {
                               9763                 :            207 :     return sessionBackupState;
                               9764                 :                : }
                               9765                 :                : 
                               9766                 :                : /*
                               9767                 :                :  * do_pg_backup_stop
                               9768                 :                :  *
                               9769                 :                :  * Utility function called at the end of an online backup.  It creates history
                               9770                 :                :  * file (if required), resets sessionBackupState and so on.  It can optionally
                               9771                 :                :  * wait for WAL segments to be archived.
                               9772                 :                :  *
                               9773                 :                :  * "state" is filled with the information necessary to restore from this
                               9774                 :                :  * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc.
                               9775                 :                :  *
                               9776                 :                :  * It is the responsibility of the caller of this function to verify the
                               9777                 :                :  * permissions of the calling user!
                               9778                 :                :  */
                               9779                 :                : void
 1399 michael@paquier.xyz      9780                 :            179 : do_pg_backup_stop(BackupState *state, bool waitforarchive)
                               9781                 :                : {
                               9782                 :            179 :     bool        backup_stopped_in_recovery = false;
                               9783                 :                :     char        histfilepath[MAXPGPATH];
                               9784                 :                :     char        lastxlogfilename[MAXFNAMELEN];
                               9785                 :                :     char        histfilename[MAXFNAMELEN];
                               9786                 :                :     XLogSegNo   _logSegNo;
                               9787                 :                :     FILE       *fp;
                               9788                 :                :     int         seconds_before_warning;
 6686 bruce@momjian.us         9789                 :            179 :     int         waits = 0;
 5943 simon@2ndQuadrant.co     9790                 :            179 :     bool        reported_waiting = false;
                               9791                 :                : 
 1399 michael@paquier.xyz      9792         [ -  + ]:            179 :     Assert(state != NULL);
                               9793                 :                : 
                               9794                 :            179 :     backup_stopped_in_recovery = RecoveryInProgress();
                               9795                 :                : 
                               9796                 :                :     /*
                               9797                 :                :      * During recovery, we don't need to check WAL level. Because, if WAL
                               9798                 :                :      * level is not sufficient, it's impossible to get here during recovery.
                               9799                 :                :      */
                               9800   [ +  +  -  + ]:            179 :     if (!backup_stopped_in_recovery && !XLogIsNeeded())
 6530 tgl@sss.pgh.pa.us        9801         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9802                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9803                 :                :                  errmsg("WAL level not sufficient for making an online backup"),
                               9804                 :                :                  errhint("\"wal_level\" must be set to \"replica\" or \"logical\" at server start.")));
                               9805                 :                : 
                               9806                 :                :     /*
                               9807                 :                :      * OK to update backup counter and session-level lock.
                               9808                 :                :      *
                               9809                 :                :      * Note that CHECK_FOR_INTERRUPTS() must not occur while updating them,
                               9810                 :                :      * otherwise they can be updated inconsistently, which might cause
                               9811                 :                :      * do_pg_abort_backup() to fail.
                               9812                 :                :      */
 3477 fujii@postgresql.org     9813                 :CBC         179 :     WALInsertLockAcquireExclusive();
                               9814                 :                : 
                               9815                 :                :     /*
                               9816                 :                :      * It is expected that each do_pg_backup_start() call is matched by
                               9817                 :                :      * exactly one do_pg_backup_stop() call.
                               9818                 :                :      */
 1572 sfrost@snowman.net       9819         [ -  + ]:            179 :     Assert(XLogCtl->Insert.runningBackups > 0);
                               9820                 :            179 :     XLogCtl->Insert.runningBackups--;
                               9821                 :                : 
                               9822                 :                :     /*
                               9823                 :                :      * Clean up session-level lock.
                               9824                 :                :      *
                               9825                 :                :      * You might think that WALInsertLockRelease() can be called before
                               9826                 :                :      * cleaning up session-level lock because session-level lock doesn't need
                               9827                 :                :      * to be protected with WAL insertion lock. But since
                               9828                 :                :      * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be
                               9829                 :                :      * cleaned up before it.
                               9830                 :                :      */
 3411 teodor@sigaev.ru         9831                 :            179 :     sessionBackupState = SESSION_BACKUP_NONE;
                               9832                 :                : 
 3141 fujii@postgresql.org     9833                 :            179 :     WALInsertLockRelease();
                               9834                 :                : 
                               9835                 :                :     /*
                               9836                 :                :      * If we are taking an online backup from the standby, we confirm that the
                               9837                 :                :      * standby has not been promoted during the backup.
                               9838                 :                :      */
 1399 michael@paquier.xyz      9839   [ +  +  -  + ]:            179 :     if (state->started_in_recovery && !backup_stopped_in_recovery)
 5296 simon@2ndQuadrant.co     9840         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9841                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9842                 :                :                  errmsg("the standby was promoted during online backup"),
                               9843                 :                :                  errhint("This means that the backup being taken is corrupt "
                               9844                 :                :                          "and should not be used. "
                               9845                 :                :                          "Try taking another online backup.")));
                               9846                 :                : 
                               9847                 :                :     /*
                               9848                 :                :      * During recovery, we don't write an end-of-backup record. We assume that
                               9849                 :                :      * pg_control was backed up last and its minimum recovery point can be
                               9850                 :                :      * available as the backup end location. Since we don't have an
                               9851                 :                :      * end-of-backup record, we use the pg_control value to check whether
                               9852                 :                :      * we've reached the end of backup when starting recovery from this
                               9853                 :                :      * backup. We have no way of checking if pg_control wasn't backed up last
                               9854                 :                :      * however.
                               9855                 :                :      *
                               9856                 :                :      * We don't force a switch to new WAL file but it is still possible to
                               9857                 :                :      * wait for all the required files to be archived if waitforarchive is
                               9858                 :                :      * true. This is okay if we use the backup to start a standby and fetch
                               9859                 :                :      * the missing WAL using streaming replication. But in the case of an
                               9860                 :                :      * archive recovery, a user should set waitforarchive to true and wait for
                               9861                 :                :      * them to be archived to ensure that all the required files are
                               9862                 :                :      * available.
                               9863                 :                :      *
                               9864                 :                :      * We return the current minimum recovery point as the backup end
                               9865                 :                :      * location. Note that it can be greater than the exact backup end
                               9866                 :                :      * location if the minimum recovery point is updated after the backup of
                               9867                 :                :      * pg_control. This is harmless for current uses.
                               9868                 :                :      *
                               9869                 :                :      * XXX currently a backup history file is for informational and debug
                               9870                 :                :      * purposes only. It's not essential for an online backup. Furthermore,
                               9871                 :                :      * even if it's created, it will not be archived during recovery because
                               9872                 :                :      * an archiver is not invoked. So it doesn't seem worthwhile to write a
                               9873                 :                :      * backup history file during recovery.
                               9874                 :                :      */
 1399 michael@paquier.xyz      9875         [ +  + ]:CBC         179 :     if (backup_stopped_in_recovery)
                               9876                 :                :     {
                               9877                 :                :         XLogRecPtr  recptr;
                               9878                 :                : 
                               9879                 :                :         /*
                               9880                 :                :          * Check to see if all WAL replayed during online backup contain
                               9881                 :                :          * full-page writes.
                               9882                 :                :          */
 4325 andres@anarazel.de       9883                 :              9 :         SpinLockAcquire(&XLogCtl->info_lck);
                               9884                 :              9 :         recptr = XLogCtl->lastFpwDisableRecPtr;
                               9885                 :              9 :         SpinLockRelease(&XLogCtl->info_lck);
                               9886                 :                : 
 1399 michael@paquier.xyz      9887         [ -  + ]:              9 :         if (state->startpoint <= recptr)
 5296 simon@2ndQuadrant.co     9888         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9889                 :                :                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9890                 :                :                      errmsg("WAL generated with \"full_page_writes=off\" was replayed "
                               9891                 :                :                             "during online backup"),
                               9892                 :                :                      errhint("This means that the backup being taken on the standby "
                               9893                 :                :                              "is corrupt and should not be used. "
                               9894                 :                :                              "Enable \"full_page_writes\" and run CHECKPOINT on the primary, "
                               9895                 :                :                              "and then try an online backup again.")));
                               9896                 :                : 
                               9897                 :                : 
 5296 simon@2ndQuadrant.co     9898                 :CBC           9 :         LWLockAcquire(ControlFileLock, LW_SHARED);
 1399 michael@paquier.xyz      9899                 :              9 :         state->stoppoint = ControlFile->minRecoveryPoint;
                               9900                 :              9 :         state->stoptli = ControlFile->minRecoveryPointTLI;
 5296 simon@2ndQuadrant.co     9901                 :              9 :         LWLockRelease(ControlFileLock);
                               9902                 :                :     }
                               9903                 :                :     else
                               9904                 :                :     {
                               9905                 :                :         char       *history_file;
                               9906                 :                : 
                               9907                 :                :         /*
                               9908                 :                :          * Write the backup-end xlog record
                               9909                 :                :          */
 3277 rhaas@postgresql.org     9910                 :            170 :         XLogBeginInsert();
  530 peter@eisentraut.org     9911                 :            170 :         XLogRegisterData(&state->startpoint,
                               9912                 :                :                          sizeof(state->startpoint));
 1399 michael@paquier.xyz      9913                 :            170 :         state->stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END);
                               9914                 :                : 
                               9915                 :                :         /*
                               9916                 :                :          * Given that we're not in recovery, InsertTimeLineID is set and can't
                               9917                 :                :          * change, so we can read it without a lock.
                               9918                 :                :          */
                               9919                 :            170 :         state->stoptli = XLogCtl->InsertTimeLineID;
                               9920                 :                : 
                               9921                 :                :         /*
                               9922                 :                :          * Force a switch to a new xlog segment file, so that the backup is
                               9923                 :                :          * valid as soon as archiver moves out the current segment file.
                               9924                 :                :          */
 3277 rhaas@postgresql.org     9925                 :            170 :         RequestXLogSwitch(false);
                               9926                 :                : 
 1399 michael@paquier.xyz      9927                 :            170 :         state->stoptime = (pg_time_t) time(NULL);
                               9928                 :                : 
                               9929                 :                :         /*
                               9930                 :                :          * Write the backup history file
                               9931                 :                :          */
                               9932                 :            170 :         XLByteToSeg(state->startpoint, _logSegNo, wal_segment_size);
                               9933                 :            170 :         BackupHistoryFilePath(histfilepath, state->stoptli, _logSegNo,
                               9934                 :                :                               state->startpoint, wal_segment_size);
 3277 rhaas@postgresql.org     9935                 :            170 :         fp = AllocateFile(histfilepath, "w");
                               9936         [ -  + ]:            170 :         if (!fp)
 3277 rhaas@postgresql.org     9937         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9938                 :                :                     (errcode_for_file_access(),
                               9939                 :                :                      errmsg("could not create file \"%s\": %m",
                               9940                 :                :                             histfilepath)));
                               9941                 :                : 
                               9942                 :                :         /* Build and save the contents of the backup history file */
 1399 michael@paquier.xyz      9943                 :CBC         170 :         history_file = build_backup_content(state, true);
 1398                          9944                 :            170 :         fprintf(fp, "%s", history_file);
 1399                          9945                 :            170 :         pfree(history_file);
                               9946                 :                : 
 3277 rhaas@postgresql.org     9947   [ +  -  +  -  :            170 :         if (fflush(fp) || ferror(fp) || FreeFile(fp))
                                              -  + ]
 3277 rhaas@postgresql.org     9948         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9949                 :                :                     (errcode_for_file_access(),
                               9950                 :                :                      errmsg("could not write file \"%s\": %m",
                               9951                 :                :                             histfilepath)));
                               9952                 :                : 
                               9953                 :                :         /*
                               9954                 :                :          * Clean out any no-longer-needed history files.  As a side effect,
                               9955                 :                :          * this will post a .ready file for the newly created history file,
                               9956                 :                :          * notifying the archiver that history file may be archived
                               9957                 :                :          * immediately.
                               9958                 :                :          */
 3277 rhaas@postgresql.org     9959                 :CBC         170 :         CleanupBackupHistory();
                               9960                 :                :     }
                               9961                 :                : 
                               9962                 :                :     /*
                               9963                 :                :      * If archiving is enabled, wait for all the required WAL files to be
                               9964                 :                :      * archived before returning. If archiving isn't enabled, the required WAL
                               9965                 :                :      * needs to be transported via streaming replication (hopefully with
                               9966                 :                :      * wal_keep_size set high enough), or some more exotic mechanism like
                               9967                 :                :      * polling and copying files from pg_wal with script. We have no knowledge
                               9968                 :                :      * of those mechanisms, so it's up to the user to ensure that he gets all
                               9969                 :                :      * the required WAL.
                               9970                 :                :      *
                               9971                 :                :      * We wait until both the last WAL file filled during backup and the
                               9972                 :                :      * history file have been archived, and assume that the alphabetic sorting
                               9973                 :                :      * property of the WAL files ensures any earlier WAL files are safely
                               9974                 :                :      * archived as well.
                               9975                 :                :      *
                               9976                 :                :      * We wait forever, since archive_command is supposed to work and we
                               9977                 :                :      * assume the admin wanted his backup to work completely. If you don't
                               9978                 :                :      * wish to wait, then either waitforarchive should be passed in as false,
                               9979                 :                :      * or you can set statement_timeout.  Also, some notices are issued to
                               9980                 :                :      * clue in anyone who might be doing this interactively.
                               9981                 :                :      */
                               9982                 :                : 
                               9983         [ +  + ]:            179 :     if (waitforarchive &&
 1399 michael@paquier.xyz      9984   [ +  +  +  +  :             11 :         ((!backup_stopped_in_recovery && XLogArchivingActive()) ||
                                     -  +  +  +  +  
                                                 + ]
                               9985   [ +  -  -  +  :              1 :          (backup_stopped_in_recovery && XLogArchivingAlways())))
                                              -  + ]
                               9986                 :                :     {
                               9987                 :              5 :         XLByteToPrevSeg(state->stoppoint, _logSegNo, wal_segment_size);
                               9988                 :              5 :         XLogFileName(lastxlogfilename, state->stoptli, _logSegNo,
                               9989                 :                :                      wal_segment_size);
                               9990                 :                : 
                               9991                 :              5 :         XLByteToSeg(state->startpoint, _logSegNo, wal_segment_size);
                               9992                 :              5 :         BackupHistoryFileName(histfilename, state->stoptli, _logSegNo,
                               9993                 :                :                               state->startpoint, wal_segment_size);
                               9994                 :                : 
 5864 bruce@momjian.us         9995                 :              5 :         seconds_before_warning = 60;
                               9996                 :              5 :         waits = 0;
                               9997                 :                : 
                               9998   [ +  +  -  + ]:             15 :         while (XLogArchiveIsBusy(lastxlogfilename) ||
                               9999                 :              5 :                XLogArchiveIsBusy(histfilename))
                              10000                 :                :         {
                              10001         [ -  + ]:              5 :             CHECK_FOR_INTERRUPTS();
                              10002                 :                : 
                              10003   [ +  -  -  + ]:              5 :             if (!reported_waiting && waits > 5)
                              10004                 :                :             {
 5864 bruce@momjian.us        10005         [ #  # ]:UBC           0 :                 ereport(NOTICE,
                              10006                 :                :                         (errmsg("base backup done, waiting for required WAL segments to be archived")));
                              10007                 :              0 :                 reported_waiting = true;
                              10008                 :                :             }
                              10009                 :                : 
 1846 michael@paquier.xyz     10010                 :CBC           5 :             (void) WaitLatch(MyLatch,
                              10011                 :                :                              WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
                              10012                 :                :                              1000L,
                              10013                 :                :                              WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
                              10014                 :              5 :             ResetLatch(MyLatch);
                              10015                 :                : 
 5864 bruce@momjian.us        10016         [ -  + ]:              5 :             if (++waits >= seconds_before_warning)
                              10017                 :                :             {
 5864 bruce@momjian.us        10018                 :UBC           0 :                 seconds_before_warning *= 2;    /* This wraps in >10 years... */
                              10019         [ #  # ]:              0 :                 ereport(WARNING,
                              10020                 :                :                         (errmsg("still waiting for all required WAL segments to be archived (%d seconds elapsed)",
                              10021                 :                :                                 waits),
                              10022                 :                :                          errhint("Check that your \"archive_command\" is executing properly.  "
                              10023                 :                :                                  "You can safely cancel this backup, "
                              10024                 :                :                                  "but the database backup will not be usable without all the WAL segments.")));
                              10025                 :                :             }
                              10026                 :                :         }
                              10027                 :                : 
 5864 bruce@momjian.us        10028         [ +  + ]:CBC           5 :         ereport(NOTICE,
                              10029                 :                :                 (errmsg("all required WAL segments have been archived")));
                              10030                 :                :     }
 5646 magnus@hagander.net     10031         [ +  + ]:            174 :     else if (waitforarchive)
 5932 tgl@sss.pgh.pa.us       10032         [ +  - ]:              6 :         ereport(NOTICE,
                              10033                 :                :                 (errmsg("WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup")));
 5677 magnus@hagander.net     10034                 :            179 : }
                              10035                 :                : 
                              10036                 :                : 
                              10037                 :                : /*
                              10038                 :                :  * do_pg_abort_backup: abort a running backup
                              10039                 :                :  *
                              10040                 :                :  * This does just the most basic steps of do_pg_backup_stop(), by taking the
                              10041                 :                :  * system out of backup mode, thus making it a lot more safe to call from
                              10042                 :                :  * an error handler.
                              10043                 :                :  *
                              10044                 :                :  * 'arg' indicates that it's being called during backup setup; so
                              10045                 :                :  * sessionBackupState has not been modified yet, but runningBackups has
                              10046                 :                :  * already been incremented.  When it's false, then it's invoked as a
                              10047                 :                :  * before_shmem_exit handler, and therefore we must not change state
                              10048                 :                :  * unless sessionBackupState indicates that a backup is actually running.
                              10049                 :                :  *
                              10050                 :                :  * NB: This gets used as a PG_ENSURE_ERROR_CLEANUP callback and
                              10051                 :                :  * before_shmem_exit handler, hence the odd-looking signature.
                              10052                 :                :  */
                              10053                 :                : void
 2411 rhaas@postgresql.org    10054                 :              9 : do_pg_abort_backup(int code, Datum arg)
                              10055                 :                : {
 1376 alvherre@alvh.no-ip.    10056                 :              9 :     bool        during_backup_start = DatumGetBool(arg);
                              10057                 :                : 
                              10058                 :                :     /* If called during backup start, there shouldn't be one already running */
 1371                         10059   [ -  +  -  - ]:              9 :     Assert(!during_backup_start || sessionBackupState == SESSION_BACKUP_NONE);
                              10060                 :                : 
 1376                         10061   [ +  -  +  + ]:              9 :     if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
                              10062                 :                :     {
                              10063                 :              6 :         WALInsertLockAcquireExclusive();
                              10064         [ -  + ]:              6 :         Assert(XLogCtl->Insert.runningBackups > 0);
                              10065                 :              6 :         XLogCtl->Insert.runningBackups--;
                              10066                 :                : 
                              10067                 :              6 :         sessionBackupState = SESSION_BACKUP_NONE;
                              10068                 :              6 :         WALInsertLockRelease();
                              10069                 :                : 
                              10070         [ +  - ]:              6 :         if (!during_backup_start)
                              10071         [ +  - ]:              6 :             ereport(WARNING,
                              10072                 :                :                     errmsg("aborting backup due to backend exiting before pg_backup_stop was called"));
                              10073                 :                :     }
 2411 rhaas@postgresql.org    10074                 :              9 : }
                              10075                 :                : 
                              10076                 :                : /*
                              10077                 :                :  * Register a handler that will warn about unterminated backups at end of
                              10078                 :                :  * session, unless this has already been done.
                              10079                 :                :  */
                              10080                 :                : void
                              10081                 :              5 : register_persistent_abort_backup_handler(void)
                              10082                 :                : {
                              10083                 :                :     static bool already_done = false;
                              10084                 :                : 
                              10085         [ +  + ]:              5 :     if (already_done)
                              10086                 :              1 :         return;
  355 peter@eisentraut.org    10087                 :              4 :     before_shmem_exit(do_pg_abort_backup, BoolGetDatum(false));
 2411 rhaas@postgresql.org    10088                 :              4 :     already_done = true;
                              10089                 :                : }
                              10090                 :                : 
                              10091                 :                : /*
                              10092                 :                :  * Get latest WAL insert pointer
                              10093                 :                :  */
                              10094                 :                : XLogRecPtr
 5310 heikki.linnakangas@i    10095                 :           2142 : GetXLogInsertRecPtr(void)
                              10096                 :                : {
 4325 andres@anarazel.de      10097                 :           2142 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                              10098                 :                :     uint64      current_bytepos;
                              10099                 :                : 
 4766 heikki.linnakangas@i    10100                 :           2142 :     SpinLockAcquire(&Insert->insertpos_lck);
                              10101                 :           2142 :     current_bytepos = Insert->CurrBytePos;
                              10102                 :           2142 :     SpinLockRelease(&Insert->insertpos_lck);
                              10103                 :                : 
                              10104                 :           2142 :     return XLogBytePosToRecPtr(current_bytepos);
                              10105                 :                : }
                              10106                 :                : 
                              10107                 :                : /*
                              10108                 :                :  * Get latest WAL record end pointer
                              10109                 :                :  */
                              10110                 :                : XLogRecPtr
  135 tomas.vondra@postgre    10111                 :           1563 : GetXLogInsertEndRecPtr(void)
                              10112                 :                : {
                              10113                 :           1563 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                              10114                 :                :     uint64      current_bytepos;
                              10115                 :                : 
                              10116                 :           1563 :     SpinLockAcquire(&Insert->insertpos_lck);
                              10117                 :           1563 :     current_bytepos = Insert->CurrBytePos;
                              10118                 :           1563 :     SpinLockRelease(&Insert->insertpos_lck);
                              10119                 :                : 
                              10120                 :           1563 :     return XLogBytePosToEndRecPtr(current_bytepos);
                              10121                 :                : }
                              10122                 :                : 
                              10123                 :                : /*
                              10124                 :                :  * Get latest WAL write pointer
                              10125                 :                :  */
                              10126                 :                : XLogRecPtr
 1621 heikki.linnakangas@i    10127                 :           6833 : GetXLogWriteRecPtr(void)
                              10128                 :                : {
  844 alvherre@alvh.no-ip.    10129                 :           6833 :     RefreshXLogWriteResult(LogwrtResult);
                              10130                 :                : 
 1621 heikki.linnakangas@i    10131                 :           6833 :     return LogwrtResult.Write;
                              10132                 :                : }
                              10133                 :                : 
                              10134                 :                : /*
                              10135                 :                :  * Returns the redo pointer of the last checkpoint or restartpoint. This is
                              10136                 :                :  * the oldest point in WAL that we still need, if we have to restart recovery.
                              10137                 :                :  */
                              10138                 :                : void
                              10139                 :            409 : GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
                              10140                 :                : {
                              10141                 :            409 :     LWLockAcquire(ControlFileLock, LW_SHARED);
                              10142                 :            409 :     *oldrecptr = ControlFile->checkPointCopy.redo;
                              10143                 :            409 :     *oldtli = ControlFile->checkPointCopy.ThisTimeLineID;
                              10144                 :            409 :     LWLockRelease(ControlFileLock);
 7429 tgl@sss.pgh.pa.us       10145                 :            409 : }
                              10146                 :                : 
                              10147                 :                : /* Thin wrapper around ShutdownWalRcv(). */
                              10148                 :                : void
 1854 noah@leadboat.com       10149                 :           1059 : XLogShutdownWalRcv(void)
                              10150                 :                : {
  263 michael@paquier.xyz     10151   [ +  +  -  + ]:           1059 :     Assert(AmStartupProcess() || !IsUnderPostmaster);
                              10152                 :                : 
 1854 noah@leadboat.com       10153                 :           1059 :     ShutdownWalRcv();
  264 michael@paquier.xyz     10154                 :           1059 :     ResetInstallXLogFileSegmentActive();
 1854 noah@leadboat.com       10155                 :           1059 : }
                              10156                 :                : 
                              10157                 :                : /* Enable WAL file recycling and preallocation. */
                              10158                 :                : void
 1621 heikki.linnakangas@i    10159                 :           1264 : SetInstallXLogFileSegmentActive(void)
                              10160                 :                : {
                              10161                 :           1264 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                              10162                 :           1264 :     XLogCtl->InstallXLogFileSegmentActive = true;
                              10163                 :           1264 :     LWLockRelease(ControlFileLock);
 3973 fujii@postgresql.org    10164                 :           1264 : }
                              10165                 :                : 
                              10166                 :                : /* Disable WAL file recycling and preallocation. */
                              10167                 :                : void
  264 michael@paquier.xyz     10168                 :           1235 : ResetInstallXLogFileSegmentActive(void)
                              10169                 :                : {
                              10170                 :           1235 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                              10171                 :           1235 :     XLogCtl->InstallXLogFileSegmentActive = false;
                              10172                 :           1235 :     LWLockRelease(ControlFileLock);
                              10173                 :           1235 : }
                              10174                 :                : 
                              10175                 :                : bool
 1621 heikki.linnakangas@i    10176                 :            365 : IsInstallXLogFileSegmentActive(void)
                              10177                 :                : {
                              10178                 :                :     bool        result;
                              10179                 :                : 
                              10180                 :            365 :     LWLockAcquire(ControlFileLock, LW_SHARED);
                              10181                 :            365 :     result = XLogCtl->InstallXLogFileSegmentActive;
                              10182                 :            365 :     LWLockRelease(ControlFileLock);
                              10183                 :                : 
                              10184                 :            365 :     return result;
                              10185                 :                : }
                              10186                 :                : 
                              10187                 :                : /*
                              10188                 :                :  * Update the WalWriterSleeping flag.
                              10189                 :                :  */
                              10190                 :                : void
 5192 tgl@sss.pgh.pa.us       10191                 :            600 : SetWalWriterSleeping(bool sleeping)
                              10192                 :                : {
 4325 andres@anarazel.de      10193                 :            600 :     SpinLockAcquire(&XLogCtl->info_lck);
                              10194                 :            600 :     XLogCtl->WalWriterSleeping = sleeping;
                              10195                 :            600 :     SpinLockRelease(&XLogCtl->info_lck);
 5192 tgl@sss.pgh.pa.us       10196                 :            600 : }

Generated by: LCOV version 2.0-1