Considerations for replication.

48 views
Skip to first unread message

Kevin Toppenberg

unread,
Feb 12, 2026, 3:40:06 PMFeb 12
to Everything MUMPS
I have gotten through the replication tutorial in the Acculturation guide

For convenience, I did add a script: "originating_start" like this.  Each server had to have its own custom version, pointing to the 2 other servers.

ydbuser@paris:~/jnlex$ cat ./originating_start
#!/bin/bash
source ~/jnlex/jnlex_env
host="192.168.3.57"
echo "=================================================="
echo "Starting paris as a primary, originating server"
echo "Sending replication to santiago on $host:4000"
echo "Sending replication to melbourn on $host:6000"
echo "=================================================="
$ydb_dist/mupip replicate -source -start -instsecondary=santiago -secondary=$host:4000 -buffsize=1048576 -log=/home/ydbuser/jnlex/santiago_`date +%Y%m%d:%H:%M:%S`.log
$ydb_dist/mupip replicate -source -start -instsecondary=melbourne -secondary=$host:6000 -buffsize=1048576 -log=/home/ydbuser/jnlex/melbourne_`date +%Y%m%d:%H:%M:%S`.log
echo "=================================================="
ydbuser@paris:~/jnlex$ 


I also added a script for rollback fetchresync like this

#!/bin/bash
source ~/jnlex/jnlex_env
filename="/home/ydbuser/jnlex/Unreplic_Trans_Report_`date +%Y%m%d%H%M%S`.txt"
echo "================================================"
echo "Rolling back any transactions needed to prepare"
echo "  for becoming a replicating (secondary) server"
echo "  Any rolled back entries will be sent store in:"
echo "  $filename"
$ydb_dist/mupip journal -rollback -backward -fetchresync=3000 -losttrans="$filename" "*"
echo "================================================"

Now that I am done, it seems to me that when we consider all the various states in the system of servers, one has a state machine something like this. 

For each server, states would be something like this:

replicating_state_machine.png


And of course all the other servers in the replication system would have their own states. And hopefully the states are complementary to each other.    I would like to make a supervising program that keeps contact with the various machines and coordinates them all moving to their appropriate states.  Yottadb doesn't do this automatically, presumably because it could get messy fast.  But I am planning just 2 servers, so it should be doable. 

And that gets me to my questions
-- How best for a script to determine the current state of a given server?  I could run mupip replicate -source -checkhealth   and mupip replicate -receive -checkhealth and parse out the text reply.  But that seems fragile.  I could create a .txt file on each server specifying it's last known state.  And if there is a crash, a startup script could act on and modify the file. 

Any other thoughts on coordinating and managing the servers?

Thanks in advance,
KT

Kevin Toppenberg

unread,
Feb 12, 2026, 6:11:52 PMFeb 12
to Everything MUMPS
After working with chatGPT, I have this script that injests mupip output and creates a json file. 

This probes a server with: mupip replicate -receive -checkhealth

#!/usr/bin/env bash
#
# probe_replicating_health  (Probe B: receiver/replicating health)
#
# JSON CONTRACT (API v1)
#
# {
#   "receiving_configured": boolean,
#   "receiving_active": boolean,
#   "origin_peer": "<primary instance name>" | null,
#   "receiver": {
#       "pid": integer,
#       "alive": boolean,
#       "mode": "ACTIVE" | "PASSIVE" | "UNKNOWN"
#   } | null,
#   "update_process": {
#       "pid": integer,
#       "alive": boolean
#   } | null,
#   "reason": "<YDB error code>"            # present only if not configured
# }
#
# receiving_active:
#   true  -> receiver is alive and (mode==ACTIVE OR mode==UNKNOWN)
#   false -> receiver not alive, or error, or not configured
#
# Exit Codes:
#   0 -> successful probe execution (regardless of receiving state)
#   1 -> unexpected output / parse anomaly
#   2 -> mupip not found / environment failure
#

set -euo pipefail

# Load YottaDB environment if available
ENV_FILE="${ENV_FILE:-$HOME/jnlex/jnlex_env}"
if [[ -r "$ENV_FILE" ]]; then
  # shellcheck disable=SC1090
  source "$ENV_FILE"
fi

# Resolve mupip path
MUPIP=""
if [[ -n "${ydb_dist:-}" && -x "${ydb_dist}/mupip" ]]; then
  MUPIP="${ydb_dist}/mupip"
elif command -v mupip >/dev/null 2>&1; then
  MUPIP="$(command -v mupip)"
fi

if [[ -z "$MUPIP" ]]; then
  echo '{"receiving_configured":false,"receiving_active":false,"origin_peer":null,"receiver":null,"update_process":null,"reason":"MUPIP_NOT_FOUND"}'
  exit 2
fi

out="$("$MUPIP" replicate -receive -checkhealth 2>&1 || true)"

# Any %YDB-E-... => not configured/available for probe purposes
errcode="$(echo "$out" | sed -n 's/^%YDB-E-\([A-Z0-9]\+\),.*/\1/p' | head -n 1)"
if [[ -n "$errcode" ]]; then
  echo "{\"receiving_configured\":false,\"receiving_active\":false,\"origin_peer\":null,\"receiver\":null,\"update_process\":null,\"reason\":\"$errcode\"}"
  exit 0
fi

# Parse receiver PID, optional mode, optional origin peer, and update-process PID
parsed="$(
  echo "$out" | awk '
    BEGIN {
      rpid=""; rmode="UNKNOWN"; ralive=0;
      upid=""; ualive=0;
      origin="";
    }

    /primary instance name \[/ {
      t=$0
      sub(/^.*primary instance name \[/, "", t)
      sub(/\].*$/, "", t)
      origin=t
      next
    }

    /^PID[[:space:]]+[0-9]+[[:space:]]+Receiver server is alive/ {
      t=$0
      sub(/^PID[[:space:]]+/, "", t)
      sub(/[[:space:]].*$/, "", t)
      rpid=t
      ralive=1

      # If mode appears as "in ACTIVE mode", capture it; else leave UNKNOWN
      if ($0 ~ / in [A-Z]+ mode/) {
        t=$0
        sub(/^.* in /, "", t)
        sub(/ mode.*$/, "", t)
        rmode=t
      }
      next
    }

    /^PID[[:space:]]+[0-9]+[[:space:]]+Update process is alive/ {
      t=$0
      sub(/^PID[[:space:]]+/, "", t)
      sub(/[[:space:]].*$/, "", t)
      upid=t
      ualive=1
      next
    }

    END {
      # Emit KEY=VALUE lines for bash to consume
      print "ORIGIN=" origin
      print "RPID=" rpid
      print "RMODE=" rmode
      print "RALIVE=" ralive
      print "UPID=" upid
      print "UALIVE=" ualive
    }
  '
)"

origin_peer="$(echo "$parsed" | sed -n 's/^ORIGIN=//p' | head -n 1)"
rpid="$(echo "$parsed" | sed -n 's/^RPID=//p' | head -n 1)"
rmode="$(echo "$parsed" | sed -n 's/^RMODE=//p' | head -n 1)"
ralive="$(echo "$parsed" | sed -n 's/^RALIVE=//p' | head -n 1)"
upid="$(echo "$parsed" | sed -n 's/^UPID=//p' | head -n 1)"
ualive="$(echo "$parsed" | sed -n 's/^UALIVE=//p' | head -n 1)"

# Normalize origin_peer to null if blank
if [[ -z "$origin_peer" ]]; then
  origin_json="null"
else
  origin_esc="$(printf '%s' "$origin_peer" | sed 's/"/\\"/g')"
  origin_json="\"$origin_esc\""
fi

# Build receiver JSON
if [[ -n "$rpid" && "$ralive" == "1" ]]; then

  # Only include mode if YDB actually reported one
  if [[ "$rmode" == "UNKNOWN" || -z "$rmode" ]]; then
    receiver_json="{\"pid\":$rpid,\"alive\":true}"
    receiving_active="true"
  else
    receiver_json="{\"pid\":$rpid,\"alive\":true,\"mode\":\"$rmode\"}"

    if [[ "$rmode" == "PASSIVE" ]]; then
      receiving_active="false"
    else
      receiving_active="true"
    fi
  fi

else
  receiver_json="null"
  receiving_active="false"
fi

# Build update process JSON (optional)
if [[ -n "$upid" && "$ualive" == "1" ]]; then
  update_json="{\"pid\":$upid,\"alive\":true}"
else
  update_json="null"
fi

# If we parsed nothing at all, treat as anomaly
if [[ "$receiver_json" == "null" && "$update_json" == "null" ]]; then
  sample="$(echo "$out" | head -n 10 | tr -d '\r' | sed 's/"/\\"/g')"
  echo "{\"receiving_configured\":true,\"receiving_active\":false,\"origin_peer\":$origin_json,\"receiver\":null,\"update_process\":null,\"note\":\"no receiver/update lines parsed\",\"sample\":\"$sample\"}"
  exit 1
fi

echo "{\"receiving_configured\":true,\"receiving_active\":$receiving_active,\"origin_peer\":$origin_json,\"receiver\":$receiver_json,\"update_process\":$update_json}"
exit 0

And this file. This probes a server with: mupip replicate -source -checkhealth

#!/usr/bin/env bash
#
# probe_health  (Probe A: forwarding/source health)
#
# JSON CONTRACT (API v1)
#
# {
#   "forwarding_configured": boolean,
#   "forwarding_active": boolean,
#   "destination_peers": {
#       "<peer_name>": {
#           "pid": integer,
#           "mode": "ACTIVE"
#       },
#       ...
#   },
#   "reason": "<YDB error code>"            # present only if not configured
#   "note": "<informational string>"        # optional informational message
# }
#
# FIELD DEFINITIONS:
#
# forwarding_configured:
#   true  -> replication source infrastructure exists on this node
#   false -> not configured / no journal pool / replication disabled
#
# forwarding_active:
#   true  -> at least one non-dummy destination peer is ACTIVE
#   false -> no ACTIVE non-dummy peers
#
# destination_peers:
#   Map of downstream peers this node is actively forwarding to.
#   Only peers in ACTIVE mode are included.
#   The special peer name "dummy" is excluded.
#
# reason:
#   Present if YottaDB returned %YDB-E-XXXX.
#   Contains the error code (e.g., NOJNLPOOL).
#
# Exit Codes:
#   0 -> successful probe execution (regardless of forwarding state)
#   1 -> unexpected output / parse anomaly
#   2 -> mupip not found / environment failure
#

set -euo pipefail

# Load YottaDB environment if available
ENV_FILE="${ENV_FILE:-$HOME/jnlex/jnlex_env}"
if [[ -r "$ENV_FILE" ]]; then
  # shellcheck disable=SC1090
  source "$ENV_FILE"
fi

# Resolve mupip path
MUPIP=""
if [[ -n "${ydb_dist:-}" && -x "${ydb_dist}/mupip" ]]; then
  MUPIP="${ydb_dist}/mupip"
elif command -v mupip >/dev/null 2>&1; then
  MUPIP="$(command -v mupip)"
fi

if [[ -z "$MUPIP" ]]; then
  echo "{\"forwarding_configured\":false,\"forwarding_active\":false,\"destination_peers\":{},\"reason\":\"MUPIP_NOT_FOUND\"}"
  exit 2
fi

out="$("$MUPIP" replicate -source -checkhealth 2>&1 || true)"

# Detect YottaDB error condition
errcode="$(echo "$out" | sed -n 's/^%YDB-E-\([A-Z0-9]\+\),.*/\1/p' | head -n 1)"
if [[ -n "$errcode" ]]; then
  echo "{\"forwarding_configured\":false,\"forwarding_active\":false,\"destination_peers\":{},\"reason\":\"$errcode\"}"
  exit 0
fi

# Parse ACTIVE non-dummy peers only
json_peers="$(
  echo "$out" | awk '
    BEGIN { first=1; name=""; pid=""; }

    /secondary instance name \[/ {
      # Extract pid
      t=$0
      sub(/^.*source server pid \[/, "", t)
      sub(/\].*$/, "", t)
      pid=t

      # Extract peer name
      t=$0
      sub(/^.*secondary instance name \[/, "", t)
      sub(/\].*$/, "", t)
      name=t
      next
    }

    /^PID[[:space:]]+[0-9]+[[:space:]]+Source server is alive in ACTIVE mode/ {

      # Exclude dummy downstream placeholder
      if (name == "dummy") next

      if (name != "" && pid != "") {
        printf "%s\"%s\":{\"pid\":%s,\"mode\":\"ACTIVE\"}",
               (first ? "" : ","), name, pid
        first=0
      }
      next
    }
  '
)"

if [[ -n "$json_peers" ]]; then
  echo "{\"forwarding_configured\":true,\"forwarding_active\":true,\"destination_peers\":{$json_peers}}"
  exit 0
fi

# No ACTIVE peers, but no error either -> configured but idle/passive/dummy-only
echo "{\"forwarding_configured\":true,\"forwarding_active\":false,\"destination_peers\":{}}"
exit 0


And finally, this script that uses the above scripts to determine the current state of a server.  


#!/usr/bin/env bash
#
# probe_node_health  (Probe C: compose probes + derive role)
#
# Requires:
#   ./probe_originating_health
#   ./probe_replicating_health
#
# Output JSON:
# {
#   "role": "PRIMARY" | "SECONDARY" | "FORWARDER" | "IDLE" | "ERROR",
#   "forwarding": { ...output of probe_originating_health... },
#   "receiving":  { ...output of probe_replicating_health... }
# }
#
# Role derivation uses booleans:
#   forwarding.forwarding_active
#   receiving.receiving_active
#

set -euo pipefail

DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROBE_FWD="${DIR}/probe_originating_health"
PROBE_RCV="${DIR}/probe_replicating_health"

# Ensure dependencies exist
if [[ ! -x "$PROBE_FWD" ]]; then
  echo "{\"role\":\"ERROR\",\"error\":\"missing probe_originating_health\",\"path\":\"$PROBE_FWD\"}"
  exit 2
fi
if [[ ! -x "$PROBE_RCV" ]]; then
  echo "{\"role\":\"ERROR\",\"error\":\"missing probe_replicating_health\",\"path\":\"$PROBE_RCV\"}"
  exit 2
fi

# Run probes (never let a nonzero exit kill us; capture output for debugging)
fwd_out="$("$PROBE_FWD" 2>&1 || true)"
rcv_out="$("$PROBE_RCV" 2>&1 || true)"

# Validate JSON using jq if available (optional but helpful)
have_jq=0
if command -v jq >/dev/null 2>&1; then
  have_jq=1
fi

if [[ "$have_jq" -eq 1 ]]; then
  if ! echo "$fwd_out" | jq -e . >/dev/null 2>&1; then
    fwd_out="$(printf '%s' "$fwd_out" | head -n 5 | sed 's/"/\\"/g')"
    echo "{\"role\":\"ERROR\",\"error\":\"probe_originating_health returned non-JSON\",\"forwarding_raw\":\"$fwd_out\"}"
    exit 1
  fi
  if ! echo "$rcv_out" | jq -e . >/dev/null 2>&1; then
    rcv_out="$(printf '%s' "$rcv_out" | head -n 5 | sed 's/"/\\"/g')"
    echo "{\"role\":\"ERROR\",\"error\":\"probe_replicating_health returned non-JSON\",\"receiving_raw\":\"$rcv_out\"}"
    exit 1
  fi
fi

# Derive role (prefer jq when available; otherwise simple string tests)
role="ERROR"

if [[ "$have_jq" -eq 1 ]]; then
  fwd_active="$(echo "$fwd_out" | jq -r '.forwarding_active // false')"
  rcv_active="$(echo "$rcv_out" | jq -r '.receiving_active // false')"

  if [[ "$fwd_active" == "true" && "$rcv_active" == "false" ]]; then
    role="PRIMARY"
  elif [[ "$fwd_active" == "false" && "$rcv_active" == "true" ]]; then
    role="SECONDARY"
  elif [[ "$fwd_active" == "true" && "$rcv_active" == "true" ]]; then
    role="FORWARDER"
  elif [[ "$fwd_active" == "false" && "$rcv_active" == "false" ]]; then
    role="IDLE"
  else
    role="ERROR"
  fi

  echo "{\"role\":\"$role\",\"forwarding\":$fwd_out,\"receiving\":$rcv_out}"
  exit 0
fi

# Fallback (no jq): conservative string-based role derivation
# This assumes the probes always include these exact key/value strings.
fwd_active=0
rcv_active=0
echo "$fwd_out" | grep -q '"forwarding_active":true' && fwd_active=1
echo "$rcv_out" | grep -q '"receiving_active":true' && rcv_active=1

if [[ "$fwd_active" -eq 1 && "$rcv_active" -eq 0 ]]; then
  role="PRIMARY"
elif [[ "$fwd_active" -eq 0 && "$rcv_active" -eq 1 ]]; then
  role="SECONDARY"
elif [[ "$fwd_active" -eq 1 && "$rcv_active" -eq 1 ]]; then
  role="FORWARDER"
elif [[ "$fwd_active" -eq 0 && "$rcv_active" -eq 0 ]]; then
  role="IDLE"
else
  role="ERROR"
fi

echo "{\"role\":\"$role\",\"forwarding\":$fwd_out,\"receiving\":$rcv_out}"
exit 0

The output of these scripts can be passed to the linux utility "jq" for json parsing and query. 

Sample output is documented at the top of each script.  

If there is a better, more direct way to get the state of the current server, please let me know. 

Kevin

Stefano Lalli

unread,
Feb 12, 2026, 7:18:25 PMFeb 12
to Kevin Toppenberg, Everything MUMPS

Kevin

 If you want to get replication information programmatically, you need to look at the following routine in the YDBGUI:

YDBGUI/routines/_ydbguiReplication.m

 

The first function getInstanceFile() parse the instance file information and populate an array in JDOM format (the MUMPS representation of a JSON file), so you can convert it to JSON straight away, passing it as a parameter to the JSON routines… (you can use the standard VistA VPRJSONxxx or copy them from the YDBWEBSERVER).

 

The second function getHealth() collects real-time information using the $$^%PEEKBYNAME() function. This will also return a JDOM array.

This is probably what you also want... At the moment I can not find where the description of all these fields is on the YDB docs, but they are there, somewhere...

Maybe Bhaskar or Sam can point you to the right doc page...

But looking at the code and the variable names, you should be able to find most of the stuff...

 

The other function is getBacklog() to get real-time backlog information (also using the PeekByName function...)

 

Finally (which probably is the best for you), just call the fullInfo() function and you will get a big array with all the info (and, yes, you can convert this to json right away as well...). It will also get all the log files and merge them, so you have a full overview...

Sorry I don’t remember all the details; it has been a while...

Hope this help...

Stef


PS: these functions quit a “pointer”, so you need to call them using the syntax:

set *myVal=$$fullInfo()

and get the array back.


 PPSS: This code is calling a few external function like $$runShell^%ydbguiUtils, you will need to copy this code ait as well, but I don't think that there are many calls to external routines... You should be done in a short time...




Op vr 13 feb 2026 om 00:11 schreef Kevin Toppenberg <kdt...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "Everything MUMPS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to everythingmum...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/everythingmumps/7015d1f5-9ba5-41b7-be8f-59eb9d2df51fn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stefano Lalli

unread,
Feb 12, 2026, 9:01:52 PMFeb 12
to Kevin Toppenberg, Everything MUMPS
Kevin, to make it much easier....

If you have the YDBGUI installed in your system, open a YDB session and verify that the zroutines is properly configured...

Type:
w $zrou

you should find these shared library names in the list:
_ydbgui.so
_ydbmwebserver.so

Then simply type:
set *ret=$$fullInfo^%ydbguiReplication()

then 
zwr ret

Hope it works for you
stef

Op vr 13 feb 2026 om 01:18 schreef Stefano Lalli <stefano...@gmail.com>:

Kevin Toppenberg

unread,
Feb 13, 2026, 6:17:10 PMFeb 13
to Everything MUMPS
Stefano,

This was very helpful.  Thank you very much.  I didn't have ydbgui installed, but I found instructions to add on to my existing ydb (a test environment) here https://gitlab.com/YottaDB/UI/YDBGUI.

I have never used variable aliases ("pointers") before, so I did some reading on that.  Mind blown.

I'll be digesting this for a bit.

Thanks again,
Kevin
Reply all
Reply to author
Forward
0 new messages