System Troubleshooting
Verifying CMAPI Cluster Status
The mcsStatus command is an administrative alias for an HTTP curl request. Firewalls and security programs can block or interfere with this communication channel.
Status Verification Commands
Execute via administrative alias:
mcsStatusExecute via explicit network request:
curl -s https://127.0.0.1:8640/cmapi/0.4.0/cluster/status --header 'Content-Type:application/json' --header 'x-api-key:xxxxxxxxxxxxxxxx' -k | jq .
Operational Troubleshooting Checklist
If the status check command fails, encounters a connection break, or returns empty results, execute the following diagnostic verification steps in sequence:
Check for security programs interfering: Try running the connection string with elevated permissions (
sudo curl), as local system configurations can cut the connection. Use tracking utilities likehtop,top, orpsto look for interfering security services.Examples to audit:
falcon-senor,falcond,falcon-sensor,AppArmor, orseliux.Remediation: Turn these applications off, recycle the cluster instance, and attempt the status check again.
Verify node port connectivity: The remote nodes must have port
8640open for communication via a TCP connection.Run the socket inspection utility to verify which processes are actively bound to system ports:
ss -ntlp
Validate uniform authorization keys (
x-api-key): The authentication header string must match across all nodes in the deployment.Inspect the following configuration file path to verify key synchronization:
File Location:
/etc/columnstore/cmapi_server.confTarget Section:
[Authentication]Variable Line:
x-api-key=xxxxxxx
Confirm JSON processor installation: The API outputs a JSON data stream that is piped directly into
jq. If the server does not have this utility installed, the result is piped into nothing, returning blank terminal feedback.Install the prerequisite package dependency via the package manager:
yum install jq
Check IP address targeting loops: Manually run the
curlquery against the local loopback address (127.0.0.1) and compare the output structure against a query run against the known master instance.If the results do not match, it indicates that
/etc/columnstore/Columnstore.xmlis not identical across all cluster nodes.Remediation: Manually copy a uniform copy of
Columnstore.xmlto all nodes, restart thecmapiservice, and try again.
Reload default terminal environment settings: If the shell fails to recognize baseline commands, refresh the tracking aliases:
source /etc/profile.d/columnstoreAlias.sh # Alternative file target to source: /usr/share/columnstore/columnstoreAlias
Parsing CMAPI JSON Telemetry Output
When executed successfully, CMAPI generates a process status payload tracking metadata across the cluster topology:
Key Output Interpretation Rules
Node Headcount (
num_nodes): Verify if the expected number of nodes are actively part of the cluster. If 1 or more nodes fail and drop out, deeper troubleshooting is required, andColumnstore.xmlwill likely be shuffled as the node is considered inactive.Service Presence Tracking: CMAPI evaluates the process list on each node against specific binary names, but it cannot identify if processes are orphaned or not working as expected.
Master Node Requirement: The server acting as master (
dbrm_mode="master") must have exactly 7 running processes. If fewer appear, some may be failing to start; inspect the system logs or check/var/log/mariadb/columnstore/trace/and/var/log/mariadb/columnstore/corefiles/for core dumps or stack traces before segmentation faults occurred.Replica Node Requirement: Replica nodes (
dbrm_mode="slave") must have exactly 4 running processes. Check the logs if a service fails to initialize.
Cluster Modes (
cluster_mode): Find the node wheredbrm_mode=masterand check itscluster_modeflag. Replicas always remain locked intoreadonlymode. A healthy master should readreadwrite. If the master node is inreadonly, a severe issue has occurred, forcing the system into safe mode until manual intervention corrects the root cause. This state transition historically occurs because:Stuck rollbacks originating from failed transactions or bulk
cpimporttasks.Disk Block Resource Manager (BRM) files cannot be updated/written to, or are missing/corrupted.
Certain services cannot start, such as
storagemanagerbeing unable to reach external S3 storage buckets.
Master Resolution Dependencies: CMAPI determines which node is the master dynamically by utilizing the internal
CEJuser profile to inspect the database process list for active replication threads. IfCEJuser credentials or grant permissions are broken, CMAPI breaks completely.DBroot Mappings: The tracked
dbrootsvalues should match themodule_idto confirm no failover states have transpired. The active master node traditionally ownsDBroot 1. If a role change occurs, the database root responsibilities get shuffled across surviving instances.
ColumnStore Engine Log Breakdown
Parsing the debug.log Text String Layout
debug.log Text String LayoutThe primary diagnostic log file is located at /var/log/mariadb/columnstore/debug.log. Output entries are written according to a standardized layout format:
Tracking Operational Timings via Internal Transaction IDs
Identify the transaction scope: Locate the numeric
Internal Transaction IDthat appears between the two pipe characters (||). For example:|22|.Correlate matching execution markers: Map the corresponding
Start SQL statementandEnd SQL statementrows that share the same transaction ID to find the rough timing of queries. For instance, aLOAD DATA INFILEoperation spanning these markers reveals an exact execution window.Trace long-running queries: For extended or heavy operations, these bounding lines will not appear back-to-back because other cluster nodes interleave separate log output statements. Search explicitly on the unique transaction ID (e.g.,
|22|) to compile a continuous timeline for that single query lifecycle.
Administrative Log Investigation Patterns
Execute the following grep search strings to extract critical milestones from system logs:
Find executed SQL query strings:
Track the lifecycle of bulk
cpimporttasks:Locate cluster startup timestamps:
Verify successful
loadbrmexecutions with the extent map:Audit if
DMLProchandles rollbacks correctly during initialization:Confirm successful system shutdowns via
save_brmcheckpoints:
Common Startup Errors and Remediation Strategies
1. Object Mapping Failures (IDB-2006: does not exist in Columnstore)
IDB-2006: does not exist in Columnstore)Root Cause: The underlying MySQL definition metadata files match successfully at
/var/lib/mysql/{database}/{tablename}.frm, but the internal object mappings linking those columns to lower-level ColumnStore elements have broken or are missing entirely.Remediation Action (Single Affected Table): If the data records contained in the target table are not critical, recreate the structural definitions to instantiate brand new object identification tracking values within the columnar storage maps:
Remediation Action (System-Wide Mismatch): If numerous database tables throw this same exception concurrently when queried, the wrong extent map file has likely been loaded into memory. Restore system operations by restoring a valid historical extent map backup and its matching transaction journal. If a functional backup map cannot be obtained, execute
mcsRebuildEMto reconstruct the map properties, which may succeed if the raw data files still reside intact inside each independentdbrootworkspace directory path. If those source data files are also missing, no further recovery paths exist.
2. Hanging Ingestion Rollbacks on Cluster Boot
Root Cause: These log sequences appear normally during system startup phases as
DMLProcreviews the transaction layers to finalize outstanding changes or roll back bulk updates that were abruptly canceled or failed prior to the system restart.Mechanics: These tracking records are maintained inside the active
BRM_saves_journalstructure, with historical rollback fragments potentially existing inside corresponding.vssand.vbbmblock files.Workaround: If cluster initialization operations become completely blocked or hang indefinitely while processing failed rollbacks, restoring an older version or backup copy of the transaction journal can bypass the loop. Note that pursuing this cleanup strategy will result in the permanent loss of any other distinct transaction writes recorded within that specific journal window.
3. Verification of System Catalog States
Root Cause: This represents a standard informational notification output during cluster boot procedures confirming that ColumnStore successfully detects a valid pre-existing installation and will load the data blocks normally without resetting or overwriting deployment configurations.
4. Table Lock File Permission Blockages
Root Cause: The database application layer cannot modify or access the metadata lock boundaries because of an operating system file write permission conflict on the physical file system tracking asset.
Remediation Action: Switch your active terminal shell context to the primary database service user profile (
mysql) and initialize a clean instance of the lock tracking file path manually to clear the access conflict:
Last updated
Was this helpful?

