System Troubleshooting MariaDB ColumnStore
- MariaDB ColumnStore alias commands
- MariaDB ColumnStore Support Report tool
- MariaDB ColumnStore logging
- Enable/Disable Core Files
- MariaDB ColumnStore database files
- MariaDB ColumnStore utilities
- Multi-node install problems and how to diagnose
- Add Module install problems and how to diagnose
- postConfigure install problems and how to diagnose
- startSystem problems and how to diagnose
- System in DBRM Read-Only Mode
- Non-Root System, PrimProc Process fails to startup
MariaDB ColumnStore alias commands
During the installation, these alias commands are defined and placed in the .bashrc file of the install user. This example is from a non-root install:
alias mcsmysql='/home/mariadb-user/mariadb/columnstore/mysql/bin/mysql --defaults-file=/home/mariadb-user/mariadb/columnstore/mysql/my.cnf -u root' alias ma=/home/mariadb-user/mariadb/columnstore/bin/mcsadmin alias mcsadmin=/home/mariadb-user/mariadb/columnstore/bin/mcsadmin alias cpimport=/home/mariadb-user/mariadb/columnstore/bin/cpimport alias home='cd /home/mariadb-user/mariadb/columnstore' alias log='cd /var/log/mariadb/columnstore/' alias dbrm='cd /home/mariadb-user/mariadb/columnstore/data1/systemFiles/dbrm' alias module='cat /home/mariadb-user/mariadb/columnstore/local/module' mcsmysql - Access the MariaDB ColumnStore MySQL Console ma and mcsadmin - Access the MariaDB ColumnStore Admin Console cpimport - short-cut to run the Bulk Load Process, cpimport home - cd to MariaDB ColumnStore home directory log - cd to MariaDB ColumnStore log directory dbrm - cd to MariaDB ColumnStore DBRM file directory module - outputs the MariaDB ColumnStore local mode name, like 'pm1'
MariaDB ColumnStore Support Report tool
This tool can be executed by users, called "columnstoreSupport", that will generated a report that contains the log files and other system data that is used by MariaDB personnel to help diagnose system related issues and errors within the MariaDB ColumnStore Product.
Here is how to run it:
On a single server:
On a multi-node combo server, run on pm1
/usr/local/mariadb/columnstore/bin/columnstoreSupport -a -p 'user-password'
On a multi-node separate server, run on um1
/usr/local/mariadb/columnstore/bin/columnstoreSupport -a -p 'user-password'
NOTE: If ssh-keys are setup, enter the work 'ssh' for user-password. If no ssh-keys, then enter the Unix User password that is used to ssh login to the system.
Here is an example of it getting run the the report that is generated:
/usr/local/mariadb/columnstore/bin/columnstoreSupport -a Get software report data for pm1 Get config report data for pm1 Get log report data for pm1 Get log config data for pm1 Get hardware report data for pm1 Get resource report data for pm1 Get dbms report data for pm1 Columnstore Support Script Successfully completed, files located in columnstoreSupportReport.tar.gz
columnstoreSupportReport.tar.gz is what you would provide to MariaDB personnel or attach to a JIRA.
This is what the report consist of:
1. Compressed log files from each module : pm1_logReport.tar.gz a. This is the directory /var/log/mariadb/columnstore from pm1 which will contain:
- system logs for ColumnStore, debug, info, err, warning, and critical
- the alarm logs, alarm.log and activealarmLog
- UI command log, uiCommands.log, which are commands entered into mcsadmin
2. Config report from each module: pm1_configReport.txt. NOTE: on a single server system, pm1 report will contain more configuration data. On a multi-node seperate system um1 report will contain more data
- Server processes - ps command info and top
- System network information
- System configuration information including storage
- System status information at the time the report was run
- System configuration file, Columnstore.xml
3. Hardware report for each module: pm1_hardwareReport.txt
- OS version
- CPU information
- Memory information
- Storage mount information
- Ip address information, ifconfig
4. Resource report for each module: pm1_resourceReport.txt
- Shared memory
- Disk usage
- DBRM files
- Active table locks
- BRM extent map
5. Software report for each module: pm1_softwareReport.txt
- MariaDB ColumnStore software version
6. DBMR report, from the front-end modules: um1_dbmsReport.txt
- MariaDB version
- System catalog and tables
- MariaDB Columnstore usernames
- MariaDB ColumnStore variables
- MariaDB ColumnStore configuration file, my.cnf
- List of active queries at the time the report was run
7. MariaDB ColumnStore MySQL log file: um1_mysqllogReport.tar.gz
MariaDB ColumnStore logging
MariaDB ColumnStore utilizes the install system logging tool, whether its syslog, rsyslog, or ng-syslog. The logs are located in /var/log/mariadb/columnstore. There are these 5 logs:
- debug.log We also utilize the log rotate tool and by default we are configured to keep 7 days of log files. They are stored in /var/log/mariadb/columnstore/archive.
Also in the /var/log/mariadb/columnstore directory, there are a few other logs that are kept:
- activeAlarms – List of active alarms currently set on the system
- alarm.log – list of all the alarms and associated clear alarms
- uiCommands.log – list of the mcsadmin commands entered
The MariaDB ColumnStore process corefiles would be stored in /var/log/mariadb/columnstore/corefiles, that is if core file dumping is enabled on the system.
MariaDB ColumnStore log files and what goes in them
- Crit, err, and warning used to log problems by a MariaDB ColumnStore Process.
- Info will have logs showing high level actions that are going on in the system. During a stop/startsystem, it will show the high level commands of the process/modules being stopped and started. The bulk-load (cpimport) tool logs is high level actions there also.
- Debug will have the lower level actions from the MariaDB ColumnStore Processes, which will include queries. So all the mysql query commands will be logged here.
MariaDB ColumnStore-MySQL logs are stored here:
NOTE: Other log files will be written to the log directory as well as the /tmp directory from MariaDB ColumnStore process during certain operations. So you will see a few other logs show up in the log directory besides these.
Process STDOUT/STDERR logging
MariaDB ColumnStore Processes have built in STDOUT/STDERR logging that can be enabled. This could be used for additional debugging of issues. This is enabled on a Process by Process Level. Here is an example of how to enable and disable. In this example, locate the Process to enable, like DDLProc. Then enter the Process name and Module-type on the 'setprocessconfig' command.
NOTE: run from PM1
mcsadmin> getprocessconfig getprocessconfig Tue Sep 26 19:21:14 2017 Process Configuration Process #1 Configuration information ProcessName = ProcessMonitor ModuleType = ChildExtOAMModule ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/ProcMon BootLaunch = 0 LaunchID = 1 RunType = LOADSHARE LogFile = off Process #2 Configuration information ProcessName = ProcessManager ModuleType = ParentOAMModule ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/ProcMgr BootLaunch = 1 LaunchID = 2 RunType = ACTIVE_STANDBY LogFile = off Process #3 Configuration information ProcessName = DBRMControllerNode ModuleType = ParentOAMModule ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/controllernode ProcessArg1 = /home/mariadb-user/mariadb/columnstore/bin/controllernode ProcessArg2 = fg BootLaunch = 2 LaunchID = 4 DepModuleName1 = @ DepProcessName1 = ProcessManager RunType = SIMPLEX LogFile = off Process #4 Configuration information ProcessName = ServerMonitor ModuleType = ChildOAMModule ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/ServerMonitor ProcessArg1 = /home/mariadb-user/mariadb/columnstore/bin/ServerMonitor BootLaunch = 2 LaunchID = 6 RunType = LOADSHARE LogFile = off Process #5 Configuration information ProcessName = DBRMWorkerNode ModuleType = ChildExtOAMModule ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/workernode ProcessArg1 = /home/mariadb-user/mariadb/columnstore/bin/workernode ProcessArg2 = DBRM_Worker ProcessArg3 = fg BootLaunch = 2 LaunchID = 7 RunType = LOADSHARE LogFile = off Process #6 Configuration information ProcessName = DecomSvr ModuleType = pm ProcessLocation = //home/mariadb-user/mariadb/columnstore/bin/DecomSvr BootLaunch = 2 LaunchID = 15 RunType = LOADSHARE LogFile = off Process #7 Configuration information ProcessName = PrimProc ModuleType = pm ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/PrimProc BootLaunch = 2 LaunchID = 20 RunType = LOADSHARE LogFile = off Process #8 Configuration information ProcessName = ExeMgr ModuleType = pm ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/ExeMgr BootLaunch = 2 LaunchID = 30 DepModuleName1 = pm* DepProcessName1 = PrimProc RunType = LOADSHARE LogFile = off Process #9 Configuration information ProcessName = WriteEngineServer ModuleType = pm ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/WriteEngineServer BootLaunch = 2 LaunchID = 40 RunType = LOADSHARE LogFile = off Process #10 Configuration information ProcessName = DDLProc ModuleType = pm ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/DDLProc BootLaunch = 2 LaunchID = 50 DepModuleName1 = pm* DepProcessName1 = WriteEngineServer DepModuleName2 = * DepProcessName2 = DBRMWorkerNode DepModuleName3 = * DepProcessName3 = ExeMgr RunType = SIMPLEX LogFile = off Process #11 Configuration information ProcessName = DMLProc ModuleType = pm ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/DMLProc BootLaunch = 2 LaunchID = 51 DepModuleName1 = pm* DepProcessName1 = WriteEngineServer DepModuleName2 = * DepProcessName2 = DBRMWorkerNode DepModuleName3 = @ DepProcessName3 = DDLProc RunType = SIMPLEX LogFile = off Process #12 Configuration information ProcessName = mysqld ModuleType = pm ProcessLocation = /home/mariadb-user/mariadb/columnstore/mysql/libexec/mysqld BootLaunch = 0 LaunchID = 100 RunType = LOADSHARE LogFile = off mcsadmin> setprocessconfig DDLProc pm LogFile on setprocessconfig Tue Sep 26 19:22:00 2017 Successfully set LogFile = on mcsadmin> shutdownsystem y shutdownsystem Tue Sep 26 19:23:59 2017 This command stops the processing of applications on all Modules within the MariaDB ColumnStore System Checking for active transactions Stopping System... Successful stop of System Shutting Down System... Successful shutdown of System mcsadmin> startsystem startsystem Tue Sep 26 19:24:36 2017 startSystem command, 'columnstore' service is down, sending command to start the 'columnstore' service on all modules System being started, please wait.................. Successful start of System mcsadmin>
In the log directory, there will be the following 2 files. So all the STDOUT/STDERR from the process will be logged here
pwd /var/log/mariadb/columnstore ll DDLProc.* -rw-r--r-- 1 mariadb-user mariadb-user 0 Sep 26 19:25 DDLProc.err -rw-r--r-- 1 mariadb-user mariadb-user 34 Sep 26 19:25 DDLProc.out
To Disable, just set the LogFile setting back to off and do the shutdown/startsystem
setprocessconfig DDLProc pm LogFile off
Enable/Disable Core Files
Since core files are very large (1gb) and can take up a lot of disk space, Core File Generating for MariaDB ColumnStore platform processes are disabled.
The location of the MariaDB ColumnStore Process corefiles get placed here:
- Enable from pm1
# ma shutdownsystem # /usr/local/mariadb/columnstore/bin/setConfig Installation CoreFileFlag y # ma startsystem
- Disable from pm1
# ma shutdownsystem # /usr/local/mariadb/columnstore/bin/setConfig Installation CoreFileFlag n # ma startsystem
MariaDB ColumnStore database files
The MariaDB ColumnStore has 3 sets of database files. These files are also always backed up and restored together as part of the backup and restore process.
- MariaDB ColumnStore-MySQL schemes - /usr/local/mariadb/columnstore/mysql/db/*
- MariaDB ColumnStore Database - /usr/local/mariadb/columnstore/dataX/000.dir
- MariaDB ColumnStore DBRM files - /usr/local/mariadb/columnstore/data1/systemFiles/dbrm/*
MariaDB ColumnStore Database: The X represents the DBroot ID #. DBroot is the file directory containing the meta-data. We generally assign 1 DBroot per Performance Module. This DBroot can be pointing to local disk storage area or mounted to external disk. So for a single-server setup, we would have ‘data1’ as the dbroot. If we had a system will 2 Performance modules, then we would have ‘data1’ on PM1 and ‘data2’ on PM2.
MariaDB ColumnStore DBRM files: This is where the Extent Map and Versioning files are located. These make up the DBRM files. There are 3 copies of the files that are keep, one is the current active set and the other 2 are the backup. The current active copy file name is located in this file:
The Extent Map is loaded into shared memory on each of the nodes during the start-system process time. The version in shared memory on the PM1 node is the main copy. Changes are applied to that version in memory. Then changes are made to the disk version that only exist on PM1 disk storage and a copy of those changes are sent out to the other nodes and their memory copies are updated.
NOTE: The following utility can be used to dump the internal memory copy of the Extent Map
/usr/local/mariadb/columnstore/bin/editem : there are a few options with this command, -i dumps a raw copy and -d dumps a formatted copy
MariaDB ColumnStore utilities
Here are a few of the common utilities that are used to view and troubleshoot issues. All of these commands are located in /usr/local/mariadb/columnstore/bin/
These are utilizes to be used to view or set system variables. They should be run on the Active OAM Parent Module, which generally is the Performance Module #1.
- editem – used to view the Extent Map in internal memory, discussed in previous section
- configxml.sh – used to get and set parameters from the system config file, ColumnStore.xml
- ./configxml.sh getconfig ExeMgr1 Port
- ./configxml.sh setconfig ExeMgr1 Port 8601
- dbrmctl – used to display are view or change the DBRM status
- Example, display the current the dbrm status
- ./dbrmctl status
- Example, the dbrm might be in a readonly state, this would unlock
- ./dbrmctl resume
- Example, display the current the dbrm status
These are utilizes to be used to view or clear Database Table Locks. They should be run on the Active OAM Parent Module, which generally is the Performance Module #1.
- viewtablelock – will display what tables are locked. There might be times when a DML command might fail and leave a table in a locked state. You can run this command to find which tables are locked, and then use the ‘cleartablelock’ command
- cleartablelock – use to clear a table lock. As explained above, could be used to clear a lock on a table that was left set on a failed command
These are utilities that would be run on all nodes in the system
- clearShm – used to clear the shared internal memory, used at times after a system-shutdown command is do just to make sure the memory is cleared
Multi-node install problems and how to diagnose
Once you install the packages on the initial server, pm1, run post-install and postConfigure.
If its failing in the remote server install section, go via the install logs in /tmp to see why the failure occurred. It could be related to these issues:
1. user password or ssh-keys not setup, failing to log in
2. Dependent package isn't installed on a remote server
3. Incompatible OS's between nodes, all have to be the same
If you get to the point where it says Starting system processes, but it seems to hang or not return. Here are some things to check:
1. Check the locale setting on all servers and make sure they are all the same
2. on pm1, create the alias if you haven't already
- . /usr/local/mariadb/columnstore/bin/columnstoreAlias then run following command and check the process status:
- mcsadmin getsysteminfo check if ProcMon is ACTIVE on all configured servers, if not, check the log files on the asscouiated server to see what error ProcMon is reporting. Also make sure the ProcMgr is ACTIVE on pm1.
logs are located in:
generally when ProcMon/ProcMgr isn't active, its because one of these issues:
1. if external storage, an pm /etc/fstab isnt setup
2. message issue between the servers that is causing ProcMon's and ProcMgr to fail to communicate. Make sure all server firewalls are disable along with SElinux.
Add Module install problems and how to diagnose
There are a number of reasons why an addModule command might fail, missing dependent packages, password or ssh key is not setup, etc. Here are some things to investigate when this 'mcsadmin' command does fail.
1. Check the log files on the local node where the command is running. Generally an log in the error, warning, or critical log will be reported when a failure occurs.
2. Depending on how far the addModule command got, another log will be generated in /tmp. Look for a log file with binary_installer, user/performance_installer. This will echo back the installer script the commands it runs, so it would flag an issue in there.
3. Also make sure you can log into the new server/instance from the local one. Not being able to log in would cause a failure.
4. Also depending on how far the command got, it might have added an entry in the system configuration file for the new module. So you will need to check that and if you were adding pm2 and it failed, then it shows up in the system configuration via the 'getsystemn'. You would then need to remove that module before trying the addModule command.
5. If your get the error return from the 'addModule' command if File Open error, that means it could locate the MariaDB ColumnStore rpm/deb/binary in the $HOME directory, i.e. /root for root user install. The logic takes the packages from here and pushes them to the new server.
postConfigure install problems and how to diagnose
The installation script, postConfigure, is run at install and upgrade times. The first part of the script takes information from the user and setup the system configuration, which is updating the Columnstore.xml and the ProcessConfig.xml configuration files.
The second part of the script performs a remote install of all of the other servers in the system, which is for a multi-node install configuration. The installation of the remote nodes are done simultaneously and the remote install logs are plaed in /tmp on 'pm1', i.e. "pm1_installer.log". The actualy log file name will be different based on if you are doing an rpm, debian, or binary install. So if postConfigure reports that a failure occurred during the remote server install phase, you can look at these logs in /tmp. The main reasons why this might fail:
1. ssh access to the remote node from pm1 failed, password or ssh setup issues
2. A missing dependency package on the remote node
The third part of postConfigure is the starting up of the system, which consist of starting up of the 'columnstore' service script on each node. If a failure happens during this time frame, do the following to help determine the failure:
1. first, you might need to run this script to get the 'mcsadmin' alias command defined
# . ./usr/local/mariadb/columnstore/bin/aliasColumnstore
2. get the system status's
# mcsadmin getsystemi
3. Here is some things that can point to you why the system didn't come up a. Make sure that all ProcMon processes are active on all nodes. If they aren't, then hear are some of the reasons why they might not be: i. Firewall is enabled on the 'pm1' node are the installing node, check that ii. ProcMon might have run into another issues at startup, like failing to mount to an external disk. So check the log files on the remote server where ProcMon failed to go ACTIVE b. If it reports a module status of FAILED, then check the log files from that module.
4. Also check the log files from the local 'pm1' module.
startSystem problems and how to diagnose
So this is assuming that the system has made it successfully though a postConfigure install or upgrade. At some point, you might need to do a stop or shutdownsystem for some maintenance or some other reason. And then do the startsystem. If any failures occur with the startSystem command, you can check the following:
1. get the system status's
# mcsadmin getsystemi
2. Here is some things that can point to you why the system didn't come up a. Make sure that all ProcMon processes are active on all nodes. If they aren't, then hear are some of the reasons why they might not be: i. Firewall is enabled on the 'pm1' node are the installing node, check that ii. ProcMon might have run into another issues at startup, like failing to mount to an external disk. So check the log files on the remote server where ProcMon failed to go ACTIVE b. If any of the DBRM Processes, Controller or Worker nodes are in a FAILED state, the most likely reason for this the there is an issue with the DBRM files. These fails are loaded from disk into shared-memory. This is load fails, it will mark the DBRM Process as FAILED. If this is the case, then please contact MariaDB Customer Support. The DBRM fails contain the Extent Map and other Metadata related to the MariaDB Columnstore Database Files. c. If it reports a module status of FAILED, then check the log files from that module.
3. Also check the log files from the local 'pm1' module.
System in DBRM Read-Only Mode
During some failover scenrios or network issues, the system can get into a state where the DBRM system files gets placed into Read-Only Mode. It will be shown by the follow alarm. This alarm along with all critical alarms will be displayed when user logs into the Columnstore Admin Console 'mcsadmin'.
AlarmID = 31 Brief Description = DBRM_READ_ONLY Alarm Severity = CRITICAL Time Issued = Wed Sep 13 14:32:37 2017 Reporting Module = pm1 Reporting Process = DBRMControllerNode Reported Device = System
If the system ever gets into DBRM Read-Only Mode, its best resolved by do a restart system form the 'pm1' module:
# mcsadmin restartsystem y
DBRM Read-Only Mode means that changes can't be made to the MariaDB ColumnStore Database while it's in this state. Queries can still be processed.
Non-Root System, PrimProc Process fails to startup
For non-root systems, the user file settings is required to be set as shown in the Preparing Guide. So if you have a Non-Root install where it fails to start and the 'mcsadmin getsystemi' shows that the PrimProc Process is in a failed state. Double check the user file settings on each node.