System Troubleshooting MariaDB ColumnStore

You are viewing an old version of this article. View the current version here.

MariaDB ColumnStore alias commands

During the installation, these alias commands are defined and placed in the .bashrc file of the install user. 1.1 and later releases, the alias will reside in /etc/profile.d/columnstoreAlias.sh

This example is from a non-root install:

alias mcsmysql='/home/mariadb-user/mariadb/columnstore/mysql/bin/mysql --defaults-file=/home/mariadb-user/mariadb/columnstore/mysql/my.cnf -u root'
alias ma=/home/mariadb-user/mariadb/columnstore/bin/mcsadmin
alias mcsadmin=/home/mariadb-user/mariadb/columnstore/bin/mcsadmin
alias cpimport=/home/mariadb-user/mariadb/columnstore/bin/cpimport
alias home='cd /home/mariadb-user/mariadb/columnstore'
alias log='cd /var/log/mariadb/columnstore/'
alias dbrm='cd /home/mariadb-user/mariadb/columnstore/data1/systemFiles/dbrm'
alias module='cat /home/mariadb-user/mariadb/columnstore/local/module'

mcsmysql - Access the MariaDB ColumnStore MySQL Console
ma and mcsadmin - Access the MariaDB ColumnStore Admin Console
cpimport - short-cut to run the Bulk Load Process, cpimport
home - cd to MariaDB ColumnStore home directory
log - cd to MariaDB ColumnStore log directory
dbrm - cd to MariaDB ColumnStore DBRM file directory
module - outputs the MariaDB ColumnStore local mode name, like 'pm1'

MariaDB ColumnStore Support Report tool

This tool can be executed by users, called "columnstoreSupport", that will generated a report that contains the log files and other system data that is used by MariaDB personnel to help diagnose system related issues and errors within the MariaDB ColumnStore Product.

Here is how to run it:

On a single server:

/usr/local/mariadb/columnstore/bin/columnstoreSupport -a

On a multi-node combo server, run on pm1

/usr/local/mariadb/columnstore/bin/columnstoreSupport -a -p 'user-password'

On a multi-node separate server, run on pm1

/usr/local/mariadb/columnstore/bin/columnstoreSupport -a -p 'user-password'

NOTE: If ssh-keys are setup, enter the work 'ssh' for user-password. If no ssh-keys, then enter the Unix User password that is used to ssh login to the system.

Here is an example of it getting run the the report that is generated:

/usr/local/mariadb/columnstore/bin/columnstoreSupport -a
Get software report data for pm1
Get config report data for pm1
Get log report data for pm1
Get log config data for pm1
Get hardware report data for pm1
Get resource report data for pm1
Get dbms report data for pm1

Columnstore Support Script Successfully completed, files located in columnstoreSupportReport.tar.gz

columnstoreSupportReport.tar.gz is what you would provide to MariaDB personnel or attach to a JIRA.

This is what the report consist of:

1. Compressed log files from each module : pm1_logReport.tar.gz a. This is the directory /var/log/mariadb/columnstore from pm1 which will contain:

  • system logs for ColumnStore, debug, info, err, warning, and critical
  • the alarm logs, alarm.log and activealarmLog
  • UI command log, uiCommands.log, which are commands entered into mcsadmin

2. Config report from each module: pm1_configReport.txt. NOTE: on a single server system, pm1 report will contain more configuration data. On a multi-node seperate system um1 report will contain more data

  • /etc/tstab
  • Server processes - ps command info and top
  • System network information
  • System configuration information including storage
  • System status information at the time the report was run
  • System configuration file, Columnstore.xml

3. Hardware report for each module: pm1_hardwareReport.txt

  • OS version
  • CPU information
  • Memory information
  • Storage mount information
  • Ip address information, ifconfig

4. Resource report for each module: pm1_resourceReport.txt

  • Shared memory
  • Disk usage
  • DBRM files
  • Active table locks
  • BRM extent map

5. Software report for each module: pm1_softwareReport.txt

  • MariaDB ColumnStore software version

6. DBMR report, from the front-end modules: um1_dbmsReport.txt

  • MariaDB version
  • System catalog and tables
  • MariaDB Columnstore usernames
  • MariaDB ColumnStore variables
  • MariaDB ColumnStore configuration file, my.cnf
  • List of active queries at the time the report was run

7. MariaDB ColumnStore MySQL log file: um1_mysqllogReport.tar.gz

MariaDB ColumnStore logging

MariaDB ColumnStore utilizes the install system logging tool, whether its syslog, rsyslog, or syslog-ng. The logs are located in /var/log/mariadb/columnstore. There are these 5 logs:

  • crit.log
  • err.log
  • warning.log
  • info.log
  • debug.log

We also utilize the log rotate tool and by default we are configured to keep 7 days of log files. They are stored in /var/log/mariadb/columnstore/archive.

The MariaDB ColumnStore logrotate file is located in

/etc/logrotate.d/columnstore

Also in the /var/log/mariadb/columnstore directory, there are a few other logs that are kept:

  • activeAlarms – List of active alarms currently set on the system
  • alarm.log – list of all the alarms and associated clear alarms
  • mcsadmin.log – list of the mcsadmin commands entered

The MariaDB ColumnStore process corefiles would be stored in /var/log/mariadb/columnstore/corefiles, that is if core file dumping is enabled on the system.

MariaDB ColumnStore log files and what goes in them

  • Crit, err, and warning used to log problems by a MariaDB ColumnStore Process.
  • Info will have logs showing high level actions that are going on in the system. During a stop/startsystem, it will show the high level commands of the process/modules being stopped and started. The bulk-load (cpimport) tool logs is high level actions there also.
  • Debug will have the lower level actions from the MariaDB ColumnStore Processes, which will include queries. So all the mysql query commands will be logged here.

MariaDB ColumnStore-MySQL logs are stored here:

/usr/local/mariadb/columnstore/mysql/db/’server-name’.err

NOTE: Other log files will be written to the log directory as well as the /tmp directory from MariaDB ColumnStore process during certain operations. So you will see a few other logs show up in the log directory besides these.

MariaDB ColumnStore log files and how to setup

The MariaDB ColumnStore log files setup is done as part of the post-install/postConfigure installation process. If some some reason the MariaDB ColumnStore log files aren't being generated, then there might have been some install/setup error that occurred.

Run the following command to try to get them setup, if they aren't after an install was done.

Root install:

/usr/local/mariadb/columnstore/bin/syslogSetup.sh install

Non-root install:

$HOME/mariadb/columnstore/bin/syslogSetup.sh install

To test the logs, run the following and check the directory

# mcsadmin getlogconfig

# ls -ltr /var/log/mariadb/columnstore

// want to see if any of these logs are now showing up

# ls -ltr
total 172
-rwxr-xr-x 1 syslog adm 398 Oct 12 18:27 warning.log
-rwxr-xr-x 1 syslog adm 398 Oct 12 18:27 err.log
-rwxrwxrwx 1 syslog adm 8100 Oct 12 18:27 info.log
-rwxrwxrwx 1 syslog adm 139975 Oct 12 18:27 debug.log

If so, then logging is now working...

Process STDOUT/STDERR logging

MariaDB ColumnStore Processes have built in STDOUT/STDERR logging that can be enabled. This could be used for additional debugging of issues. This is enabled on a Process by Process Level. Here is an example of how to enable and disable. In this example, locate the Process to enable, like DDLProc. Then enter the Process name and Module-type on the 'setprocessconfig' command.

NOTE: run from PM1

mcsadmin> getprocessconfig
getprocessconfig   Tue Sep 26 19:21:14 2017

Process Configuration

Process #1 Configuration information
ProcessName = ProcessMonitor
ModuleType = ChildExtOAMModule
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/ProcMon
BootLaunch = 0
LaunchID = 1
RunType = LOADSHARE
LogFile = off

Process #2 Configuration information
ProcessName = ProcessManager
ModuleType = ParentOAMModule
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/ProcMgr
BootLaunch = 1
LaunchID = 2
RunType = ACTIVE_STANDBY
LogFile = off

Process #3 Configuration information
ProcessName = DBRMControllerNode
ModuleType = ParentOAMModule
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/controllernode
ProcessArg1 = /home/mariadb-user/mariadb/columnstore/bin/controllernode
ProcessArg2 = fg
BootLaunch = 2
LaunchID = 4
DepModuleName1 = @
DepProcessName1 = ProcessManager
RunType = SIMPLEX
LogFile = off

Process #4 Configuration information
ProcessName = ServerMonitor
ModuleType = ChildOAMModule
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/ServerMonitor
ProcessArg1 = /home/mariadb-user/mariadb/columnstore/bin/ServerMonitor
BootLaunch = 2
LaunchID = 6
RunType = LOADSHARE
LogFile = off

Process #5 Configuration information
ProcessName = DBRMWorkerNode
ModuleType = ChildExtOAMModule
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/workernode
ProcessArg1 = /home/mariadb-user/mariadb/columnstore/bin/workernode
ProcessArg2 = DBRM_Worker
ProcessArg3 = fg
BootLaunch = 2
LaunchID = 7
RunType = LOADSHARE
LogFile = off

Process #6 Configuration information
ProcessName = DecomSvr
ModuleType = pm
ProcessLocation = //home/mariadb-user/mariadb/columnstore/bin/DecomSvr
BootLaunch = 2
LaunchID = 15
RunType = LOADSHARE
LogFile = off

Process #7 Configuration information
ProcessName = PrimProc
ModuleType = pm
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/PrimProc
BootLaunch = 2
LaunchID = 20
RunType = LOADSHARE
LogFile = off

Process #8 Configuration information
ProcessName = ExeMgr
ModuleType = pm
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/ExeMgr
BootLaunch = 2
LaunchID = 30
DepModuleName1 = pm*
DepProcessName1 = PrimProc
RunType = LOADSHARE
LogFile = off

Process #9 Configuration information
ProcessName = WriteEngineServer
ModuleType = pm
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/WriteEngineServer
BootLaunch = 2
LaunchID = 40
RunType = LOADSHARE
LogFile = off

Process #10 Configuration information
ProcessName = DDLProc
ModuleType = pm
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/DDLProc
BootLaunch = 2
LaunchID = 50
DepModuleName1 = pm*
DepProcessName1 = WriteEngineServer
DepModuleName2 = *
DepProcessName2 = DBRMWorkerNode
DepModuleName3 = *
DepProcessName3 = ExeMgr
RunType = SIMPLEX
LogFile = off

Process #11 Configuration information
ProcessName = DMLProc
ModuleType = pm
ProcessLocation = /home/mariadb-user/mariadb/columnstore/bin/DMLProc
BootLaunch = 2
LaunchID = 51
DepModuleName1 = pm*
DepProcessName1 = WriteEngineServer
DepModuleName2 = *
DepProcessName2 = DBRMWorkerNode
DepModuleName3 = @
DepProcessName3 = DDLProc
RunType = SIMPLEX
LogFile = off

Process #12 Configuration information
ProcessName = mysqld
ModuleType = pm
ProcessLocation = /home/mariadb-user/mariadb/columnstore/mysql/libexec/mysqld
BootLaunch = 0
LaunchID = 100
RunType = LOADSHARE
LogFile = off

mcsadmin> setprocessconfig DDLProc pm LogFile on
setprocessconfig   Tue Sep 26 19:22:00 2017

   Successfully set LogFile = on

mcsadmin> shutdownsystem y
shutdownsystem   Tue Sep 26 19:23:59 2017

This command stops the processing of applications on all Modules within the MariaDB ColumnStore System

   Checking for active transactions

   Stopping System...
   Successful stop of System 

   Shutting Down System...
   Successful shutdown of System 

mcsadmin> startsystem
startsystem   Tue Sep 26 19:24:36 2017
startSystem command, 'columnstore' service is down, sending command to
start the 'columnstore' service on all modules


   System being started, please wait..................
   Successful start of System 

mcsadmin> 

In the log directory, there will be the following 2 files. So all the STDOUT/STDERR from the process will be logged here

pwd
/var/log/mariadb/columnstore

ll DDLProc.*
-rw-r--r-- 1 mariadb-user mariadb-user  0 Sep 26 19:25 DDLProc.err
-rw-r--r-- 1 mariadb-user mariadb-user 34 Sep 26 19:25 DDLProc.out

To Disable, just set the LogFile setting back to off and do the shutdown/startsystem

setprocessconfig DDLProc pm LogFile off

Crash trace files

MariaDB ColumnStore 1.0.13 / 1.1.3 onwards includes a special crash handler which will log details of a crash from the main UM and PM daemons. These can be found in:

/var/log/mariadb/columnstore/trace

The filenames will be in the form `<processName>.<processID>.log`. These are similar to the crash traces that can be found in the MariaDB server log files if MariaDB server crashes.

Enable/Disable Core Files

Since core files are very large (1gb) and can take up a lot of disk space, Core File Generating for MariaDB ColumnStore platform processes are disabled.

The location of the MariaDB ColumnStore Process corefiles get placed here:

/var/log/mariadb/columnstore/corefiles/

  • Enable from pm1
# ma shutdownsystem
# /usr/local/mariadb/columnstore/bin/setConfig Installation CoreFileFlag y
# ma startsystem
  • Disable from pm1
# ma shutdownsystem
# /usr/local/mariadb/columnstore/bin/setConfig Installation CoreFileFlag n
# ma startsystem

MariaDB ColumnStore database files

The MariaDB ColumnStore has 3 sets of database files. These files are also always backed up and restored together as part of the backup and restore process.

  • MariaDB ColumnStore-MySQL schemes - /usr/local/mariadb/columnstore/mysql/db/*
  • MariaDB ColumnStore Database - /usr/local/mariadb/columnstore/dataX/000.dir
  • MariaDB ColumnStore DBRM files - /usr/local/mariadb/columnstore/data1/systemFiles/dbrm/*

MariaDB ColumnStore Database: The X represents the DBroot ID #. DBroot is the file directory containing the meta-data. We generally assign 1 DBroot per Performance Module. This DBroot can be pointing to local disk storage area or mounted to external disk. So for a single-server setup, we would have ‘data1’ as the dbroot. If we had a system will 2 Performance modules, then we would have ‘data1’ on PM1 and ‘data2’ on PM2.

MariaDB ColumnStore DBRM files: This is where the Extent Map and Versioning files are located. These make up the DBRM files. There are 3 copies of the files that are keep, one is the current active set and the other 2 are the backup. The current active copy file name is located in this file:

/usr/local/mariadb/columnstore/data1/systemFiles/dbrm/BRM_saves_current

The Extent Map is loaded into shared memory on each of the nodes during the start-system process time. The version in shared memory on the PM1 node is the main copy. Changes are applied to that version in memory. Then changes are made to the disk version that only exist on PM1 disk storage and a copy of those changes are sent out to the other nodes and their memory copies are updated.

NOTE: The following utility can be used to dump the internal memory copy of the Extent Map /usr/local/mariadb/columnstore/bin/editem : there are a few options with this command, -i dumps a raw copy and -d dumps a formatted copy

MariaDB ColumnStore utilities

Here are a few of the common utilities that are used to view and troubleshoot issues. All of these commands are located in /usr/local/mariadb/columnstore/bin/

These are utilizes to be used to view or set system variables. They should be run on the Active OAM Parent Module, which generally is the Performance Module #1.

  • editem – used to view the Extent Map in internal memory, discussed in previous section
  • configxml.sh – used to get and set parameters from the system config file, ColumnStore.xml
    • ./configxml.sh getconfig ExeMgr1 Port
    • ./configxml.sh setconfig ExeMgr1 Port 8601
  • dbrmctl – used to display are view or change the DBRM status
    • Example, display the current the dbrm status
      • ./dbrmctl status
    • Example, the dbrm might be in a readonly state, this would unlock
      • ./dbrmctl resume

These are utilizes to be used to view or clear Database Table Locks. They should be run on the Active OAM Parent Module, which generally is the Performance Module #1.

  • viewtablelock – will display what tables are locked. There might be times when a DML command might fail and leave a table in a locked state. You can run this command to find which tables are locked, and then use the ‘cleartablelock’ command
  • cleartablelock – use to clear a table lock. As explained above, could be used to clear a lock on a table that was left set on a failed command

These are utilities that would be run on all nodes in the system

  • clearShm – used to clear the shared internal memory, used at times after a system-shutdown command is do just to make sure the memory is cleared

Tables locks and clearing

A Table lock might be left set due to come failure on processing a DML/DDL command. Normally this lock can be cleared with the utility mentioned above, cleartablelock. But in the case where its doesn't clear the lock, it can also be cleared by restarting the Active DMLProc on the system. This will cause DMLProc to perform the rollback processing that will clear any table locks.

1. run command to get the Active DMLProc

# mcsadmin getsystemi

2. Run the follow command that will restart the DMLProc

# mcsadmin restartProcess DMLProc xxx (xxx is probably um1 or pm1, based on the system)

When the status of DMLProc goes to ACTIVE from BUSY_INIT (meaning is performing rollbacks), then check to see if the lock still exist.

Multi-node install problems and how to diagnose

Once you install the packages on the initial server, pm1, run post-install and postConfigure.

If its failing in the remote server install section, go via the install logs in /tmp to see why the failure occurred. It could be related to these issues:

1. user password or ssh-keys not setup, failing to log in

2. Dependent package isn't installed on a remote server

3. Incompatible OS's between nodes, all have to be the same

If you get to the point where it says Starting system processes, but it seems to hang or not return. Here are some things to check:

1. Check the locale setting on all servers and make sure they are all the same

2. on pm1, create the alias if you haven't already

  1. . /usr/local/mariadb/columnstore/bin/columnstoreAlias then run following command and check the process status:
  2. mcsadmin getsysteminfo check if ProcMon is ACTIVE on all configured servers, if not, check the log files on the asscouiated server to see what error ProcMon is reporting. Also make sure the ProcMgr is ACTIVE on pm1.

logs are located in:

/var/log/mariadb/columnstore

generally when ProcMon/ProcMgr isn't active, its because one of these issues:

1. if external storage, an pm /etc/fstab isnt setup

2. message issue between the servers that is causing ProcMon's and ProcMgr to fail to communicate. Make sure all server firewalls are disable along with SElinux.

Add Module install problems and how to diagnose

There are a number of reasons why an addModule command might fail, missing dependent packages, password or ssh key is not setup, etc. Here are some things to investigate when this 'mcsadmin' command does fail.

1. Check the log files on the local node where the command is running. Generally an log in the error, warning, or critical log will be reported when a failure occurs.

2. Depending on how far the addModule command got, another log will be generated in /tmp. Look for a log file with binary_installer, user/performance_installer. This will echo back the installer script the commands it runs, so it would flag an issue in there.

3. Also make sure you can log into the new server/instance from the local one. Not being able to log in would cause a failure.

4. Also depending on how far the command got, it might have added an entry in the system configuration file for the new module. So you will need to check that and if you were adding pm2 and it failed, then it shows up in the system configuration via the 'getsystemn'. You would then need to remove that module before trying the addModule command.

5. If your get the error return from the 'addModule' command if File Open error, that means it could locate the MariaDB ColumnStore rpm/deb/binary in the $HOME directory, i.e. /root for root user install. The logic takes the packages from here and pushes them to the new server.

postConfigure install problems and how to diagnose

The installation script, postConfigure, is run at install and upgrade times. The first part of the script takes information from the user and setup the system configuration, which is updating the Columnstore.xml and the ProcessConfig.xml configuration files.

The second part of the script performs a remote install of all of the other servers in the system, which is for a multi-node install configuration. The installation of the remote nodes are done simultaneously and the remote install logs are plaed in /tmp on 'pm1', i.e. "pm1_installer.log". The actualy log file name will be different based on if you are doing an rpm, debian, or binary install. So if postConfigure reports that a failure occurred during the remote server install phase, you can look at these logs in /tmp. The main reasons why this might fail:

1. ssh access to the remote node from pm1 failed, password or ssh setup issues

2. A missing dependency package on the remote node

The third part of postConfigure is the starting up of the system, which consist of starting up of the 'columnstore' service script on each node. If a failure happens during this time frame, do the following to help determine the failure:

1. first, you might need to run this script to get the 'mcsadmin' alias command defined

# . ./usr/local/mariadb/columnstore/bin/aliasColumnstore

2. get the system status's

# mcsadmin getsystemi

3. Here is some things that can point to you why the system didn't come up a. Make sure that all ProcMon processes are active on all nodes. If they aren't, then hear are some of the reasons why they might not be: i. Firewall is enabled on the 'pm1' node are the installing node, check that ii. ProcMon might have run into another issues at startup, like failing to mount to an external disk. So check the log files on the remote server where ProcMon failed to go ACTIVE b. If it reports a module status of FAILED, then check the log files from that module.

4. Also check the log files from the local 'pm1' module.

startSystem problems and how to diagnose

So this is assuming that the system has made it successfully though a postConfigure install or upgrade. At some point, you might need to do a stop or shutdownsystem for some maintenance or some other reason. And then do the startsystem. If any failures occur with the startSystem command, you can check the following:

1. get the system status's

# mcsadmin getsystemi

2. Here is some things that can point to you why the system didn't come up a. Make sure that all ProcMon processes are active on all nodes. If they aren't, then hear are some of the reasons why they might not be: i. Firewall is enabled on the 'pm1' node are the installing node, check that ii. ProcMon might have run into another issues at startup, like failing to mount to an external disk. So check the log files on the remote server where ProcMon failed to go ACTIVE b. If any of the DBRM Processes, Controller or Worker nodes are in a FAILED state, the most likely reason for this the there is an issue with the DBRM files. These fails are loaded from disk into shared-memory. This is load fails, it will mark the DBRM Process as FAILED. If this is the case, then please contact MariaDB Customer Support. The DBRM fails contain the Extent Map and other Metadata related to the MariaDB Columnstore Database Files. c. If it reports a module status of FAILED, then check the log files from that module.

3. Also check the log files from the local 'pm1' module.

System in DBRM Read-Only Mode

The System can go into DBRM Read-Only Mode due to these conditions, a failure while doing a DDL/DML command, network problem between servers where the DBRM could get distributed to the other servers from Performance Module 1, and some failover scenarios. It will be shown by the follow alarm. This alarm along with all critical alarms will be displayed when user logs into the Columnstore Admin Console 'mcsadmin'.

AlarmID = 31 Brief Description = DBRM_READ_ONLY Alarm Severity = CRITICAL Time Issued = Wed Sep 13 14:32:37 2017 Reporting Module = pm1 Reporting Process = DBRMControllerNode Reported Device = System

If the system ever gets into DBRM Read-Only Mode, its best resolved by do a restart system form the 'pm1' module:

# mcsadmin restartsystem y

DBRM Read-Only Mode means that changes can't be made to the MariaDB ColumnStore Database while it's in this state. Queries can still be processed.

Non-Root System, PrimProc Process fails to startup

For non-root systems, the user file settings is required to be set as shown in the Preparing Guide. So if you have a Non-Root install where it fails to start and the 'mcsadmin getsystemi' shows that the PrimProc Process is in a failed state. Double check the user file settings on each node.

https://mariadb.com/kb/en/library/preparing-for-columnstore-installation/#set-the-user-file-limits-by-root-user

Create table error - Error occurred when calling system catalog

If you are having a problem creating a table after an new install is performed and you get the error "Error occurred when calling system catalog", chances are the System Catalog didnt get created by postConfigure. The call to create happens at the very end of postConfigure, so its possible that postConfigure didn't successfully complete or there was an error when trying to create it.

Run the following command from PM1 to create the System Catalog:

/usr/local/mariadb/columnstore/bin/dbbuilder 7

NOTE: This example is assuming it's a root user install

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.