Comments - Issues with MariaDB ColumnStore 1.0.4
Content reproduced on this site is the property of its respective owners,
and this content is not reviewed in advance by MariaDB. The views, information and opinions
expressed by this content do not necessarily represent those of MariaDB or any other party.
It seems that the system can be started easier than before. However, the system cannot be stopped successfully including restartSystem and stopSystem.
The error messages are as follows:
ProcessMonitor[29575]: 24.095830 |0|0|0| E 18 CAL0000: EXCEPTION ERROR on setProcessStatus: Caught unknown exception!
controllernode[29910]: 37.135918 |0|0|0| E 29 CAL0000: DBRM: error: SessionManager::getSystemState() failed (network)
ProcessManager[29910]: 36.016391 |0|0|0| E 17 CAL0000: line: 1211 STOPSYSTEM: Failed, timeout waiting for module to stop
By the way, can you help investigate the issue of AlarmConfig.xml mentioned in my first reply?
We are not aware of issues that could cause AlarmConfig.xml to get emptied. The localedef issue causes some weird behavior because the install is partial so wanted to exclude that (we will update doc and the postCfg script is being updated to log better info if this is not done). On the stop, it would be helpful to check if there is anything in the logs for the other servers. Also the localedef command needs to performed on all servers not just the install one.
For starters, you shouldn't be using '/usr/local/mariadb/columnstore/bin/columnstore restart' restart the system. This will just restart the columnstore service on a local node. As documented in the KD guides, you will want to use the mcsadmin console for these commands. And these are best run from the pm1, which is where the install took place.
mcsadmin shutdownsystem y stop all processes on all nodes mcsadmin startsystem will start all processes, if ssh-key is not setup, you need to provide the user password as the third argument, when this is run after a shutdown, password not required when running after a stopsystem mcsadmin stopsystem stops all the DB processes on all nodes, leave the Proc-Mgt running
Additional information to help diagnose the issue. Once you install the packages on the initial server, pm1, run post-install and postConfigure. If you get to the point where it says Starting system processes, but it seems to hang or not return. Here are some things to check:
on pm1, create the alias if you haven't already
logs are located in:
/var/log/mariadb/columnstore
generally when ProcMon/ProcMgr isn't active, its because one of these issues: 1. if external storage, an pm /etc/fstab isnt setup 2. message issue between the servers that is causing ProcMon's and ProcMgr to fail to communicate. Make sure all server firewalls are disable along with SElinux.
Thanks for your feedback.
I reinstalled the MariaDB ColumnStore 1.0.4 from scratch after resetting the locale on all worker nodes. Right now, the installation can be finished successfully and the system status is Active. So it seems that it is not an issue of firewalls. In addition, I used local disks for installation.
I have copied the ssh key from PM1 to other nodes. Also I need copy the ssh key from UM1 to UM2 so that the UM1 can configure the replication between UM1 and UM2.
My configuration is 2UM3PM. I can reproduce the zero size of AlarmConfig.xml after restarting the processes in a certain PM node using the command "/usr/local/mariadb/columnstore/bin/columnstore restart". You are right, I should use mcsadmin to manage the whole system. But I have to try other workarounds when I failed to stop the system. I have checked the settings of firewalls and stopped the service of iptables. I still encountered the issue of stopping the system. I observed that some processes cannot be stopped while the status of PM node was failed. So please help investigate the stop issue. Thank you very much.
I've create jira bug:https://jira.mariadb.org/browse/MCOL-396 to track the AlarmConfig issue with direct pm server restart.
A similar bug MCOL-404 was submitted. I used non-root installation guide and encountered the same issue. Thanks.
Yes, this was a miss in the 1.0.4 release. This will be fixed in our next RC release 1.0.5. Thanks for testing!
Ok. Thank you very much.