DMLProc fails due to bulkrollback after aborted cpimport

792 views
Skip to first unread message

Nico Aymet

unread,
Jan 30, 2018, 9:01:26 AM1/30/18
to MariaDB ColumnStore
Hi to all,
I have an issue, some bulkimport via cpimport have been aborted due to network leak leaving some table locked.
When I try to restart the DMLProc on pm1, the DMLProc starts the "rollbackAll" routine trying to make a "bulkrollback", but I receive "IDB-2006: 'select schema,tablename,columnname from syscolumn where objectid=7196 --colName/EC7196' does not exist in Columnstore. and tablelock is not released."
After that the DBRM enter in read-only status and the DMLProc enter in FAILED status.

I tried to restart the DMLProc with the following comands:
/usr/local/mariadb/columnstore/bin/save_brm
/usr/local/mariadb/columnstore/bin/dbrmctl reload
/usr/local/mariadb/columnstore/bin/DMLProc

I already tried to unlock table with:
/usr/local/mariadb/columnstore/bin/cleartablelock 1
but I receive:
System is not ready.  Verify that InfiniDB is up and ready before running this program

Some one has idea how to solve it?
Maybe a solution can be also remove completely the locked table from calpontsys.systable, recreate it and reload the data, but how to do it if DMLProc fails?

-------------------------------------

Here the output of /usr/local/mariadb/columnstore/bin/viewtablelock
 There are 3 table locks

st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
  Table  LockID  Process  PID    Session   Txn  CreationTime              State    DBRoots  
         1       DMLProc  31728  BulkLoad  n/a  Tue Jan 30 03:17:14 2018  LOADING  1,3      
         2       DMLProc  31728  BulkLoad  n/a  Tue Jan 30 03:21:27 2018  LOADING  1,3      
         3       DMLProc  31728  BulkLoad  n/a  Tue Jan 30 03:21:27 2018  LOADING  1,3

Here the debug log after restart DMLProc:
Jan 30 13:40:27 pm1-server ProcessMonitor[1606]: 27.752781 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = BUSY_INIT
Jan 30 13:40:27 pm1-server ProcessMonitor[1606]: 27.752844 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = BUSY_INIT PID = 30276
Jan 30 13:40:27 pm1-server ProcessMonitor[1606]: 27.752871 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = BUSY_INIT
Jan 30 13:40:27 pm1-server DMLProc[30276]: 27.754986 |0|0|0| I 20 CAL0002: DMLProc starts rollbackAll.
Jan 30 13:40:27 pm1-server ProcessMonitor[1606]: 27.755943 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = ROLLBACK_INIT
Jan 30 13:40:27 pm1-server ProcessMonitor[1606]: 27.755995 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = ROLLBACK_INIT PID = 30276
Jan 30 13:40:28 pm1-server oamcpp[30276]: 28.780085 |0|0|0| E 08 CAL0000: OamCache::checkReload shows state for pm1 as FAILED
Jan 30 13:40:28 pm1-server DMLProc[30276]: 28.782740 |0|0|0| I 20 CAL0002: DMLProc will rollback 3 tables.
Jan 30 13:40:28 pm1-server joblist[31340]: 28.790924 |0|0|0| W 05 CAL0000: forcing reload of columnstore.xml for dbRootConnectionMap
Jan 30 13:40:29 pm1-server oamcpp[31340]: 29.793600 |0|0|0| E 08 CAL0000: OamCache::checkReload shows state for pm1 as FAILED
Jan 30 13:40:29 pm1-server joblist[31340]: 29.795667 |0|0|0| W 05 CAL0000: dbroot still not in dbRootConnectionMap
Jan 30 13:40:29 pm1-server joblist[31340]: 29.795826 |2147483648|0|0| C 05 CAL0000: st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline.
Jan 30 13:40:29 pm1-server DMLProc[30276]: 29.796407 |0|0|0| W 20 CAL0002: DMLProc started bulkrollback on table OID 7196 and table lock id 1 failed:IDB-2034: At least one DBRoot required for that query is offline.  and tablelock is not released.
Jan 30 13:40:29 pm1-server writeengine[31340]: 29.807400 |0|0|0| I 19 CAL0084: ClearTableLock: Starting bulk rollback for table test-db.table1 (OID-5519); lock-2; initiated by DMLProc.
Jan 30 13:40:29 pm1-server writeengine[31340]: 29.807589 |0|0|0| I 19 CAL0072: ClearTableLock: Rollback extents   for table test-db.table1 (OID-5519).
Jan 30 13:40:29 pm1-server writeengine[31340]: 29.813401 |0|0|0| I 19 CAL0073: ClearTableLock: Rollback extents   for table test-db.table1 (OID-5519), column 5520. DBRoot-1 (column extent)
Jan 30 13:40:29 pm1-server writeengine[31340]: 29.813459 |0|0|0| I 19 CAL0074: ClearTableLock: Rollback extentMap for table test-db.table1 (OID-5519), column 5520. Restoring HWM column extent: dbRoot-1; part#-0; seg#-1; hwm-3542
Jan 30 13:40:29 pm1-server writeengine[31340]: 29.832501 |0|0|0| I 19 CAL0075: ClearTableLock: Rollback dbfile    for table test-db.table1 (OID-5519), column 5520. Reinit HWM compressed column extent in db file: dbRoot-1; part#-0; seg#-1; rawOffset(bytes)-29024256; rawFreeBlks-4649
Jan 30 13:40:29 pm1-server writeengine[31340]: 29.851422 |0|0|0| I 19 CAL0075: ClearTableLock: Rollback dbfile    for table test-db.table1 (OID-5519), column 5520. HWM compressed column file: dbRoot-1; part#-0; seg#-1; restoredChunk-1351680 bytes; truncated to 67117056 bytes
Jan 30 13:40:30 pm1-server writeengine[31340]: 30.381591 |0|0|0| I 19 CAL0073: ClearTableLock: Rollback extents   for table test-db.table1 (OID-5519), column 5521. DBRoot-1 (column extent)
Jan 30 13:40:30 pm1-server writeengine[31340]: 30.381670 |0|0|0| I 19 CAL0074: ClearTableLock: Rollback extentMap for table test-db.table1 (OID-5519), column 5521. Restoring HWM column extent: dbRoot-1; part#-0; seg#-1; hwm-1771
Jan 30 13:40:30 pm1-server writeengine[31340]: 30.794350 |0|0|0| I 19 CAL0075: ClearTableLock: Rollback dbfile    for table test-db.table1 (OID-5519), column 5521. Reinit HWM compressed column extent in db file: dbRoot-1; part#-0; seg#-1; rawOffset(bytes)-14516224; rawFreeBlks-2324
Jan 30 13:40:30 pm1-server DMLProc[30276]: 30.813012 |0|0|0| W 20 CAL0002: DMLProc started bulkrollback on table OID 5519 and table lock id 2 failed:IDB-2034: At least one DBRoot required for that query is offline.  and tablelock is not released.
Jan 30 13:40:31 pm1-server writeengine[31340]: 31.036836 |0|0|0| I 19 CAL0075: ClearTableLock: Rollback dbfile    for table test-db.table1 (OID-5519), column 5521. HWM compressed column file: dbRoot-1; part#-0; seg#-1; restoredChunk-212992 bytes; truncated to 33562624 bytes
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.167926 |0|0|0| W 20 CAL0002: DMLProc started bulkrollback on table OID 7002 and table lock id 3 failed:IDB-2034: At least one DBRoot required for that query is offline.  and tablelock is not released.
Jan 30 13:40:32 pm1-server ProcessMonitor[1606]: 32.170867 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = ROLLBACK_INIT
Jan 30 13:40:32 pm1-server ProcessMonitor[1606]: 32.170931 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = ROLLBACK_INIT PID = 30276
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.171213 |0|0|0| I 20 CAL0002: DMLProc will rollback 4 transactions.
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.173712 |0|0|0| I 20 CAL0002: DMLProc will roll back transaction 506
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.176677 |0|0|0| C 20 CAL0002:  problem with rollback transaction 506and DBRM is setting to readonly and table lock is not released: lookupLocal from extent map error encountered while looking up lbid:verID 560750963589121:506 and error code is -1 with message UNKNOWN (-1) (WriteEngine returns error 1)
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.250087 |0|0|0| I 20 CAL0002: DMLProc will roll back transaction 130560
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.251615 |0|0|0| C 20 CAL0002:  problem with rollback transaction 130560and DBRM is setting to readonly and table lock is not released: Can't read DBRM isReadWrite [ DBRM is in READ-ONLY mode]
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.253761 |0|0|0| I 20 CAL0002: DMLProc will roll back transaction 130815
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.254972 |0|0|0| C 20 CAL0002:  problem with rollback transaction 130815and DBRM is setting to readonly and table lock is not released: Can't read DBRM isReadWrite [ DBRM is in READ-ONLY mode]
Jan 30 13:40:32 pm1-server controllernode[30276]: 32.255861 |0|0|0| C 29 CAL0000: ExtentMap::lookupLocal(): invalid lbid requested: -144112997642404352
Jan 30 13:40:32 pm1-server writeengine[31340]: 32.719351 |0|0|0| I 19 CAL0073: ClearTableLock: Rollback extents   for table test-db.table1 (OID-5519), column 5522. DBRoot-1 (column extent)
Jan 30 13:40:32 pm1-server writeengine[31340]: 32.719447 |0|0|0| I 19 CAL0074: ClearTableLock: Rollback extentMap for table test-db.table1 (OID-5519), column 5522. Restoring HWM column extent: dbRoot-1; part#-0; seg#-1; hwm-3542
Jan 30 13:40:32 pm1-server writeengine[31340]: 32.720821 |0|0|0| E 19 CAL0085: ClearTableLock: Ending bulk rollback for table test-db.table1 (OID-5519); lock-2; initiated by DMLProc. (rollback failed; Error rolling back column extents from extent map for 5522; dbRoot-1; partition-0; segment-1; hwm-3542; BRM error performing bulk rollback of column extents. [BRM error status: DBRM is in READ-ONLY mode]).
Jan 30 13:40:32 pm1-server writeengine[31340]: 32.720866 |0|0|0| I 19 CAL0085: ClearTableLock: Ending bulk rollback for table test-db.table1 (OID-5519); lock-2; initiated by DMLProc. (rollback failed; Error rolling back column extents from extent map for 5522; dbRoot-1; partition-0; segment-1; hwm-3542; BRM error performing bulk rollback of column extents. [BRM error status: DBRM is in READ-ONLY mode]).
Jan 30 13:40:32 pm1-server ProcessMonitor[1606]: 32.722897 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = FAILED
Jan 30 13:40:32 pm1-server DMLProc[30276]: 32.723200 |0|0|0| C 20 CAL0002: DMLProc failed to start due to : ExtentMap::lookupLocal(): invalid lbid requested: -144112997642404352
Jan 30 13:40:32 pm1-server ProcessMonitor[1606]: 32.723230 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = FAILED PID = 0
Jan 30 13:40:32 pm1-server ProcessMonitor[1606]: 32.723309 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = FAILED
Jan 30 13:40:32 pm1-server joblist[31340]: 32.723391 |0|0|0| W 05 CAL0000: forcing reload of columnstore.xml for dbRootConnectionMap
Jan 30 13:40:33 pm1-server oamcpp[31340]: 33.725851 |0|0|0| E 08 CAL0000: OamCache::checkReload shows state for pm1 as FAILED
Jan 30 13:40:33 pm1-server joblist[31340]: 33.728500 |0|0|0| W 05 CAL0000: dbroot still not in dbRootConnectionMap
Jan 30 13:40:33 pm1-server joblist[31340]: 33.728724 |2147483648|0|0| C 05 CAL0000: st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline.
Jan 30 13:40:39 pm1-server ProcessMonitor[1606]: 39.459680 |0|0|0| D 18 CAL0000: State out-of-sync, update on DMLProc/7/32254
Jan 30 13:40:39 pm1-server ProcessMonitor[1606]: 39.459776 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 7 PID = 32254
Jan 30 13:40:39 pm1-server ProcessMonitor[1606]: 39.460215 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = FAILED
Jan 30 13:40:39 pm1-server ProcessMonitor[1606]: 39.460274 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = FAILED PID = 32254

Here the output of getSoftwareInfo:
Package: mariadb-columnstore-platform
Status: install ok installed
Priority: optional
Section: devel
Installed-Size: 147697
Maintainer: MariaDB Corporation Ab
Architecture: amd64
Version: 1.1.2
Provides: mariadb-columnstore-platform
Depends: expect, libboost-all-dev, mariadb-columnstore-libs, libsnappy1v5
Description: MariaDB Columnstore: a very fast and robust SQL database server

Here the output of getProcessStatus:
mcsadmin> getProcessStatus
getprocessstatus   Tue Jan 30 13:15:24 2018

MariaDB ColumnStore Process statuses

Process             Module    Status            Last Status Change        Process ID
------------------  ------    ---------------   ------------------------  ----------
ProcessMonitor      pm1       ACTIVE            Tue Jan 30 10:50:17 2018        1606
ProcessManager      pm1       ACTIVE            Tue Jan 30 10:50:23 2018        2622
DBRMControllerNode  pm1       ACTIVE            Tue Jan 30 11:50:39 2018       29782
ServerMonitor       pm1       ACTIVE            Tue Jan 30 11:50:41 2018       29805
DBRMWorkerNode      pm1       ACTIVE            Tue Jan 30 11:50:41 2018       29849
DecomSvr            pm1       ACTIVE            Tue Jan 30 11:50:45 2018       30042
PrimProc            pm1       ACTIVE            Tue Jan 30 11:50:47 2018       30129
ExeMgr              pm1       ACTIVE            Tue Jan 30 11:50:57 2018       31184
WriteEngineServer   pm1       ACTIVE            Tue Jan 30 11:51:01 2018       31340
DDLProc             pm1       ACTIVE            Tue Jan 30 11:51:05 2018       31562
DMLProc             pm1       FAILED            Tue Jan 30 11:51:36 2018       32254
mysqld              pm1       ACTIVE            Tue Jan 30 11:50:53 2018       29474

ProcessMonitor      pm2       ACTIVE            Tue Jan 30 10:50:35 2018        1601
ProcessManager      pm2       HOT_STANDBY       Tue Jan 30 11:51:15 2018       17095
DBRMControllerNode  pm2       COLD_STANDBY      Tue Jan 30 11:50:54 2018
ServerMonitor       pm2       ACTIVE            Tue Jan 30 11:50:39 2018       16292
DBRMWorkerNode      pm2       ACTIVE            Tue Jan 30 11:50:44 2018       16324
DecomSvr            pm2       ACTIVE            Tue Jan 30 11:50:47 2018       16365
PrimProc            pm2       ACTIVE            Tue Jan 30 11:50:50 2018       16376
ExeMgr              pm2       ACTIVE            Tue Jan 30 11:50:58 2018       17002
WriteEngineServer   pm2       ACTIVE            Tue Jan 30 11:51:02 2018       17028
DDLProc             pm2       COLD_STANDBY      Tue Jan 30 11:51:02 2018
DMLProc             pm2       COLD_STANDBY      Tue Jan 30 11:51:02 2018
mysqld              pm2       ACTIVE            Tue Jan 30 11:50:55 2018       16137

ProcessMonitor      pm3       ACTIVE            Tue Jan 30 10:50:36 2018        1643
ProcessManager      pm3       COLD_STANDBY      Tue Jan 30 11:50:42 2018
DBRMControllerNode  pm3       COLD_STANDBY      Tue Jan 30 11:50:42 2018
ServerMonitor       pm3       ACTIVE            Tue Jan 30 11:50:45 2018       14505
DBRMWorkerNode      pm3       ACTIVE            Tue Jan 30 11:50:47 2018       14559
DecomSvr            pm3       ACTIVE            Tue Jan 30 11:50:51 2018       14574
PrimProc            pm3       ACTIVE            Tue Jan 30 11:50:53 2018       14585
ExeMgr              pm3       ACTIVE            Tue Jan 30 11:50:57 2018       14654
WriteEngineServer   pm3       ACTIVE            Tue Jan 30 11:51:01 2018       14678
DDLProc             pm3       COLD_STANDBY      Tue Jan 30 11:51:03 2018
DMLProc             pm3       COLD_STANDBY      Tue Jan 30 11:51:03 2018
mysqld              pm3       ACTIVE            Tue Jan 30 11:50:45 2018       14370


David Hill

unread,
Jan 30, 2018, 9:55:42 AM1/30/18
to Nico Aymet, MariaDB ColumnStore
For starters, that is not the way to restart a process and its not recommended to restart just a single process.
It is recommended to perform a system restart via mcsadmin.

Try running this from pm1 and see if the rollback takes places and the locks are clearer.

# mcsadmin restartSystem y

--
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columnstore+unsub...@googlegroups.com.
To post to this group, send email to mariadb-columnstore@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/61e299ad-facb-43b2-b6a5-a7366a156c34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
------------------
David Hill
MariaDB ColumnStore Development
MariaDB Corporation
http://www.mariadb.com

Nico

unread,
Jan 30, 2018, 10:05:52 AM1/30/18
to MariaDB ColumnStore
Thanks for your reply David,
obviously I've already tried in the way you suggest, with the same result:
mcsadmin> restartSystem y
restartsystem   Tue Jan 30 14:58:04 2018
The following tables are locked:
st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
LockID Name         Process PID   Session  CreationTime         State    DBRoots
st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
1      .            DMLProc 30276 BulkLoad 2018-01-30 03:17:14  Abandoned1      ,3
st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
2      .            DMLProc 30276 BulkLoad 2018-01-30 03:21:27  Abandoned1      ,3
st: 0 TupleBPS::sendPrimitiveMessages() caught an exception: IDB-2034: At least one DBRoot required for that query is offline. 
3      .            DMLProc 30276 BulkLoad 2018-01-30 03:21:27  Abandoned1      ,3
Your options are:
    Cancel    -- Cancel the restart request
    Wait      -- Wait for write operations to end and then restart
    Force     -- Force a restart
What would you like to do: [Cancel]: Force

   System being restarted now ...................

   System Not Ready, DMLProc is checking/processing rollback of abandoned transactions. Processing could take some time, please wait....
**** restartSystem Failed : check log files

Here the debug log:
Jan 30 14:59:34 prod-cs-1112 ProcessMonitor[1606]: 34.416846 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc\*
Jan 30 14:59:34 prod-cs-1112 ProcessMonitor[1606]: 34.416926 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 2 PID = 0
Jan 30 14:59:34 prod-cs-1112 ProcessMonitor[1606]: 34.417317 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = MAN_INIT
Jan 30 14:59:34 prod-cs-1112 ProcessMonitor[1606]: 34.417376 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = MAN_INIT PID = 0
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.427100 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 21 PID = 17742
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.427667 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = PID_UPDATE
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.427723 |0|0|0| D 18 CAL0000: DMLProc PID is 17742
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.427771 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = PID_UPDATE PID = 17742
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.427934 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 13 on device DMLProc
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.428004 |0|0|0| I 18 CAL0000: STARTALL: ACK back to ProcMgr, return status = 0
Jan 30 14:59:35 prod-cs-1112 ProcessManager[2622]: 35.428141 |0|0|0| D 17 CAL0000: Set Module pm1 State = 4
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.428615 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = ACTIVE
Jan 30 14:59:35 prod-cs-1112 ProcessManager[2622]: 35.428640 |0|0|0| D 17 CAL0000: pm1 module is started by request.
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.429492 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
Jan 30 14:59:35 prod-cs-1112 ProcessMonitor[1606]: 35.431575 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 21 on device DMLProc
Jan 30 14:59:35 prod-cs-1112 ProcessManager[2622]: 35.432310 |0|0|0| D 17 CAL0000: ACK received from 'pm1' Process-Monitor, return status = 0
Jan 30 14:59:35 prod-cs-1112 ProcessManager[2622]: 35.432388 |0|0|0| D 17 CAL0000: startModuleThread Exit on pm1
Jan 30 14:59:36 prod-cs-1112 ProcessMonitor[1606]: 36.623784 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = BUSY_INIT
Jan 30 14:59:36 prod-cs-1112 ProcessMonitor[1606]: 36.623860 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = BUSY_INIT PID = 17742
Jan 30 14:59:36 prod-cs-1112 ProcessMonitor[1606]: 36.623916 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = BUSY_INIT
Jan 30 14:59:36 prod-cs-1112 DMLProc[17742]: 36.626559 |0|0|0| I 20 CAL0002: DMLProc starts rollbackAll.
Jan 30 14:59:36 prod-cs-1112 ProcessMonitor[1606]: 36.665137 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = ROLLBACK_INIT
Jan 30 14:59:36 prod-cs-1112 ProcessMonitor[1606]: 36.665211 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = ROLLBACK_INIT PID = 17742
Jan 30 14:59:36 prod-cs-1112 DMLProc[17742]: 36.690092 |0|0|0| I 20 CAL0002: DMLProc will rollback 3 tables.
Jan 30 14:59:37 prod-cs-1112 DMLProc[17742]: 37.401586 |0|0|0| W 20 CAL0002: DMLProc started bulkrollback on table OID 7196 and table lock id 1 failed:IDB-2006: 'select schema,tablename,columnname from syscolumn where objectid=7196 --colName/EC7196' does not exist in Columnstore. and tablelock is not released.
Jan 30 14:59:37 prod-cs-1112 DMLProc[17742]: 37.517354 |0|0|0| W 20 CAL0002: DMLProc started bulkrollback on table OID 5519 and table lock id 2 failed:IDB-2006: 'select schema,tablename,columnname from syscolumn where objectid=7196 --colName/EC7196' does not exist in Columnstore. and tablelock is not released.
Jan 30 14:59:37 prod-cs-1112 writeengine[17396]: 37.577210 |0|0|0| I 19 CAL0084: ClearTableLock: Starting bulk rollback for table te-sbb.loco__zone__db__484017 (OID-5519); lock-2; initiated by DMLProc.
Jan 30 14:59:37 prod-cs-1112 writeengine[17396]: 37.577619 |0|0|0| I 19 CAL0072: ClearTableLock: Rollback extents   for table te-sbb.loco__zone__db__484017 (OID-5519).
Jan 30 14:59:37 prod-cs-1112 writeengine[17396]: 37.753594 |0|0|0| I 19 CAL0073: ClearTableLock: Rollback extents   for table te-sbb.loco__zone__db__484017 (OID-5519), column 5520. DBRoot-1 (column extent)
Jan 30 14:59:37 prod-cs-1112 writeengine[17396]: 37.753684 |0|0|0| I 19 CAL0074: ClearTableLock: Rollback extentMap for table te-sbb.loco__zone__db__484017 (OID-5519), column 5520. Restoring HWM column extent: dbRoot-1; part#-0; seg#-1; hwm-3542
Jan 30 14:59:38 prod-cs-1112 DMLProc[17742]: 38.096296 |0|0|0| W 20 CAL0002: DMLProc started bulkrollback on table OID 7002 and table lock id 3 failed:IDB-2006: 'select schema,tablename,columnname from syscolumn where objectid=7002 --colName/EC7002' does not exist in Columnstore. and tablelock is not released.
Jan 30 14:59:38 prod-cs-1112 ProcessMonitor[1606]: 38.099132 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = ROLLBACK_INIT
Jan 30 14:59:38 prod-cs-1112 ProcessMonitor[1606]: 38.099187 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = ROLLBACK_INIT PID = 17742
Jan 30 14:59:38 prod-cs-1112 DMLProc[17742]: 38.099432 |0|0|0| I 20 CAL0002: DMLProc will rollback 4 transactions.
Jan 30 14:59:38 prod-cs-1112 DMLProc[17742]: 38.115208 |0|0|0| I 20 CAL0002: DMLProc will roll back transaction 506
Jan 30 14:59:38 prod-cs-1112 DMLProc[17742]: 38.119703 |0|0|0| C 20 CAL0002:  problem with rollback transaction 506and DBRM is setting to readonly and table lock is not released: lookupLocal from extent map error encountered while looking up lbid:verID 560750963589121:506 and error code is -1 with message UNKNOWN (-1) (WriteEngine returns error 1)
Jan 30 14:59:38 prod-cs-1112 DMLProc[17742]: 38.177268 |0|0|0| I 20 CAL0002: DMLProc will roll back transaction 130560
Jan 30 14:59:38 prod-cs-1112 DMLProc[17742]: 38.178739 |0|0|0| C 20 CAL0002:  problem with rollback transaction 130560and DBRM is setting to readonly and table lock is not released: Can't read DBRM isReadWrite [ DBRM is in READ-ONLY mode]
Jan 30 14:59:38 prod-cs-1112 DMLProc[17742]: 38.180928 |0|0|0| I 20 CAL0002: DMLProc will roll back transaction 130815
Jan 30 14:59:38 prod-cs-1112 DMLProc[17742]: 38.181903 |0|0|0| C 20 CAL0002:  problem with rollback transaction 130815and DBRM is setting to readonly and table lock is not released: Can't read DBRM isReadWrite [ DBRM is in READ-ONLY mode]
Jan 30 14:59:38 prod-cs-1112 controllernode[17742]: 38.182627 |0|0|0| C 29 CAL0000: ExtentMap::lookupLocal(): invalid lbid requested: -144112997642404352
Jan 30 14:59:38 prod-cs-1112 writeengine[17396]: 38.622030 |0|0|0| I 19 CAL0075: ClearTableLock: Rollback dbfile    for table te-sbb.loco__zone__db__484017 (OID-5519), column 5520. Reinit HWM compressed column extent in db file: dbRoot-1; part#-0; seg#-1; rawOffset(bytes)-29024256; rawFreeBlks-4649
Jan 30 14:59:39 prod-cs-1112 writeengine[17396]: 39.180731 |0|0|0| I 19 CAL0075: ClearTableLock: Rollback dbfile    for table te-sbb.loco__zone__db__484017 (OID-5519), column 5520. HWM compressed column file: dbRoot-1; part#-0; seg#-1; restoredChunk-1351680 bytes; truncated to 67117056 bytes
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.227082 |0|0|0| D 17 CAL0000: getStandbyModule called
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.227160 |0|0|0| D 17 CAL0000: GLUSTER_WHOHAS called:1 2
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.228021 |0|0|0| D 17 CAL0000: setStandbyModule called
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.231793 |0|0|0| D 17 CAL0000: distributeConfigFile called for system file = Columnstore.xml
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.237299 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.237919 |0|0|0| D 17 CAL0000: pm2 distributeConfigFile success.
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.241455 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm3
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.242164 |0|0|0| D 17 CAL0000: pm3 distributeConfigFile success.
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.242227 |0|0|0| D 17 CAL0000: Columnstore.xml Standby OAM updated to : pm2:10.11.1.3
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.242250 |0|0|0| D 17 CAL0000: Send Message for new Hot-Standby ProcessManager to module = pm2
Jan 30 14:59:39 prod-cs-1112 ProcessManager[2622]: 39.245468 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
Jan 30 14:59:40 prod-cs-1112 ProcessMonitor[1606]: 40.241625 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm2/ProcessManager State = HOT_STANDBY
Jan 30 14:59:40 prod-cs-1112 ProcessMonitor[1606]: 40.241698 |0|0|0| D 18 CAL0000: statusControl: Set Process pm2/ProcessManager State = HOT_STANDBY PID = 0
Jan 30 14:59:40 prod-cs-1112 ProcessMonitor[1606]: 40.253547 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm2/ProcessManager State = HOT_STANDBY
Jan 30 14:59:40 prod-cs-1112 ProcessMonitor[1606]: 40.253619 |0|0|0| D 18 CAL0000: statusControl: Set Process pm2/ProcessManager State = HOT_STANDBY PID = 0
Jan 30 14:59:41 prod-cs-1112 ProcessMonitor[1606]: 41.268420 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm2/ProcessManager State = PID_UPDATE
Jan 30 14:59:41 prod-cs-1112 ProcessMonitor[1606]: 41.268527 |0|0|0| D 18 CAL0000: statusControl: Set Process pm2/ProcessManager State = PID_UPDATE PID = 23083
Jan 30 14:59:41 prod-cs-1112 ProcessManager[2622]: 41.269278 |0|0|0| D 17 CAL0000: pm2/ProcessManager process is started by request.
Jan 30 14:59:41 prod-cs-1112 ProcessManager[2622]: 41.269345 |0|0|0| D 17 CAL0000: Hot-Standby ProcessManager ACK received from Process-Monitor, return status = 0
Jan 30 14:59:41 prod-cs-1112 ProcessManager[2622]: 41.269386 |0|0|0| D 17 CAL0000: distributeConfigFile called for system file = Columnstore.xml
Jan 30 14:59:41 prod-cs-1112 ProcessManager[2622]: 41.273352 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
Jan 30 14:59:42 prod-cs-1112 writeengine[17396]: 42.063392 |0|0|0| I 19 CAL0073: ClearTableLock: Rollback extents   for table te-sbb.loco__zone__db__484017 (OID-5519), column 5521. DBRoot-1 (column extent)
Jan 30 14:59:42 prod-cs-1112 writeengine[17396]: 42.063496 |0|0|0| I 19 CAL0074: ClearTableLock: Rollback extentMap for table te-sbb.loco__zone__db__484017 (OID-5519), column 5521. Restoring HWM column extent: dbRoot-1; part#-0; seg#-1; hwm-1771
Jan 30 14:59:42 prod-cs-1112 writeengine[17396]: 42.084919 |0|0|0| E 19 CAL0085: ClearTableLock: Ending bulk rollback for table te-sbb.loco__zone__db__484017 (OID-5519); lock-2; initiated by DMLProc. (rollback failed; Error rolling back column extents from extent map for 5521; dbRoot-1; partition-0; segment-1; hwm-1771; BRM error performing bulk rollback of column extents. [BRM error status: DBRM is in READ-ONLY mode]).
Jan 30 14:59:42 prod-cs-1112 writeengine[17396]: 42.085006 |0|0|0| I 19 CAL0085: ClearTableLock: Ending bulk rollback for table te-sbb.loco__zone__db__484017 (OID-5519); lock-2; initiated by DMLProc. (rollback failed; Error rolling back column extents from extent map for 5521; dbRoot-1; partition-0; segment-1; hwm-1771; BRM error performing bulk rollback of column extents. [BRM error status: DBRM is in READ-ONLY mode]).
Jan 30 14:59:42 prod-cs-1112 ProcessMonitor[1606]: 42.088388 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = FAILED
Jan 30 14:59:42 prod-cs-1112 ProcessMonitor[1606]: 42.088471 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = FAILED PID = 0
Jan 30 14:59:42 prod-cs-1112 ProcessMonitor[1606]: 42.089946 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = FAILED
Jan 30 14:59:42 prod-cs-1112 DMLProc[17742]: 42.089989 |0|0|0| C 20 CAL0002: DMLProc failed to start due to : ExtentMap::lookupLocal(): invalid lbid requested: -144112997642404352
Jan 30 14:59:42 prod-cs-1112 ProcessManager[2622]: 42.269639 |0|0|0| D 17 CAL0000: pm2 distributeConfigFile success.
Jan 30 14:59:42 prod-cs-1112 ProcessManager[2622]: 42.273155 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm3
Jan 30 14:59:42 prod-cs-1112 ProcessManager[2622]: 42.273795 |0|0|0| D 17 CAL0000: pm3 distributeConfigFile success.
Jan 30 14:59:42 prod-cs-1112 ProcessManager[2622]: 42.273883 |0|0|0| D 17 CAL0000: setQuerySystemState = 1
Jan 30 14:59:42 prod-cs-1112 ProcessManager[2622]: 42.274438 |0|0|0| D 17 CAL0000: setQuerySystemState successful
Jan 30 14:59:42 prod-cs-1112 ProcessManager[2622]: 42.274918 |0|0|0| D 17 CAL0000: Set System State = FAILED
Jan 30 14:59:42 prod-cs-1112 ProcessMonitor[1606]: 42.275228 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = FAILED
Jan 30 14:59:42 prod-cs-1112 ProcessManager[2622]: 42.277195 |0|0|0| D 17 CAL0000: startSystemThread Exit
Jan 30 14:59:42 prod-cs-1112 ProcessMonitor[1606]: 42.354997 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm2/ProcessManager State = HOT_STANDBY
Jan 30 14:59:42 prod-cs-1112 ProcessMonitor[1606]: 42.355073 |0|0|0| D 18 CAL0000: statusControl: Set Process pm2/ProcessManager State = HOT_STANDBY PID = 23083
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.636092 |0|0|0| C 18 CAL0000: *****Calpont Process Restarting: DMLProc, old PID = 17742
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.636239 |0|0|0| D 18 CAL0000: STOPPING Process: DMLProc
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.636308 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 1 PID = 0
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.636381 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.638512 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = AUTO_OFFLINE
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.638555 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = AUTO_OFFLINE PID = 0
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.640757 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DMLProc
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.641985 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DMLProc
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.643574 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc\*
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.643632 |0|0|0| D 18 CAL0000: STARTING Process: DMLProc
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.643654 |0|0|0| D 18 CAL0000: Process location: /usr/local/mariadb/columnstore/bin/DMLProc
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.644001 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm1 is 4
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.644688 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm2 is 4
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.645689 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm3 is 4
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.646547 |0|0|0| D 18 CAL0000: Dependent process of DBRMWorkerNode/pm1 is 4
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.647229 |0|0|0| D 18 CAL0000: Dependent process of DBRMWorkerNode/pm2 is 4
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.647818 |0|0|0| D 18 CAL0000: Dependent process of DBRMWorkerNode/pm3 is 4
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.648826 |0|0|0| D 18 CAL0000: Dependent process of DDLProc/pm1 is 4
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.650353 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 27 on device DBRM
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.656426 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc\*
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.656488 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 3 PID = 0
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.656884 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = AUTO_INIT
Jan 30 14:59:45 prod-cs-1112 ProcessMonitor[1606]: 45.656942 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = AUTO_INIT PID = 0
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.666081 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 21 PID = 18184
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.666622 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = PID_UPDATE
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.666680 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = PID_UPDATE PID = 18184
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.666730 |0|0|0| D 18 CAL0000: DMLProc PID is 18184
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.666881 |0|0|0| D 18 CAL0000: Inform Process Mgr that process was restarted: DMLProc
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.667110 |0|0|0| I 18 CAL0000: Calpont Process DMLProc restarted successfully!!
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.667301 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 13 on device DMLProc
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.667783 |0|0|0| I 17 CAL0000: MSG RECEIVED: Process Restarted on pm1/DMLProc
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.668176 |0|0|0| D 17 CAL0000: setQuerySystemState = 0
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.668804 |0|0|0| D 17 CAL0000: setQuerySystemState successful
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.669035 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 21 on device DMLProc
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.669151 |0|0|0| D 17 CAL0000: Set System State = BUSY_INIT
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.669469 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = BUSY_INIT
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.670952 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.671714 |0|0|0| D 17 CAL0000: reinitProcessType: ReInit all cpimport
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.672832 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.673092 |0|0|0| I 18 CAL0000: MSG RECEIVED: Re-Init process request on: cpimport
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.675004 |0|0|0| D 17 CAL0000: cpimport process is reinited by request.
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.675059 |0|0|0| D 17 CAL0000: reinitProcessType: ACK received from Process-Monitor, return status = 0
Jan 30 14:59:46 prod-cs-1112 ProcessMonitor[1606]: 46.682559 |0|0|0| I 18 CAL0000: PROCREINITPROCESS: completed, no ack to ProcMgr
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.682723 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.683207 |0|0|0| D 17 CAL0000: cpimport process is reinited by request.
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.683288 |0|0|0| D 17 CAL0000: reinitProcessType: ACK received from Process-Monitor, return status = 0
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.686747 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm3
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.687442 |0|0|0| D 17 CAL0000: cpimport process is reinited by request.
Jan 30 14:59:46 prod-cs-1112 ProcessManager[2622]: 46.687493 |0|0|0| D 17 CAL0000: reinitProcessType: ACK received from Process-Monitor, return status = 0
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.733814 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = BUSY_INIT
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.733878 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = BUSY_INIT PID = 18184
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.733906 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = BUSY_INIT
Jan 30 14:59:47 prod-cs-1112 DMLProc[18184]: 47.735459 |0|0|0| I 20 CAL0002: DMLProc starts rollbackAll.
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.736533 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = FAILED
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.736585 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = FAILED PID = 0
Jan 30 14:59:47 prod-cs-1112 DMLProc[18184]: 47.736779 |0|0|0| C 20 CAL0002: DMLProc failed to start due to : Rollback will be deferred due to DBRM is in read only state.
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.736766 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = FAILED
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.740086 |0|0|0| C 18 CAL0000: *****Calpont Process Restarting: DMLProc, old PID = 18184
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.740163 |0|0|0| D 18 CAL0000: STOPPING Process: DMLProc
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.740207 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 1 PID = 0
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.740293 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DMLProc
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.740714 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = AUTO_OFFLINE
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.742735 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = AUTO_OFFLINE PID = 0
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.743563 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.744524 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DMLProc
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.748446 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc\*
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.748528 |0|0|0| D 18 CAL0000: STARTING Process: DMLProc
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.748567 |0|0|0| D 18 CAL0000: Process location: /usr/local/mariadb/columnstore/bin/DMLProc
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.748959 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm1 is 4
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.749628 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm2 is 4
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.750328 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm3 is 4
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.751195 |0|0|0| D 18 CAL0000: Dependent process of DBRMWorkerNode/pm1 is 4
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.751831 |0|0|0| D 18 CAL0000: Dependent process of DBRMWorkerNode/pm2 is 4
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.752568 |0|0|0| D 18 CAL0000: Dependent process of DBRMWorkerNode/pm3 is 4
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.753366 |0|0|0| D 18 CAL0000: Dependent process of DDLProc/pm1 is 4
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.753464 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 27 on device DBRM
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.760629 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc\*
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.760686 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 3 PID = 0
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.760974 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = AUTO_INIT
Jan 30 14:59:47 prod-cs-1112 ProcessMonitor[1606]: 47.761021 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = AUTO_INIT PID = 0
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.771071 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 21 PID = 18297
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.771683 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = PID_UPDATE
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.771743 |0|0|0| D 18 CAL0000: DMLProc PID is 18297
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.771759 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = PID_UPDATE PID = 18297
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.771968 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 21 on device DMLProc
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.772038 |0|0|0| D 18 CAL0000: Inform Process Mgr that process was restarted: DMLProc
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.772366 |0|0|0| I 18 CAL0000: Calpont Process DMLProc restarted successfully!!
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.772986 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.773020 |0|0|0| I 17 CAL0000: MSG RECEIVED: Process Restarted on pm1/DMLProc
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.773453 |0|0|0| D 17 CAL0000: setQuerySystemState = 0
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.774306 |0|0|0| D 17 CAL0000: setQuerySystemState successful
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.774464 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 13 on device DMLProc
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.774564 |0|0|0| D 17 CAL0000: Set System State = BUSY_INIT
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.774929 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = BUSY_INIT
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.776860 |0|0|0| D 17 CAL0000: reinitProcessType: ReInit all cpimport
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.777920 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.778144 |0|0|0| I 18 CAL0000: MSG RECEIVED: Re-Init process request on: cpimport
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.780086 |0|0|0| D 17 CAL0000: cpimport process is reinited by request.
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.780122 |0|0|0| D 17 CAL0000: reinitProcessType: ACK received from Process-Monitor, return status = 0
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.785585 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.786076 |0|0|0| D 17 CAL0000: cpimport process is reinited by request.
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.786131 |0|0|0| D 17 CAL0000: reinitProcessType: ACK received from Process-Monitor, return status = 0
Jan 30 14:59:48 prod-cs-1112 ProcessMonitor[1606]: 48.787563 |0|0|0| I 18 CAL0000: PROCREINITPROCESS: completed, no ack to ProcMgr
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.789707 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm3
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.790521 |0|0|0| D 17 CAL0000: cpimport process is reinited by request.
Jan 30 14:59:48 prod-cs-1112 ProcessManager[2622]: 48.790572 |0|0|0| D 17 CAL0000: reinitProcessType: ACK received from Process-Monitor, return status = 0
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.838993 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = BUSY_INIT
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.839060 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = BUSY_INIT PID = 18297
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.839089 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = BUSY_INIT
Jan 30 14:59:49 prod-cs-1112 DMLProc[18297]: 49.840588 |0|0|0| I 20 CAL0002: DMLProc starts rollbackAll.
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.841536 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = FAILED
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.841583 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = FAILED PID = 0
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.841758 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = FAILED
Jan 30 14:59:49 prod-cs-1112 DMLProc[18297]: 49.841865 |0|0|0| C 20 CAL0002: DMLProc failed to start due to : Rollback will be deferred due to DBRM is in read only state.
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.849373 |0|0|0| W 18 CAL0000: DMLProc failed initialization
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.849522 |0|0|0| D 18 CAL0000: Send SET Alarm ID 25 on device DMLProc
Jan 30 14:59:49 prod-cs-1112 ProcessMonitor[1606]: 49.849843 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = FAILED
Jan 30 15:00:05 prod-cs-1112 ProcessMonitor[1606]: 05.854339 |0|0|0| D 18 CAL0000: State out-of-sync, update on DMLProc/7/18297
Jan 30 15:00:05 prod-cs-1112 ProcessMonitor[1606]: 05.854462 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 7 PID = 18297
Jan 30 15:00:05 prod-cs-1112 ProcessMonitor[1606]: 05.854979 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = FAILED
Jan 30 15:00:05 prod-cs-1112 ProcessMonitor[1606]: 05.855065 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = FAILED PID = 18297

David Hill

unread,
Jan 30, 2018, 10:15:58 AM1/30/18
to Nico, MariaDB ColumnStore
Thanks... I see this issue

DMLProc started bulkrollback on table OID 5519 and table lock id 2 failed:IDB-2006: 'select schema,tablename,columnname from syscolumn where objectid=7196 --colName/EC7196' does not exist in Columnstore. and tablelock is not released.

Lets check of the DBRM files are there and what they look like on pm1

# dbrm   // this is an alias to get you to the dbrm directory
# ls -ltr

Can you please run and post the output.

Thanks

--
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columnstore+unsub...@googlegroups.com.
To post to this group, send email to mariadb-columnstore@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nico

unread,
Jan 30, 2018, 10:21:43 AM1/30/18
to MariaDB ColumnStore
Here:
root@prod-cs-1112:/usr/local/mariadb/columnstore/data1/systemFiles/dbrm# ls -ltr
total 16556
-rw-rw-rw- 1 root root 2099208 ene 30 00:34 oidbitmap
-rw-rw-rw- 1 root root  818732 ene 30 13:40 BRM_savesB_em
-rw-rw-rw- 1 root root 1379196 ene 30 13:40 BRM_savesB_vbbm
-rw-rw-rw- 1 root root 2751512 ene 30 13:40 BRM_savesB_vss
-rw-rw-rw- 1 root root  818732 ene 30 14:58 BRM_saves_em
-rw-rw-rw- 1 root root 1379196 ene 30 14:58 BRM_saves_vbbm
-rw-rw-rw- 1 root root 2751512 ene 30 14:58 BRM_saves_vss
-rw-r--r-- 1 root root     169 ene 30 14:59 tablelocks
-rw-rw-rw- 1 root root  818732 ene 30 14:59 BRM_savesA_em
-rw-rw-rw- 1 root root 1379196 ene 30 14:59 BRM_savesA_vbbm
-rw-rw-r-- 1 root root      65 ene 30 14:59 BRM_saves_current
-rw-rw-rw- 1 root root 2751512 ene 30 14:59 BRM_savesA_vss
-rw-rw-rw- 1 root root      22 ene 30 14:59 BRM_saves_journal
-rw-rw-r-- 1 root root      12 ene 30 15:09 SMTxnID


David Hill

unread,
Jan 30, 2018, 11:14:36 AM1/30/18
to Nico, MariaDB ColumnStore
Thanks for info. DBRM data looks intact...

If possible, please provide a copy of the ColumnStore Support Report. 
We can look into this further.



--
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columnstore+unsub...@googlegroups.com.
To post to this group, send email to mariadb-columnstore@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nico

unread,
Jan 30, 2018, 11:26:24 AM1/30/18
to MariaDB ColumnStore
Sent privately
Thanks
Nico

Andrew Hutchings

unread,
Jan 30, 2018, 12:14:39 PM1/30/18
to mariadb-c...@googlegroups.com
Hi Nico,

If your ColumnStore installation is running fine now these locks can be
removed using a hidden cleartablelock option, '-l'. For example:

cleartablelock -l 1

This ignores any rollback (since the table doesn't appear to exist) and
removes the lock anyway.

The downside is if the table really does exist and there was data to
rollback, it cannot be rolled back any more. This is why we don't really
publish this option.

Hope this helps.

Kind Regards
Andrew
> <https://mariadb.com/kb/en/library/system-troubleshooting-mariadb-columnstore/#mariadb-columnstore-support-report-tool>
>
> --
> You received this message because you are subscribed to the Google
> Groups "MariaDB ColumnStore" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mariadb-columns...@googlegroups.com
> <mailto:mariadb-columns...@googlegroups.com>.
> To post to this group, send email to
> mariadb-c...@googlegroups.com
> <mailto:mariadb-c...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mariadb-columnstore/9033dc0e-6625-46ac-a7a0-21a854504cd9%40googlegroups.com
> <https://groups.google.com/d/msgid/mariadb-columnstore/9033dc0e-6625-46ac-a7a0-21a854504cd9%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Andrew Hutchings (LinuxJedi)
Lead Engineer, MariaDB ColumnStore

Nico

unread,
Jan 30, 2018, 12:34:34 PM1/30/18
to MariaDB ColumnStore
Hi Andrew,
thanks for your suggestion.
I tried the -l option, but since the DMLProc on pm1 is FAILED seems I can't use the command, neither with this option.

Here what I tried:
root@prod-cs-1112:/usr/local/mariadb/columnstore/bin/cleartablelock -l 1
BRM error: DBRM is in READ-ONLY mode
Table lock 1 is not cleared.
root@prod-cs-1112:/usr/local/mariadb/columnstore/bin/dbrmctl reload
OK.
root@prod-cs-1112:/usr/local/mariadb/columnstore/bin/cleartablelock -l 1

System is not ready.  Verify that InfiniDB is up and ready before running this program

There is another way to clear this locks and force DMLProc to start?
Don't really metter about the data into that table...

Thanks again.
Rgds
Nico

Nico

unread,
Jan 31, 2018, 8:39:11 AM1/31/18
to MariaDB ColumnStore
I cannot figure out how to solve it.
Today I've installed a new clean cluster to reimport the data, but happens the same issue after network outage.

I will try to move the import logica directly into pm1 to avoid problems about importing data from external servers.

For more checking by your side I'm importing date from other machine doing:
mysql --host=db_server -u db_user -pdb_pass db_name --local-infile -e "LOAD DATA LOCAL INFILE '{filePathCSV}' INTO TABLE {db_table} FIELDS TERMINATED BY '|'
and the issue happens when I lost the connection between this machine and the pm1.
Then the DMLProc fails and the involved table remain locked in the state I reported.

Thanks for any support.
Nico

David Thompson

unread,
Feb 1, 2018, 11:02:54 PM2/1/18
to Nico, MariaDB ColumnStore

Hi Nico,

Unfortunately if you have another issue causing the network connection to be dropped, columnstore is protecting you by leaving the table lock in place since you have effectively a cpimport (Since you are doing LDI with autocommit on) that is started but was terminated abruptly. Are you not able to clear and rollback the new lock?

 

Also do you have any idea why the network connection is getting dropped?

 

David

--

You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columns...@googlegroups.com.
To post to this group, send email to mariadb-c...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/ceb2109e-8ad9-4cf7-b13d-596d42b55c31%40googlegroups.com.

Nico

unread,
Feb 6, 2018, 9:08:10 AM2/6/18
to MariaDB ColumnStore
Hi David,
sorry for the late answer, the new lock I notice on the new cluster was cleared and the rollback was completed using the systemRestart command.
I had to relaunch the systemRestart 3-4 times, because the rollback takes a lot and it goes in timeout.
So finally this cluster was restored and now it's fully operative.

Regard to the first cluster, I tried to check better the logs, but all the ways I followed brings me to the result that rollback fails with "DMLProc started bulkrollback on table OID 5519 and table lock id 2 failed:IDB-2006: 'select schema,tablename,columnname from syscolumn where objectid=7196 --colName/EC7196' does not exist in Columnstore. and tablelock is not released." Here the table locks cannot be released, at the end I destroyed that cluster, cause not recoverable.

About the network connections drops, this happens because I'm using preemptible virtual machine (VM) instances on Google Cloud Platform, so some of them have been destroyed during cpimport.
I change this idea to a single stable instance that execute all the cpimports, moving elaborated data from the preemptive ones before import them.

So finally after all considerations above, I understand that columnstore is protecting me by leaving the table lock in place since I have effectively a cpimport that was terminated abruptly, but the strange thing is that when it happens the DMLProc crashes, leaving the entire system locked.
I think it's better that the columnstore automatically start a rollback and after releasing the table lock in the case a cpimport fails (and this is already detected in some way by the system and indicated with status "Abandoned1" in table locks), in the way to continue to serve other new queries without calling systemRestart by hand.


Nico
Reply all
Reply to author
Forward
0 new messages