Crawling Windows shares datafari 2.1.3

330 views
Skip to first unread message

Troy Tate

unread,
Feb 8, 2016, 8:21:25 AM2/8/16
to Datafari
Downloaded the jcifs-1.3.18.jar from http://jcifs.samba.org/src/ to DATAFARI_SOURCE_DIR\mcf\mcf_home\connector-lib-proprietary. Renamed it to jcifs.jar. Edited the Datafari/mcf/mcf_home/connectors.xml file and uncommented the  <!--repositoryconnector name="Windows shares" class="org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector"/--> line - but Windows shares does not show in the list of connector types. I have stopped/restarted datafari a couple of times and no joy.

Thanks for the help. 

cedric...@francelabs.com

unread,
Feb 8, 2016, 8:28:26 AM2/8/16
to Datafari
Hi Troy,

1. Why did you rename the jar ? (should not impact though, but it is not requested in the readme procedure)
2. Can you copy/paste your connectors.xml file content ?

Regards,

Cedric

Troy Tate

unread,
Feb 8, 2016, 8:59:40 AM2/8/16
to Datafari
Renamed just for testing purposes. I renamed it back to the original name.
----
<?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at


 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<!-- The connectors registry file permits registration of domains and connectors upon the
      startup of the jetty-based ManifoldCF example.  In a real installation, this registration
      step would be done ideally just once, but in the example the connectors
      are all reregistered on every startup.
-->
<connectors>
    <!-- Add any authorization domains here -->
    <!-- authorizationdomain domain="AD" name="ActiveDirectory"/-->
    <!-- authorizationdomain domain="SHP" name="SharePoint"/-->
    <!-- authorizationdomain domain="FB" name="FaceBook"/-->

    <!-- Add your output connectors here -->
  <outputconnector name="Solr" class="org.apache.manifoldcf.agents.output.solr.SolrConnector"/>
  <outputconnector name="OpenSearchServer" class="org.apache.manifoldcf.agents.output.opensearchserver.OpenSearchServerConnector"/>
  <outputconnector name="Null" class="org.apache.manifoldcf.agents.output.nullconnector.NullConnector"/>
  <outputconnector name="HDFS" class="org.apache.manifoldcf.agents.output.hdfs.HDFSOutputConnector"/>
  <outputconnector name="MetaCarta GTS" class="org.apache.manifoldcf.agents.output.gts.GTSConnector"/>
  <outputconnector name="File system" class="org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector"/>
  <outputconnector name="ElasticSearch" class="org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector"/>
 <outputconnector name="AmazonCloudSearch" class="org.apache.manifoldcf.agents.output.amazoncloudsearch.AmazonCloudSearchConnector"/>

    <!-- Add your transformation connectors here -->
  <transformationconnector name="Tika content extractor" class="org.apache.manifoldcf.agents.transformation.tika.TikaExtractor"/>
  <transformationconnector name="Null" class="org.apache.manifoldcf.agents.transformation.nullconnector.NullConnector"/>
  <transformationconnector name="Metadata adjuster" class="org.apache.manifoldcf.agents.transformation.forcedmetadata.ForcedMetadataConnector"/>
  <transformationconnector name="Allowed documents" class="org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter"/>

    <!-- Add your mapping connectors here -->
  <mappingconnector name="Regular expression mapper" class="org.apache.manifoldcf.authorities.mappers.regexp.RegexpMapper"/>

    <!-- Add your authority connectors here -->
  <authorityconnector name="SharePoint/Native" class="org.apache.manifoldcf.authorities.authorities.sharepoint.SharePointAuthority"/>
  <authorityconnector name="SharePoint/ActiveDirectory" class="org.apache.manifoldcf.authorities.authorities.sharepoint.SharePointADAuthority"/>
  <authorityconnector name="Null" class="org.apache.manifoldcf.authorities.authorities.nullauthority.NullAuthority"/>
  <authorityconnector name="Meridio" class="org.apache.manifoldcf.authorities.authorities.meridio.MeridioAuthority"/>
  <!--authorityconnector name="LiveLink" class="org.apache.manifoldcf.authorities.authorities.livelink.LivelinkAuthority"/-->
  <authorityconnector name="LDAP" class="org.apache.manifoldcf.authorities.authorities.ldap.LDAPAuthority"/>
  <authorityconnector name="Jira" class="org.apache.manifoldcf.authorities.authorities.jira.JiraAuthorityConnector"/>
  <authorityconnector name="JDBC" class="org.apache.manifoldcf.authorities.authorities.jdbc.JDBCAuthority"/>
  <authorityconnector name="Generic" class="org.apache.manifoldcf.authorities.authorities.generic.GenericAuthority"/>
  <authorityconnector name="Documentum" class="org.apache.manifoldcf.authorities.authorities.DCTM.AuthorityConnector"/>
  <authorityconnector name="CMIS" class="org.apache.manifoldcf.authorities.authorities.cmis.CmisAuthorityConnector"/>
  <authorityconnector name="Alfresco Webscript" class="org.apache.manifoldcf.authorities.authorities.alfrescowebscript.AlfrescoAuthorityConnector"/>
  <authorityconnector name="Active Directory" class="org.apache.manifoldcf.authorities.authorities.activedirectory.ActiveDirectoryAuthority"/>

    <!-- Add your repository connectors here -->
  <repositoryconnector name="Wiki" class="org.apache.manifoldcf.crawler.connectors.wiki.WikiConnector"/>
  <repositoryconnector name="Web" class="org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector"/>
  <repositoryconnector name="SharePoint" class="org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository"/>
  <repositoryconnector name="RSS" class="org.apache.manifoldcf.crawler.connectors.rss.RSSConnector"/>
  <repositoryconnector name="Meridio" class="org.apache.manifoldcf.crawler.connectors.meridio.MeridioConnector"/>
  <!--repositoryconnector name="LiveLink" class="org.apache.manifoldcf.crawler.connectors.livelink.LivelinkConnector"/-->
  <repositoryconnector name="Jira" class="org.apache.manifoldcf.crawler.connectors.jira.JiraRepositoryConnector"/>
  <repositoryconnector name="JDBC" class="org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector"/>
  <repositoryconnector name="Windows shares" class="org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector"/-->
  <repositoryconnector name="HDFS" class="org.apache.manifoldcf.crawler.connectors.hdfs.HDFSRepositoryConnector"/>
  <repositoryconnector name="GridFS" class="org.apache.manifoldcf.crawler.connectors.gridfs.GridFSRepositoryConnector"/>
  <repositoryconnector name="GoogleDrive" class="org.apache.manifoldcf.crawler.connectors.googledrive.GoogleDriveRepositoryConnector"/>
  <repositoryconnector name="Generic" class="org.apache.manifoldcf.crawler.connectors.generic.GenericConnector"/>
  <repositoryconnector name="File system" class="org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector"/>
  <repositoryconnector name="FileNet" class="org.apache.manifoldcf.crawler.connectors.filenet.FilenetConnector"/>
  <repositoryconnector name="EMail" class="org.apache.manifoldcf.crawler.connectors.email.EmailConnector"/>
  <repositoryconnector name="DropBox" class="org.apache.manifoldcf.crawler.connectors.dropbox.DropboxRepositoryConnector"/>
  <repositoryconnector name="Documentum" class="org.apache.manifoldcf.crawler.connectors.DCTM.DCTM"/>
  <repositoryconnector name="CMIS" class="org.apache.manifoldcf.crawler.connectors.cmis.CmisRepositoryConnector"/>
  <!--repositoryconnector name="Alfresco" class="org.apache.manifoldcf.crawler.connectors.alfresco.AlfrescoRepositoryConnector"/-->
  <repositoryconnector name="Alfresco Webscript" class="org.apache.manifoldcf.crawler.connectors.alfrescowebscript.AlfrescoConnector"/>

</connectors>

cedric...@francelabs.com

unread,
Feb 8, 2016, 9:04:15 AM2/8/16
to Datafari
Ok it seems you did not fully uncomment the line, that's why it does not work. Please replace this line:

<repositoryconnector name="Windows shares" class="org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector"/-->


By this line:


<repositoryconnector name="Windows shares" class="org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector"/>


Regards,


Cedric

Troy Tate

unread,
Feb 8, 2016, 9:27:50 AM2/8/16
to Datafari
It's early on a Monday. Thanks for catching that - but making that change had no effect. Windows Shares still does not show in list of connectors.
Connectors.png

cedric...@francelabs.com

unread,
Feb 8, 2016, 9:30:45 AM2/8/16
to Datafari
You also need to stop and restart datafari once such a change is made.

Regards,

Cedric

Troy Tate

unread,
Feb 8, 2016, 9:37:51 AM2/8/16
to Datafari
I have done that and rebooted the system also - no change in the available connector types.

Troy Tate

unread,
Feb 8, 2016, 9:42:32 AM2/8/16
to Datafari
I even tried commenting out the Alfresco Webscript connector as shown below but it does not remove it from the list. So somehow the page is not refreshing even after a server/service restart. 

   <!-- Add your repository connectors here -->
  <repositoryconnector name="Wiki" class="org.apache.manifoldcf.crawler.connectors.wiki.WikiConnector"/>
  <repositoryconnector name="Web" class="org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector"/>
  <repositoryconnector name="SharePoint" class="org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository"/>
  <repositoryconnector name="RSS" class="org.apache.manifoldcf.crawler.connectors.rss.RSSConnector"/>
  <repositoryconnector name="Meridio" class="org.apache.manifoldcf.crawler.connectors.meridio.MeridioConnector"/>
  <!--repositoryconnector name="LiveLink" class="org.apache.manifoldcf.crawler.connectors.livelink.LivelinkConnector"/-->
  <repositoryconnector name="Jira" class="org.apache.manifoldcf.crawler.connectors.jira.JiraRepositoryConnector"/>
  <repositoryconnector name="JDBC" class="org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector"/>
  <repositoryconnector name="Windows shares" class="org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector"/>
  <repositoryconnector name="HDFS" class="org.apache.manifoldcf.crawler.connectors.hdfs.HDFSRepositoryConnector"/>
  <repositoryconnector name="GridFS" class="org.apache.manifoldcf.crawler.connectors.gridfs.GridFSRepositoryConnector"/>
  <repositoryconnector name="GoogleDrive" class="org.apache.manifoldcf.crawler.connectors.googledrive.GoogleDriveRepositoryConnector"/>
  <repositoryconnector name="Generic" class="org.apache.manifoldcf.crawler.connectors.generic.GenericConnector"/>
  <repositoryconnector name="File system" class="org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector"/>
  <repositoryconnector name="FileNet" class="org.apache.manifoldcf.crawler.connectors.filenet.FilenetConnector"/>
  <repositoryconnector name="EMail" class="org.apache.manifoldcf.crawler.connectors.email.EmailConnector"/>
  <repositoryconnector name="DropBox" class="org.apache.manifoldcf.crawler.connectors.dropbox.DropboxRepositoryConnector"/>
  <repositoryconnector name="Documentum" class="org.apache.manifoldcf.crawler.connectors.DCTM.DCTM"/>
  <repositoryconnector name="CMIS" class="org.apache.manifoldcf.crawler.connectors.cmis.CmisRepositoryConnector"/>
  <!--repositoryconnector name="Alfresco" class="org.apache.manifoldcf.crawler.connectors.alfresco.AlfrescoRepositoryConnector"/-->
  <!--repositoryconnector name="Alfresco Webscript" class="org.apache.manifoldcf.crawler.connectors.alfrescowebscript.AlfrescoConnector"/-->


On Monday, February 8, 2016 at 9:30:45 AM UTC-5, cedric...@francelabs.com wrote:

Aurélien MAZOYER

unread,
Feb 8, 2016, 12:22:52 PM2/8/16
to Troy Tate, Datafari

Hi,

Please launch the script initialize-db.sh in mcf_home to make  mcf takes your change into account,

Regards,

Aurélien

--
You received this message because you are subscribed to the Google Groups "Datafari" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datafari+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Troy Tate

unread,
Feb 8, 2016, 5:09:59 PM2/8/16
to Datafari, lab...@gmail.com
Actually I just removed the opt/datafari folder and reinstalled and it is working now after making the appropriate changes and before starting the service.

thanks.

Olivier Tavard

unread,
Feb 8, 2016, 6:21:56 PM2/8/16
to Troy Tate, Datafari
Hi Troy,

I'm glad that you managed to have the new connector working.

The procedure you did at the beginning was correct but I assume that you tried to add the connector in an existing Datafari. You can still add it but there is an additional step : you need to tell Manifoldcf to initialize again all the connectors.
We did not precise the two cases in our documentation so I created a page on our wiki to detail that for other people :

Best,

Olivier

Troy Tate

unread,
Feb 9, 2016, 7:24:49 AM2/9/16
to Datafari, lab...@gmail.com
Thank you!
Reply all
Reply to author
Forward
0 new messages