[Dspace-tech] forbidden oai import

299 views
Skip to first unread message

Ady Wahyudi Paundu

unread,
Aug 26, 2015, 10:58:50 AM8/26/15
to dspace-tech
hi all, another question,

i ran [dspace]/bin/dspace oai import -c -o
it went successfully up until 3600 items, then it show forbidden error:

...
3300 items imported so far...
3400 items imported so far...
3500 items imported so far...
3600 items imported so far...
org.apache.solr.common.SolrException: Forbidden

Forbidden

request: http://repository.unhas.ac.id/solr/oai/update?wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
at org.dspace.xoai.app.XOAI.index(XOAI.java:229)
at org.dspace.xoai.app.XOAI.indexAll(XOAI.java:210)
at org.dspace.xoai.app.XOAI.index(XOAI.java:128)
at org.dspace.xoai.app.XOAI.main(XOAI.java:439)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
...

what went wrong?

additional info:
both in [dspace]/config/modules/discovery.cfg and
[dspace]/config/modules/solr-statistics.cfg, i set the server attribut
to http://localhost/solr/....

another thing: why it just went error at somewhere around 3600?

best regards,
adywp

----
Powered by PTIK UNHAS ~ http://unhas.ac.id/


Lighton Phiri

unread,
Aug 26, 2015, 10:58:55 AM8/26/15
to Ady Wahyudi Paundu, dspace-tech
Hello,

What happens if you re-run the import without optimising your index
--running it without the 'o' flag? Perhaps it has something to do with
this [1]?

[dspace]/bin/dspace oai import -c

[1] https://issues.apache.org/jira/browse/SOLR-2832

Lighton Phiri
http://lightonphiri.org
> ------------------------------------------------------------------------------
> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
> Get 100% visibility into your production application - at no cost.
> Code-level diagnostics for performance bottlenecks with <2% overhead
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap1
> _______________________________________________
> DSpace-tech mailing list
> DSpac...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

João Melo

unread,
Aug 26, 2015, 10:58:57 AM8/26/15
to Ady Wahyudi Paundu, dspace-tech
Hi Ady,

this exception is thrown whenever the OAI indexer is trying to access some SOLR server which replies with a HTTP 403 Forbidden error. 

If you try to access it, it will, in fact, return a 403 error.

It shouldn't work at all (never), it's weird watching it happening after a while. 
 I'm not able to figure it out what is happening, it could be an infrastructure problem (firewall, url redirects, url rewrites...)



------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
DSpace-tech mailing list
DSpac...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



--
Thanks, João Melo (My Portfolio)
DSpace Department
LyncodeOfficial website
Follow us on Facebook

ad...@unhas.ac.id

unread,
Aug 26, 2015, 10:58:59 AM8/26/15
to Lighton Phiri, dspace-tech

Hi Lighton,

i kept having the same result, forbidden error, even when not using 'o'
flag.
however, every time i try, it stopped at different items, one time at
700.. the other time at 100...

i had also look at the issue you mentioned, but i dont know which
protocol should i extend its connection timeout, since i use both http
and ajp.
i will try changing both of them then and tell you how it goes

best regards,
adywp

ad...@unhas.ac.id

unread,
Aug 26, 2015, 10:59:00 AM8/26/15
to João Melo, dspace-tech

Yes Sir,
i also have found several similar case from internet, however all of
them happened at start and caused by mis-configured server attribut for
either discovery or solr.
i dont know about the infrastructure problem, but all of dspace
resources to index are on localhost (including database)

thank you.

best regards,
adywp

On Thu, 30 May 2013 18:11:10 +0100, João Melo wrote:
> Hi Ady,
>
> this exception is thrown whenever the OAI indexer is trying to access
> some SOLR server which replies with a HTTP 403 Forbidden error. 
>
> The
>
> request: http://repository.unhas.ac.id/solr/oai/update?wt=javabin&version=2
> [8]
> If you try to access it, it will, in fact, return a 403 error.
>
> It shouldn't work at all (never), it's weird watching it happening
> after a while. 
>  I'm not able to figure it out what is happening, it could be an
> infrastructure problem (firewall, url redirects, url rewrites...)
>
> On 30 May 2013 13:07, Ady Wahyudi Paundu wrote:
>
>> hi all, another question,
>>
>> i ran [dspace]/bin/dspace oai import -c -o
>> it went successfully up until 3600 items, then it show forbidden
>> error:
>>
>> ...
>> 3300 items imported so far...
>> 3400 items imported so far...
>> 3500 items imported so far...
>> 3600 items imported so far...
>> org.apache.solr.common.SolrException: Forbidden
>>
>> Forbidden
>>
>> request:
>> http://repository.unhas.ac.id/solr/oai/update?wt=javabin&version=2
>> [1]
>> to http://localhost/solr/.. [2]..

esa

unread,
Aug 26, 2015, 10:59:03 AM8/26/15
to dspac...@lists.sourceforge.net
hello mr. Ady,

is the solr configured to localhost?, as you know that the solr should be
setting to localhost or 127.0.0.1.
if you use solr for the storage in the oai, you must set the solr.url to
localhost, so the import can be done.
like this:
solr.url=http://localhost:8080/solr/oai

and i think you must set cache.enable to 'false'. because in my oai server,
the data can't be updated until i set cache.enable to 'false'.

i hope this will help you.

oh yeah, are you indonesian ? i'm indonesian too, and now i'm working dspace
for widyatama university.



--
View this message in context: http://dspace.2283337.n4.nabble.com/forbidden-oai-import-tp4664487p4664509.html
Sent from the DSpace - Tech mailing list archive at Nabble.com.

ady

unread,
Aug 26, 2015, 10:59:11 AM8/26/15
to dspac...@lists.sourceforge.net
Hi Esa,
thank you for the hint, but i already set the oai.url to localhost yet the
problem remain.
i haven't try the cache thing though... i'll give it a shot

warm regards from makassar,
adywp



--
View this message in context: http://dspace.2283337.n4.nabble.com/forbidden-oai-import-tp4664487p4664517.html

esa

unread,
Aug 26, 2015, 10:59:28 AM8/26/15
to dspac...@lists.sourceforge.net
Hello Mr. Ady. warm regards too from Bandung.

you say you have already set the oai.url to localhost, but what i mean is the SOLR that should be set to localhost, not the oai.url. Oai.url must be set to dspace baseUrl that have been configured in dpace.cfg

here is my oai configuration, i hope this help you

#---------------------------------------------------------------#
#--------------------XOAI CONFIGURATIONS------------------------#
#---------------------------------------------------------------#
# These configs are used by the XOAI                            #
#---------------------------------------------------------------#

# Storage: solr | database
storage=solr

# Base solr index, keep in localhost because its the setting
solr.url = http://localhost:8080/solr/oai
# OAI persistent identifier prefix.
# Format - oai:PREFIX:HANDLE
identifier.prefix = repository.widyatama.ac.id
# Base url for bitstreams
bitstream.baseUrl = http://repository.widyatama.ac.id:8080/xmlui

# Base Configuration Directory
config.dir = /opt/dspace/config/crosswalks/oai

# Cache enabled?
cache.enabled = false

# Base Cache Directory
cache.dir = /opt/dspace/var/oai

# OAI base URL, added by Esa
oai.dspace.url = http://repository.widyatama.ac.id:8080/xmlui

#---------------------------------------------------------------#
#--------------OAI HARVESTING CONFIGURATIONS--------------------#
#---------------------------------------------------------------#
# These configs are only used by the OAI-ORE related functions  #
#---------------------------------------------------------------#

### Harvester settings

# Crosswalk settings; the {name} value must correspond to a declated ingestion crosswalk
# harvester.oai.metadataformats.{name} = {namespace},{optional display name}
# The display name is only used in the xmlui for the jspui there are entries in the
# Messages.properties in the form jsp.tools.edit-collection.form.label21.select.{name}
harvester.oai.metadataformats.dc = http://www.openarchives.org/OAI/2.0/oai_dc/, Simple Dublin Core
harvester.oai.metadataformats.qdc = http://purl.org/dc/terms/, Qualified Dublin Core
harvester.oai.metadataformats.dim = http://www.dspace.org/xmlns/dspace/dim, DSpace Intermediate Metadata

# This field works in much the same way as harvester.oai.metadataformats.PluginName
# The {name} must correspond to a declared ingestion crosswalk, while the
# {namespace} must be supported by the target OAI-PMH provider when harvesting content.
# harvester.oai.oreSerializationFormat.{name} = {namespace}
harvester.oai.oreSerializationFormat.ore = http://www.w3.org/2005/Atom

# Determines whether the harvester scheduling process should be started
# automatically when the DSpace webapp is deployed.
# default: false
harvester.autoStart=false

# Amount of time subtracted from the from argument of the PMH request to account
# for the time taken to negotiate a connection. Measured in seconds. Default value is 120.
#harvester.timePadding = 120

# How frequently the harvest scheduler checks the remote provider for updates,
# messured in minutes. The default vaule is 12 hours (or 720 minutes)
#harvester.harvestFrequency = 720

# The heartbeat is the frequency at which the harvest scheduler queries the local
# database to determine if any collections are due for a harvest cycle (based on
# the harvestFrequency) value. The scheduler is optimized to then sleep until the
# next collection is actually ready to be harvested. The minHeartbeat and
# maxHeartbeat are the lower and upper bounds on this timeframe. Measured in seconds.
# Default minHeartbeat is 30.  Default maxHeartbeat is 3600.
#harvester.minHeartbeat = 30
#harvester.maxHeartbeat = 3600

# How many harvest process threads the scheduler can spool up at once. Default value is 3.
#harvester.maxThreads = 3

# How much time passess before a harvest thread is terminated. The termination process
# waits for the current item to complete ingest and saves progress made up to that point.
# Measured in hours. Default value is 24.
#harvester.threadTimeout = 24

# When harvesting an item that contains an unknown schema or field within a schema what
# should the harvester do? Either add a new registry item for the field or schema, ignore
# the specific field or schema (importing everything else about the item), or fail with
# an error. The default value if undefined is: fail.
# Possible values: 'fail', 'add', or 'ignore'
harvester.unknownField  = add
harvester.unknownSchema = fail

# The webapp responsible for minting the URIs for ORE Resource Maps.
# If using oai, the dspace.oai.uri config value must be set.
# The URIs generated for ORE ReMs follow the following convention for both cases.
# format: [baseURI]/metadata/handle/[theHandle]/ore.xml
# Default value is oai
ore.authoritative.source = xmlui

# A harvest process will attempt to scan the metadata of the incoming items
# (dc.identifier.uri field, to be exact) to see if it looks like a handle.
# If so, it matches the pattern against the values of this parameter.
# If there is a match the new item is assigned the handle from the metadata value
# instead of minting a new one. Default value: hdl.handle.net
#harvester.acceptedHandleServer = hdl.handle.net, handle.myu.edu
harvester.acceptedHandleServer = hdl.handle.net, handle.myu.edu

# Pattern to reject as an invalid handle prefix (known test string, for example)
# when attempting to find the handle of harvested items. If there is a match with
# this config parameter, a new handle will be minted instead. Default value: 123456789.
#harvester.rejectedHandlePrefix = 123456789, myTestHandle


2013/5/31 ady [via DSpace] <[hidden email]>
Hi Esa,
thank you for the hint, but i already set the oai.url to localhost yet the problem remain.
i haven't try the cache thing though... i'll give it a shot

warm regards from makassar,
adywp


If you reply to this email, your message will be added to the discussion below:
http://dspace.2283337.n4.nabble.com/forbidden-oai-import-tp4664487p4664517.html
To unsubscribe from forbidden oai import, click here.
NAML



--
Thanks,


Esa Fauzi



View this message in context: Re: forbidden oai import
Reply all
Reply to author
Forward
0 new messages