Download Patent Applicaions

164 views
Skip to first unread message

TaWo

unread,
Oct 4, 2012, 3:13:18 PM10/4/12
to gsutil-...@googlegroups.com

Hi, can somebody help me?

I can download patent applications which start with 121* using the following command:

python c:\gsutil\gsutil cp gs://uspto-pair/applications/121*  file://c:\patents

Some of the foulders include a file called for example: “12102391-2008-12-12-00004-SRNT.pdf”.

I would like to download only these files which ends with *SRNT.pdf for the applications which starts with 121*.

Google Cloud Storage Team

unread,
Oct 4, 2012, 4:44:18 PM10/4/12
to gsutil-...@googlegroups.com
Hi,

Have you tried this:
python c:\gsutil\gsutil cp gs://uspto-pair/applications/121*SRNT.pdf  file://c:\patents
Marc
Google Cloud Storage Team

tanne...@gmail.com

unread,
Oct 5, 2012, 6:48:15 AM10/5/12
to gsutil-...@googlegroups.com, gs-...@google.com
Hi,
yes I have. It also doesn't work:
"Command Exception: No Uris matched"
What could be the problem?
The download is available for everyone (public data).

Google Storage Team

unread,
Oct 8, 2012, 3:25:27 PM10/8/12
to tanne...@gmail.com, gsutil-...@googlegroups.com
Are you sure the objects you're looking for exist? I tried listing all the objects in that bucket starting with 121. None of the results contain the string "pdf".

$ gsutil ls gs://uspto-pair/applications/121** | grep pdf
$

Marc

Original Message Follows:
------------------------
From: tanne...@gmail.com
Subject: Re: Download Patent Applicaions
Date: Fri, 5 Oct 2012 03:48:15 -0700 (PDT)

> Hi,
> yes I have. It also doesn't work:
> "Command Exception: No Uris matched"
> What could be the problem?
> The download is available for everyone (public data).
>
>
> Am Donnerstag, 4. Oktober 2012 22:44:39 UTC+2 schrieb Google Cloud Storage
> Team:
>
> > Hi,
> >
> > Have you tried this:
> >
> > python c:\gsutil\gsutil cp gs://uspto-pair/applications/121*SRNT.pdf file://c:\patents
> >
> > Marc
> > Google Cloud Storage Team
> >
> > On Thu, Oct 4, 2012 at 8:13 PM, TaWo <w.tan...@onlinehome.de <javascript:>
> > > wrote:
> >
> >> Hi, can somebody help me?******
> >>
> >> I can download patent applications which start with 121* using the
> >> following command:****
> >>
> >> python c:\gsutil\gsutil cp gs://uspto-pair/applications/121* file://c:\patents****
> >>
> >> Some of the foulders include a file called for example:
> >> “12102391-2008-12-12-00004-SRNT.pdf”.****
> >>
> >> I would like to download only these files which ends with *SRNT.pdf for
> >> the applications which starts with 121*.****
> >>
> >
> >

tanne...@gmail.com

unread,
Oct 9, 2012, 2:16:32 AM10/9/12
to gsutil-...@googlegroups.com, tanne...@gmail.com, gs-...@google.com
 

Hi! Yes I am sure!

With the following command you can download the zip file for the patent application 12102391:

python c:\gsutil\gsutil cp gs://uspto-pair/applications/12102391.zip file://c:\patents

The subdirectory includes several objects which ends with SRNT.pdf

So at least the 12102391 includes such objects!

Could it be a problem that it is a zip file?

 

Google Storage Team

unread,
Oct 9, 2012, 10:50:03 AM10/9/12
to tanne...@gmail.com, gsutil-...@googlegroups.com
I think I see what's going on here. When you use meta-characters like *, those are used to match object names, not object contents. Your objects are zip files, which contain the files ending in SRNT.pdf, so the files you're searching for are one level of indirection away from your selection mechanism.

This is very similar to Unix shell meta-characters - imagine your objects were local zip files. Running the shell command 'ls 121*SRNT.pdf' is not going to find files embedded in those zip archives, it's only going to match file names in your directory (none of which end in .pdf).

I'm not aware of an easy/efficient way to find the files of interest, short of writing a script to iterate over your objects with a download, unzip, ls (looking for the desired pattern), clean up and repeat cycle.

Marc

Original Message Follows:
------------------------
From: tanne...@gmail.com
Subject: Re: Download Patent Applicaions
Date: Mon, 8 Oct 2012 23:16:32 -0700 (PDT)

>
>
> Hi! Yes I am sure!
>
> With the following command you can download the zip file for the patent
> application 12102391:
>
> python c:\gsutil\gsutil cp gs://uspto-pair/applications/12102391.zip
> file://c:\patents
>
> The subdirectory includes several objects which ends with SRNT.pdf
>
> So at least the 12102391 includes such objects!
>
> Could it be a problem that it is a zip file?
>
>
>
> Am Montag, 8. Oktober 2012 21:25:50 UTC+2 schrieb Google Storage Team:
>
> > Are you sure the objects you're looking for exist? I tried listing all the
> > objects in that bucket starting with 121. None of the results contain the
> > string "pdf".
> >
> > $ gsutil ls gs://uspto-pair/applications/121** | grep pdf
> > $
> >
> > Marc
> >
> > Original Message Follows:
> > ------------------------
> > From: tanne...@gmail.com <javascript:>
Reply all
Reply to author
Forward
0 new messages