accessing BAM files from FTP

224 views
Skip to first unread message

Andrea L

unread,
Jul 14, 2010, 9:26:26 AM7/14/10
to Pysam User group
I feel like I may be overlooking something, but is it possible access
BAM files from remote FTP (such as the 1000 genomes data) using pysam?
I know that Samtools does this, but I haven't been able to find the
right python call.

Thanks in advance,
Andrea

krobison

unread,
Jul 16, 2010, 1:35:03 PM7/16/10
to Pysam User group
I've succeeded in enabling this with two changes are required to pysam
to enable this functionality; however when used with HTTP a warning
which simply is annoying with command line SAMTools appears to cause
an exception which requires catching; for FTP it appears to work.

First, in the section of code of Samfile.open concerned with reading
the test for file existence needs to be modified with two additional
tests (the first two); otherwise URLs are seen as bad file names

if strncmp(filename,”http:”,5)!=0 and
strncmp(filename,”ftp:”,4)!=0 and strncmp( filename, "-", 1) != 0 and
not os.path.exists( filename )

Second, samtools needs to be compiled with the _USE_KNETFILE macro;
the current SAMTools Makefile also invokes FILE_OFFSET_BITS so I set
that too, though that may not be essential

pysam = Extension(
"pysam/csamtools", # name of extension
[ "pysam/csamtools.pyx" ] +\
[ "pysam/%s" % x for x in (
"pysam_util.c", )] +\
glob.glob( os.path.join( "samtools", "*.c" ) ),
library_dirs=[],
include_dirs=[ "samtools", ],
libraries=[ "z", ],
language="c",
define_macros = [('FILE_OFFSET_BITS','64'),
('_USE_KNETFILE','')],
)


For HTTP, at least some operations raise

File "/usr/local/lib/python2.6/dist-packages/pysam/__init__.py",
line 54, in __call__
if stderr: raise SamtoolsError( "\n".join( stderr ) )
pysam.SamtoolsError: '[knet_seek] SEEK_END is not supported for HTTP.
Offset is unchanged.\n\n[samopen] inconsistent number of target
sequences.\n'


(Andrea & I are co-workers; I've placed this here for the benefit of
the project)

Keith R.

krobison

unread,
Jul 16, 2010, 9:59:10 PM7/16/10
to Pysam User group
Turns out I used a very incomplete test case; two more needed changes
have been identified but also I have a puzzle unsolved.

First, a similar file existence check is needed for the BAM index file
within csamtools.pyx

if strncmp(filename,"http:",5)!=0 and
strncmp(filename,"ftp:",4)!=0 and not os.path.exists(filename +
".bai"):

(the first two tests are the change)

Second, when samtools downloads the index it outputs a message to
stderr, which causes pysam to throw an exception.
in __init__.py
indexLoadString="[bam_index_load] attempting to download the
remote index file."
if stderr and
strncmp(stderr,indexLoadString,len(indexLoadString))!=0: raise
SamtoolsError( "\n".join( stderr ) )

replaces
if stderr: raise SamtoolsError( "\n".join( stderr ) )


(apologies if this is really ugly Python code; I'm not a regular
coder)


The remaining mystery is why pysam.view works now but trying to
construct an object fails; they both seem to use the same call.
## runs
import sys
import pysam
import httplib

url="ftp://ftp.sanger.ac.uk/pub/rd/humanSequences/CV.bam"

print url
pysam.view("-H",url)
pysam.view(url, "11:2148860-2148960")

## errors out
import sys
import pysam
import httplib

url="ftp://ftp.sanger.ac.uk/pub/rd/humanSequences/CV.bam"
samfile = pysam.Samfile(url) # crashes out here complaining it can't
open the file

Andreas

unread,
Jul 18, 2010, 3:38:05 PM7/18/10
to Pysam User group
Hi,

Thanks! Added your suggestions, it seems to work now (revision
dev-0.3).

Note, that you need to open bamfiles in "binary" mode :
samfile = pysam.Samfile(url, "rb")

Partial success though: opening files works, but fetching stalls. Work
in progress.

Thanks!
Andreas

Jay Hesselberth

unread,
Feb 3, 2012, 4:27:37 PM2/3/12
to pysam-us...@googlegroups.com
Has there been any progress on this? Doesn't seem to be supported in 0.6 or in the repository. Is there something specific preventing this from being implemented?
Reply all
Reply to author
Forward
0 new messages