Not that I am aware of, but found bio.cc:/BiO/Store/DB/PDB which was
mirrored last year (maybe one-off)?
So, please go on as long the space is enough - 846GB remaining.
You be you can overwrite old one.
Cheers,
Sung
> --
> You received this message because you are subscribed to the Google
> Groups "BiO.CC server interface" group.
>
> BiOcentre proposes progressive concepts in using biological data, new types of databases, and new ways of looking at old problems. We encourage members to propose and realize radical and revolutionary methods in science and engineering.
We keep the parts we don't regularly use zipped, so .8Tb should be enough ;-)
Seriously, I think the whole archive comes in well under 100Gb.
We are buying hundreds of terabyte disks.
If you need just shout.
Cheers
Jong
Jong Bhak,
\(^o^)/
Tel: 031-888-9311, Fax: 031-888-9314, Mobile: 010 9944 6754
> Lets see if it works! :-D
Seems the process crashed 6000 structures in:
./update.sh
Wed Oct 27 15:33:43 KST 2010
RSYNCING!
rsync: failed to set times on "/BiO/Store/DB/PDB/./.": Operation not
permitted (1)
rsync: writefd_unbuffered failed to write 4092 bytes to socket
[generator]: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at
io.c(1525) [generator=3.0.6]
rsync error: received SIGUSR1 (code 19) at main.c(1285) [receiver=3.0.6]
NEW AND/OR UPDATED stats:
PDB; all 0, divided 0.
mmCIF; all 0, divided 0.
XML; all 0, divided 0.
XML-noatom; all 0, divided 0.
XML-extatom; all 0, divided 0.
Structure factors; all 0, divided 0.
BioUnit; all 0, divided 6085.
DELETED stats:
PDB; all 0, divided 0.
mmCIF; all 0, divided 0.
XML; all 0, divided 0.
XML-noatom; all 0, divided 0.
XML-extatom; all 0, divided 0.
Structure factors; all 0, divided 0.
BioUnit; all 0, divided 0.
Wed Oct 27 16:44:30 KST 2010
UNZIPING!
doing /BiO/Store/DB/PDB/data/structures/all/pdb
cant open directory /BiO/Store/DB/PDB/data/structures/all/pdb : No
such file or directory
Wed Oct 27 16:44:30 KST 2010
DONE!
Did you try editing /etc/aliases?
e.g. pdb biocc-serve...@googlegroups.com
Need to issue 'newaliases' after editing the file.
I cannot access bio.cc at the moment - maybe down?
Cheers,
Sung
I'm loading the structures into the relational database now, and it's
taking a long time... We are up to 50k structures loaded after about
2.5 days...
It could be 'painful' to run this every week as we do currently. I'll
have to look at alternative PDB to RDB loaders to see how easy it is
to build a 'PDBLite' relational database.
I guess this is why bio.cc is running slow currently, because mysql
writes to NFS (and home dirs are also on NFS) . Not sure how to get
round that easily... Do any other mysql servers use the same NFS
mounted data directory? (That could be a problem).
Thanks for changing /etc/aliases (sorry, I always forget that is the
place to edit). Looks like the first cron job got in successfully
(yay!) before the change was made:
From: ro...@jaesu.bio.cc (Cron Daemon)
To: p...@jaesu.bio.cc
Subject: Cron <pdb@jaesu> /BiO/Store/DB/PDB/update.sh
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/pdb>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=pdb>
X-Cron-Env: <USER=pdb>
Status: R
Wed Dec 8 11:45:01 KST 2010
RSYNCING!
NEW AND/OR UPDATED stats:
PDB; all 181, divided 325.
mmCIF; all 181, divided 329.
XML; all 181, divided 329.
XML-noatom; all 181, divided 329.
XML-extatom; all 181, divided 329.
Structure factors; all 172, divided 172.
BioUnit; all 318, divided 529.
DELETED stats:
PDB; all 2, divided 2.
mmCIF; all 2, divided 2.
XML; all 2, divided 2.
XML-noatom; all 2, divided 2.
XML-extatom; all 2, divided 2.
Structure factors; all 2, divided 2.
BioUnit; all 5, divided 5.
Wed Dec 8 12:42:58 KST 2010
UNZIPING!
doing /BiO/Store/DB/PDB/data/structures/all/pdb
While unzipping I found...
181 new PDB files
144 updated PDB files
2 obsolete PDB files
doing /BiO/Store/DB/PDB/data/structures/all/mmCIF
While unzipping I found...
181 new PDB files
148 updated PDB files
2 obsolete PDB files
Wed Dec 8 12:47:51 KST 2010
DONE!
BTW, feel free to play with the new PDB relational database (pdbase).
We have some usage notes that I'll put somewhere visible soon.
Cheers,
Dan.
our admins pointed out another potential bottleneck for the PDBase
loading. Apparently NFS transmits a directory index every time a file
is accessed. For the 'unzipped' directory with 60.000 files this may
put sigfinicant load on the network and slow down the process. Maybe
we could work around this by using 'hash directories' just like for
the gzipped files. But I'd have to look into how OpenMMS handles file
loading. I think it currently takes a list file as input and I'm not
sure whether paths are allowed in this list (then it would be easy to
change). Otherwise, maybe running multiple parallel instances of
OpenMMS would help. This assumes that we have enough memory and MySQL
is not the bottleneck.
Thanks for all the work!
Henning
Interesting. I remember hearing about problems with big directories
under NFS / GPFS / etc. However, I thought if you had the full path,
it didn't require the directory index lookup.
BTW, it's almost 70 thousand structures now! We should get ready for
the 100,000th structure ;-)
> we could work around this by using 'hash directories' just like for
> the gzipped files. But I'd have to look into how OpenMMS handles file
> loading. I think it currently takes a list file as input and I'm not
> sure whether paths are allowed in this list (then it would be easy to
It should be easy enough the hack the loader, but I figure if we are
going to do anything, it should be to implement incremental loading.
Improving load efficiency may let us load 70k structures per week, but
since we only really need to load about 500 (140 times fewer) we
should focus on that (i.e. what are the chances of us improving
loading efficiency by 140 fold?).
I remember that there are options for incremental loading, but that
the loader wasn't behaving as adversed?
> change). Otherwise, maybe running multiple parallel instances of
> OpenMMS would help. This assumes that we have enough memory and MySQL
> is not the bottleneck.
Yeah, I thought about running parallel instances too. I think this is
the best short term solution, but I don't mind looking at the loader
for a) incremental loading, or b) PDBLite schema.
> Thanks for all the work!
As you know, all I did was install your modified loader!
Cheers,
Dan.
1. to be sure that the DB is up-to-date no matter how often the update
was skipped
2. to avoid inconsistencies which would occur if the process is killed
while loading an entry
The first point can easily be addressed by doing a diff of the files
in 'unzipped' and in the DB.
Just feeding the diff list to OpenMMS would do it without even having
to change anything in the pipeline.
The second point is more difficult because OpenMMS is not really
'transaction-safe'. But then,
we could just hope for the best and if we really notice any
inconsistencies we can always
do a full update again.
For both points it may help that table 'mms_entry' has a column
'load_status' which, I think,
should only be '1' if loading was successful.
I believe we mainly didn't implement incremental loading before
because we had enough resources available
to not even bother.
Henning
It looks taking too long?
>
> It could be 'painful' to run this every week as we do currently. I'll
> have to look at alternative PDB to RDB loaders to see how easy it is
> to build a 'PDBLite' relational database.
>
> I guess this is why bio.cc is running slow currently, because mysql
> writes to NFS (and home dirs are also on NFS) . Not sure how to get
> round that easily... Do any other mysql servers use the same NFS
> mounted data directory? (That could be a problem).
>
Are you sure? Is mysql writing a NFS directory?
I see this from /etc/my.cnf
datadir=/BiO/Serve/LocalmySQL
And this from /etc/fstab:
/dev/md0 /BiO ext3 defaults 0 0
I don't think /BiO is NFS-mounted.
>
> Thanks for changing /etc/aliases (sorry, I always forget that is the
> place to edit). Looks like the first cron job got in successfully
> (yay!) before the change was made:
Let's see whether we get a cron mail for this job next time when it runs.
>
>
> BTW, feel free to play with the new PDB relational database (pdbase).
> We have some usage notes that I'll put somewhere visible soon.
>
Thanks and cheers.
Just edited - let's see how it goes.
I could see a new email in /var/spool/mail/pdb.
But it does't seem to relay properly to biocc-serve...@googlegroups.com
Maybe I need to see the biocc google-group setting - only members can
send emails?
Tested on my local account (su...@jaesu.bio.cc), which relayed
successfully to my gmail account.
Interesting! I'll add it to the list of members.
Just added to the member list.
Could you set up a dummy cron job to see whether p...@jaesu.bio.cc can
relay to biocc-server-interface?
Just to make things clear, my test above is done locally using mutt -
that's why the mail sender can recognize jaesu.bio.cc
But just added 'jaesu' as a CNAME of bio.cc
About godaddy, it's all about email forwarding - now set up for
su...@bio.cc and j...@bio.cc.
Do you like to add p...@bio.cc (same with p...@jaesu.bio.cc)?
I thought p...@bio.cc would be just sending only (to biocc-server-interface)
I tried via Mutt to begin with (as pdb)
Just tried a cron at 20:33 (server time)...
Just tried a cron at 20:38 (server time)...
> Just to make things clear, my test above is done locally using mutt -
> that's why the mail sender can recognize jaesu.bio.cc
> But just added 'jaesu' as a CNAME of bio.cc
>
> About godaddy, it's all about email forwarding - now set up for
> su...@bio.cc and j...@bio.cc.
> Do you like to add p...@bio.cc (same with p...@jaesu.bio.cc)?
> I thought p...@bio.cc would be just sending only (to biocc-server-interface)
:-/ ... don't know sorry! If you think it's a good idea :-)
Dan.