COAL Updates and Questions

5 views

Skip to first unread message

Alexa Huerta

unread,

Sep 27, 2018, 8:15:26 PM9/27/18

to Coal-capstone, Lewis John Mcgibbney

Hi Dr. McGibbney,

Alex and I found a slightly smaller image file to run through the pycoal library commands, however it has still been taking a long time and has not been able to complete the generation of the mineral classification file. Would you be able to send us the archived files that have been generated previously through pycoal so that we can use those to begin working on the staging process?

Additionally, we had some questions about the staging process:

1. Our understanding is we are staging GIS products on Google Drive - which account do we use for staging? Is it connected to the COAL Google Group?

2. Is there a limited storage space on Google Drive? If so, how should we handle that?

3 Which files are we staging? Everything that is generated through the pycoal process or only certain ones?

4. To clarify, we should be automating the process of staging the GIS products into Google Drive?

Denim and Kristen also have the following questions for you regarding File Manager and extracting meta-data:

1. What needs to be done for the metadata extractors by this next week?

2. Could you provide the repo on where to write the metadata extractors?

Finally, we are working on creating a high-level design document as our third deliverable for our class. We will send it to you Monday, and ask that you submit your approval to our professor then.

Thank you for your help.

Best,

Alexa Huerta

Alexa Huerta

University of Southern California

B.S. Computer Science, December 2018

alex...@usc.edu

Lewis John Mcgibbney

unread,

Sep 28, 2018, 5:01:44 PM9/28/18

to Alexa Huerta, Coal-capstone

Hi Team,

Back from my business travel, thank you for writing with questions.

Responses below

On Thu, Sep 27, 2018 at 5:15 PM Alexa Huerta <alex...@usc.edu> wrote:

Hi Dr. McGibbney,

Alex and I found a slightly smaller image file to run through the pycoal library commands, however it has still been taking a long time and has not been able to complete the generation of the mineral classification file.

How long were you running this for?

What OS do you have? Which Python version? How much RAM do you have available?

Would you be able to send us the archived files that have been generated previously through pycoal so that we can use those to begin working on the staging process?

Yes, please see

https://drive.google.com/open?id=1YVhdLxvrZE3eC97OEXathLMJRgWt8haO

With QGIS you can view, the RBG, mineral classified, mining classified and environmentally correlated images, please confirm you can do this.

Thank you

Additionally, we had some questions about the staging process:

1. Our understanding is we are staging GIS products on Google Drive - which account do we use for staging?

This is not strictly true no. I wanted you to stage some products merely so that the File Management team could grab the quickly and begin working with them in the File Manager. As we've seen in the File Manager demo's so far, data is staged into directory which resolves to $coal-sds-deploy/data/staging/

For the time being however, you can use the following Drive account

https://drive.google.com/open?id=1dRJKNIQycw9rFA9Y1d2FxyG3o-1LjlBb

Is it connected to the COAL Google Group?

The above one is yes.

2. Is there a limited storage space on Google Drive? If so, how should we handle that?

Once we have an automated mechanism established for acquiring imagery, we will set that up on a local machine to validate it. Once validated we will configure a AWS EC2 instance which will enable us to crank through the data retrieval issue with no problems. We can stage products, process them (whilst additionally recording a FTP-URL to the source product) and then delete the source products so we save space.

3 Which files are we staging? Everything that is generated through the pycoal process or only certain ones?

Absolutely everything, The scenario is that a scientist can inspect absolutely everything generated by Pycoal.

4. To clarify, we should be automating the process of staging the GIS products into Google Drive?

Into a staging drive. Forget about Google. Just consider the staging location as some remote resource, somewhere. We use common protocols for acquiring data.

If you need clarification on any of the above let me know.

Denim and Kristen also have the following questions for you regarding File Manager and extracting meta-data:
1. What needs to be done for the metadata extractors by this next week?

https://github.com/capstone-coal/coal-sds/issues/6 the conversation trail can be seen there. Any further questions on actual issues should be recorded in Github issue tracker.

2. Could you provide the repo on where to write the metadata extractors?

The metadata extraction logic largely already exists... it just needs to be configured in COAL-SDS. The OODT documentation on extractors exists at https://cwiki.apache.org/confluence/display/OODT/Metadata+Extractors

We will be using the Apache Tika Command Line Metadata Extractor

https://github.com/apache/oodt/blob/master/metadata/src/main/java/org/apache/oodt/cas/metadata/extractors/TikaCmdLineMetExtractor.java

Finally, we are working on creating a high-level design document as our third deliverable for our class. We will send it to you Monday, and ask that you submit your approval to our professor then.

Sounds great thanks folks.

Lewis

Reply all

Reply to author

Forward

0 new messages