1. How can I view how many blocks has a file been broken into, in a Hadoop file system?
2. What is the difference between Scale out and Scale up?
3. What is batch processing in Hadoop?
4. What is OLAP and OLTP?
the most commonly misunderstood terms is OLTP and OLAP. What are OLTP and OLAP? OLTP means Online Transaction Processing. OLAP means Online Analytical Processing. The meanings are synonymous with their names. OLTP deals with processing of data from transactional systems. For example, an application that loads the reservation data of a hotel is an OLTP system. An OLTP system is designed mainly keeping in the mind the performance of the end application. It comprises of the application, database & the reporting system that directly works on this database. The database in an OLTP system would be designed in a manner as to facilitate the improvement in the application efficiency thereby reducing the processing time of the application. For example, consider a hotel reservation application. The database of such an application would be designed mainly for faster inserts of the customer related data. It would also be designed in a manner as to get a faster retrieval of the hotel room availability information.
Such a database is part of an OLTP system. Whenever a reporting tool is made to work on such an application database then it forms the OLTP system. Generally, an OLTP system refers to the type of database a reporting tool works on. During the 1980’s, many applications were developed to cater to needs of many upcoming organizations. All the applications required a database to process, load & extract the data. The entire database was designed keeping in mind the performance of the end application. As the organizations developed, they felt the importance of analyzing the data that was collected. The analysis performed on such data resulted in alarming number of findings that helped the organizations to make important business decisions. Hence, more need was felt to develop full reporting solutions. This is the period when more & more reporting tools came into the market. But the performance of these reporting tools was very poor since they were made to extract data from a system/database that was mainly developed keeping in mind the performance of the application.
There were 2 main reasons why a DW came into being
1. Decrease in the performance of front-end applications as more & more data was collected. A need to isolate older data was felt.
2. Importance of reporting was felt in equal terms among all organizations. Existing reporting systems were poor since they had to work on existing application databases.
OLAP systems were mainly developed using data in a warehouse. Having said that a need was felt to isolate older data, it was necessary to store them in a format that would be useful in easing out the reporting bottlenecks. A need was felt to isolate the data & redesign the application data to such a format & structure that this data repository would be the prime source of business decisions. Coming back to OLAP systems, these systems were mainly developed on the isolated data. The isolated data provided a means for faster, easier & efficient reporting. OLAP system need not always do reporting from a DW. The criterion is that it must be doing reporting from a system/database that does not involve ongoing transactions. For example, some organizations create something called an ODS (Operational Data Store) which would be a replica of the transactional data. This data store would then be used for reporting. But generally OLAP is synonymous with a Data Warehouse.
SEQ is a flat file consisting of binary key/value pairs. It is extensively used in MapReduce as input/output formats.
Note- internally, the temporary outputs of maps are stored using SequenceFile.
The SequenceFile provides a Writer, Reader and Sorter classes for writing, reading and sorting respectively.
There are 3 different SequenceFile formats:
The recommended way is to use the SequenceFile.createWriter methods to construct the 'preferred' writer implementation.
The SequenceFile.Reader acts as a bridge and can read any of the above SequenceFile formats.
Ex- You can read them in the following manner:
Configuration config = new Configuration();
Path path = new Path(PATH_TO_YOUR_FILE);
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value))
// perform some operating
reader.close();
Ref - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/SequenceFile.html
--
You received this message because you are subscribed to the Google Groups "HadoopOnlineTraining9" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetrai...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetraining9+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "HadoopOnlineTraining9" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetraining9+unsub...@googlegroups.com.
The maximum size of the thread pool. When the pending request queue overflows, new threads are created until their number reaches this number. After that, the server starts dropping connections.
Default: 1000
--
You received this message because you are subscribed to the Google Groups "HadoopOnlineTraining9" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetrai...@googlegroups.com.
mapred-site.xml, core-site.xml, hdfs-site.xml) still free from custom configuration.To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetraining9+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetrai...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetraining9+unsubscri...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "HadoopOnlineTraining9" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetraining9+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetrai...@googlegroups.com.
Sorry Pavan it is not indexing.... it is ranking
Pawan two Qeustions
Thanks & RegardsPavan Reddy .G
--
You received this message because you are subscribed to the Google Groups "HadoopOnlineTraining9" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hadooponlinetraining9+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--Thanks & RegardsPavan Reddy .G