Re: How to process a stream of lines from a file one-by-one?

17 views
Skip to first unread message

Jimb Esser

unread,
Apr 25, 2015, 8:21:57 PM4/25/15
to nod...@googlegroups.com
I'm unfamiliar with the modules you're using, but on first glance it looks like you're piping the read stream to two different things, instead of chaining the pipes, I would think that :
rs.pipe(es.split())
rs
.pipe(es.map(...
perhaps should be:
rs.pipe(es.split())
 
.pipe(es.map(...

I've used "line-stream" [1] for this in the past myself, which is an alternative you could look into.


Hope this helps!
  Jimb Esser

On Friday, April 24, 2015 at 10:23:18 AM UTC-7, Marco Ippolito wrote:
Hi all,
I'm trying to extract in "streaming-mode" a list of lines from a txt file and than process them one-by-one in order, for example (or could be something else) to detect the language.

This is my code:

var path = require('path');
var fs = require('fs');
var util = require('util');
var stream = require('stream');
var es = require('event-stream');
var cld = require('cld')

var normalized_path = path.normalize(process.argv[2])
var unified_unique_urls_file_path = path.join(process.cwd(), normalized_path)
var unified_unique_urls_file_name = normalized_path + "_unified_unique.txt"
var unified_unique_urls_file = unified_unique_urls_file_path + "/" + unified_unique_urls_file_name

var lineNr = 1
var rs = fs.createReadStream(unified_unique_urls_file)

rs.pipe(es.split()) // split stream to break on newlines
rs.pipe(es.map(function(line) {
    (function() {
         callback(line)
    })();
})
.on('error', function() {
    console.log('Error while reading file.');
    })
.on('end', function() {
    console.log('Read entire file.');
    })
);

function callback(line) {
    var lineS = line.toString()
    var results_object = {};
    cld.detect(lineS, function(err, result) {
        //console.log(result);
        if (result != "undefined")
            results_object[lineS] = result["languages"][0]["code"]
        else
            results_object[lineS] = "undefined"
    });
    for (var key in results_object) {
        if (results_object.hasOwnProperty(key)) {
            //alert(key + " -> " + results_object[key]);
            console.log(key + " -> " + results_object[key]);
        }
    }
}

The problem is that it seems that the processing within the callback is executed only for the last line of the file:
time ./languageDetection.js "baruffaldi_spa"
http://aziende.corriere.it/2599852/baruffaldi-spa
.....
http://www.youinweb.it/profiles_it/20097/baruffaldi-(spa)_320924.htm
http://www.baruffaldi.it
 -> it

Any idea how to solve it?

Marco

Reply all
Reply to author
Forward
0 new messages