--
You received this message because you are subscribed to the Google Groups "BigDL User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigdl-user-gro...@googlegroups.com.
To post to this group, send email to bigdl-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigdl-user-group/378585cd-f50e-4ef5-8936-dcdea43c990c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
classifier = NNEstimator(lrModel, MultiLabelSoftMarginCriterion(), transformer, SeqToTensor([15])) \ .setLearningRate(0.003).setBatchSize(736).setMaxEpoch(2).setFeaturesCol("image")
creating: createMultiLabelSoftMarginCriterion creating: createSeqToTensor creating: createFeatureLabelPreprocessing creating: createNNEstimator
pipeline = Pipeline(stages=[classifier])
nnModel = pipeline.fit(trainingDF)
creating: createToTuple creating: createChainedPreprocessing
nnModel.transform(trainingDF).show(10)
delete key = 1be1d69b-963b-41f2-ab23-967ca68d9d7a 3 +----------------+--------------------+--------------+--------------------+--------------------+ | Image Index| image|Finding Labels| label| prediction| +----------------+--------------------+--------------+--------------------+--------------------+ |00000005_001.png|[hdfs://pNameNode...| No Finding|[0.0, 0.0, 0.0, 0...|[0.122650035, 0.0...| |00000005_004.png|[hdfs://pNameNode...| No Finding|[0.0, 0.0, 0.0, 0...|[0.012328666, 0.0...| |00000008_001.png|[hdfs://pNameNode...| No Finding|[0.0, 0.0, 0.0, 0...|[0.0055147237, 0....| |00000011_007.png|[hdfs://pNameNode...| Infiltration|[0.0, 0.0, 1.0, 0...|[0.003118499, 0.0...| |00000015_000.png|[hdfs://pNameNode...| No Finding|[0.0, 0.0, 0.0, 0...|[0.01412236, 0.00...| |00000022_000.png|[hdfs://pNameNode...| No Finding|[0.0, 0.0, 0.0, 0...|[0.02084559, 0.01...| |00000022_001.png|[hdfs://pNameNode...| Fibrosis|[0.0, 0.0, 0.0, 0...|[0.032384016, 0.0...| |00000039_001.png|[hdfs://pNameNode...| No Finding|[0.0, 0.0, 0.0, 0...|[0.0017941386, 0....| |00000040_001.png|[hdfs://pNameNode...| Emphysema|[0.0, 0.0, 0.0, 0...|[0.23027827, 0.00...| |00000042_005.png|[hdfs://pNameNode...| No Finding|[0.0, 0.0, 0.0, 0...|[0.005233792, 8.9...| +----------------+--------------------+--------------+--------------------+--------------------+ only showing top 10 rows
predictionDF = nnModel.transform(validationDF).cache()
delete key = e017f9ce-af62-4653-8b85-01f7d3374be5 3
predictionDF.select("Image Index","label","prediction").sort("label", ascending=False).show(10)
+----------------+--------------------+--------------------+ | Image Index| label| prediction| +----------------+--------------------+--------------------+ |00011379_040.png|[1.0, 1.0, 1.0, 1...|[0.013405059, 0.0...| |00025262_000.png|[1.0, 1.0, 1.0, 0...|[0.0056996667, 0....| |00014871_012.png|[1.0, 1.0, 1.0, 0...|[0.0013419994, 0....| |00020826_011.png|[1.0, 1.0, 1.0, 0...|[0.0135408975, 0....| |00020826_012.png|[1.0, 1.0, 1.0, 0...|[0.003606446, 0.0...| |00017055_001.png|[1.0, 1.0, 1.0, 0...|[0.0052660527, 0....| |00016805_014.png|[1.0, 1.0, 1.0, 0...|[0.005817896, 0.0...| |00009286_003.png|[1.0, 1.0, 1.0, 0...|[0.019827748, 0.0...| |00019395_016.png|[1.0, 1.0, 1.0, 0...|[0.0021730824, 0....| |00020860_001.png|[1.0, 1.0, 1.0, 0...|[0.03898971, 0.00...| +----------------+--------------------+--------------------+ only showing top 10 rows
predictionDF.select("Image Index","label","prediction").show(10)
+----------------+--------------------+--------------------+ | Image Index| label| prediction| +----------------+--------------------+--------------------+ |00000005_003.png|[0.0, 0.0, 0.0, 0...|[0.006877544, 0.0...| |00000010_000.png|[0.0, 0.0, 1.0, 0...|[0.0013657764, 0....| |00000011_006.png|[1.0, 0.0, 0.0, 0...|[0.046066444, 0.0...| |00000031_000.png|[0.0, 0.0, 0.0, 0...|[0.0026110203, 0....| |00000034_001.png|[0.0, 0.0, 0.0, 0...|[0.05101261, 0.04...| |00000046_000.png|[0.0, 0.0, 0.0, 0...|[0.005264407, 0.0...| |00000047_001.png|[0.0, 0.0, 0.0, 0...|[0.0036213757, 0....| |00000047_003.png|[1.0, 0.0, 0.0, 0...|[0.042358775, 0.0...| |00000050_002.png|[0.0, 0.0, 0.0, 0...|[0.118123546, 0.0...| |00000054_006.png|[0.0, 0.0, 1.0, 0...|[0.0027013535, 0....| +----------------+--------------------+--------------------+ only showing top 10 rows
correct = predictionDF.filter("label=prediction").count()
And the Error I am having is mentioned in my first post.
Thanks for posting your code! We've been looking into this.
The suggested solution is to convert the array<double> type column into array<float> type column by exploiting cast().
from pyspark.sql.types import *
predictionDF=predictionDF.withColumn(“label”, predictionDF[“label”].cast(ArrayType(FloatType())))
Then we can compare and filter out those rows with the same value for the “label” column and “prediction” column.
correct = predictionDF.filter("label=prediction").count()
Converting array<float> to array<double> is not recommended here. Since when converting float value to double value, the compiler converts the float value to the nearest double, i.e 1.0 -> 1.00000000002359. It could make values which should be equal not equal any more. Then you should set a threshold for the value difference then which is more cumbersome.