Failed to import data

51 views
Skip to first unread message

Ragunath V

unread,
Apr 1, 2021, 4:35:11 AM4/1/21
to cloud-nl-discuss
  Hi,

Am new to NL entity extraction. am following the given example to generate JSONL and trying to import from cloud storage using AutoML UI. but am getting following error as below. Please correct me if anything am doing wrong.

Failed to import data

The attempted action failed, please try again.

Tracking number: c6581337541606915


I have attached the JSONL content as below:

{
  "annotations": [
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 67,
          "start_offset": 62
        }
      },
      "display_name": "Modifier"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 158,
          "start_offset": 141
        }
      },
      "display_name": "SpecificDisease"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 330,
          "start_offset": 290
        }
      },
      "display_name": "SpecificDisease"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 337,
          "start_offset": 332
        }
      },
      "display_name": "SpecificDisease"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 627,
          "start_offset": 610
        }
      },
      "display_name": "Modifier"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 754,
          "start_offset": 749
        }
      },
      "display_name": "Modifier"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 875,
          "start_offset": 865
        }
      },
      "display_name": "Modifier"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 968,
          "start_offset": 951
        }
      },
      "display_name": "Modifier"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 1553,
          "start_offset": 1548
        }
      },
      "display_name": "Modifier"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 1652,
          "start_offset": 1606
        }
      },
      "display_name": "CompositeMention"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 1833,
          "start_offset": 1826
        }
      },
      "display_name": "DiseaseClass"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 1860,
          "start_offset": 1843
        }
      },
      "display_name": "SpecificDisease"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 1930,
          "start_offset": 1913
        }
      },
      "display_name": "SpecificDisease"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 2129,
          "start_offset": 2111
        }
      },
      "display_name": "SpecificDisease"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 2188,
          "start_offset": 2160
        }
      },
      "display_name": "SpecificDisease"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 2260,
          "start_offset": 2243
        }
      },
      "display_name": "Modifier"
    },
    {
      "text_extraction": {
        "text_segment": {
          "end_offset": 2356,
          "start_offset": 2339
        }
      },
      "display_name": "Modifier"
    }
  ],
  "text_snippet": {
    "content": "10051005\tA common MSH2 mutation in English and North American HNPCC families:
      origin, phenotypic expression, and sex specific differences in colorectal cancer .\tThe
      frequency , origin , and phenotypic expression of a germline MSH2 gene mutation previously
      identified in seven kindreds with hereditary non-polyposis cancer syndrome (HNPCC) was
      investigated . The mutation ( A-- > T at nt943 + 3 ) disrupts the 3 splice site of exon 5
      leading to the deletion of this exon from MSH2 mRNA and represents the only frequent MSH2
      mutation so far reported . Although this mutation was initially detected in four of 33
      colorectal cancer families analysed from eastern England , more extensive analysis has
      reduced the frequency to four of 52 ( 8 % ) English HNPCC kindreds analysed . In contrast ,
      the MSH2 mutation was identified in 10 of 20 ( 50 % ) separately identified colorectal
      families from Newfoundland . To investigate the origin of this mutation in colorectal cancer
      families from England ( n = 4 ) , Newfoundland ( n = 10 ) , and the United States ( n = 3 ) ,
      haplotype analysis using microsatellite markers linked to MSH2 was performed . Within the
      English and US families there was little evidence for a recent common origin of the MSH2
      splice site mutation in most families . In contrast , a common haplotype was identified
      at the two flanking markers ( CA5 and D2S288 ) in eight of the Newfoundland families .
      These findings suggested a founder effect within Newfoundland similar to that reported by
      others for two MLH1 mutations in Finnish HNPCC families . We calculated age related risks
      of all , colorectal , endometrial , and ovarian cancers in nt943 + 3 A-- > T MSH2 mutation
      carriers ( n = 76 ) for all patients and for men and women separately . For both sexes combined ,
      the penetrances at age 60 years for all cancers  and for colorectal cancer were 0 . 86 and 0 . 57 ,
      respectively . The risk of colorectal cancer was significantly higher ( p < 0.01 ) in males
      than females ( 0 . 63 v 0 . 30 and 0 . 84 v 0 . 44 at ages 50 and 60 years , respectively ) .
      For females there was a high risk of endometrial cancer ( 0 . 5 at age 60 years ) and premenopausal
      ovarian cancer ( 0 . 2 at 50 years ) . These intersex differences in colorectal cancer risks
      have implications for screening programmes and for attempts to identify colorectal cancer
      susceptibility modifiers .\n "
  }
}

tielve

unread,
Apr 7, 2021, 11:57:11 AM4/7/21
to cloud-nl-discuss
Hi,

You need to import is a CSV file containing all the JSONL files paths.
Example of the CSV file content:
```
gs://My_Bucket/sample1.jsonl
gs://My_Bucket/sample2.jsonl
```

Documentation references:
You import training data into AutoML Natural Language using a CSV file that lists the documents and optionally includes their category labels or sentiment values. AutoML Natural Language creates a dataset from the listed documents.

Once you have collected all of your training documents, create a CSV file that lists them all.

Regards,
Kevin
Reply all
Reply to author
Forward
0 new messages