Source Code (2011) Imdb

0 views

Skip to first unread message

Siiri

unread,

Aug 3, 2024, 4:38:26 PM8/3/24

to nighclasunstal

After playing around with @Hawk's BASE64 discovery above, I found that everything after the BASE64 code is display info. If you remove everything between the last @ and .jpg it will load the image in the highest res it has.

Those poster images don't appear to have any correlation to the title page, so you'll have to retrieve the title page first, and then retrieve the img element for the page. The good news is that the img tag is wrapped in an a tag with name="poster". You didn't say what kind of tools you are using, but this basically a screen scraping operation.

Be aware tough, that the terms of service explicitly forbid screenscraping. You can download the IMDB database as a set of text files, but as I understand it, the IMDB movie ID is nowhere to be found in these text files.

I download the image using wget for many movies in a directory using this bash script. The mp4 files have names that the IMDB likes, and that's why the first search result is nearly guaranteed to be correct. Names like "Love Exposure (2008).mp4".

Here is my program to generate human readable html summary page for movie companies found on imdb page. Change the initial url to your liking and it generates a html file where you can see title, summary, score and thumbnail.

Objectives
a) To describe the linkage of the federal Immigration, Refugees and Citizenship Canada Permanent Resident (IRCC-PR) database with the Manitoba healthcare registry and b) compare data linkage methods and rates between four Canadian provinces accounting for interprovincial mobility of immigrants.

Conclusions
Despite variations in methodology, provincial linkage rates were relatively high. The use of a national immigration dataset for linkage to provincial repositories allows a more comprehensive linkage than that of province-specific subsets. Observed linkage rates can be biased downwards by interprovincial migration, and methods that use external data sources can contribute to assessing potential selection bias and misclassification.

The Canadian publicly-funded health care system guarantees universal access to basic health care services to all legal residents. The delivery of health care services is the responsibility of the provincial governments, which maintain their own registries. Individuals who register with the provincial or territorial health insurance plans are issued a provincial unique health card number. Because of the near universal coverage, provincial health insurance registries are deemed to be the most comprehensive population rosters. Unlike health care, immigration is managed at the federal level in Canada. Immigration, Refugees and Citizenship Canada (IRCC) is the federal immigration agency that maintains nationwide databases of all applications for immigration to Canada, including those of who were granted permanent residence. In the last decade, four Canadian provincial data repositories [Population Data BC (PopData BC) in British Columbia, Manitoba Centre for Health Policy (MCHP) in Manitoba, the NB Institute for Research, Data and Training (NB-IRDT) in New Brunswick and ICES (formerly the Institute for Clinical and Evaluative Sciences) in Ontario] entered into individual ongoing data sharing agreements with the federal immigration agency to link the national IRCC Permanent Resident database (IRCC-PR database) to provincial health care registries for health services research. This article describes the methods used to link the national IRCC-PR database with provincial health care rosters and its challenges.

Although the national IRCC-PR database has been linked at the national level to tax data, hospital data and a community survey [1], the linkage to provincial health insurance records makes it possible to conduct intersectoral research involving data from health, education, social assistance, justice, and other services that are administered by provincial agencies. Provincial research data repositories use an encrypted version of the health card number to relate de-identified information of multiple linked administrative databases for research in compliance with strict privacy, ethical and legal protocols. The linkage process consists in attaching a unique personal identifier to each individual in the incoming database (e.g. immigration data) to make it linkable to all other existing databases (e.g., hospitalisations, social services).

The objectives of this study were to A) describe the linkage of the IRCC-PR database with the Manitoba healthcare registry and B) compare linkage methods and rates between the IRCC-PR database and four provincial healthcare registries in Canada before and after accounting for interprovincial mobility of immigrants. Although parts A and B are relatively independent, for readers not familiar with Canadian linkages part A may serve as a background for part B.

The linkage was conducted by an MCHP analyst (RW) at the MHSAL offices in Winnipeg, Manitoba using blocking schemes to substantially reduce the number of comparisons made during the linkage process. The IRCC-PR database was divided into two groups; records with Manitoba as the intended destination and records with another Canadian province as the intended destination, 3.6%, and 96.4% respectively. The IRCC-PR database and the Registry were partitioned into subsets defined by the blocking variables. For each surname in the IRCC-PR database, records in that file and the Registry sharing that surname were grouped into a distinct subset, and sequential searches for matches were conducted within each subset. Both groups were processed using the same blocking schemes for deterministic, probabilistic, and manual matching (Figure 1). Because records of those not intending to settle in Manitoba were more numerous than records of those who intended to settle in Manitoba, the threshold for distinguishing between links and non-links was increased in the probabilistic passes for non-Manitoba records to reduce the likelihood of false positives.

Deterministic linkage involved an exact match on personal identifiers (e.g., last name, DOB) while probabilistic linkage involved comparing records on additional attributes (e.g., sex, region of residence) using LinkPro, a record linkage package [2]. To reduce misspellings of names in LinkPro, phonetic matches on surnames were conducted using Soundex coding, a name encoding algorithm. Matching with Soundex code on surnames was only done after exact matches on other variables. The likelihood of valid links was formalised by creating a linkage score, which added up the agreement or disagreement of all the linkage keys, weighted by their ability to discriminate between valid and invalid links. The linkage weight score generated by LinkPro was the basis of deciding which links to accept, reject or review. Links with weight between the set thresholds were reviewed manually by MCHP data analysts with extensive data linkage experience. After the linkage process was completed, all personal identifiers were removed, and each record was assigned a scrambled unique personal identifier.

Data quality assessments were conducted using the MCHP data evaluation framework to ensure that linked records corresponded to unique individuals, to identify likely false positives and that the data were accurate and plausible [3]. Duplicate landing records were removed by retaining the earliest landing record per individual, records with erroneous health insurance coverage (zero days or before birth), records with coverage that ended before landing date, and records with coverage that started after the end of our observation period (31st March 2019) (Figure 1). The observed linkage rate was calculated separately for the Manitoba and non-Manitoba records as the percentage of immigrants in the IRCC-PR database that linked to the same individuals in the Registry.

To assess differential selection bias resulting from the linkage, linkage rates were stratified according to various immigrant characteristics and standardised differences calculated between linked and unlinked individuals. This approach was used in the linkage of the IRCC in Ontario [4]. To assess whether immigrants to Manitoba were different from those settling in other parts of Canada, standardised differences were calculated between those linked to a Manitoba resident and those unlinked individuals who intended to settle in the rest of Canada.

Table 2 shows that the sociodemographic characteristics of linked (229,025 + 34,686 = 263,711) and unlinked individuals with intended destination Manitoba were generally similar, suggesting little or no bias associated with the linkage process for most characteristics. Large standardised differences (above 20%), indicating differences between unlinked and linked immigrants that are unlikely to be due to chance, were found among immigrants who intended to settle in Manitoba and were more common between 2005 and 2009. Other potential differences (standardised differences between 15% and 20%) included lower proportions of linked immigrants among those who landed in the 1990s, among refugees, and those born in Africa.

Linkage methods were similar across provinces; however, the IRCC-PR file received by provinces varied in terms of coverage and coverage years. Other differences include software used, methods for data quality assessment, and linkage completion date (Table 2).

Despite receiving the national dataset from IRCC, the first linkage between the IRCC-PR Database and the New Brunswick Medicare Registry was restricted to immigrants who indicated NB as their intended destination and those who did not have a specified destination. This decision was made to minimise false positives in the context of resource constraints. Because the Medicare Registry is the most complete, accurate, and up-to-date list of all NB residents across all government departments in the province, deterministic matching was the primary method used, followed by some manual matching of records that were categorised as having NB as the intended destination but could not be matched after various processes of deterministic matching were performed with the Power Query linkage software. To differentiate unmatched individuals due to matching errors from those who may have moved out of province, if no other family members in the same immigrant household were found to be present the unmatched individual was counted as never arrived in the province. A study found that about 89% of unmatched primary applicants lived in a household where no family members could be matched [6]. Individuals who were linked from the IRCC-PR database to Medicare Registry records were assigned their corresponding unique scrambled identifier that enabled the file to be linked with other data files held at NB-IRDT. Duplicate records were removed by retaining the record with the earliest landing date.