Research ready Lung Data

Research ready lung data

Overview

This data dictionary describes the contents of the research ready lung cancer data set. This data set is made up of all Queensland non-small cell lung cancer (NSCLC) diagnoses between 2000 and 2019. The fields have been arranged in four categories as follows:

  • Demographic,
  • Clinical,
  • Treatment, and
  • Administrative

and a dataset sample is provided containing 10 example records. For detailed reference tables containing ICD Codes for clinical data items such as Primary Site, Morphology and Procedure codes, please see Reference Sets.

Population-wide data on stage is available for the first time, with the most recent years 2017-2019 containing TNM stage information. Multi-modal treatment data (Surgery, Radiation therapy, Intravenous Systemic Therapy) is available for patients who underwent treatment for their diagnosis of lung cancer, along with details of known MDT presentations.

All of the data within this collection comes from the Queensland Oncology Repository (QOR), a cancer patient database developed and maintained by the Queensland Cancer Control Analysis Team (QCCAT; Queensland Health) to support Queensland’s cancer control, safety, and quality assurance initiatives. QOR consolidates cancer patient information for the state and contains data on diagnoses and deaths, surgery, chemotherapy, and radiotherapy. For more information, visit our website.

Click here to request access

Demographic

Demographic fields in ths dataset include the age, sex and Indigenous status of the person diagnosed with cancer, as well as geographic and socio-economic data based on the person’s place of residence at diagnosis.

Field title Field name Definition Field values
Age at diagnosis AgeAtDiagnosis The age of the person in (completed) years at a specific point in time when first diagnosed  
Age group at diagnosis (code) AgeGroupFiveYearsKey The five-year age group the person belonged to at a specific point in time when first diagnosed  
Age group at diagnosis (description) AgeGroupFiveYears The five-year age group the person belonged to at a specific point in time when first diagnosed  
Date of birth BirthDate The date on which an individual was born  
Date of death DeathDate The date of death of the person  
Hospital and Health Service (code) HHSOfResidenceKey Queensland Hospital and health service geographic region at diagnosis  
Hospital and Health Service (description) HHSOfResidence Queensland Hospital and health service geographic region at diagnosis Cairns and Hinterland
Central Queensland
Central West
Darling Downs
Gold Coast
Mackay
Metro North
Metro South
North West
South West
Sunshine Coast
Torres and Cape
Townsville
West Moreton
Wide Bay
Indigenous status (code) IndigenousStatusID A measure of whether a person identifies as being of Aboriginal or Torres Strait Islander origin  
Indigenous status (description) IndigenousStatus A measure of whether a person identifies as being of Aboriginal or Torres Strait Islander origin Indigenous
non-Indigenous
Not Stated/Unknown
Remoteness area Remoteness The remoteness of residence at time of diagnosis. Major City
Inner Regional
Outer Regional
Remote & Very Remote
Sex (code) SexID The biological distinction between male and female  
Sex (description) Sex The biological distinction between male and female  
Socioeconomic status DecileID

Socio-Economic Indexes for Areas (SEIFA), a census-based measure of social and economic well-being developed by the Australian Bureau of Statistics (ABS) and aggregated at the level of Statistical Area 2 (SA2). (See summary table)

The Index of Relative Socioeconomic Advantage and Disadvantage is used (IRSAD)
1 (Low)
.
.
.
10 (High)
Socioeconomic status DecileName

Socio-Economic Indexes for Areas (SEIFA), a census-based measure of social and economic well-being developed by the Australian Bureau of Statistics (ABS) and aggregated at the level of Statistical Area 2 (SA2). (See summary table)

The Index of Relative Socioeconomic Advantage and Disadvantage is used (IRSAD)
1
2
3
4
5
6
7
8
9
10
Socioeconomic status (group) SocioeconomicStatus Socio-Economic Indexes for Areas (SEIFA), a census-based measure of social and economic well-being developed by the Australian Bureau of Statistics (ABS) and aggregated at the level of Statistical Area 2 (SA2). (See summary table) Affluent
Disadvantaged
Middle
Unknown

Statistical Area 2

(SA2 Residence at Diagnosis)

ASGS_SA2Code A designated region describing location and contact details that represents a medium-sized area built from a number of Statistical Area 1, as represented by a code. The aim is to represent a community that interacts together socially and economically See ABS website for details

Statistical Area 2

(SA2 Residence at Diagnosis)

ASGS_SA2Description A designated region describing location and contact details that represents a medium-sized area built from a number of Statistical Area 1. The aim is to represent a community that interacts together socially and economically See ABS website for details

Clinical

The clinical fields contained in this dataset include information about the cancer diagnosed. This includes date of diagnosis, the clinical features of the tumour such as size, morphology, nodal status and the presence of metastases. For the first time, clinical stage is included in this data. Full staging information is available only for patients diagnosed 2017-2019, although Stage IV patients have been identified from 2000-onwards. This data is comprehensive and has been obtained from clinical audits conducted by clinicians and coders, leading to the development of automated algorithms used on prospective data to glean stage routinely for new diagnoses.

Field title Field name Definition Field values
Cancer-related death DeathEventCauseSpecific Was the person’s death caused by their cancer? 0 - No
1 - Yes
Date of diagnosis DiagnosisDate The date a disease or condition is diagnosed  
Differentiation (code) DifferentiationKey The histological grade of the cancer tissue in a person with cancer 1
2
3
4
98
99
Differentiation (description) Differentiation The histological grade of the cancer tissue in a person with cancer

Well Differentiated / Low Grade / Grade 1

Moderately Differentiated / Intermediate Grade / Grade 2

Not Applicable

Not Stated/Unknown

Poorly Differentiated / High Grade / Grade 3

Undifferentiated / Anaplastic / Grade 4

Morphology of cancer (code) MorphologyCode The histological classification of the cancer tissue (histopathological type) in a person with cancer, and a description of the course of development that a tumour is likely to take: benign or malignant (behaviour), as represented by a code. See Appendix A
Morphology of cancer (description) Morphology The histological classification of the cancer tissue (histopathological type) in a person with cancer, and a description of the course of development that a tumour is likely to take: benign or malignant (behaviour). See Appendix A
Morphology of cancer (group) MorphologyGroup High level grouping of the morphologies Adenocarcinomas
Other Specific Carcinomas
Squamous Carcinomas
Unspecified Carcinomas (NOS)
Most valid basis of diagnosis of cancer (code) DiagnosisBasisKey The most reliable basis of a cancer diagnosis 0
1
2
3
4
5
6
7
8
9
10
11
12
9999
Most valid basis of diagnosis of cancer (description) DiagnosisBasis The most reliable basis of a cancer diagnosis

Clinical Investigations
Clinical Only
Cytology or Haematology
Exploratory Surgery
Histology (unknown if Primary or Metastasis)
Histology of Metastasis
Histology of Primary Tumour
Not Stated/Unknown
Specific Tumour Markers (Biochemical or Immunological Testing)

Number of comorbidities ComorbidityCount A grouping of clinical conditions that has the potential to significantly affect a cancer patient’s prognosis. Numeric value
Number of comorbidities (grouped) ComorbidityCountGroup A grouping of clinical conditions that has the potential to significantly affect a cancer patient’s prognosis. 0
1
2+
Performance status (code) PerformanceStatusCode Performance status recorded in QOOL at time of MDT (code) 0
1
2
3
4
99
Performance status (description) PerformanceStatus Performance status recorded in QOOL at time of MDT (description) Fully active
Ambulatory - capable of light work
Bed < 50% - self caring - not working
Bed > 50% - partially self caring
Confined to bed or chair
Unknown
Primary site of cancer (description) PrimarySite The site of origin of the tumour, as opposed to the secondary or metastatic sites. Bronchus or lung
Lower lobe, bronchus or lung
Main bronchus
Middle lobe, bronchus or lung
Overlapping lesion of bronchus and lung
Trachea
Upper lobe, bronchus or lung
Primary site of cancer (group) PrimarySiteGroup High level grouping of the sites in which the tumour originated in a person with cancer NSCLC
Primary site of cancer (ICD-10-AM code) PrimarySiteCode The site of origin of the tumour, as opposed to the secondary or metastatic sites, as represented by an ICD-10-AM code. C33
C340
C341
C342
C343
C348
C349
Underlying cause of death CauseOfDeath The cause of death of the person as represented by an ICD-10-AM code.  

Treatment

Treatment data items fall into 4 sub-categories as follows: Multidisciplinary Team meetings (MDT), Surgical procedures, Radiation therapy (RT), and Intra-venous systemic therapy (IVST). For detailed information regarding clinical data items, please see Reference Sets.

MULTIDISCIPLINARY TEAM MEETINGS (MDT)

MDT data is present from 2000 onwards. Initial MDT data is limited to data from a project run at The Prince Charles Hospital, however data sourced from QOOL is available from 2009. MDTs also provide many of the data items required for staging.

Field title Field name Definition Field values
Had MDT review HadMDTReview Record of MDT presentation as recorded by QOOL No
Yes

SURGICAL DATA

Data items related to surgery and the admission during which the procedure was performed are outlined below. Surgical treatment for lung cancer is complex, and procedures fall into three broad groupings. For details of the procedure codes that fall under these categories, please see Appendix A.

Field title Field name Definition Field values
ASA score (procedure) ProcASAScore A score used assess and communicate a patient’s pre-anesthesia medical co-morbidities. The classification system alone does not predict the perioperative risks, but used with other factors (eg, type of surgery, frailty, level of deconditioning), it can be helpful in predicting perioperative risks. 1
2
3
4
5
9
Date of admission AdmissionDate Date admitted to a facility for procedure  
Date of discharge DischargeDate Date discharged from a facility after procedure  
Date of procedure ProcedureDate The date on which a clinical intervention was performed during an inpatient episode of care  
Death in hospital DeathInpatient Death in hospital following surgery No
Yes
Elective status of admission ProcElectiveStatus Denotes if admission was elective or emergency Elective admission
Emergency admission
Not assigned
Facility capability score (procedure) ProcedureFacilityCSCF High level grouping of hospital capability for service delivery 3
4
5
6
Facility peer group (procedure) ProcFacPeerGrp High level grouping of AIHW peer group that does not distinguish between public and private  
Length of stay ProcedureLOS The number of days a patient was in hospital during the admission for their procedure  
Patient received treatment for their cancer HadTreatment Did the patient receive treatment for their cancer No
Yes
Patient travelled for surgery TravelledForSurgery Was the surgery performed in a facility within the same HHS that the person lives in No
Yes
Patient underwent surgery IsSurgery Did the patient have surgery for their cancer No
Yes
Procedure code ProcedureCode A clinical intervention represented by a ICD-10-AM 11th Edtition code 3843800
3843801
3843802
3844000
3844001
3844100
3844101
9016900
Procedure group name ProcedureGroupName A group of clinical interventions Lobectomy
Partial Resection
Pneumonectomy
Procedure name ProcedureName A description of the clinical intervention (ICD-10-AM 11th Edtition) Endoscopic wedge resection of lung
Lobectomy of lung
Pneumonectomy
Radical lobectomy
Radical pneumonectomy
Radical wedge resection of lung
Segmental resection of lung
Wedge resection of lung

 

RADIATION THERAPY (RT)

Field title Field name Definition Field values
Date of radiotherapy (end) FirstRTEndDate Date radiation therapy was completed  
Date of radiotherapy (start) FirstRTStartDate Date radiation therapy was first received  
Death within 30 days of radiotherapy RT30DaysToDeath Did the patient receive radiotherapy in the 30 days prior to death No
Yes
Facility type (radiotherapy) FirstRTFacilityType Facility type (public/private) where radiation therapy was delivered  
Patient received adjuvant radiotherapy HadPostProcRT   No
Yes
Patient received RT HadRT Did the patient receive radiotherapy as treatment for their cancer No
Yes
Patient received RT before surgery HadPreProcRT   No
Yes
Treatment intent (final RT) LastRTIntent The intent of the course of radiation therapy (curative/palliatitve) Curative
Palliative
Treatment intent (radiotherapy) FirstRTIntent The intent of the course of radiation therapy (curative/palliatitve) Curative
Palliative

IV SYSTEMIC THERAPY

Field title Field name Definition Field values
Date of IVST (end) FirstCTEndDate Date IV systemic therapy completed  
Date of IVST (start) FirstCTStartDate Date IV systemic therapy began  
Death within 30 days of IVST CT30DaysToDeath Did the person receive IV systemic therapy in the 30 days prior to death No
Yes
Patient received IVST HadCT Did the person receive IV systemic therapy for their cancer No
Yes

Administrative

Field title Field name Definition Field values
Censor date SurvivalCensorDate Patients followed up until this date  
Patient identifier UniqueID Unique identifier for each person in the dataset  

Sample data

This sample contains 10 sample records intended to show the type of data available and the format in which the data is presented. This will allow researchers to prepare load scripts and analysis programs in advance of downloading the full data set.

Click here to download the sample data in Excel.

 

Reference sets

AGE GROUPS

The five-year age group the person belonged to at a specific point in time when first diagnosed

Age group key Age group
1 0-4
2 5-9
3 10-14
4 15-19
5 20-24
6 25-29
7 30-34
8 35-39
9 40-44
10 45-49
11 50-54
12 55-59
13 60-64
14 65-69
15 70-74
16 75-79
17 80-84
18 85+

SEX

The biological distinction between male and female.

Reference code Short description Long description
1 MALE MALE
2 FEMALE FEMALE
3 OTHER OTHER
9 NOT STATED/INADEQUATELY DESCRIBED NOT STATED/INADEQUATELY DESCRIBED

INDIGENOUS STATUS

A measure of whether a person identifies as being of Aboriginal or Torres Strait Islander origin.

Reference code Short description Long description
1 Aboriginal but not Torres Strait Islander origin Aboriginal but not Torres Strait Islander origin
2 Torres Strait Islander but not Aboriginal origin Torres Strait Islander but not Aboriginal origin
3 Both Aboriginal and Torres Strait Islander origin Both Aboriginal and Torres Strait Islander origin
4 Neither Aboriginal nor Torres Strait Is. Origin Neither Aboriginal nor Torres Strait Islander origin
9 Not Stated / Unknown Not Stated / Unknown

PRIMARY SITE

The site of origin of the tumour, as opposed to the secondary or metastatic sites, as represented by an ICD-10-AM code.

Primary site code Primary site punctuated Short description Long description Group
C33 C33 Trachea Malignant neoplasm of trachea Lung
C340 C34.0 Main bronchus Malignant neoplasm of main bronchus Lung
C341 C34.1 Upper lobe Malignant neoplasm of upper lobe, bronchus or lung Lung
C342 C34.2 Middle lobe Malignant neoplasm of middle lobe, bronchus or lung Lung
C343 C34.3 Lower lobe Malignant neoplasm of lower lobe, bronchus or lung Lung
C348 C34.8 Overlapping lesion of lung Overlapping malignant lesion of bronchus and lung Lung
C349 C34.9 Lung Malignant neoplasm of bronchus or lung, unspecified Lung

MORPHOLOGY

The histological classification of the cancer tissue (histopathological type) in a person with cancer, and a description of the course of development that a tumour is likely to take: benign or malignant (behaviour).

Morphology code Short description Group
81403 Adenocarcinoma Adenocarcinomas
82003 Adenoid cystic carcinoma Adenocarcinomas
82013 Cribriform carcinoma Adenocarcinomas
82113 Tubular adenocarcinoma Adenocarcinomas
82503 Bronchiolo-alveolar adenocarcinoma Adenocarcinomas
82513 Alveolar adenocarcinoma Adenocarcinomas
82523 Bronchiolo-alveolar carcinoma, non-mucinous Adenocarcinomas
82533 Bronchiolo-alveolar carcinoma, mucinous Adenocarcinomas
82543 Bronchiolo-alveolar carcinoma, mixed mucinous and non-mucinous Adenocarcinomas
82553 Adenocarcinoma with mixed subtypes Adenocarcinomas
82603 Papillary adenocarcinoma Adenocarcinomas
82633 Adenocarcinoma in tubulovillous adenoma Adenocarcinomas
83103 Clear cell adenocarcinoma Adenocarcinomas
83233 Mixed cell adenocarcinoma Adenocarcinomas
84303 Mucoepidermoid carcinoma Adenocarcinomas
84803 Mucinous adenocarcinoma Adenocarcinomas
84813 Mucin-producing adenocarcinoma Adenocarcinomas
84903 Signet ring cell carcinoma Adenocarcinomas
85503 Acinar cell carcinoma Adenocarcinomas
85723 Adenocarcinoma with spindle cell metaplasia Adenocarcinomas
85743 Adenocarcinoma with neuroendocrine differentiation Adenocarcinomas
85763 Hepatoid adenocarcinoma Adenocarcinomas
80303 Giant cell and spindle cell carcinoma Other Specific Carcinomas
80313 Giant cell carcinoma Other Specific Carcinomas
80323 Spindle cell carcinoma Other Specific Carcinomas
80333 Pseudosarcomatous carcinoma Other Specific Carcinomas
80463 Non-small cell carcinoma Other Specific Carcinomas
82303 Solid carcinoma Other Specific Carcinomas
82443 Mixed adenoneuroendocrine carcinoma Other Specific Carcinomas
82453 Adenocarcinoid tumour Other Specific Carcinomas
82463 Neuroendocrine carcinoma Other Specific Carcinomas
85603 Adenosquamous carcinoma Other Specific Carcinomas
80523 Papillary squamous cell carcinoma Squamous Carcinomas
80703 Squamous cell carcinoma Squamous Carcinomas
80713 Squamous cell carcinoma, keratinising Squamous Carcinomas
80723 Squamous cell carcinoma, large cell, nonkeratinising Squamous Carcinomas
80733 Squamous cell carcinoma, small cell, nonkeratinising Squamous Carcinomas
80743 Squamous cell carcinoma, spindle cell Squamous Carcinomas
80753 Squamous cell carcinoma, adenoid Squamous Carcinomas
80763 Squamous cell carcinoma, microinvasive Squamous Carcinomas
80833 Basaloid squamous cell carcinoma Squamous Carcinomas
80843 Squamous cell carcinoma, clear cell type Squamous Carcinomas
81233 Basaloid carcinoma Squamous Carcinomas
80103 Carcinoma Unspecified Carcinomas (NOS)
80123 Large cell carcinoma Unspecified Carcinomas (NOS)
80133 Large cell neuroendocrine carcinoma Unspecified Carcinomas (NOS)
80143 Large cell carcinoma with rhabdoid phenotype Unspecified Carcinomas (NOS)
80203 Carcinoma, undifferentiated Unspecified Carcinomas (NOS)
80213 Carcinoma, anaplastic Unspecified Carcinomas (NOS)
80223 Pleomorphic carcinoma Unspecified Carcinomas (NOS)
80503 Papillary carcinoma Unspecified Carcinomas (NOS)

PROCEDURE CODES

The following surgical procedures are contained within this dataset.

Procedure code Procedure name Group
3843801 Lobectomy of lung Lobectomy of lung
3844100 Radical lobectomy Lobectomy of lung
3843800 Segmental wedge resection of lung Partial Resection
3844000 Wedge resection of lung Partial Resection
3844001 Radical wedge resection of lung Partial Resection
9016900 Endoscopic wedge resection of lung Partial Resection
3843802 Pneumonectomy Pneumonectomy
3844101 Radical pneumonectomy Pneumonectomy

DIAGNOSIS BASIS

The most reliable basis of a cancer diagnosis.

Diagnosis basis code Short description Long description Group
5 Cytology or Haematology Cytology: Examination of cells from a primary or secondary site, including fluids aspirated by endoscopy or needle; also includes the microscopic examination of peripheral blood and bone marrow aspirates Histological
6 Histology of Metastasis Histology of metastasis: Histological examination of tissue from a metastasis, including autopsy specimens Histological
7 Histology of Primary Tumour Histology of a primary tumour: Histological examination of tissue from primary tumour, however obtained, including all cutting techniques and bone marrow biopsies; also includes autopsy specimens of primary tumour Histological
8 Histology (unknown if Primary or Metastasis) Histology: either unknown whether of primary or metastatic site, or not otherwise specified Histological
0 Death certificate only Death certificate only: Information provided is from a death certificate Other
1 Clinical Only Clinical: Diagnosis made before death, but without any of the following (codes 2-7) Other
2 Clinical Investigations Clinical investigation: All diagnostic techniques, including x-ray, endoscopy, imaging, ultrasound, exploratory surgery (e.g. laparotomy), and autopsy, without a tissue diagnosis Other
4 Specific Tumour Markers (Biochemical or Immunological Testing) Specific tumour markers: Including biochemical and/or immunological markers that are specific for a tumour site Other
9 Not Stated/Unknown Unknown Other

SOCIOECONOMIC STATUS

Socioeconomic status is based on the Socio-Economic Indexes for Areas (SEIFA), a census-based measure of social and economic well-being developed by the Australian Bureau of Statistics (ABS) and aggregated at the level of Statistical Local Areas (SLA).

The ABS use SEIFA scores to rank regions into ten groups (deciles) numbered one to ten, with one being the most disadvantaged and ten being the most affluent group.

This ranking is useful at the national level, but the number of people in each decile often becomes too small for meaningful comparisons when applied to a subset of the population.

For this reason, this document further aggregates SEIFA deciles into 3 socioeconomic groups.

Socioeconomic status decile Group Percentage of population
1-2 Disadvantaged 0.2
3-8 Middle 0.6
9-10 Affluent 0.2

Frequently Asked Questions (FAQ)

What data is included in the Research Ready Dataset (RRD)?

This dataset contains records for all persons diagnosed with Non-Small Cell Lung Cancer (NSCLC) in Queensland between 2000 and 2019. Demographic data such as sex, age-group, location/remoteness of residence is included.

Is the data anonymised?

Yes. No names or addresses will be included in the data extracts. Unique identifiers will be attached to each person included in the file.

Does this dataset contain information on treatments the individual received?

Yes. Data on the following modes of treatment are included in the dataset: surgery (major resections), radiation therapy (RT), intra-venous systemic therapy (IVST)

Are there gaps in the data?

Treatment data is available for the duration of the 20-yr period. Complete surgical data is available for the full timespan. Radiation therapy data sources were enhanced in 2007 and data from 2007-2019 has greater coverage. Similarly, chemotherapy sources such as iPharmacy and CHARM were progressively added, leading to full coverage from 2009. Chemotherapy data is limited to Intra-venous systemic therapy (IVST) only.

Staging data capture is limited to Stage IV only for 2000 to 2016 and has been inferred by the presence of clinical records indicating metastases. From 2017 onwards, complete 8th edition staging data has been inferred using TNM staging approaches via clinical data reviewed by specialist coders.

How to apply

Details on how to apply for access to the full dataset can be found here

How do I cite data provided?

Use of the data in publications requires the acknowledgement and citation as outlined in the CAQ publication guidelines here

Contact details

For more information or to contact us directly, click here