Data Issues Paper

Data Issues Paper

Data Issues Paper – March 2022

Overview

The Data Issues Paper provides a summary of data-related issues that have been identified in the Ten to Men data. It has been designed to assist users of the data as they undertake research and analysis, and should be read in conjunction with the Data User Guide.

This paper provides information to data users on:

  • observed inconsistences and issues that they should be aware of when analysing and interpreting the Ten to Men data
  • recommendations and guidance in the management of identified data quality issues in the Ten to Men data.

The Data Issues Paper has been divided into three sections:

  • a history of the Ten to Men datasets
  • changes to the structure of the Ten to Men datasets
  • hierarchical listing of identified data quality issues within each research domain.

Further sections will be added as any data-related issues emerge.

Data Issue Paper Updates

Date Version Update Suggested citation
September 2019 1.0 Initial version Howell, L., Bandara, D., Mohal, J., Andalon, M., Silbert, M., Garrard, B., Swami, N., & Daraganova, G. (2019). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Version 1.0, September 2019. Melbourne: Australian Institute of Family Studies.
September 2021 2.0 Updated for Wave 3 Howell, L., Silbert, M., & Bandara, D. (2021). Ten to Men: The Australian Longitudinal Study on Male Health - Data Issues Paper, Version 2.0, September 2021. Melbourne: Australian Institute of Family Studies.
March 2022 2.1 Addition of Section 3.9 Current Occupation Howell, L., Silbert, M., & Bandara, D. (2022). Ten to Men: The Australian Longitudinal Study on Male Health – Data Issues Paper, Version 2.1, March 2022. Melbourne: Australian Institute of Family Studies.

Read the publication

1. Ten to Men data

1. Ten to Men data

Periodically a new release of the Ten to Men datasets will be generated as additional information becomes available after each data collection wave. The releases will be numbered in sequential order and a new Digital Object Identifier (DOI) will be minted.

Release 1.0 was issued by the University of Melbourne and contained data from Wave 1 only.

Release 2.0 was also issued by the University of Melbourne. It contained data from both Wave 1 and Wave 2, as well as a respondent dataset.

Release 2.1 was issued by the Australian Institute of Family Studies (AIFS) and comprised of updated Wave 1 and Wave 2 datasets. Relevant data from the respondent dataset was included in these datasets and is no longer available as a separate dataset.

Release 3.0 was issued by AIFS and contained data from Wave 1, Wave 2 and Wave 3.

A history of the dataset releases and suggested citations can be found in Appendix A.

2. Ten to Men datasets

2.Ten to Men datasets

This section documents the structural changes that have been applied to the Ten to Men datasets. These structural changes will enhance the usability of the datasets, especially as additional waves are included in the future. They include the merging of datasets, resolving data inconsistencies, addressing quality issues and augmenting data resources with additional information.

Most of the major structural changes were implemented in Release 2.1 and future releases of the Ten to Men datasets. Table 1 provides a summary of these changes; further details can be found in the corresponding sections.

Table 1: Summary of changes to the dataset structure in Release 2.1
Change to structure Implementation See section for further details
Addition of a data sharing framework Release 2.1 2.1
Respondent data added to Wave 1 and Wave 2 datasets, removing the need for a separate dataset Release 2.1 2.2
Renaming of variables to indicate wave, thus aligning with the standard naming convention for variables Release 2.1 2.3
Renaming of variables to indicate research domain Release 2.1 2.4
Addition of a new research domain for linked data Release 2.1 2.5
Renaming of census linked variables to include reference year Release 2.1 2.6
Renaming of weights to include reference to population or sample weights Release 3.0 2.7

2.1 Addition of a data sharing framework

To increase the utility of information while minimising disclosure risks with consideration to data sharing principles, a data sharing framework to differentiate the user's access level was proposed for the Ten to Men datasets. This resulted in two levels of datasets for each wave being generated - the General Release and Restricted Release.

A lower level of confidentialisation was applied to the Restricted Release dataset, with all initial information preserved. The only information not included in the dataset are names, addresses and other contact details. Access to the Restricted Release dataset may only be granted when data users are able to demonstrate a genuine need for the additional data and when they also meet the necessary additional security requirements.

The General Release dataset has undergone further data confidentialisation. In addition to the information removed for the Restricted Release dataset, further confidentialisation for the General Release dataset includes suppressing variables, aggregating response categories and recoding outlying values to a less extreme value. Users can consult the Ten to Men Data Dictionary for more information on the confidentialised variables.

As access requirements to the General Release dataset are less rigorous than for the Restricted Release dataset, this has improved accessibility for users to the Ten to Men datasets.

For further information about the Ten to Men datasets, including data access procedures, users can refer to sections 3 and 7 of the Ten to Men Data User Guide.

2.2 Availability of respondent dataset

Release 2.0 comprised of three Ten to Men datasets - Respondent, Wave 1 and Wave 2. The Respondent dataset contained key indicator data, such as the unique study identifier, age, household identifier and geographical information. The dataset within each wave contained the responses to the corresponding questionnaires.

In Release 2.1, relevant information from the respondent dataset has been included in the Ten to Men Wave 1 and Wave 2 datasets. This has removed the necessity of maintaining a separate respondent dataset, and thus only two datasets were released at each level - Wave 1 and Wave 2.

2.3 Renaming of variables to indicate wave

The standard naming convention of the Ten to Men variables specifies that the first character of the variable should indicate the wave or be a 'z' if the variable is constant across waves.

In Releases 1.0 and 2.0, some variables in the respondent dataset did not follow this standard naming convention. The first character of the variable name was 'z' but they were not consistent across waves. In these cases, the variable label specified whether it related to Wave 1 or Wave 2.

It is important that variables conform to the Ten to Men standard naming convention to maintain consistency and uniformity across the data. Therefore, in Release 2.1, variables were renamed to follow the standard naming convention; that is, the first character of the variable was changed to indicate the wave if the variable was not constant across waves.

Further details of all variables that were renamed are shown in Appendix B.

2.4 Renaming of variables to indicate research domain

The standard naming convention of the variables in the Ten to Men dataset specifies that the second and third characters of the variable should indicate the research domain. The research domain of all variables is also listed in the Data Dictionary.

In Release 2.0, one variable was identified where the second and third characters of the variable did not correspond to a research domain. The detail of this variable is shown below in Table 2.

As it is important to maintain consistency across the data products, this variable has been renamed in Release 2.1 to reflect the correct research domain.

Table 2: List of Wave 2 variables that do not follow the standard naming convention
Release 2.0 variable name Research domain according to the standard naming convention
(2 nd and 3rd characters)
Research domain listed in the Data Dictionary
bhxsex120a hx hx is not a research domain Behaviours - sexual behaviour (bx)

2.5 Additional research domain for linked data

In Releases 1.0 and 2.0, the research domain of Data Collection (DC) is comprised of key indicator variables and linked data. This included variables such as the Unique Study ID, Participation Indicators, Household Indicators, Statistical Area codes (SA1, SA2) and numerous Socio-Economic Indexes for Areas (SEIFA).

In Release 2.1, these variables were separated into two research domains to provide transparency about the data source. The key indicator variables remained in the research domain of Data Collection, while an additional research domain was created for Linked Data (LD).

As the standard naming convention of the variables specifies that the second and third characters of the variable name should indicate the research domain, this has resulted in the renaming of some variables to conform to this standard. That is, the second and third characters of the variable name were changed from 'DC' to 'LD'.

Further details of all variables that were renamed are shown in Appendix B.

2.6 Renaming of census-based data

In Release 2.0, the respondent dataset contained linked data from the Australian Bureau of Statistics (ABS) 2011 Census. These variables did not contain any information to indicate the census year.

As new census data becomes available, it is important to include a census year reference in the variable name. In Release 2.1, the eighth and ninth characters of the variable name were changed to represent a year indicator. For example, the variable 'aldieod00i' has been renamed to 'aldieod11i', to indicate that it is based on the 2011 Census data. Additional census data was also available for Wave 2, so this dataset also contains linked data from the ABS 2016 Census.

Further details of all variables that were renamed are shown in Appendix B.

2.7. Renaming of weight variables

Prior to Release 3.0, only population weights were included in the dataset. Sample weights were added in Wave 3, and a variable naming framework adopted for the weight variables.

Wave 1 and Wave 2 weighting variables were then re-named to comply with this framework, and to clearly indicate whether the variable refers to a population or sampling weight.

Table 3 indicates the naming convention for weights that has been applied from Release 3.0.

Table 3: Naming convention for weights
Character position
in Variable Name
Description Variable abbreviation
1 Wave A, B or C
2,3 Research Domain DC
4 Initial or Raked I or R
5 Longitudinal or Cross-Sectional L or C
6 Population or Sample P or S
7,8,9 For Wave 1 WTA
7,8,9 For Wave 2 WTB
7,8,9 For Wave 3 WTC
7,8,9 Between Waves 1 and 2 WAB
7,8,9 Between Waves 1 and 3 WAC
7,8,9 Between Waves 2 and 3 WBC
10 Derived D

Further details of the weighting variables that were renamed are shown in Appendix B.

3. Data quality issues

3. Data quality issues

Data quality is measured by factors such as accuracy, validity, consistency and completeness; it is the responsibility of the data user to assess the data quality of the Ten to Men variables before any analysis is undertaken.

Most variables in the Ten to Men datasets have some proportion of missing data, which has been coded using the Ten to Men standard missing value code frame (see the Ten to Men Data User Guide for more information). The proportion and reasons for missing data should be considered before drawing any conclusions from the data.

This section contains a hierarchical listing of data quality issues that have been identified across the waves of Ten to Men. It includes information around the consistency of some variables across waves and the accuracy of various data. Further sections will be added as any additional data quality issues emerge.

Table 4 provides a summary of the identified data quality issues and the wave/s that are affected; further information can be found in the corresponding sections.

Table 4: Summary of data quality issues by wave
Research Domain Data Quality Issue Wave 1 Wave 2 Wave 3 See Section for further details
- Additional Wave 1 participants     3.11
- Data from Parent Questionnaire   3.3
- Outliers 3.1
- Pilot data for Wave 2     3.12
Various Derived variables   3.15
Behaviours - alcohol Age of first drink of alcohol   3.6
Behaviours - tobacco Age first smoked cigarette   3.7
Behaviours - weight Height, Weight and Body Mass Index 3.4
Behaviours - weight Height     3.17
Data collection indicator Update to weights 3.13
Health status - health status Obstructive Sleep Apnoea     3.14
Health status - health status Short form 12 (SF-12) Health Survey   3.16
Social determinants - life events Other Natural Disasters     3.18
Social determinants - socioeconomic status Age of Respondents 3.2
Social determinants - socioeconomic status Country of Birth     3.8
Social determinants - socioeconomic status Current Occupation 3.9
Social determinants - socioeconomic status Language spoken at home     3.10
Social determinants - socioeconomic status Level of Education completed   3.5

3.1 Outliers

All releases of the Ten to Men datasets contain the raw data, with variables that have not been cleaned for outliers. Data users are advised to take care when using and interpreting the Ten to Men data, as the presence of outliers may necessitate excluding values or categorising the extreme ends.

The exception to this is the categorising of the extreme ends for some variables as part of the confidentialisation process for the General Release datasets. The variables where this top/bottom coding has been applied are indicated in the Ten to Men Data Dictionary.

3.2 Age of Respondents

Cohort inconsistencies

The scope of Ten to Men was males aged 10-55 years (at Wave 1), with three cohorts:

  • males aged 10-14 years completing a Boys questionnaire
  • males aged 15-17 years completing a Young Men questionnaire
  • males aged 18 years and over completing an Adult questionnaire.

However, there were a small number of men invited to participate whose age was outside the scope, or who completed the incorrect questionnaire for their age. The inconsistency arises with less than 0.5% of the population and is likely to have occurred due to the difference in time between sending out the hard copy questionnaires and the respondents completing the questionnaires. The survey data for these respondents have been retained in the Ten to Men datasets.

The inconsistencies are present in both Wave 1 and Wave 2 datasets in all Releases of the datasets.

Calculation of age in Wave 3

In Wave 3, the age of the respondent was not asked in the questionnaire and was calculated for inclusion in the dataset. As part of the respondent validation process for Wave 3, the date of birth was asked. Therefore, there are two sources of the date of birth - the master contact file and Wave 3 survey data. A process was undertaken to compare the date of birth from the two sources, and it was the same for 97% of respondents.

Further investigation of the 3% where the date of birth differed showed that many only supplied the birth year for Wave 3 data. An assumption has been made that the birth date on the contact file is correct and this has been used to calculate the age of the respondent in Wave 3 (the Wave 3 survey date was also used in the calculations).

There are five observations where no date of birth has been supplied (in either Wave 3 or the master contact file). In these cases, the age at Wave 1 and Wave 2, as well as the survey completion dates have been used to impute an age for Wave 3. The five unique study identifiers (zdcid0001d) where this occurred are 5003136, 7006305, 7007404, 8010082 and 9015997.

3.3 Parent questionnaire data

For Wave 1 and Wave 2 of Ten to Men, the parents of the males aged 10-14 years also completed a questionnaire. The parent was not assigned an ID and therefore it cannot be determined if the same parent filled in the questionnaire for both Wave 1 and Wave 2. This is important as some questions were subject to the parent's perception. For example, 'In the past 4 weeks, how often does your child feel happy?'

As a result, data users are advised to take extreme care if comparing responses from the Parent questionnaire across Wave 1 and Wave 2.

3.4 Anthropometric measurements

The Ten to Men questionnaires contain questions about anthropometric measurements. Some of the responses are implausible (e.g. a height of 1 cm).

All releases of the Ten to Men datasets contain the raw data, which has not been cleaned for outliers. The exception to this is the categorising of the extreme ends for some variables as part of the confidentialisation process for the General Release datasets.

Data users are advised to clean and make their own decisions when dealing with anthropometric measurements as they may contain erroneous data values that will affect derived values and interpretations.

3.5. Level of Education completed

In all waves and questionnaires of Ten to Men, there were question/s about the completed level of education. However, each questionnaire had different response categories for Wave 1 and Wave 2. Extreme care needs to be taken when using this education data, especially if comparing values across questionnaires.

Note that if creating groups, the Australian Standard Classification of Education (ASCED) could be used. In this case, Primary education should also include Year 7 for South Australia only. More information on the ASCED and how it is structured can be found on the ABS website.

3.6 Age when first drank alcohol

A data issue with the following question has been identified:

  • How old were you when you first drank more than just a sip or a taste of alcohol?

The question was included on three questionnaires (Boys, Young Men and Adults), and therefore a common variable was created to hold the responses for each wave. For example, the variable 'abaalcagem' contains the responses from all questionnaires for Wave 1.

The data issue arose as a format was applied to the responses to this question on the Boys questionnaire. No format, other than the missing value formats, was applied to the responses to this question on the Young Men and Adults questionnaires. When the data from the Boys questionnaire was merged with the data from the Young Men and Adults questionnaires, no format other than the missing value formats was applied.

As a result, the data from the Boys questionnaire for this question was incorrectly reduced by four years.

This data issue is present in Releases 1.0 and 2.0 of the Ten to Men datasets, but the raw data has been amended in Release 2.1.

Further details

The format applied to the responses to this question on the Boys questionnaire is shown in Table 5. The corresponding question in the Young Men and Adults questionnaires only had the missing value formats applied (codes -8 to -1). For example, if the respondent replied 10 years of age, the data entered was either 6 (Boys) or 10 (Young Men or Adults).

Table 5: Format applied to the Boys cohort
Code Format
-8 No questionnaire or interview completed
-7 Unable to determine value
-6 Value implausible
-5 Invalid multiple response
-4 Refused or not answered
-3 Don't know
-2 Not applicable
-1 Not asked
1 5 years old
2 6 years old
3 7 years old
4 8 years old
5 9 years old
6 10 years old
7 11 years old
8 12 years old
9 13 years old
10 14 years old

When the data from the Boys questionnaire was merged with the data from the Young Men and Adults questionnaires, no format other than the missing value formats was applied. The format for the Boys questionnaire was not applied and the formatted age value was replaced with the code. As a result, the age of the first drink of alcohol for the Boys data was reduced by four years (with the maximum age possible being 10).

The data (excluding the missing values) from Release 2.0 of the Ten to Men datasets is shown in Table 6. Responses from both the Boys and Young Men questionnaires are shown for comparison. Each cell in the table is colour coded:

  • black, representing implausible values given the age of the respondent at the time of the survey (e.g. a 10 year old cannot respond that they started drinking at 12 years)
  • grey, representing recorded responses
  • green, representing no recorded responses.

The issue with the data from the Boys questionnaire is clear from the number and distribution of years where there was no recorded response (green cells). This is especially evident when compared to the data from the Young Men questionnaire.

Table 6: Data released in Waves 1 and 2 (Release 2.0)

Table 6: Data released in Wave 1 and 2

3.7 Age when first smoked cigarettes

A data issue with the following question has been identified:

  • How old were you when you smoked your first cigarette?

The question was included on three questionnaires (Boys, Young Men and Adults), and therefore a common variable was created to hold the responses for each wave. For example, the variable 'abtcigagem' contains the responses from the Boys, Young Men and Adult questionnaires for Wave 1.

The data issue arose as a format was applied to the responses to this question on the Boys questionnaire. No format, other than the missing value formats, was applied to the responses to this question on the Young Men and Adults questionnaires. When the data from the Boys questionnaire were merged with the data from the Young Men and Adults questionnaires, no format other than the missing value formats was applied.

As a result, the data from the Boys questionnaire for this question were incorrectly reduced by four years.

This data issue is present in Releases 1.0 and 2.0 of the Ten to Men datasets, but the raw data has been amended in Release 2.1.

As it is the same data issue as described above, see section 3.6 for further details.

3.8 Country of birth

In Wave 1 of Ten to Men, each questionnaire contained three questions about participant’s country of birth and their parents’ country of birth. There were various options for the response, including ‘Other’, where the respondent could specify any other country using the free text field.

The data were recorded in the three variables:

  • Participant's country of birth (asecobownm)
  • Mother's country of birth (asemocob1m)
  • Father's country of birth (asefacob1m).

These data were then re-coded using the Standard Australian Classification of Countries (SACC) and an additional nine variables at the 1-digit, 2-digit and 4-digit levels were created. These variables contain more detail than the categories provided on the questionnaire, as the 'Other' category has been expanded to include languages specified in the free text field. They are:

  • Participant's country of birth (asecobow1md, asecobow2md, asecobow4md)
  • Mother's country of birth (asemocob1md, asemocob2md, asemocob4md)
  • Father's country of birth (asefacob1md, asefacob2md, asefacob4md).

Although this classification is a three-level hierarchical structure, this has not been strictly applied to the data. Small values at the 2-digit and 4-digit levels have been confidentialised by replacing with 99 or 9999 instead of using the supplementary codes (not further defined (nfd)). Therefore care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher 'Other' results than expected. Further details are shown in Table 7.

For data users, it is recommended that the variables at the 2-digit and 4-digit levels are used in conjunction with the 1-digit level variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

Table 7: List of Country of Birth Codes
Country of Birth
(1-digit code)
Country of Birth
(2-digit code)
Suggested Replacement
Country of Birth (2-digit code)
Wave 1 Frequency
1 99 10 Oceania and Antarctica nfd 46
2 99 20 North-West Europe nfd 17
3 99 30 Southern and Eastern Europe nfd 26
4 99 40 North Africa and Middle East nfd 45
5 99 50 South-East Asia nfd 0
6 99 60 North-East Asia nfd 29
7 99 70 Southern and Central Asia nfd 28
8 99 80 Americas nfd 10

3.9. Current Occupation

In Wave1 and Wave 2 of Ten to Men, the Adult questionnaire contained a question about the participant’s current occupation. It was a free text field, requesting both the Job title and the main duties/tasks.

This data was then coded using the Australian and New Zealand Standard Classification of Occupations (ANZSCO). Three variables for the participant’s current occupation were created for each wave. These are at the 1-digit, 2-digit and 4-digit levels:

  • 1-digit level (aseempoc1ad, bseempoc1ad)
  • 2-digit level (aseempoc2ad, bseempoc2ad)
  • 4-digit level (aseempoc4ad, bseempoc4ad)

Although the classification is a three-level hierarchical structure, this has not been strictly applied to the data. Small values at the 2-digit and 4-digit levels have been confidentialised by replacing with 99 or 9999 instead of using the supplementary codes (not further defined (nfd)). Some values at the 2-digit level have been coded as -7 (Unable to determine value) because the 4-digit level has been confidentialised to 9999.

Therefore, care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher ‘Other’ results than expected. Further details are shown below in Table 8.

For data users, it is recommended that the variables at the 2-digit and 4-digit levels are used in conjunction with the 1-digit level variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

Table 8: List of Participant’s Current Occupation Codes
Current Occupation
(1-digit code)
Current Occupation
(2-digit code)
Suggested Replacement
Current Occupation (2-digit code)
Wave 1 Frequency Wave 2 Frequency
1 -7 10 Managers nfd 166 154
2 -7 20 Professionals nfd 65 55
3 -7 30 Technicians and Trades Workers nfd 70 97
5 -7 50 Clerical and Administrative Workers nfd 14 12
5 99 50 Clerical and Administrative Workers nfd 41 32
6 -7 60 Sales Workers nfd 44 38
6 99 60 Sales Workers nfd 49 38
7 -7 70 Machinery Operators and Drivers nfd 75 43
8 -7 80 Labourers nfd 41 40
8 99 80 Labourers nfd 0 42

The Parent’s questionnaire asked the same question about the parent’s current occupation. The variables for this are:

  • 1-digit level (aseempoc1pd, bseempoc1pd)
  • 2-digit level (aseempoc2pd, bseempoc2pd)
  • 4-digit level (aseempoc4pd, bseempoc4pd)

This data has the same issue and recommendations as the participant’s current occupation.

Table 9: List of Parent’s Current Occupation Codes
Current Occupation
(1-digit code)
Current Occupation
(2-digit code)
Suggested Replacement
Current Occupation (2-digit code)
Wave 1 Frequency Wave 2 Frequency
1 -7 10 Managers nfd 10 10
1 99 10 Managers nfd 103 62
2 -7 20 Professionals nfd 5 2
2 99 20 Professionals nfd 65 66
3 -7 30 Technicians and Trades Workers nfd 1 0
3 99 30 Technicians and Trades Workers nfd 54 0
4 99 Community and Personal Service Workers nfd 46 62
5 -7 50 Clerical and Administrative Workers nfd 2 4
5 99 50 Clerical and Administrative Workers nfd 119 68
8 -7 80 Labourers nfd 3 0
8 99 80 Labourers nfd 58 0
9 -7 99 Other 9 3

3.10 Language spoken at home

In Wave 1 of Ten to Men, each questionnaire contained a question about the language spoken at home. However, the response categories varied across the questionnaires.

Adult questionnaire

The Adult questionnaire had seven options for the response to the question about language. One option was 'Other', where the respondent could specify any other language using the free text field. These options are shown in Table 10.

Table 10: List of Language Codes for Adult cohort
Code Language
1201 English
2201 Greek
2401 Italian
4202 Arabic
6302 Vietnamese
7104 Mandarin
9999 Other

This data was then re-coded using the Australian Standard Classification of Languages (ASCL) and three variables at the 1-digit, 2-digit and 4-digit levels were created (aselangh1ad, aselangh2ad, aselangh4ad). These variables contain more detail than the categories on the questionnaire, as the 'Other' category has been expanded to include languages specified in the free text field.

Although detailed information on the language can be obtained, the small values at these levels have resulted in the variables being confidentialised (some values have been replaced by 99 or 9999). Care should be taken when using the variables at the 2-digit and 4-digit levels, as it will give higher 'Other' results than expected. Further details are shown in Table 11.

We recommend that the variables at the 2-digit and 4-digit levels be used in conjunction with the 'aselangh1ad' variable. The confidentialised variables at the 2-digit and 4-digit levels can then be replaced with the corresponding nfd code.

Table 11: List of Language Codes for Adult cohort
Language (1-digit level) aselangh1ad Language (2-digit level) aselangh2ad Suggested Replacement Language (2-digit level) Wave 1 Frequency
1 99 10 Northern European Languages, nfd 30
2 99 20 Southern European Languages, nfd 72
3 99 30 Eastern European Languages, nfd 56
4 99 40 Southwest and Central Asian Languages, nfd 57
5 99 50 Southern Asian Languages, nfd 2
6 99 60 Southeast Asian Languages, nfd 30
7 99 70 Eastern Asian Languages, nfd 20

Boys and Young Men questionnaires

The Boys and Young Men questionnaires only had three options for the response to this question about language, as shown in Table 12 and recorded as the variable 'aselangh1u'.

Table 12: List of Language Codes for Boys and Young Men cohorts
Code Language
1 English
2 Another language
3 English and another language about equally

The respondent could specify the other language using the free text field and this was re-coded using the ASCL. Three variables at the 1-digit, 2-digit and 4-digit levels were created (aselangh1ud, aselangh2ud, aselangh4ud). However, the small values at this level has resulted in the variables being totally confidentialised (all values have been replaced by 9, 99 or 9999).

Therefore, no information about the other languages spoken at home is available in the Ten to Men datasets for the Boys and Young Men.

3.11 Additional Wave 1 participants

During Wave 2 of Ten to Men, 33 additional participants were identified for Wave 1. They were not included in the original Wave 1 dataset (Release 1.0) as their eligibility and consent status had not been determined at that stage, but this issue was resolved during Wave 2.

In Release 1.0, the sample size for Wave 1 was 15,988. This was comprised of the three cohorts:

  • 1,087 males aged 10-14 years completing a Boys questionnaire
  • 1,017 males aged 15-17 years completing a Young Men questionnaire
  • 13,884 males aged 18 years and over completing an Adult questionnaire.

In Releases 2.0 and 2.1, the 33 additional participants have been subsequently included in Wave 1, taking the reconciled sample size for Wave 1 to 16,021. The reconciled cohort sizes are:

  • 1,099 males aged 10-14 years completing a Boys questionnaire
  • 1,026 males aged 15-17 years completing a Young Men questionnaire
  • 13,896 males aged 18 years and over completing an Adult questionnaire.

3.12 Pilot data for Wave 2

Of the reconciled Wave 1 sample, there were 314 respondents who were interviewed in the Ten to Men pilot for Wave 2. These respondents did not complete a questionnaire during the course of the main data collection period for Wave 2.

In Releases 1.0 and 2.0, the pilot data have been included in Wave 2 datasets. The sample size was 12,250 males.

In Release 2.1, the data for these 314 respondents have been removed from the Wave 2 dataset. This has reduced the sample size for Wave 2 to 11,936 males. From this Release onwards, these 314 respondents will remain part of the pilot and not be included in the main sample.

3.13 Update to weights

Wave 2 weights were not available in Release 2.0 of the Ten to Men datasets. Wave 2 weights were calculated and were included for the first time in Release 2.1.

Upon review of the Ten to Men data, it was decided to update the Wave 1 weights. This was necessary to ensure that the weights for Wave 2 were developed using the same approach and references as those used in the calculation of the Wave 1 weights.

Release 2.1 of the Ten to Men datasets contains the updated weights for Wave 1, and the new sample weights for Wave 2.

In Release 3.0, population and sample weights have been included for all waves.

3.14 Obstructive sleep apnoea

For Wave 2 of Ten to Men, there were four questions asked in the Adult questionnaire relating to obstructive sleep apnoea as part of the STOP-Bang questionnaire screening tool. Further information about this screening tool can be found on the STOP-Bang website.

Four objective measures are also required as part of the STOP-Bang questionnaire screening tool: BMI, age, neck circumference and gender. The responses to these eight elements are scored, with the result indicating low, medium or high risk of obstructive sleep apnoea.

The resulting score was recorded in the Ten to Men Wave 2 dataset as the derived variable:

  • Risk of OSA (STOP-Bang) (bhsosarisad).

In Release 2.0, this variable had values of 0 or 1, rather than the 0–8 scale or a Low/Medium/High format. In Release 2.1, the intention was to recalculate the derived variable. However, only seven of the eight elements of the STOP-Bang questionnaire screen were available, as we did not have information about the neck circumference. As a result, this derived variable (bhsosarisad) has been removed from the datasets in Release 2.1.

3.15 Derived variables

The Ten to Men dataset contains numerous derived variables, including scale and summary scores. They are calculated for analytical data enrichment. The calculation of these derived variables require input from multiple raw variables, and it is possible that one or more of these input data values may be missing. Missing values are given negative numeric values according to the Ten to Men standard missing value code frame. More information about this code frame can be found in the Ten to Men Data User Guide. 

In Release 1.0 and 2.0, any negative data values were replaced with zero in the calculation of the derived variables. This could introduce misinterpretation of data, depending on the derivation of each variable. For example, the mean of individual components may be underestimated when zero is assigned to a missing value. There were also a couple of scores that were incorrectly calculated. For example, the elements of the General Wellbeing Scale were not reversed scored before calculating the mean. 

Therefore, data users using release 1.0 or 2.0 are advised to re-check and review the interpretation of the derived variables, as the derived variable values may be underestimated or overestimated. 

In Release 2.1, derived variables were re-calculated, and a set of guidelines were developed for the treatment of missing input variables. These are: 

  • If all of the missing input values had the same code frame, the derived variable was assigned the same missing value as per the code frame. For example, if all input variables were -4, the derived variable was assigned to be -4. 
  • If the input variables had any combination of missing values and some valid data values, the derived variable was assigned the missing value code of -7 (Unable to determine value).

3.16 Short Form 12 (SF-12) Health Survey

The Wave 1 and Wave 2 adult questionnaires included the SF-12 Health Survey: a licensed scale measuring respondent's health status. An SF-12 scale score was derived and included in the dataset for Release 1.0 and 2.0. However, due to issues relating to SF-12 license approvals, the raw data items and derived scale score have been removed from Release 2.1. These items have been redacted in the annotated questionnaires and have been deleted from the data dictionary.

3.17 Height

In Wave 3, height was only asked if the respondent was under 23 years. Therefore, 88% of respondents were not asked this question and the height variable was coded as -2 Not applicable.

The decision was made to impute the height at Wave 3 for all respondents where the question was not asked.

There are two sources of height - Wave 1 and Wave 2. Data from both waves were used, as some respondents in Wave 3 may not have participated in Wave 2.

An assumption has been made that the largest height value is the most accurate, and this has been used to populate the height variable in Wave 3 for those respondents aged 23 years or above.

3.18 Other Natural Disasters

In Wave 3, the following two questions were asked about whether you or a family member had experienced a natural disaster.

  • Have you been affected by any of the following natural disasters in the past year?
  • Has a close friend or family member been affected by any of the following natural disasters in the past year?

One of the options was Other, where a free text field then allowed more details about the type of natural disaster. Analysis of the free text field indicated that many respondents had specified coronavirus, covid, pandemic or something similar.

This is correct, as the Federal Government considers the COVID-19 pandemic a natural disaster. In Wave 3, a separate module contained questions related to the COVID-19 pandemic.

However, it was an unexpected response to these questions. The COVID-19 pandemic has affected everyone, yet only some respondents reflected that.

The decision was made to include an additional two variables in the Wave 3 dataset. These two variables (cslndothc, cslndfmoc) reflect the categorised responses from the free text field, as shown in Table 13.

Data users are advised to make their own decisions about whether to include or exclude the COVID-19 pandemic as a natural disaster.

Table 13: Responses for the Other Natural Disasters
Question Value Description Wave 3 Frequency
Have you been affected by any of the following natural disasters in the past year? 0 Coronavirus 280
  1 Other 29
  -2 Not applicable 7,610
Has a close friend or family member been affected by any of the following natural disasters in the past year? 0 Coronavirus 166
  1 Other 10
  -2 Not applicable 7,743
Appendix A

Appendix A

Date Release Dataset Suggested citation and DOI
July 2016 Release 1.0 Wave 1 Pirkis, J., English, D., & Currier, D. (2016). The Australian Longitudinal Study on Male Health (Ten to Men), 2013. [computer file]. Canberra: Australian Data Archive, The Australian National University.
  • DOI:10.4225/87/587ebdbc851b1
August 2017 Release 2.0 Respondent

Wave 1

Wave 2

Pirkis, J., English, D., & Currier, D. (2017). The Australian Longitudinal Study on Male Health (Ten to Men), 2013. [computer file]. Canberra: Australian Data Archive, The Australian National University. 
  • Respondent DOI: 10.4225/87/N8C9NP
  • Wave 1 DOI: 10.4225/87/Z4PEZN
  • Wave 2 DOI: 10.4225/87/2KHTSV
September 2019 Release 2.1 Wave 1

Wave 2

Bandara, D; Howell, L; Daraganova, G, 2019, "Ten to Men: The Australian Longitudinal Study on Male Health, Release 2.1 (Waves 1-2)", doi:10.26193/V2IVIG, ADA Dataverse.
September 2021 Release 3.0 Wave 1

Wave 2

Wave 3

Bandara, D; Howell, L; Silbert, M; Daraganova, G, 2021, "Ten to Men: The Australian Longitudinal Study on Male Health, Release 3 (Waves 1-3)", doi:10.26193/JDE1TD, ADA Dataverse.
Appendix B

Appendix B

The tables below shows a list of all variables in the original dataset that have been renamed.

a. Details of variables in respondent dataset that have been renamed
Label Old Variable Name New Variable Name
(Release 2.1)
Wave 1 Wave 2
SA1 code confidentialised (2011 Census based) zdcsa1codmd aldsa1c11md bldsa1c11md
SA1 code confidentialised (2016 Census based) n/a n/a bldsa1c16md
SA2 code confidentialised (2011 Census based) zdcsa2codmd aldsa2c11md bldsa2c11md
SA2 code confidentialised (2016 Census based) n/a n/a bldsa2c16md
SA Modified Monash Model Classification zdcmmmcsam adcmmmcsam bdcmmmcsam
ASGS Region (2011 Census Based) zdcremotem aldremt11m bldremt11m
ASGS Region (2016 Census Based) n/a n/a bldremt16m
State (2011 Census Based) zshstate0id aldstat11id bldstat11id
State (2016 Census Based) n/a aldstat16id bldstat16id
Number of Household Participants zdchmparted adchmparted n/a
Sampling Weights (2011 Census Based) zdcwgt001md adcwgts11md n/a
Sampling Weights (2016 Census Based) n/a n/a n/a
SEIFA Index of Relative Socio-Economic Disadvantage -
Rank (2011 Census Based)
zdcirsdr0i aldirdr11i bldirdr11i
SEIFA Index of Relative Socio-Economic Disadvantage -
Rank (2016 Census Based)
n/a n/a bldirdr16i
SEIFA Index of Relative Socio-Economic Disadvantage - Percent (2011 Census Based) zdcirsdp0i aldirdp11i bldirdp11i
SEIFA Index of Relative Socio-Economic Disadvantage - Percent (2016 Census Based) n/a n/a bldirdp16i
SEIFA Index of Relative Socio-Economic Disadvantage -
Decile (2011 Census Based)
zdcirsdd0i aldirdd11i bldirdd11i
SEIFA Index of Relative Socio-Economic Disadvantage -
Decile (2016 Census Based)
n/a n/a bldirdd16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Rank (2011 Census Based) zdcirsadri aldiadr11i bldiadr11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Rank (2016 Census Based) n/a n/a bldiadr16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Percent (2011 Census Based) zdcirsadpi aldiadp11i bldiadp11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Percent (2016 Census Based) n/a n/a bldiadp16i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Decile (2011 Census Based) zdcirsaddi aldiadd11i bldiadd11i
SEIFA Index of Relative Socio-Economic Advantage and Disadvantage - Decile (2016 Census Based) n/a n/a bldiadd16i
SEIFA Index of Economic Resources - Rank
(2011 Census Based)
zdcierr00i aldierr11i bldierr11i
SEIFA Index of Economic Resources - Rank
(2016 Census Based)
n/a n/a bldierr16i
SEIFA Index of Economic Resources - Percent
(2011 Census Based)
zdcierp00i aldierp11i bldierp11i
SEIFA Index of Economic Resources - Percent
(2016 Census Based)
n/a n/a bldierp16i
SEIFA Index of Economic Resources - Decile
(2011 Census Based)
zdcierr00i aldierd11i bldierd11i
SEIFA Index of Economic Resources - Decile
(2016 Census Based)
n/a n/a bldierd16i
SEIFA Index of Education and Occupation - Rank
(2011 Census Based)
zdcieor00i aldieor11i bldieor11i
SEIFA Index of Education and Occupation - Rank
(2016 Census Based)
n/a n/a bldieor16i
SEIFA Index of Education and Occupation - Percent
(2011 Census Based)
zdcieop00i aldieop11i bldieop11i
SEIFA Index of Education and Occupation - Percent
(2016 Census Based)
n/a n/a bldieop16i
SEIFA Index of Education and Occupation - Decile
(2011 Census Based)
zdcieod00i aldieod11i bldieod11i
SEIFA Index of Education and Occupation - Decile
(2011 Census Based)
n/a n/a bldieod16i
Sex in the past 12 months bhsex120a n/a bbxsex120a
b. Details of variables in that have been renamed between releases
Label Variable Name
Release 2.1 Release 3.0
Initial cross-sectional population weight for Wave 1 adcicswgtmd adcicpwtad
Raked cross-sectional population weight for Wave 1 adcrcswgtmd adcrcpwtad
Initial cross-sectional population weight for Wave 2 bdcicswgtmd bdcicpwtbd
Raked cross-sectional population weight for Wave 2 bdcrcswgtmd bdcrcpwtbd
Initial longitudinal population weight between Wave 1 and Wave 2 bdcilgwgtmd bdcilpwabd
Raked longitudinal population weight between Wave 1 and Wave 2 bdcrlgwgtmd bdcrlpwabd
Glossary

Glossary

Glossary of terms
Term Description
ABS Australian Bureau of Statistics
ANZSCO Australian and New Zealand Standard Classification of Occupations
ASCED Australian Standard Classification of Education
ASCL Australian Standard Classification of Languages
AIFS Australian Institute of Family Studies
ASGS Australian Statistical Geographic Standards
BMI Body Mass Index
DC Data Collection
DOI Digital Object Identifier
General Release This dataset includes data from which the more sensitive information has been removed. Confidentialisation has also been considered for all variables and applied if required.
LD Linked Data
NFD Not further defined
Respondent dataset A dataset containing key indicator data, such as the unique study identifier, age, household identifier and geographical information
Restricted Release This dataset includes information at a more detailed level than the General Release datasets. Items include language, occupation, and country of birth at the 4-digit levels.
SA1 Statistical Area 1
SA2 Statistical Area 2
SACC Standard Australian Classification of Countries
SEIFA Socio-Economic Indexes for Areas
SRC Social Research Centre
TTM Ten to Men Study
UoM University of Melbourne
Update An update occurs when significant changes are made to an existing release. For example, the update to Release 2.0 resulted in it being reissued as Release 2.1.
Wave dataset A dataset containing the responses to the corresponding questionnaire of a given wave

Acknowledgements

Ten to Men: The Australian Longitudinal Study on Male Health was commissioned by the Commonwealth Department of Health. The study was initially conducted by the University of Melbourne who released datasets, including data documentation, for Wave 1 and Wave 2. Roy Morgan Research undertook the data collection and initial data processing for these two waves.

After a competitive tender process in 2017, the Australian Institute of Family Studies (AIFS) was awarded the tender to conduct Wave 3. Since then, the Wave 1 and Wave 2 datasets, including data documentation, have been updated by AIFS.

In 2020, the study team re-evaluated and revised the survey content and methodology to enable contactless interviewing for Wave 3. New items designed to collect information on the impacts of COVID-19 and the recent effects of natural disasters were also incorporated into the revised survey. The online survey went live at the end of July 2020, with data collection concluding in February 2021. The Social Research Centre (SRC), in collaboration with Ipsos, was contracted to undertake the fieldwork component for Wave 3 of the study.

Publication details

Data Issues Paper
Published by the Australian Institute of Family Studies, March 2022
Suggested citation:

Howell, L., Silbert, M., & Bandara, D. (2022). Ten to Men: The Australian Longitudinal Study on Male Health – Data Issues Paper, Version 2.1, March 2022. Melbourne: Australian Institute of Family Studies.

Download Publication