Rethinking the Components of Regulation of Cognition through the Structural Validity of the Meta-Text Test

: The field of studies in metacognition points to some limitations in the way the construct has traditionally been measured and shows a near absence of performance-based tests. The Meta-Text is a performance-based test recently created to assess components of cognition regulation: planning, monitoring, and judgment. This study presents the first evidence on the structural validity of the Meta-Text, by analyzing its dimensionality and reliability in a sample of 655 Honduran university students. Different models were tested, via item confirmatory factor analysis. The results indicated that the specific factors of planning and monitoring do not hold empirically. The bifactor model containing the general cognition regulation factor and the judgment-specific factor was evaluated as the best model (CFI = .992; NFI = .963; TLI = .991; RMSEA = .021). The reliability of the factors in this model proved to be acceptable (Ω = .701 & .699). The judgment items were well loaded only by the judgment factor, suggesting that the judgment construct may actually be another component of the metacognitive knowledge dimension but having little role in cognition regulation. The results show initial evidence on the structural validity of the Meta-Text and give rise to information previously unidentified by the field which has conceptual implications for theorizing metacognitive components.

The literature points out that metacognition is primarily composed of two major components or domains: metacognitive knowledge and cognition regulation (Craig et al., 2020). Both domains interact with each other. The former refers to what people know about their own functioning, what implies a knowledge about how they could engage more efficiently with a specific task. In turn, cognition regulation refers to the ability to control and monitor the cognitive strategies used in task performance (Azevedo, 2020;Muijs & Bokhove, 2020;Norman et al., 2019).
There is an almost complete predominance of measuring metacognitive components by self-report instruments and think aloud protocols (Craig et al., 2020;Gascoine et al., 2017;Ohtani & Hisasaka, 2018). However, the literature points out important limitations of these measures, especially with regard to the measurement of the cognition regulation domain. Cognition regulation involves the "online" or "in-moment" process as the cognitive task is performed, nevertheless, the self-report instruments make it difficult to accurately assess the domain, since these instruments are usually applied offline, i.e., before or after task performance (Akturk & Sahin, 2011;Craig et al., 2020). Moreover, there are several evidences about the biases involved in self-report instruments, such as social desirability, acquiescence, and the possible lack of knowledge of respondents about their own cognitive processes (Abernethy, 2015;Wetzel et al., 2016). In turn, think aloud protocols allow an online and more accurate gauging of the processes of cognition regulation involved in the execution of cognitive tasks (Hu & Gao, 2017). However, this type of measurement requires the presence of judges to evaluate the protocols, bringing significant risks of confirmatory bias in the measurement process Wolcott & Lobczowski, 2021). Furthermore, the use of think aloud protocols requires an intensive individual assessment process, resulting in costly studies with small samples (e.g., Van der Stel & Veenman, 2008;Veenman & Van Cleef, 2018).
Another way to address the problem of measuring metacognition is the construction and validation of performancebased tests. This type of test allows an assessment of the construct at the moment the task is performed by the respondent, allowing scores to be obtained via their performance Gomes, Araujo & Castillo-Diaz, 2021). Furthermore, the use of performance tests does not require the participation of judges to evaluate the scores. This substantially reduces the confirmatory bias present in think aloud and allows for the generation of less costly studies that can be done on large samples. Ohtani and Hisasaka's (2018) meta-analysis shows that performance-based measures are far superior to self-report measures. The correlations of the self-report metacognitive measures with academic performance (r = .18; 95% confidence interval = .13-.22) are much lower than the correlations of this outcome with the performance-based measures of the think aloud method tasks (r = .41; 95% confidence interval = .31-.52). However, the think aloud method is not supported by tests, and is therefore often applied only to small samples, relying heavily on judges for the production of scores and validation of constructs. Therefore, the development of performance-based tests would allow investigating in much larger samples the real predictive power of metacognitive components in relation to educational outcomes (e.g., Castillo-Diaz & Gomes, 2022;. The use of better measures, without significant noise or biases, has important practical applications, since better evidence, arising from these measures, permits to design more pertinent diagnoses and educational intervention strategies (Donker et al., 2014;Jansen et al., 2019).
The dominance in the design and use of self-report instruments and the absence of performance-based testing is not a prerogative of the field of metacognition. The field of studies in learning approaches suffers from this same handicap. Nevertheless, we observe an initial effort in the development of performance-based tests in this field, envisioning new possibilities in measuring and building evidence for the area (e.g., Gomes et al., 2020Gomes & Nascimento, 2021). In the case of metacognition, to the best of our knowledge, few tests are identified that aim to measure the construct through respondent performance (Desoete et al., 2001;Golino & Gomes, 2011;Neuenhaus et al., 2011). When analyzing which of these tests present validity studies based on their factor structure, we found that only the Metacognitive Monitoring Test (MMT, also called Reading Monitoring Test or Read Monitoring Test) presents such evidence (Castillo-Diaz & Gomes, 2022;Golino & Gomes, 2011;. Moreover, the MMT is the only performance-based test that was developed to be applied to higher education students. However, the MMT does show some limitations. These are: (1) the measurement of a single metacognitive component, (2) the borderline reliability (between values of .60 and .70 in the alpha and omega indices), (3) and above all the intensive work for generating the item scores, since the respondents' justification must be read and evaluated to produce the scores.
The Meta-Performance battery is a performance-based assessment instrument recently created in order to assess three specific metacognitive abilities from the regulation domain of cognition: planning, monitoring, and judgment . Planning is conceived in the battery as a very specific metacognitive ability, involving only the individual's ability to properly identify specific sequences of steps that allow the resolution of a task (Oliveira & Nascimento, 2014). Monitoring is also defined as a very specific metacognitive ability, that is, the individual's ability to detect errors at the moment they are performing a given task (e.g., Golino & Gomes, 2011;Pires & Gomes, 2018). Judgment is defined by the test authors within the concurrent judgment paradigm, it is the person's evaluation of their own performance after solving each test item (Schraw, 2009). The metacognitive abilities involved in the battery are relevant predictors to education, insofar as meta-analyses indicate that, compared to the metacognitive knowledge, cognition regulation is the domain of greatest importance in terms of predicting student academic performance (Dent & Koenka, 2016;Ohtani & Hisasaka, 2018).
The battery is composed of two tests. The first assesses metacognition in reading comprehension tasks (Meta-Text), while the second makes this assessment in tasks of solving arithmetic expressions (Meta-Number). Previous research indicates favorable evidence for the content validity of the battery . However, the tests of the battery have not yet undergone analysis of their structural validity.

Tested Models
In this paper, five models are tested on the dimensionality of the Meta-Text test (Figure 1). Model A (Unidimensional) establishes that only the cognition regulation domain explains the variance of all test items, rejecting the presence of any of the specific metacognitive abilities of planning, judgment, and monitoring. Model B (uncorrelated factors) establishes the presence of the three specific metacognitive abilities, and assumes that these abilities are independent, it does not accept the possibility of the presence of the cognition regulation domain, insofar as metacognitive theory assumes that these abilities are related because they are part of this domain. Model C (correlated factors) assumes that the three specific metacognitive abilities correlate. However, this model does not explicitly establish the presence of the cognition regulation domain as an explanation of the correlations between the metacognitive abilities. Model D (Hierarchical), also called second-order, defines that the cognition regulation domain explains the correlations between specific metacognitive abilities. Finally, Model E (Bifactor) assumes that both the cognition regulation domain and the specific metacognitive abilities directly explain people's performance on the items.  One of the advantages of performance-based tests is that they allow for optimal testing of the empirical plausibility of constructs. Self-report tests also allow this to be tested, but performance-based items allow constructs to be empirically tested through performance, which makes the evidence more robust than evidence supported by respondents' selfreport via reading the wording of items that represent certain behaviors. The very construction of self-report items permits the respondent to perceive an association between certain items and respond accordingly. Performance-based items, on the other hand, do not depend on this perception of the respondent, so this type of bias is not relevant for this type of items. The think aloud method, although based on performance, is based on judges' scores, so that, the testing construct is quite fragile, despite having some validity. This type of analysis has a strong potential for bias, because judges, despite all methodological precautions to avoid positive bias, generate scores that potentially produce associated scores among task components based on the judges' prior expectations. In our paper, besides testing the structural validity of the Meta-Text test, we are also testing the validity of the metacognitive abilities of planning, monitoring, judgment, and regulation of cognition. The different models analyzed in this study allow us to refute or corroborate these components of metacognition. As highlighted, the almost exclusive dominance of self-report tests and think aloud procedures has brought weak evidence on the empirical plausibility of these constructs and our study permits the generation of evidence based on a more solid methodology which are not employed by almost all studies in the field.

Sample
The participants of this study were selected through a convenience sampling that included 655 higher education students from the largest public university in Honduras in Central America. The sample is characterized by a predominance of females (N = 441; 67.3%), young adults (M = 20.14 years; SD = 3.06), and belonging to the central campus (N = 489; 74.7%). The sample has 274 (41.8%) students from the economic sciences, 159 (24.3%) students from the social sciences, humanities, and arts, 110 (16.8%) students from the exact sciences and engineering, and 112 (17.1%) from the biological and health sciences.

Instruments
Meta-Text Test. This is a performance-based test that is part of the Meta-Performance Battery, recently created with the goal of assessing specific abilities in the domain of cognition regulation. Details of the battery and items are found in Castillo-Diaz and Gomes (2021). The Meta-Performance battery is a production of the Laboratory for Research on Cognitive Architecture (Laboratório de Investigação da Arquitetura Cognitiva [LAICO]), which has a mission to articulate psychometrics to educational psychology and to build a broad set of tests on predictors of educational outcomes. The Monitoring Metacognitive Test (MMT) itself, cited in this article, was developed by LAICO and also shows good evidence of internal and external validity from several studies (e.g., Golino & Gomes, 2011;. The MMT is a well-established metacognitive test in Brazil that has influenced the creation of the Meta-Performance Battery (Castillo-Diaz & Gomes, 2021) and a methodology for measuring metacognition in school and academic assessments (Gomes, 2021;Pires & Gomes, 2017, 2018. The Meta-Text is structured by a set of 18 questions that were carefully designed with the purpose of covering a thematic diversity of texts and not requiring relevant prior knowledge of the respondent about these topics. Each question is composed of three elements: (a) a statement describing a hypothetical author's goal for writing a text; (b) five possible sentences for composing the text, depending on the presented goal, and; (c) a text written by the hypothetical author, using some of the available sentences. The respondent's task is to answer three specific commands: Command A measures planning ability and therefore asks the respondent to create a plan regarding what would be the appropriate sentences for the author to adequately achieve his objective. Command B measures the judgment ability and therefore asks the respondent to evaluate his/her own planning, indicating whether he/she thinks he/she answered Command A correctly or incorrectly. Command C measures the monitoring ability and asks the respondent to analyze the text written by the hypothetical author, identifying if there are errors in the text, either by the presence of sentences that do not contribute to the author's objective or omitted sentences that should have been written .
Each test command represents one item, making up 18 items for each of the abilities measured: planning (Command A), judgment (Command B), and monitoring (Command C), for a total of 54 items. Each item of Command A and C has a score of one point (1) if the answer is correct and zero points (0) if it is wrong. In the case of the judgement (Command B), if the respondent thinks he has got command A right of a certain question, then their score will be one (1) for the judgement; otherwise their score will be zero (0). This judgment score does not indicate the accuracy of the respondents' judgment. We created a calculation for the accuracy that is presented in the data analysis section. The test is designed to be completed in 60 minutes, which is the maximum duration.

Procedures
Undergraduate students, with active enrollment from different areas of knowledge at the National Autonomous University of Honduras (UNAH) were contacted by e-mail and invited to participate in the research. The invitation was sent via e-mails registered in the university's registration system database. Data collection was conducted online via Google Forms during the first quarter of the year 2021. Students were invited to participate in the study on a voluntary basis, receiving prior information about the purpose, procedures, and their rights to withdraw from participating in the research at any time. Their participation was conditioned to the acceptance of a Free and Informed Consent Form (FICF). Only data from students who previously agreed to the FICF were considered. Students filled out a form that included information about sociodemographic data and the Meta-Text Test. According to the analyses obtained in a pilot application of the test, students were expected to take between 40 and 60 minutes to complete it. However, there was no time restriction for taking the test on the online platform. Approval for this study was obtained from the Institutional Review Board of the University's Dean of Student Affairs.

Data Analysis
The first stage of the analyses involved the description of descriptive statistics about the difficulty of the Meta-Text Test items. In the case of the judgment items, accuracy was calculated as follows: if the person got the planning item right and judged that they got this item right or if they got the planning item wrong and judged that they got this item wrong, then they got the judgment item right, in terms of accuracy, because they judged their performance correctly. In turn, if the person missed the planning item and thinks they got this item right, or if they get the planning item right and think they got this item wrong, then they get the judgment item wrong, in terms of accuracy. When a person gets the item right, their accuracy score is 1; otherwise, their accuracy score is 0.
The second step involved structural validity analysis. Different models, representing different factor structures, were tested using confirmatory factor analysis of items. All models analyzed included the presence of covariance between pairs of planning and monitoring items linked to the same question. Models with the following specifications were analyzed: Model A (Unidimensional) states that the latent variable "regulation of cognition" explains the variance of the 54 test items. Model B (uncorrelated factors) states that three latent variables "planning", "monitoring" and "judgment" explain a set of 18 items each. In this model the latent variables are orthogonal, that is, the correlation between them is fixed at zero. Model C (correlated factors) is just a variant of Model B, since it has the same latent variables, but allows them to correlate. Model D (Hierarchical) is composed of the same latent variables as models B and C, but incorporates a second-level general factor that explains the covariance between the first-level latent variables. Model E (Bifactor) has the same variables as Model D, but defines that the general factor of cognition regulation directly explains the 54 test items. In the bifactor model all latent variables are orthogonal, constraining their correlation to zero (Reise, 2012). Figure 1 presents the structure of the tested models. For parsimony purposes, covariance is not drawn between pairs of planning and monitoring items linked to the same question.
Since the test item scores have a dichotomous nature (correct and incorrect), the Weighted Least Square Mean and Variance Adjusted (WLSMV) estimator was used. According to the literature on confirmatory factor analysis, the WLSMV is the best alternative for modelling dichotomous data since it is a robust indicator that does not demand that variables are normally distributed (e.g., Brown, 2015;DiStefano et al., 2019). Model fits were verified using comparative fit index (CFI), normed fit index (NFI), Tucker Lewis index (TLI) and root mean square error of approximation (RMSEA). The CFI ≥ 0.90, NFI ≥ 0.90, TLI ≥ 0.90 and RMSEA < 0.10 show a non-rejection of the models, while CFI ≥ 0.95, NFI ≥ 0.95, TLI ≥ 0.95 and RMSEA < 0.06 are indicators of a good model fit (Putnick & Bornstein, 2016;Schumacker & Lomax, 2018).
For each not rejected model, the standardized factor loadings, factor correlations, and reliability of the scores were calculated. McDonald's omega (Flora, 2020) was used to calculate reliability. Although there is no consensus in the literature on the minimum acceptable values of the omega, in this study we considered the criteria of Reise et al. (2013) which establishes a minimum value of .50 and preferred value of .75. In this study we only use Mcdonald's omega (Ω) as an indicator of reliability and not Cronbach's alpha, considering the disadvantages widely discussed in the literature about the latter (e.g., Flora, 2020;McNeish, 2018).
The bifactor model tests the variance of any specific latent variable at the presence of the general factor, to the extent that all latent variables are orthogonalized. Therefore, if any latent variable had variance zero, then the bifactor model would be run again, without the presence of this latent variable, and so on, until only the latent variables with positive variance remained in the model. To define the model with the best fit, not only the highest CFI, NFI, TLI and lowest RMSEA values were considered, but also the factor loadings, variances, and acceptable reliability of the model's factors. All analyses were performed in R software version 3.6.2, using the packages semTools, version 0.5-4 (Jorgensen et al., 2021) and lavaan, version 0.6-7 (Rosseel et al., 2020). Table 1 shows the percentage of correct answers for the items. There is a relatively similar distribution in each of the ranges for the planning, monitoring, and judgment items. Most items in these abilities are in the medium (hits between 41% and 60%) and difficult (hits between 21% and 40%) categories, but there are items in all of the hit ranges with the exception of the range between 0 and 20% hits in the judgment ability.

Item Confirmatory Factor Analysis and Reliability
The results of the indices of fit of the tested models are presented in Table 2. From the models analyzed, only the threefactor uncorrelated model B showed indices of fit below acceptable values (CFI, NFI & TLI < .90; RMSEA > .08) and it was therefore rejected.  Model D (hierarchical) also showed a good fit to the data (CFI, NFI & TLI > .95; RMSEA < .05). Standardized factor loadings ranged from .303 to .879 (M = .628; SD = .133) for the first-level latent variables. The overall second-level factor had factor loadings of 1.266 (p = .020) on planning, .720 (p = .000) on monitoring, and .219 (p = .022) on judgment. The variance of the specific planning factor has negative and therefore zero variance (S 2 = -0.173). The reliability indices were the same as model C for each specific latent variable.
Model E (bifactor) was the model with the best fit (CFI = .996; NFI = .969; TLI = .996; RMSEA = .014). However, this model presented some problems linked to two specific factors. The model showed very low factor loadings of the specific factors of planning and monitoring and the variance of these factors was zero at the presence of the general factor of cognition regulation. The standardized factorial loadings of the specific latent variables ranged from -.480 to .287 (M = -.037; SD = .246) for planning, from -.265 to .572 (M = .247; SD = .181) for monitoring and from .579 to .760 (M = .670; SD = .048) for judgment. The factor loadings of cognition regulation ranged from -.193 to .849 (M = .432; SD = .250). With regard to reliability, the results indicated some problems. The general factor of cognition regulation and judgment showed acceptable values (Ω = .703 and .693, respectively). However, the reliability scores were well below the minimum expected for both planning (Ω = .015) and monitoring (Ω = .049).
Following the results of Model E, two variants of this model were analyzed, removing the specific factors of planning and monitoring. In the first variant, Model E.1, only planning was removed, that is, a general factor of cognition regulation and two specific factors of monitoring and judgment were considered. In the second variant, both planning and monitoring factors were removed, therefore, the model included only a general factor and the single judgment factor (see Figure 2).

Figure 2. Variants of the Bifactor Model
The fit indices for Models E.1 and E.2 are presented in Table 2. The results indicate good indices of fit for both models (CFI, NFI & TLI > .95; RMSEA < .05). In Model E.1 the standardized factor loadings of monitoring ranged from -.568 to .475 (M = .187; SD = .248) and for judgment showed ranges from .579 to .760 (M = .670; SD = .048). For the general factor, that is, cognition regulation, the factor loadings ranged from -.193 to .849 (M = .432; SD = .250). The factor reliability was Ω = .704 for the cognition regulation factor, Ω = .051 for monitoring and Ω = .696 for judgment. The results of this model indicate that the specific factor of monitoring continued to show factor loadings and reliability far below acceptable values.
Finally, the results of Model E.2 show factor loadings between -.093 and .810 (M = .547; SD = .244) for the general cognition regulation factor and between .579 to .760 (M = .670; SD = .048) for the judgment factor (see Table 3). The judgment items loaded well only for the judgment factor (λ > .30), having lower loadings on the cognition regulation factor (λ < .30). Reliability proved acceptable, with values of Ω = .701 and Ω = .699 for regulation of cognition and judgment, respectively.

Discussion
The purpose of this study was to assess the structural validity of the Meta-Text by analyzing the dimensionality and reliability of the test. Seven models were tested (five initial models and two additional models). The models were analyzed according to their indices of fit, factor loadings, reliability, and inter-factor correlations. According to the analyses performed, the bifactor model E2, containing a general cognition regulation factor and a judgment-specific factor, was evaluated as the model with the best characteristics, considering its indices of fit (CFI = .992; NFI = .963; TLI = .991; RMSEA = .021) and acceptable factor reliability (Ω = .701 and .699).
In spite of Model A showing an acceptable fit and Models C, D, and E showing good fits, they highlighted some important aspects worth discussing. Model A assumes the unidimensional assumption of metacognition consistent with Immekus and Imbrie's (2008) postulate. Despite showing an acceptable fit, the latent variable of the one-dimensional model loads well only on planning and monitoring items (λ > .30), while the factor loadings of the latent variable are very low on the vast majority of the judgment items (λ < .30 = J2, J3, J5, J6, J7, J11, J12, J14, J17, and J18). This result highlights the fact that the latent variable of the one-dimensional model does not explain the variance of the judgment items, implying that another latent variable, at least, would be needed in the model. Therefore, the cognition regulation latent variable does not alone explain the variance of the Meta-Text items, indicating that the cognition regulation domain probably involves both this broad ability as some more specific ability.
Models C and D, on the other hand, presented problems linked to the correlations between the factors and the loadings of factor g on the specific factors, respectively. In Model C, the specific factors of planning and monitoring showed a very high correlation (r = .912), indicating that both factors share a variance of 83.174%, providing hints about a strong possibility that both latent variables may in fact be a single construct rather than distinct constructs. In the case of model D, the factor loadings of the general factor on the factors planning and monitoring were high (λ = 1.266 and .720), with a factor loading of 1.266 indicating that there are problems in the model and most likely the planning factor does not hold. The results of the two-factor models E and E1 also reinforce the evidence to refute the planning factor and also indicate the need to refute the monitoring factor.
The evidence from models C, D, E, and E1 can be analyzed from the perspective of two hypotheses. The first hypothesis indicates that it is possible that the planning and monitoring constructs are actually just regulation of cognition and do not differ from this general domain. The meta-analysis by Craig et al. (2020) presents evidence on the considerable link between planning and monitoring, showing a correlation of .63 (95% confidence interval = .46-.81). However, the data presented in this meta-analysis primarily involve self-report measures. It is plausible that when these abilities are measured by performance-based instruments, as in the case of the Meta-Text, they show the actual relationship between these abilities, indicating that in reality they are not two distinguished specific processes (Rose et al., 2015).
The second hypothesis concerns the difficulty of developing tasks and items that separate the planning and monitoring factors. Important challenges may be generated while evaluating specific metacognitive abilities, mainly due to the difficulty of separating them from the performance of the task itself and other cognitive and metacognitive components involved in it (e.g., Li et al., 2015;Rose et al., 2015). Moreover, in Meta-Text, the pairs of planning and monitoring items are linked to the same question, so the items have local dependency which is considered in all the models analyzed. In this sense, it is possible that the characteristics of the test may influence the results. A suggestion for empirical verification of this possibility would be the development of tests in which planning and monitoring are not measured in questions that link the measures of both constructs.
In the light of the issues presented in models A, C, D, E and E1, model E2 is a bifactor model in which the presence of specific factors for monitoring and planning is eliminated, although the judgment factor is kept. In model E2, both the general factor and the specific factor showed acceptable reliability. The evidence from Model E2 brings conceptual implications for metacognition theory. Since the judgment items were well loaded only by the judgment factor and show a very weak association with the general factor of cognition regulation (see Table 3), it is suggested that the judgment construct may actually be a component of the metacognitive knowledge dimension. To date, metacognitive theory has preferentially assumed that judgment is a component of cognition regulation. Our evidence opens up the possibility that this assumption of theory is mistaken.
A plausible explanation for those novel findings in our study is the fact that the Meta-Text test is performance-based, bringing new evidence as a function of a more suitable methodology for the measurement of metacognition. Our results seem promising, as initial evidence from studies in the field of cognitive neuroscience is consistent with our results. Despite sharing similar regions of the prefrontal cortex, there are indications that judgment processes may have different mechanisms of functioning, in contrast to more general metacognitive abilities linked to the regulation domain of cognition (e.g., Fleur et al., 2021;Morales et al., 2018). However, the neurocognitive architecture underlying metacognition still needs to be further explored in future research.
The implementation of bifactor models in the field of measuring metacognition is relatively recent. Some studies analyzing this type of models have found good fit rates, nevertheless, the analyses have been performed exclusively on self-report instruments (Fergus & Bardeen, 2019;Ning, 2019;Zhao et al., 2019). Furthermore, the bifactorial frameworks tested have included as specific factors, metacognitive beliefs (Fergus & Bardeen, 2019), broad domains of metacognition, i.e., cognition and cognition regulation (Ning, 2019), and domain-dependent factors, i.e., reading and mathematics (Zhao et al., 2019). To the best of our knowledge, our study is the first to test bifactor models in the metacognitive field using items based on respondent performance. This type of bifactor model had not yet been tested in the field.

Conclusion
This study presents the first evidence on the structural validity of the Meta-Text. The results support a bifactor structure of the test that includes a general cognition regulation factor and a judgment specific factor. The shown evidence offers three important implications for the field of study of metacognition.
The first implication concerns the use of performance tests to measure metacognition. The use of performance tests allows one to deal with the respondent and confirmatory biases contained in self-report and think-aloud instruments.
The second implication is related to the theorization of metacognitive domains and abilities. Traditionally, planning, monitoring, and judgment have been linked to the domain of regulation of cognition. However, the results provide evidence that planning and monitoring are only regulation of cognition and not specific processes and that judgment is possibly a component of metacognitive knowledge and not of regulation of cognition.
The third implication concerns the importance of implementing bifactor models in psychometric studies of metacognition measures. By using these models one can to verify whether the specific factors remain valid in the presence of a general factor, separating the variance attributable to each factor. To the best of our knowledge, bifactor models of metacognition are only beginning to be addressed in recent research, so a broader understanding and application of these models in the metacognitive field is still needed. The present study analyzes and distinguishes different factor structures, incorporating a bifactorial structure which brings promising evidence, from performancebased test data.

Recommendations
As a research agenda, it is important to conduct studies to test the structural validity of the Meta-Text in other higher education samples with different sociodemographic and cultural characteristics. Furthermore, in order to test whether the factor structure of the test is replicable in different population groups, it is necessary to perform invariance analyses of the test scores according to gender, course, educational level, nationality, or other variables of sociodemographic and educational interest.
Considering the complexity in measuring metacognition and taking into account that there is still no consensus in the literature about the best instruments to measure it, the use of multi-method research designs is essential. Conducting studies involving different metacognitive measures (e.g., Think Aloud protocols and performance-based tests) represents an important field of study. These types of studies are relevant for obtaining a more comprehensive and accurate picture of students' metacognitive abilities (Gascoine et al., 2017).
Finally, in order to add new evidence on the validity of the Meta-Text and to make it available for use in psychopedagogical diagnosis and intervention processes, it is important to develop future investigations to test the external validity of the test, especially its link with educational outcomes. Furthermore, it is indicated that the other test of the Meta-Performance Battery should also be evaluated in terms of its validity, and that both tests could be evaluated together.

Limitations
Despite the contributions of this study to the field of metacognition, some limitations need to be pointed out. The sample used in this study was selected in a non-probabilistic way by convenience, so generalization of results is not possible. Furthermore, Meta-Text assesses metacognition solely in the domain of reading and comprehension of texts, so the findings of this study may be domain dependent. Finally, although Google Forms is one of the most widely used platforms in online data collection in different fields of knowledge (Mondal et al., 2019), more robust software specialized in measuring cognitive constructs should be used allowing the capture of other information, such as the subject's interactions with the test or response time on each item and on the test in general.