6. Limitations and suggestions for additional research
A research synthesis of this scope, with numerous operational and procedural decisions, will have aspects that will be questioned by other researchers and will leave many important questions and issues untouched. We, too, felt constrained in not being able to discuss the results and methodological factors in greater detail. Thus, we limit our discussion of limitations and suggested additional research to four issues.
First, the dominance of the WJ-R or WJ III studies (94 % of analysis based on WJ battery) suggests the results and implications are best characterized as a referendum on WJ CHC COG-ACH relations. Generalization of the review results to other CHC-based intelligence batteries (DAS-II; KABC-II; SB-V) or assessment approaches (cross-battery assessments) should be approached with caution. Similarly classified broad and narrow CHC measures from other batteries, particularly the narrow CHC test classifications (which are primarily based on logical expert consensus methods), cannot be assumed to display the same CHC COG-ACH relations patterns reported here. For example, although empirically classified (as per a CHC-designed WISC-III + WJ III cross-battery CFA study; Phelps, McGrew, Knopik & Ford, 2005) as narrow Gsm measures of working memory (Gsm-MW), the reported MW factor loadings for WJ III Numbers Reversed (.65), WJ III Auditory Working Memory (.59), and WISC-III Digit Span (.70) tests suggest they are not interchangeable MW measures. More importantly, other intelligence battery tests or composites that may be classified the same (either empirically or logically) as a WJ III measure may not necessarily display the same strength of relation with achievement domains. For example, in the Phelps et al. (2005) dataset, the WJ III Visual Matching (Gs-P) test correlated .42 with WJ III Letter-Word Identification and .40 with WJ III Passage Comprehension, while the similarly Gs-P classified tests (see Flanagan et al., 2006) of WJ III Cross Out (.35; .27) and WISC-III Symbol Search (.32; 27) correlated at lower levels. Empirical support for CHC COG-ACH interpretations beyond the WJ batteries is limited to non-existent.
Second, as previously discussed, our operational criterion for the significance consistency classifications was admittedly post hoc and arbitrary. Given the lack of a prior systematic CHC COG- ACH research synthesis, we erred on the side of leniency as we viewed the current review as exploratory and suggestive in nature. This was our intent—to identify possible significant COG-ACH relations warranting further study and discussion.
Third, a number of interesting CHC COG-ACH relations were classified as tentative/speculative. Additional research with the same or similar measures of these tentatively identified abilities is needed. There has been 20 years of CHC COG-ACH research, but most of it has consisted of analysis of the WJ-R and WJ III battery measures and norm data. Additional research is needed with other measures and in different norm samples. 
Fourth, space did not allow for analyses by methodological factors. Most important was the possibility of different conclusions when comparing manifest variable (MV) versus latent variable (LV) research (see Table 1) and, more importantly, what the MV/LVàACH differential findings suggest for future research and current assessment practice. Inspection of the complete set of on-line summary coding tables reveals an obvious trend for MV studies to report more significant COG- ACH relations than LV studies. For example, across all achievement domains, CHC IVs, and ages, MV analyses were significant approximately 1.5 times more frequently than IV analyses (MV = 40.1 %; LV = 26.2 %). To disentangle the possible MV/LV –by–ACH domain–by–age group interactions requires a separate analyses and manuscript. We provide the on-line summary tables in hopes others will explore these methodological nuances and their implications for practice. Although we did not undertake such detailed exploration, we believe that there is a strong probability that the MV > LV COG-ACH significance finding is most likely due to the absence (MV) or presence (LV) of a general intelligence (g) factor in the research designs. This topic deserves greater deliberation, analysis, discussion and debate than we can offer here.
The combined limitations make one over-arching conclusion clear. The extant CHC COG-ACH literature of the past 20 years has been restricted to a mosaic of methodological approaches that have been primarily applied to samples that frequently have not been independent (i.e., WJ-R and WJ III standardization sample subjects).  We believe that the salient COG-ACH relations reported are those that are the most robust—they managed to “bubble to the surface” despite the methodological twists and turns across research studies.