hback - A Haskell N-Back Memory Game

hback-0.0.tar.gz
Update: Released hback-0.0.2
Download Latest: hback-latest.tar.gz

Based on a recent research paper that claims fluid intelligence could be improved by training working memory, I wrote up a dual n-back test memory game in Haskell and gtk2hs. This is an alpha release: all comments are most welcome.

Gameplay

The n-back memory game tests whether the player can remember if the nth previous turn matches the current one. The dual n-back test will measure how well the player can remember both visual and graphic stimuli simultaneously.

Given a difficulty level n, the player is expected to remember on each turn whether the nth previous sound or graphic (or both) matches the current one, and toggle the appropriate button(s). The graphic flashes for 500ms and the player has another 2.5 seconds to answer.

hback gameplay 1
hback gameplay 2

The training game consists of b block iterations, where each block will present (20+n) visual and audio stimuli. After each iteration, the difficulty of the n-back test may increase or decrease, depending on the performance of the player: the goal is to constantly keep the player at peak concentration.

Installation

Download hback-latest.tar.gz

 % tar -xzf hback-0.0.tar.gz
 % cd hback-0.0

Game can be played without system-wide installation:

 % make trial
 % ./hback

If you like what you see, feel free to install it:

 % make build
 % make install    # you may need root privileges for this
 % rehash
 % hback

hback takes two optional arguments:

 % ./hback b n
     - b determines the number of block iterations (eg. try 20 for an effective daily workout)
     - n determines the initial level (defaults to 1)

Errata

  • Playing sounds currently through mplayer. If mplayer is not installed or you prefer a different player (all the sound files are .wav), currently the user needs to edit hback.hs (function playSound :: Audio -> IO ()) and recompile. This will be fixed as soon as I think of a good cross-platform solution; any suggestions?

24 Comments so far
Leave a comment

Great job! I was wondering how long it would be before someone implemented n-back the way it was described in that paper.

One question though: maybe I was concentrating to hard and didn’t notice, but I didn’t see a score anywhere. I see a print statement in the code, but didn’t see anything in the UI or stdout. Is that feature implemented yet?

Right, the score was orginally updated every turn (a label in the top right). I found it a bit too distracting, so I temporarily removed it. Currently, the score is tracked to change level difficulties… the backend is there for some more sophisticated score keeping and analysis; just waiting for some feedback and a good idea on what to do with the score tracking. :) I’m out of town now, but will take a look at some ideas in the upcoming week.

Thanks for the feedback :)

This is great. Thanks for releasing it. What I’d like is keybindings for clicking audio or graphic. One of the online versions uses “left arrow” for audio and “right arrow” for graphic, which is much easier than having to move the mouse back and forth between buttons (which requires taking your eyes of the target area).

Okay. Keybindings problem solved. After a little research, I was able to add the bindings by adding the following to the hback.glade file, which I guess configures the UI:

61d60
<
71d69
<

This is really nicely done. THANKS!

Oops, forgot to escape XML.

Instructions to add keybindings for “left arrow” for audio, and “right arrow” for graphic:

To the widget with id=”audioBtn”, add the following child element:

<accelerator key=”Left” modifiers=”" signal=”activate”/>

To the widget with id=”visualBtn”, add the following child element:

<accelerator key=”Right” modifiers=”" signal=”activate”/>

Using plain quotes, of course.

Also, a nice-to-have would be a way of pausing the game.

Thanks again…

For those of us who don’t have access to the paper (and since you didn’t comment your code at all ;-)), could you explain any deviations from the protocol in the paper? Are there any? Is is supposed to be exactly as described in every regard (including exactly when to change level, how to quasi-randomly select the next syllable/location, etc.)?

Thanks!

Could you post a windows binary?

Norbert: I have heard good things about the SDL bindings for cross-platform multimedia stuff (perhaps http://hackage.haskell.org/cgi-bin/hackage-scripts/package/SDL-mixer ).

Good work! So many of the previous implementations didn’t follow the protocol. Which is so weird… Presumably they were rushed. My wife and I did a web version that closely follows the protocol. It’s at: http://www.soakyourhead.com/N-Back.aspx We keep practicing and we’re slowly but surely getting better at it ;)

Hi, I am the creator of cognitivefun.net. This is very interesting, thanks for sharing it. To address Erik’s comment, regarding cogfun anyway, I did not have the intention of strictly following the original protocol and I have made that clear, so it is not weird :)

If anyone wants to challenge me on whether a verbatim implementation is necessary to acheive the effects (it is not), feel free to email me. I further do not think it is necessary to replicate the screen setup “closely” if you are not administering the test in a controlled setting.

That said, I find it interesting that the way the stimuli are represented would change the way they are encoded. This itself is worth an investigation.

Hey Cogfun. To be clear, I wasn’t slamming a particular implementation. Just sounding a note of caution. The body of evidence indicates that the dual-n-back in the study works but other similar tasks don’t. For instance, single n-back (and related variants) don’t increase gF at all! Despite being really hard. So just wanted to let people know that the actual implementation can matter.

And on top of that, some of the Flash versions (I don’t remember this being yours…) did things that pretty clearly violated the protocol (scoring a hit on any matches up to n-back, not auto-adjusting, etc.) in a serious way.

Look, I certainly didn’t mean to make you feel defensive. It’s just that some people (again, I don’t think this is you), put together apps that didn’t follow the protocol at all, and they did it quickly to make a buck or whatever. And it’ll be a shame if people work hard at the task but don’t get anywhere (again, remembering that tasks similar to dual n-back don’t work to increase gF).

I think it’s good that there are some number of free (and correct) implementations out there that might (if the science holds up) really help people.

cheers.

Just to address some issues: I believe one should stick to the protocol as close as possible if there is no good reason to deviate from it. First, the chance of duplicating the results from the research is more likely. Second, any deviations from a known protocol should be justified and tested: if we vary one aspect at a time we can indeed find out if it is relevant or not. Deviations just for the sake of change sounds rather silly (especially since, as Erik pointed out, even similar tests did not share similar successes).

I am not trying to slam anybody’s implementation. I started this project, because I saw a growing pool of n-back tests, but could not find an open-source one that tried to be consistent with the research.

For example, consider the shuffle algorithm: the protocol hard-codes how many single and double-matching stimuli are available per block. At first I used a psuedo-random generator, but then I switched over to this approach. The reasoning is two-fold. First, you’re always guaranteed to have an equal amount of correct and incorrect choices (ie. there is no bias towards matching or non-matching answers). Second, after careful consideration, it does not really encourage task specialization. My original complaint with this algorithm was that a participant that knew how many of each kind of answers there are could try to “beat” the system. In fact, though, this just raises the average n-level of the participant. As studies have shown, as the participant’s average n-level increases, he is most likely finding new and better ways to remember the stimuli, but this is not a negative task specialization. The user may have changed tactics, but the amount of concentration remains the same. And the current research concludes it’s the amount of concentration (which does not decrease with higher average n-levels) that ensures gF increases.

I am not aware of any longitudinal studies that show a decrease in gF improvements with really high n-levels. Worst-case scenario, making protocol-consistent dual n-back tests easily available to the public, will show some kind of upper bound on gF improvement with this approach. Then, as a community, we can move on to other ideas. It’s a Win-Win situation. :-)

There are some good points here; thanks for the followup. I didn’t think I’d have to repost, but I feel obliged to clarify some issues, for general interest, because I do think some points here are misleading.

Regarding the protocol, the n-back test has been in use for well over a decade. I know which test Erik refers to, which calls non-n-back presentations n-back. That is simply not an n-back test, which is an irritating case of misinformation.
Beyond that, though, I don’t know of “so
many” web implementations of n-back variants. In fact, I couldn’t find an online n-back test before the “picture n-back” on cogfun.

Regarding departures from the protocol, I do think it’s necessary to rebuke the sentiment that certain variations from a given “protocol” imply that an incomparable neural mechanism is involved (not putting words in anyone’s mouth). A reaction time test is a reaction time test, regardless of the appearance of the stimuli. Of course, there are other funky variables in play, but that isn’t the mechanism you’re worrying about — in fact, in the same vein, there are funky variables regarding the kind of audio stimuli you use for your dual n-back implementation; I won’t delve into that. It will, however, influence the validity of a web-wide measurement for an “upper bound of improvement”; in fact, I would wager that variations caused by different auditory stimuli will be greater than varying visual stimuli. I have mentioned this on the blog.

Also, saying that dual n-back training will lead to an increase in Gf is still a bold assertion. Jaeggi et al.’s paper showed that it may be possible, but there are still a lot of unanswered questions. Robert Sternberg of Tufts has written a follow-up on PNAS and points out some of the important ones; it’s worth a read.
When you’re talking about “increased Gf” it’s reflected by an increase in Gf test scores. This does not say much in and of itself; high IQ score, for example, correlates to higher performance in say, higher mathematics ability, but I wouldn’t call this a conclusive relationship. You will notice that on cogfun.net it does not talk about training on the dual n-back test to “improve fluid intelligence.” It merely says, this is a working memory test based on this paper :)

Finally, I also thought about the fixed-trials problem and decided to use the randomized method like Norbert mentions. The reasoning is just like you said; I think it’s a valid concern and went with it. Mathematically the presentations are comparable, and with large N they should converge. The problem is, I suppose, it would be harder to make broad comparisons, which validates your point. I do have a 20+n trial version but I don’t know if it’s critical to switch over. Anyway.

> Regarding departures from the protocol, I do think it’s necessary to rebuke the sentiment that certain variations from a given “protocol” imply that an incomparable neural mechanism is involved (not putting words in anyone’s mouth).

I find this hard to argue; we do not know enough to judge it irrelevant. Of course I’m not arguing that showing blue boxes instead of red ones will destroy the validity of the test. On the other hand, we don’t exactly know why one variation may work where others have failed (why does a single n-back test fail where a dual n-back test apparently succeeds?) My current belief is we may have accidentally found a system that works. If we decide to deviate from it, let’s change one thing at a time to find any positive or negative correlations.

> A reaction time test is a reaction time test, regardless of the appearance of the stimuli.

Maybe so, but we’re not testing reaction times directly (well, cognitivefun.com version is, but PNAS didn’t). As Sternberg points out, it is a test of working memory; in theory that is dependent on concentration and memory organization: and that most certainly could be affected by the choice of appearance of stimuli. In addition, it is possible that telling the user that they will be measured on %correct and avg. response time will significantly change the way they decide to encode information than if just told they are tested on %correct and average n-level. I would predict doubling the time allowed per stimuli would not change the average n-back. In fact, it may decrease it, since the problem is not reacting to a split-second decision, but concentrating hard enough to remember many previous items.

> Of course, there are other funky variables in play, but that isn’t the mechanism you’re worrying about — in fact, in the same vein, there are funky variables regarding the kind of audio stimuli you use for your dual n-back implementation; I won’t delve into that.

Please do. I do not know much about the kind of audio stimuli used in the PNAS paper. Any insight in this respect is most welcome. And the implementation is a Work In Progress that I will gladly fix. I’ve looked at the cognitivefun implementation, but am not yet convinced on its validity. I think it would be very informative if COGFUN could elaborate on the choice of auditory stimuli and how the encoding is suspected to affect the training of working memory. I’m afraid I did not do my research in this respect and hence will avoid saying something silly.

> Also, saying that dual n-back training will lead to an increase in Gf is still a bold assertion. Jaeggi et al.’s paper showed that it may be possible, but there are still a lot of unanswered questions. Robert Sternberg of Tufts has written a follow-up on PNAS and points out some of the important ones; it’s worth a read.

Thanks for the pointer, it is an interesting read. But I disagree with your premise: unless a bold assertion is made about the actual practical increases in Gf, we’ll never be in a position to strongly argue for or against it. Weak theories will only brew a pot of what-if possibilities. Let’s take a proper positive or negative stance and see how it all plays out. :-)

Sternberg says it “would be important to show that the results are really about working memory rather than some peculiarity of the particular training task.” On the one hand, this supports controlled modification of the protocol. On the other hand, until these changes can be supported with research, any application that claims to help improve fluid needs to follow the protocol. We are not aware if changes to the protocol will change the “improve Gf” result and we are not even sure if a change of protocol will change the “train working memory” result. There are just too many variables and we owe it to the users to ensure that these possibilities are well-acknowledged.

Until we know more, it may be misleading to the users to publish a non-authoritative protocol, since we are in no position to show why the protocol worked. (I, of course, won’t constantly footnote that the sample size, etc. was too small to really know if the protocol worked in the first place. I am plainly aware of these issues, but they will resolve themselves with more studies.)

> When you’re talking about “increased Gf” it’s reflected by an increase in Gf test scores. This does not say much in and of itself;

I agree, but there needs to be some division of labor. Let’s try to produce more conclusive proof that some protocol does indeed “increase Gf test scores”. Separately we can try to show some correlation of “increased Gf test scores” to some more tangible advantage.

Lastly, the 20+n pseudo-random issue was just a more practical issue I wanted to describe as an example of the kind of decisions we need to consciously make. I don’t feel strongly about it one way or the other, but it is important to explain our reasoning.

I think more controversial features of the CF version is the automatic feedback and the ending of the game as soon as the user has gotten sufficient number of wrong answers (ie. a block is no longer always 20+n). I see how both features could be considered better and worse for learning - I will need to play with it a bit more to have a real opinion about it.

I also happen to find the complete blanking of the image self-defeating: instead of helping one locate the next box, the constant on/off blinking serves as a distraction as the eyes try to adjust. With every frame, I find myself scanning the entire image since the eyes have no static point of reference. I may be in the minority with this opinion, and otherwise I find cognitivefun a really nice site, so don’t take the criticism harshly!

All disagreements aside, I think we all share a common interest for improving cognitive reasoning and the theories behind it. Looking forward to further discussions. If you prefer, we can continue these discussions via email: “norbert@this.domain.com”

Appreciate the followup; I think the discussion should be made public, at least for a little longer (because it is getting long), for informative purposes. But not too much longer, because these are comments :)

> My current belief is we may have accidentally found a system that works. If we decide to deviate from it, let’s change one thing at a time to find any positive or negative correlations.
I wouldn’t call this accidental. Jaeggi et al. must have hypothesized the setup based on prior research. The question is less about what setup to use, than how to isolate the mental processes involved, such that generalization is possible. While the result was surprising, the methodology was deliberated (well).

> Maybe so, but we’re not testing reaction times directly (well, cognitivefun.com version is, but PNAS didn’t).
The reaction time mention was an example of “funky variables” that you can manipulate but miss the crux of the test. I am not suggesting this test is a reaction time test (although to my understanding, n-back tests usually do involve reaction time measurement). Sorry for the confusion.

> unless a bold assertion is made about the actual practical increases in Gf, we’ll never be in a position to strongly argue for or against it.
The results of the Jaeggi et al. paper validate bold hypotheses, which is good. What I’m disagreeing with is if anybody takes the setup, writes a program, and says “Hey, want to improve your IQ? Play this!” That is a bold statement I would like to avoid. It is just as irritating to see tests that say “who has the biggest brain?” and gives you a counting task that’s full of unnecessary motion (you might know what I’m talking about) :)

> I think more controversial features of the CF version is the automatic feedback and the ending of the game as soon as the user has gotten sufficient number of wrong answers
> I also happen to find the complete blanking of the image self-defeating: instead of helping one locate the next box, the constant on/off blinking serves as a distraction as the eyes try to adjust.
I think these are very good arguments. Regarding ending the session right after the quota is met, I have written about this already, but consider the case where you have responded to all 10 targets in a fixed-trial version by the 17th trial. You know you are done and your brain will instantly drift away. It makes sense to simply end there. Less strongly said, it makes sense that cutting short at this point will not harm the learning process. Of course, there is a case where you aren’t keeping a conscious tally of the targets. Valid as well, but I ran with the former interpretation, to whatever effect. I did say that I am confident in this judgment, but am fully aware that I may be wrong.
Your other argument is very strong and I have been thinking about it since I visited your page. There is a good case that the stimuli would be encoded differently: that is, a visual vs. a spatial encoding task. While I think the premise is not lost, I think this is a pretty regrettable oversight on my part. Unfortunately, I cannot change the test that is already public; I can only provide a new test. This is easy, but unless it is sufficiently different, users will probably not care at all, so how to introduce a different test that appears very similar to the other one? I have thought of a setup that forces spatial encoding by randomizing the visual appearance, but that would also make cross-test comparison difficult. Hence I am at some sort of a dilemma :(. Your thoughts on this issue are appreciated.

About the auditory stimuli causing variation, it does not pertain to your test in particular, but to all reproductions of the original test. The argument is that auditory stimuli have been chosen to be most distinct yet sound familiar, in order to be encoded/retrieved at about the same speed as each other. Hence the use of alphabets, because they are guaranteed fast access if you speak the language. On the other hand, it is easier to control the novelty of visual stimuli. So I would guess the response speed to auditory stimuli would vary greatly depending on what you pick. The CF version used the same German alphabets as the original paper. I did so because I’d assume they determined that these are sufficiently different and distinguishable. I don’t think the relative retrieval efficiencies will carry over to other languages. Indeed, the reason why there is a “kpst” version on cogfun is because some users found the German consonants confusing. Anyhow, while I don’t think the variation will be significant, compared to variation from different visual stimuli, that of the auditory stimuli would be, as I would guess, larger. That was my argument, but it was a side point.

I agree with you that this is a good chance to test the efficacy of the original setup to a wider scope, but due to the changes I have made, in this regard the tests are harder to compare, which is something of my failure from noticing the importance of this when I first opened the test. As of now I am unsure whether it is worth providing a stricter-to-protocol version.

I’m still going through my motions of finishing up my exams; that’s why I’ve kept quiet for the last couple days. Seems like I won’t be putting any more effort into this until the end of next week.

In the meantime, I just wanted to rekindle the spatial vs visual encoding debate a little. I think this is a very interesting, but difficult, issue to resolve. Originally, I’ve actually tried to approach the n-back from a different perspective: how to enforce visual encoding (since I felt that a grid-like structure would encourage spatial encoding). I had a series of graphics of a similar theme that were visually “loud” comics: high-contrasting colors, lots of details, and exaggerated features. If we plot a spatial (x) vs graphical (y) encoding graph, this would be one extreme: (0,1). The other extreme is having a static point of reference and a single object which appears to just rotate around the pivot point: (1,0). Some middle ground could be, eg: (a) a similar grid with reference, except the square can be one of several colors, (b) seemingly different visual objects, but could in fact be encoded as a series of spatial points (eg. imagine a series of car pictures we can uniquely identify just by the camera angle) (c) a NxN grid with static point of reference where N is large enough it may be easier to encode visually than spatially.

I haven’t done my research on visual and spatial memory encodings, so take everything with a large grain of salt. This is what I’m scheming currently about, though: a series of tests that encourage spatial or visual encoding more. Will there be a significant difference?

Thanks for the explanation of the sound stimuli. I suppose I just found the “kpst” version wierd because I couldn’t as easily encode the sounds as I could an alphabet (I have no problem with English or German).

As a side note, I was wondering about what kind of tasks could be assigned to control participants in future studies. Some kind of memory training that doesn’t stress working memory? Any thoughts?

My net connection will be sporadic for the upcoming week, but I look forward to any ideas on the spatial/visual split.

Unfortunately, I don’t have much to add to the spatial/visual debate :( Outside of a bona fide researcher in the field, I’d be cautious. This stuff can be counter-intuitive and surprising. But others may be bolder than me ;) Until further research is done, I’m hanging pretty tightly onto the protocol.

In other news, you’ve inspired us. We’ve made our implementation open source, too. This stuff is just too important to not do it. If you’re interested in how we build the trial list (which might be similar to the shuffle method), you can see it at: http://www.soakyourhead.com/dual-n-back-open-source.aspx . You’ll also notice that we included unit tests. I wrote tests to generate trial lists and then “play” the lists to make sure that they were right. Building that list was the hardest part of the whole thing, in my opinion.

Hi again,

Regarding spatial vs visual, I don’t claim expertise in this, but I think one can make some good educated guesses. Whether you like it or not, with or without the grid structure, there is a spatial element to the block presentation (I suspect Jaeggi et al. intended it to be this way). The question would be how “spatial” it looks. So, relatively and subjectively speaking, the test on cogfun has less of a “spatial” look (but that’s probably also due to the grey background appearing and disappearing with each presentation).

Using your plot method, reproductions of the Jaeggi et al. test I have seen so far would be close to (1,0) (static focus point plus rotation). I have thought about making a spatial-only version and visual-only version, but questioned whether users would find having an extra test useful. In this case, an s-only and v-only would mean two extra tests that focus on similar working memory mechanisms. Addressing your points (a) and (b), I would think (a) already achieves a “spatial-only” effect if the appearance of the visual stimuli is sufficiently randomized. In (b) it seems like you are asking the user to remember a camera angle; I would suspect this to be a different kind of abstraction. (c) is easy: all you have to do is generate random images sufficiently different from one another.

Regarding a difference in spatial vs visual performance, again I don’t claim expertise but I would guess that “spatial stimuli” is easier to remember than than visual. This is because the spatial stimulus is still presented visually, which accesses more pathways (visual+others) during the encoding/retrieval step. You could present the spatial stimuli by sounds coming from different directions, and at that point there will be spatial-auditory coactivation. It will be interesting to find out exactly how different this is, but I am guessing it is less relevant to the question of how effective the dual n-back task is as a WM training paradigm.

As for memory training that doesn’t stress working memory, what are you thinking about specifically? To my understanding you are always dealing with LTM or non-LTM (STM/buffer, visual scratchpad, WM, whichever you choose) or the process in between. One training task I am very interested in myself is how to get more pathways involved in encoding stimuli; this would essentially speed up LTM transfer. The obvious thing to do is mnemonics training, but the determining factor here is more of self-motivated hard work than clever strategy :)

@Erik:
Glad to hear it! I think lots of researchers in this field may find your work useful since a modifiable webapp may be more appropriate for their work than a modifiable desktop app that they have to manually install on participant computers.

@Cogfun:
Apropos alternative memory training — let me rephrase. I am interested in any suggested ways to give the control participants something to do that at least vaguely seems related to “training” but won’t actually help them with their final score (assuming the current theories on why this works are in fact true). Jaeggi et al. used a control group that simply did not have to show up to the training sessions. I think a more appropriate test would be to give the control group something to work on that’s not known to increase concentration / working memory; although I’m still looking for suggestions. Ideas would be greatly appreciated

If you’re looking for control tasks that involve some form of attention, but not so much memory, you can check out the “box crossing” task. You simply have a bunch of boxes and need to cross them out as fast as you can. I believe there are other similar path-finding tasks that follow the same dynamic. Don’t know if that’s what you’re looking for though.

Speaking of path finding though, haha, there are lots of games out there that basically do just that. Those “collect all the crystals” kinds of games :)

I’m sorry to butt in like this, but I wanted to reply to an earlier point made:

> I agree, but there needs to be some division of labor. Let’s try to produce more conclusive proof that some protocol does indeed “increase Gf test scores”. Separately we can try to show some correlation of “increased Gf test scores” to some more tangible advantage.

I contend that the two statement separately are weaker than the original statement (that the protocol/training produces “more tangible advantages”).
Suppose you show that improved performance at task T increases Gf scores, and suppose you show that increased Gf scores correlate with other advantages.
Why would this imply that improved performance at task T correlates with other advantages? (I realize that it /sounds/ obvious at some level, but I don’t see any logical reason for it to be true.)
If you wanted to show that the stronger statement holds, you would have to show that the only effect of task T is to increase your Gf score, whatever that is, and that any increase of your Gf score has exactly one ‘kind’ of effect. And that it is exactly this effect that leads to the “more tangible advantages”.
As a more intuitive example, suppose we apply this reasoning to IQ instead of Gf scores (yes, I know that they aren’t the same — I’m only concerned with the logical inference).
Suppose you can show that a certain type of training increases IQ scores without any observable benefit in different tasks (i.e., the training benefits are very specialized). Clearly, this does not mean that /any/ task that increases your IQ will not be beneficial outside the IQ test. Similarly, it shouldn’t be hard to imagine that some tasks may increase both your IQ and more generally useful skills, while other tasks may only increase your IQ by training very specialized (IQ testing-related) skills.
I’m no neuroscientist, and I admit I do not know very much about how well-established Gf is as a proxy for ‘general’ cognitive capacity, but it seems exceedingly unlikely to me that every useful (high-level, not overly specialized) cognitive capacity correlates with Gf, and that (conversely) every change in Gf correlates with a change in cognitive capacity.
All I’m saying is that if the task increases both Gf and, say, working memory, neither of the following two statements follows automatically (i.e., as a tautology):
1) any increase in Gf increases working memory
2) any increase in working memory increases Gf
So if you show that some task X increases Gf, and, independently, that an increased Gf correlates with a general skill/capacity, that still does not show that task X increases that general skill/capacity.
It is possible that the link applies in some cases, but I would be surprised if cognitive training had so localized and predictable effects that we can generalize it to any task increasing Gf :-)

> I’m sorry to butt in like this, but I wanted to reply to an earlier point made

By all means. While I am as much a skeptic as the next guy, I just wanted to respond to Mork’s comment from a logical reasoning view. I’ve simply made explicit all the premises. This way, if my reasoning is wrong (as it may very well be), it will be very easy for someone to point out where my fallacies lie.

a = “Improved performance at task T”
b = “Increased Gf score”
c = “X tangible advantage”

> Suppose you show that improved performance at task T increases Gf scores, and suppose you show that increased Gf scores correlate with other advantages.

a -> b
b -> c
Therefore, a -> c

> If you wanted to show that the stronger statement holds, you would have to show that
> the only effect of task T is to increase your Gf score, whatever that is,

d = “Unknown/not considered side-effect”

Two scenarios:

a -> (b and d)
b -> c
d -> not c OR c
Therefore, a -> c and we can correctly test both (a -> b) and (b -> c) separately.

a -> (b and d)
b -> not c
d -> not c OR c
Therefore, a may or may not imply c, but we don’t know because we only tested (b -> c).
Overall: Best case scenario, we show the first case to be true. Worst case scenario, we
have simply shown that one of our premises is false.

I think we sometimes forget in logic a false premise does not invalidate a conclusion.

> and that any increase of your Gf score has exactly one ‘kind’ of effect.

We can never be sure it’s the only effect. Just as in the above scenarios, I’m only interested in whether it will be one of the effects (even if our premise is false).

> As a more intuitive example, suppose we apply this reasoning to IQ instead of Gf scores
> Suppose you can show that a certain type of training increases IQ scores without
> any observable benefit in different tasks

t = “Some task T1″
r = “Some other task T2″
p = “Improve IQ score”

t -> p
r -> p
t -> c
r -> not c
Therefore, t -> p -> c && r -> p -> not c, cannot both be true. If one of these tasks implies c, it’s specifically because there is some other additional premise we did not consider (see above discussion with d).

If anything, testing the original two premises separately will make the chance of showing the correlation to be false higher, specifically because most likely the hypothesis is wrong. But a true correlation, would in fact show the premise to be more general and stronger than just testing a -> c.

> Clearly, this does not mean that /any/ task that increases your IQ will not be beneficial outside the IQ test.

See above. My reasoning suggests that this statement is invalid and misleading. There must be some additional effects created by one task and not the other that correlates to the “additional tangible effects”. If both tasks increase IQ but only one task has “additional tangible effects”, our goal should be to pinpoint those other differences and stop focusing on IQ.

> I’m no neuroscientist, and I admit I do not know very much about how well-established Gf is as a proxy for ‘general’
> cognitive capacity, but it seems exceedingly unlikely to me that every useful (high-level, not overly specialized)
> cognitive capacity correlates with Gf, and that (conversely) every change in Gf correlates with a change in cognitive capacity.

Yes, this may be most likely true. But unless we test strong correlations, we only get back weak premises. People would still be trying to square the circle if pi was not proved to be a transcendental number.



Leave a comment