processes have been killed by the cgroup out-of-memory handler. · Issue #48

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I run the pipeline for a 206.442627 input fastq file, the progress was struck at step /01.raw_align/02.raw_align.sh.work/. The error reported was "slurmstepd: error: Detected 1 oom-kill event(s) in step 1481924.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
My run.cfg file is
[General]
job_type = slurm # here we use SGE to manage jobs
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 22
input_type = raw
input_fofn = /group/pasture/Saila/NextDenovo/smartdenovo.input.fofn # input file
workdir = /group/pasture/Saila/NextDenovo

[correct_option]
read_cuoff = 1k
seed_cutoff = 20000 # the recommended minimum seed length
blocksize = 5g
pa_correction = 5
seed_cutfiles = 5
sort_options = -m 50g -t 30 -k 50
minimap2_options_raw = -x ava-ont -t 8
correction_options = -p 30
cluster_options = --cpus-per-task={cpu} --mem-per-cpu={vf}

[assemble_option]
random_round = 100
minimap2_options_cns = -x ava-ont -t 8 -k17 -w17
nextgraph_options = -a 1

I am not sure if I need to increase the memory and if its the case can you please suggest how much I need to ?
My cluster is slurm.

Thanks

Hi, our tests show a typical minimap2-nd task usually consumes about 40g of memory, but you can use -I parameter to reduce the max memory requirement. In addition, you can also adjust -t in minimap2_options_raw and minimap2_options_cns to adapt the maximum sub-jobs running on each nodes simultaneously.

hello again,
i tried with the modified options for minimap2_options_raw = -x ava-ont -t 32 -I 100 and minimap2_options_cns = -x ava-ont -t 32 -k17 -w17 but now the pipeline stops at 01.raw_align/02.raw_align.sh.work/raw_align000/ with a error description of "slurmstepd: error: *** JOB 1734386 ON comp054 CANCELLED AT 2020-02-08T21:42:13 DUE TO TIME LIMIT ***
Is it something I can modify in the script or just restart the job?

Also, is there an option where I can filter for reads above certain read lengths ?

Thanks in advance so much for your help

Hello again ,
I modified the script with reduced blocksize of 3g and increased the seed_cutfiles to 10 but it still stalls at [ERROR] 2020-02-21 06:48:13,170 /group/pasture/Saila/NextDenovo/01.raw_align/02.raw_align.sh.work/raw_align159/nextDenovo.sh.e
with an error description of slurmstepd: error: *** JOB 1809950 ON comp035 CANCELLED AT 2020-02-21T07:00:20 DUE TO TIME LIMIT ***
Not sure if i should reduce the blocksize still to smaller size ?
Can you please advice
Thanks

Dear Dr. Hu,

Thank you for your replies above. I have the same question, "out-of-memory". I tried to adjust the -t in minimap2_options_raw and minimap2_options_cns as -t 8 , but it still failed to pass this error.

see run.cfg parameters:
[correct_option]
read_cutoff = 1k
genome_size = 3.23g # estimated genome size
sort_options = -m 20g -t 8
minimap2_options_raw = -t 8
pa_correction = 5
correction_options = -p 30

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1

see error message:
minimap2-nd --step 1 --dual=yes -t 8 -x ava-pb /01.raw_align/input.seed.005.2bit /01.r aw_align/input.part.004.2bit -o input.seed.005.2bit.163.ovl slurmstepd-c03b06n04: error: Detected 1 oom-kill event(s) in step 593045.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

The slurm cluster I used has enough memory (q_fat, 72 cpus/1.5T memory/2 nodes), but the default normal queue has limited memory. The subtasks submitted by paralleltask always use default normal queue even though I submit the main job to q_fat Therefore, I tried to use submit specify the queue for paralleltask , for example submit = sbatch -q q_fat , but it seems it was a wrong way to specify queue.

I also tried the way in [https://nextdenovo.readthedocs.io/en/latest/FAQ.html#how-to-optimize-parallel-computing-parameters].

Could you tell me how to specify the queue in a correct way or give some suggestion for this out-of-memory issue?

Thank you very much!

Regards,

Two solutions:

use


    submit = sbatch -q q_fat --cpus-per-task={cpu} --mem-per-cpu={mem} -o {out} -e {err} {script}

set


    job_type = local

and submit the main job to


    q_fat