You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
By clicking “Sign up for GitHub”, you agree to our
terms of service
and
privacy statement
. We’ll occasionally send you account related emails.
Already on GitHub?
Sign in
to your account
When I run the pipeline for a 206.442627 input fastq file, the progress was struck at step /01.raw_align/02.raw_align.sh.work/. The error reported was "slurmstepd: error: Detected 1 oom-kill event(s) in step 1481924.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
My run.cfg file is
[General]
job_type = slurm # here we use SGE to manage jobs
job_prefix = nextDenovo
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 22
input_type = raw
input_fofn = /group/pasture/Saila/NextDenovo/smartdenovo.input.fofn # input file
workdir = /group/pasture/Saila/NextDenovo
[correct_option]
read_cuoff = 1k
seed_cutoff = 20000 # the recommended minimum seed length
blocksize = 5g
pa_correction = 5
seed_cutfiles = 5
sort_options = -m 50g -t 30 -k 50
minimap2_options_raw = -x ava-ont -t 8
correction_options = -p 30
cluster_options = --cpus-per-task={cpu} --mem-per-cpu={vf}
[assemble_option]
random_round = 100
minimap2_options_cns = -x ava-ont -t 8 -k17 -w17
nextgraph_options = -a 1
I am not sure if I need to increase the memory and if its the case can you please suggest how much I need to ?
My cluster is slurm.
Thanks
Hi, our tests show a typical minimap2-nd task usually consumes about 40g of memory, but you can use
-I
parameter to reduce the max memory requirement. In addition, you can also adjust
-t
in
minimap2_options_raw
and
minimap2_options_cns
to adapt the maximum sub-jobs running on each nodes simultaneously.
hello again,
i tried with the modified options for minimap2_options_raw = -x ava-ont -t 32 -I 100 and minimap2_options_cns = -x ava-ont -t 32 -k17 -w17 but now the pipeline stops at 01.raw_align/02.raw_align.sh.work/raw_align000/ with a error description of "slurmstepd: error: *** JOB 1734386 ON comp054 CANCELLED AT 2020-02-08T21:42:13 DUE TO TIME LIMIT ***
Is it something I can modify in the script or just restart the job?
Also, is there an option where I can filter for reads above certain read lengths ?
Thanks in advance so much for your help
Hello again ,
I modified the script with reduced blocksize of 3g and increased the seed_cutfiles to 10 but it still stalls at [ERROR] 2020-02-21 06:48:13,170 /group/pasture/Saila/NextDenovo/01.raw_align/02.raw_align.sh.work/raw_align159/nextDenovo.sh.e
with an error description of slurmstepd: error: *** JOB 1809950 ON comp035 CANCELLED AT 2020-02-21T07:00:20 DUE TO TIME LIMIT ***
Not sure if i should reduce the blocksize still to smaller size ?
Can you please advice
Thanks
Dear Dr. Hu,
Thank you for your replies above. I have the same question, "out-of-memory". I tried to adjust the
-t
in
minimap2_options_raw
and
minimap2_options_cns
as
-t 8
, but it still failed to pass this error.
see run.cfg parameters:
[correct_option]
read_cutoff = 1k
genome_size = 3.23g # estimated genome size
sort_options = -m 20g -t 8
minimap2_options_raw = -t 8
pa_correction = 5
correction_options = -p 30
[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1
see error message:
minimap2-nd --step 1 --dual=yes -t 8 -x ava-pb /01.raw_align/input.seed.005.2bit /01.r aw_align/input.part.004.2bit -o input.seed.005.2bit.163.ovl slurmstepd-c03b06n04: error: Detected 1 oom-kill event(s) in step 593045.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
The slurm cluster I used has enough memory (q_fat, 72 cpus/1.5T memory/2 nodes), but the default normal queue has limited memory. The subtasks submitted by
paralleltask
always use default normal queue even though I submit the main job to
q_fat
Therefore, I tried to use
submit
specify the queue for
paralleltask
, for example
submit = sbatch -q q_fat
, but it seems it was a wrong way to specify queue.
I also tried the way in [https://nextdenovo.readthedocs.io/en/latest/FAQ.html#how-to-optimize-parallel-computing-parameters].
Could you tell me how to specify the queue in a correct way or give some suggestion for this out-of-memory issue?
Thank you very much!
Regards,
Two solutions:
use
submit = sbatch -q q_fat --cpus-per-task={cpu} --mem-per-cpu={mem} -o {out} -e {err} {script}
set
job_type = local
and submit the main job to
q_fat