blocking try_put flow graph

March 20, 2020, 8:22 am

Latest and popular articles on Intel Technologies

≪ Previous: Future of thread_bound_filter? Options?

Hello

How can I achieve a blocking try_put on a node with a concurrency limiter (or if it's limited by other reasons) ? For instance, if I have a function node with a concurrency limit of 3 tasks and I would like try_put to block (and do work-stealing if needed) until one of the tasks completes and I have an available slot for another task to push. Atm. tasks get queued as many as I can push regardless the limit value or I get 'false' returned by try_put if I use 'rejecting' policy.

Thanks.

TCE Level:

Level 1

TCE Open Date:

Friday, March 20, 2020 - 06:19

↧

Function nodes and move

March 20, 2020, 4:13 am

Latest and popular articles on Intel Technologies

≫ Next: How can I determine if there are any active tasks?

≪ Previous: blocking try_put flow graph

Hi everybody,

I'm planning to use flow graphs in an intensive data processing tool. Each node of the graph will be doing in-place computations on a large dataset. For example, a function_node could be taking a potentially large signal as an input and computing the FFT of this signal.

Performance is the main issue here. Therefore, I'd like my function_node input parameters to be movable in order to avoid useless copies of my data. That does not seem to be possible since operator() on a Body takes a const reference as a parameter.

Am I thinking it wrong? Is there a better way to achieve what I want to do?

Thanks.

TCE Level:

Level 1

TCE Open Date:

Friday, March 20, 2020 - 04:07

↧

How can I determine if there are any active tasks?

March 20, 2020, 12:27 pm

Latest and popular articles on Intel Technologies

≫ Next: How can I specify one-thread, multitple-chunks for debugging?

≪ Previous: Function nodes and move

I have some code that I need to put a mutex into. This code is called both from TBB code and non-TBB code so I don't want to take out the lock if I don't have to. I can't tell how to determine if the TBB scheduler currently has any tasks assigned to it though. I had hoped that tbb_scheduler_init::is_active() would do it but it seems not. It's always true after some TBB code has run. Is this possible?

TCE Level:

Level 1

TCE Open Date:

Friday, March 20, 2020 - 12:25

↧

How can I specify one-thread, multitple-chunks for debugging?

March 20, 2020, 12:29 pm

Latest and popular articles on Intel Technologies

≫ Next: Repository change on Github

≪ Previous: How can I determine if there are any active tasks?

I've got some crashes that I think are simply due to changes refactoring my code into multiple chunks. How can I run a TBB application where multiple chunks are created but only a single thread is used? This would allow me to reproduce problems in a debugger without having to also debug race conditions.

TCE Level:

Level 1

TCE Open Date:

Friday, March 20, 2020 - 12:27

↧

Repository change on Github

March 24, 2020, 6:33 am

Latest and popular articles on Intel Technologies

≫ Next: VS upgrades, OpenCL, System Studio 2020, flow_graph_opencl_node.h

≪ Previous: How can I specify one-thread, multitple-chunks for debugging?

Hi!

I would like to know why the Github repository has been changed from https://github.com/intel/tbb to https://github.com/oneapi-src/oneTBB.

Also, the check for v2020.0 does not match between the new URL and the old one. If it's a source code snapshot and follow the same tag, why does is it different?

https://github.com/intel/tbb/archive/v2020.0.tar.gz SHA-256 8eed2377ac62e6ac10af5a8303ce861e4525ffe491a061b48e8fe094fc741ce9
https://github.com/oneapi-src/oneTBB/archive/v2020.0.tar.gz SHA-256 57714f2d2cf33935db33cee93af57eb3ecd5a7bef40c1fb7ca4a41d79684b118

TCE Level:

Level 1

TCE Open Date:

Tuesday, March 24, 2020 - 06:25

↧

VS upgrades, OpenCL, System Studio 2020, flow_graph_opencl_node.h

March 27, 2020, 7:07 am

Latest and popular articles on Intel Technologies

≫ Next: PDF Documentation?

≪ Previous: Repository change on Github

Hi,

I'm working on the simplest example of an OpenCL flow graph. Compiling fails when looking for async_msg, even with the calls to anything OpenCL are commented out.

#define TBB_PREVIEW_FLOW_GRAPH_NODES 1
#define TBB_PREVIEW_FLOW_GRAPH_FEATURES 1

#include "tbb/flow_graph.h"
#include "tbb/flow_graph_opencl_node.h"

void testfunc()
{
    tbb::flow::graph g;
 
    bool has_been_run = false;

    //using buffer_t = tbb::flow::opencl_buffer<cl_char>;
}

Error async_msg is not a template my_prog C:\Program Files (x86)\IntelSWTools\system_studio_2020\compilers_and_libraries_2020.0.166\windows\tbb\include\tbb\flow_graph_opencl_node.h 354

... and many more.

If I comment out the header file //#include "tbb/flow_graph_opencl_node.h" then my program compiles again.

Maybe the installation is incomplete. I was using VS 2019 16.4.5 but it said to upgrade because of the an OpenCL requirement. That broke the Intel installations. After a few iterations of running the System Studio 2020 installer, and selecting anything "OpenCL", it stopped adding anything meaning it thinks everything is complete. Eventually I ran several Visual Studio Extension Intallers (*.vsix) in a System Studio 2020 subfolder and the Intel Compiler 19.1 and the Intel libraries (TBB, ...) where available again in the project properties.

Any ideas on what needs fixing? Files, paths, env vars, compiler settings, .... ?

↧

PDF Documentation?

April 14, 2020, 12:12 am

Latest and popular articles on Intel Technologies

≫ Next: Nondeterministic processing order for function node with queueing policy

≪ Previous: VS upgrades, OpenCL, System Studio 2020, flow_graph_opencl_node.h

Hello,

I am searching the pdf documentation (developer reference/guide) of TBB, but could only find the HTML/web version, while most of the other components of Parallel Studio (compilers, MKL, DAAL, IPP, ...) have pdf documents.

Some unofficial websites remain a copy of old pdf documentation (TBB of C++ Composer XE 2011), but it seems too old - almost a decade old. Besides, in this forum some threads mentioned the existance of pdf version at some early time, but those links are failed.

So, will any pdf documentation be available in the near future?

↧

Nondeterministic processing order for function node with queueing policy

April 16, 2020, 1:42 am

Latest and popular articles on Intel Technologies

≫ Next: Compiling with TBB is causing errors

≪ Previous: PDF Documentation?

When a function node with queueing policy is used, and it receives the input faster than it can handle, the next job it process can be from the input even if the internal buffer isn't empty.

For example, the code attached process a list of consecutive numbers in a simple tbb graph

tbb graph: source -> limiter -> func1 -> terminal
where terminal is a function_node with queueing policy, and is serial
result: terminal node processed input out of order when multiple tbb threads are assigned

Say

terminal node is processing input 1
func1 pushes input 2, now 2 is in terminal node's internal buffer
func1 pushes input 3, and at the same time, terminal node finished processing input 1
the next job the terminal node processes can be either 2 or 3

Since the function node has queueing policy, it's always gonna be push instead of pull. So when a job is done. How does the function node decide where to get the next job? It's not obvious to me from the source code https://github.com/oneapi-src/oneTBB/blob/2019_U8/include/tbb/internal/_...

Questions:

Is this a bug or expected behaviour
If it's expected, does it mean that I have to use the sequence node to guarantee the order?

Sample Code Result

➜ bin ✗ ./functionNodeTester
terminal node actual: 1713, expected:1712
terminal node actual: 1714, expected:1713
terminal node actual: 1712, expected:1714

Sample Code

#include <iostream>

#include "tbb/flow_graph.h"
#include "tbb/task_scheduler_init.h"

using namespace tbb::flow;

struct TerminalNode_t {
    continue_msg operator()(int v)
    {
        if (v != counter)
            std::cout << "terminal node actual: "<< v << ", expected:"<< counter << std::endl;
        counter++;
        return continue_msg();
    }
private:
    int counter = 0;
};

static int const THRESHOLD = 3;
static int const CYCLES = 10000;

int main()
{
    int count = 0;
    
    tbb::task_scheduler_init init(3);

    graph g;
    source_node<int> input(g,
        [&count](int& output) -> bool {
            if (count < CYCLES)
            {
                output = count;
                count++;
                return true;
            }
            
            return false;
        });
    
    limiter_node<int> l( g, THRESHOLD);
    function_node<int,int> func1( g, serial, [](const int& val){ return val; } );
    function_node<int, continue_msg> terminal( g, serial, TerminalNode_t() );


    make_edge( l, func1 );
    make_edge( func1, terminal );
    make_edge( terminal, l.decrement );
    make_edge( input, l );

    g.wait_for_all();
    return 0;
}

↧

Compiling with TBB is causing errors

April 16, 2020, 1:27 pm

Latest and popular articles on Intel Technologies

≫ Next: Thread ID as index from zero to thread number

≪ Previous: Nondeterministic processing order for function node with queueing policy

Windows 10 Version 1909

tbb-2020.1-win

9.20

GCC

Hello, I am trying to compile a TBB program with G++ that I can get to compile in VS, so I know that TBB is installed correctly. My tbb folder is in C:\

This is what my current Makefile looks like. https://www.codepile.net/pile/lDwJ445v

These are the errors that I am getting. I am not sure if I am linking the library incorrectly, or if it's something else.

C:\Users\Owner\Desktop\New folder\OpenMP vs TBB>make
g++ -IC:\tbb-2020.1-win\tbb\include -LC:\tbb\tbb\lib\intel64\vc14 -O3 -o pps -fopenmp -ltbb avl.o main.o parPlaneSweep.o
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x433f): undefined reference to `tbb::task_group_context::init()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x435d): undefined reference to `tbb::internal::allocate_root_with_context_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x439d): undefined reference to `tbb::internal::get_initial_auto_partitioner_divisor()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x43d0): undefined reference to `tbb::task_group_context::~task_group_context()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x47d5): undefined reference to `tbb::task_group_context::init()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x47f3): undefined reference to `tbb::internal::allocate_root_with_context_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x4835): undefined reference to `tbb::internal::get_initial_auto_partitioner_divisor()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x4868): undefined reference to `tbb::task_group_context::~task_group_context()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x7743): undefined reference to `tbb::internal::allocate_continuation_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x776c): undefined reference to `tbb::internal::allocate_child_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x7a66): undefined reference to `tbb::task_group_context::is_group_execution_cancelled() const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x7abc): undefined reference to `tbb::internal::allocate_continuation_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x7ae5): undefined reference to `tbb::internal::allocate_child_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8cab): undefined reference to `tbb::internal::allocate_continuation_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8cd4): undefined reference to `tbb::internal::allocate_child_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8f69): undefined reference to `tbb::task_group_context::is_group_execution_cancelled() const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8fb4): undefined reference to `tbb::internal::allocate_continuation_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8fdd): undefined reference to `tbb::internal::allocate_child_proxy::allocate(unsigned int) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x43): undefined reference to `tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x4e): undefined reference to `tbb::task_group_context::is_group_execution_cancelled() const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x73): undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x87): undefined reference to `tbb::task_group_context::~task_group_context()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0xe3): undefined reference to `tbb::interface7::internal::task_arena_base::internal_terminate()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0xf1): undefined reference to `tbb::task_group_context::reset()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x10b): undefined reference to `tbb::interface7::internal::task_arena_base::internal_initialize()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x12e): undefined reference to `tbb::task_group_context::reset()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x44): undefined reference to `tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x4f): undefined reference to `tbb::task_group_context::is_group_execution_cancelled() const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x74): undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x88): undefined reference to `tbb::task_group_context::~task_group_context()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0xf3): undefined reference to `tbb::interface7::internal::task_arena_base::internal_terminate()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x101): undefined reference to `tbb::task_group_context::reset()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x11b): undefined reference to `tbb::interface7::internal::task_arena_base::internal_initialize()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x13e): undefined reference to `tbb::task_group_context::reset()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text.unlikely+0xf3): undefined reference to `tbb::internal::allocate_root_with_context_proxy::free(tbb::task&) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text.unlikely+0xff): undefined reference to `tbb::task_group_context::~task_group_context()'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text.unlikely+0x118): undefined reference to `tbb::internal::allocate_root_with_context_proxy::free(tbb::task&) const'
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.rdata$_ZTVN3tbb10interface98internal9flag_taskE[__ZTVN3tbb10interface98internal9flag_taskE]+0x14): undefined reference to `tbb::task::note_affinity(unsigned short)'
collect2.exe: error: ld returned 1 exit status
Makefile:10: recipe for target 'pps' failed
make: *** [pps] Error 1

↧

Thread ID as index from zero to thread number

April 20, 2020, 6:38 pm

Latest and popular articles on Intel Technologies

≫ Next: Thread local storage data structure constructed preliminarily is assigned per each thread

≪ Previous: Compiling with TBB is causing errors

Dear TBB experts,

I am now thinking of parallel efficient algorithm by using TBB.

Thus, I would like to prepare some pool data (std::vector container) per each thread in RAII phase,

and make each thread to access by their own id (thread id, I meant).

Currently, I initialized scheduler as tbb::task_scheduler.initialize(thread_num),

and prepared std::vector<PoolDataStructure> _pool(thread_num),

then, for accessing it, I use the return value of tbb::task_arena::current_thread_index() as their thread id.

Seemingly, it works well, but I wonder this is whether correct or not as specification.

(When I run the program based on the above, I encountered different numerical result with serial or parallel.

I am afraid that there is some data race due to the above implementation, or others)

I would appreciate it if you could consider it.

↧

Thread local storage data structure constructed preliminarily is assigned per each thread

April 20, 2020, 7:02 pm

Latest and popular articles on Intel Technologies

≫ Next: set initial capacity of concurrent_hash_map

≪ Previous: Thread ID as index from zero to thread number

Dear TBB experts,

Relating to my previous post, "Thread ID as index from zero to thread number",

I would like to prepare some data pool, cache data, per each thread (or task?) in RAII (construction) step before massive iterative computations,

so as not to bring about data race between each running thread (or task?). Currently, I prepare it as std::vector which contains thread number elements.

As an alternative way, I suppose that thread local storage (tbb::combinable, or tbb::enumerable_thread_specific) will work well,

but I do not know they are assigned and referred in one-to-one correspondence, which is prepared before the parallel loop.

### I explain what I meant by psuedocode,

tbb::enumerable_thread_specific tls_data;

// data pool is prepared in RAII step

tbb::paralle_for_each(preparator.begin(), preparator.end(),

[](::iterator iter){

// HERE, tls_data is prepared per each thread (or task? )
}

);

// some massive computation step

tbb::paralle_for_each(preparator.begin(), preparator.end(),

[](::iterator iter){

// HERE, tls_data would be used without data race, is it guranteed?
}

);

)

I would appreciate it if you could tell me my question.

Kind regards

↧

set initial capacity of concurrent_hash_map

April 26, 2020, 7:15 pm

Latest and popular articles on Intel Technologies

≫ Next: Pipeline parallelism in TBB flow graph

≪ Previous: Thread local storage data structure constructed preliminarily is assigned per each thread

Does anybody knows of a concrete example for setting the initial capacity on the concurrrent_hash_map? I found recommendations of doing this to improve performance but looking at the members of the class it seems that you need to define a memory allocator too.

Thanks

↧

Pipeline parallelism in TBB flow graph

April 27, 2020, 5:01 am

Latest and popular articles on Intel Technologies

≫ Next: From Cilk to Intel TBB

≪ Previous: set initial capacity of concurrent_hash_map

Hello,

I am working on an image processing application which is built with tbb::flow_graph.
The input comes from a video file or a camera.
Each image processing node is wrapped in a multifunction_node (I need the multifunction_node's ability to selectively stop propagating graph messages).
Currently I am using graph.wait_for_all() after feeding each input frame into the root node, but I would like to be able to take advantage of pipeline parallelism.
I.e. if node A is connected to node B, I would like to let node A start working on its next input after it is done producing its output for B, instead of waiting for B to finish.

Could you please provide cues for how to do this efficiently and idiomatically?
I am new to TBB and I feel that I might be missing something obvious.

Thanks

↧

From Cilk to Intel TBB

April 29, 2020, 11:33 pm

Latest and popular articles on Intel Technologies

≫ Next: Waiting for multiple communicating graphs executed in different arenas

≪ Previous: Pipeline parallelism in TBB flow graph

I am big fan of Cilk but it is dead. So now I want to use TBB instead. (So I am new to TBB)

I rely on the function

__cilkrts_get_worker_number()

Therefore, I created a global arena as follows

tbb::task_arena *global_arena=NULL;

int tbb_setnumworkers(integer32 numworkers)
{
  int  ok=1;

  if ( global_arena==NULL )
    global_arena = new tbb::task_arena (numworkers);
}

and then my parallel fors looks like

(*global_arena).execute([=](){ tbb::parallel_for(ufirst,ulast,ustep,[=](uinteger32 i){(*f)(i,arg);}); });

Now I need a thread id I do

integer32 tbb_workerid(void)
{
  if ( global_arena==NULL )
    return ( 0 );
  else
    return ( tbb::this_task_arena::current_thread_index() );
} /* tbb_workerid */

It works mostly. However, it seems when tbb_workerid is not called from the thread that created the global_arena then the thread is -1. Well, at least I get -1 in some cases.

Any suggestion for fixing this issue with thread id being -1 in some cases? Or is it impossible?

Thanks.

Erling

↧

Waiting for multiple communicating graphs executed in different arenas

May 8, 2020, 9:31 am

Latest and popular articles on Intel Technologies

≫ Next: Enqueued tasks never gets picked up.

≪ Previous: From Cilk to Intel TBB

Hello Community,

A) Problem Description:

I'm trying to build an application using multiple flow_graphs which are communicating with each other via try_put messages. Every graph should be executed in a separate arena, mapped to a specific NUMA node (that is only specific cores belonging to the same NUMA node should work on this graph). Also it is not possible to know which graph finishes last at compile time. As it is possible that some graph is idle for a short period of time, until it receives another try_put_message from another graph, graph.wait_for_all is possibly returning before the end of execution.

B) What I've tried so far:

1. I created the arenas like

arenas[i]->initialize(tbb::task_arena::constraints(numaIDs[j],4));

2. I call graphs[i].reset() inside those arena to reset set the graph's arena to this arena.

3. I start graph execution like this:

arenas[i]->execute([&task_groups,i,descr] {
	task_groups[i].run([i,descr]{
		descr.start_nodes[i]->try_put(continue_msg());
	});
});

4. //TODO: Wait until all graphs have finished

I messed with task_group.wait() and graph.wait_for_all() but didn't succeed.

All of my attempts so far either produced deadlocks or did not block until all graphs had finished (resulting in segfaults).

C) So my questions are:

1. Is there a way to wait until all nodes of all graphs are executed?

2. Is there a way to map graph-nodes to specific cores / NUMA-node other then representing the graph by multiple subgraphs communicating with each other?

3. If I would use an atomic counter in the node bodies and also check for reaching the number of nodes in the bodies (which would basically check if the current node is the last one of all graphs): Is there a way to use this to signal a waiting thread but still make it possible for the waiting (master)-thread to join the arena/graph execution?

I know that this is a very specific issue but any help would be highly appreciated.

D) Here an example and some more details of my application that may or may not help:

I have a graph representing the dependencies between a source vector and a destination vector in sparse-matrix-vector-multiplications.

Let's say I want to execute the upper part of the graph on NUMA node 0 and the lower part on NUMA node 1. There are dependencies between the two parts of the graph.

I devide the graph into two subgraphs (TBB flow_graph with continue_nodes) (upper and lower) with edges for dependencies within the subgraph and explicit try_put(continue_message) calls after the execution of a node body for all dependencies between the subgraphs.

I start the graphs with a call to one broadcast_node for each subgraph which puts forward a continue_message to all nodes of the first level of this graph (which is the the first multiplication).

E) Best Regards

↧

Enqueued tasks never gets picked up.

May 19, 2020, 2:38 pm

Latest and popular articles on Intel Technologies

≫ Next: warning C4296: '>': expression is always false

≪ Previous: Waiting for multiple communicating graphs executed in different arenas

I have the following pattern

parallel_for (blocked_range(0 , max), [ ] (r) {

for (size_t i = r.begin(); i != r.end(); ++i) {

ctx[i] = make_unique<tbb::task_group_context>();

waitTask[i] = (new (tbb::task::allocate_root(*ctx))

tbb::empty_task);

waitTask[i]->set_ref_count(2);

auto& workerTask = *new (waitTask->allocate_child()) WorkerTask(name);

tbb::task::enqueue(workerTask);

waitTask[i]->wait_for_all(); // wait for children to complete.

}

I get a hang with all threads in receive_or_steal_task, but the "execute" of WorkerTask never being called.

Things work fine if I wait on all waitTasks outside of the parallel_for in a different parallel_for.

A sequential for loop in place of parallel_for in the above example also works fine .

↧

warning C4296: '>': expression is always false

May 20, 2020, 1:54 am

Latest and popular articles on Intel Technologies

≫ Next: Does not build with C++20

≪ Previous: Enqueued tasks never gets picked up.

Hi,

Our application treats warning as error.

We are seeing below warning on windows with Visual Studio 2017 after upgrading to TBB 2020. It's coming from tbb "concurrent_unordered_set.h" header.

Below program helps to generate the warning.

Error details:

tbb\concurrent_unordered_set.h(229): error C2220: warning treated as error - no 'object' file generated
tbb\concurrent_unordered_set.h(235): note: see reference to alias template instantiation 'cu_set_t<tbb::interface5::concurrent_unordered_set,std::iterator_traits<_Iter>::value_type,>' being compiled
tbb\concurrent_unordered_set.h(229): warning C4296: '>': expression is always false

#include <tbb/concurrent_unordered_set.h>

int main() 
{
    return 0;
}

↧

Does not build with C++20

May 21, 2020, 2:39 am

Latest and popular articles on Intel Technologies

≫ Next: TBB compile error: iterators.h(246)

≪ Previous: warning C4296: '>': expression is always false

Can not build with clang-10 using -std=c++20

A fix exist in this PR https://github.com/oneapi-src/oneTBB/pull/251

↧

TBB compile error: iterators.h(246)

May 27, 2020, 8:31 am

Latest and popular articles on Intel Technologies

≫ Next: Unexpected task_group behavior

≪ Previous: Does not build with C++20

(4):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(246): error : namespace "std" has no member "result_of"
(8):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(246): error : expected an identifier
(12):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(246): error : expected a ";"
(16):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(260): error : identifier "reference" is undefined
(20):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(263): error : identifier "reference" is undefined

Windows 10 with VS2019

Microsoft Visual Studio Professional 2019
Version 16.6.0
VisualStudio.16.Release/16.6.0+30114.105
Microsoft .NET Framework
Version 4.7.03056

Installed Version: Professional

Visual C++ 2019 00435-60000-00000-AA567
Microsoft Visual C++ 2019

Intel® C++ Compiler Package ID: w_comp_lib_2020.1.216
Intel® C++ Compiler – extension version 19.1.0.16, Package ID: w_comp_lib_2020.1.216, Copyright © 2002-2020 Intel Corporation. All rights reserved.
* Other names and brands may be claimed as the property of others.

Intel® Performance Libraries Package ID: w_comp_lib_2020.1.216
Intel® Performance Libraries – extension version 19.1.0.16, Package ID: w_comp_lib_2020.1.216, Copyright © 2002-2020 Intel Corporation. All rights reserved.
* Other names and brands may be claimed as the property of others.

Intel® Visual Fortran Compiler Package ID: w_comp_lib_2020.1.216
Intel® Visual Fortran Compiler - extension version 19.1.0055.16, Package ID: w_comp_lib_2020.1.216, Copyright © 2002-2020 Intel Corporation. All rights reserved.
* Other names and brands may be claimed as the property of others.

↧

Unexpected task_group behavior

May 29, 2020, 4:06 pm

Latest and popular articles on Intel Technologies

≫ Next: Is TBB 2020 binary Compatible to 2019

≪ Previous: TBB compile error: iterators.h(246)

I'm using elementary task_group functionality but am seeing unexpected behavior which is turning out to be problematic.

To summarize, here is a code snippet:

// This function is executed on the main application thread.
void Foo(tbb::task_group &tg)
{
    std::cout << "Main thread id is "<< syscall(SYS_gettid);

    tg.run([&]() {
        std::cout << "Spawned worker, thread id is "<< syscall(SYS_gettid);

        // Do a whole bunch of work using TBB thread pool.
    });

    // Do a whole bunch of work which has to be completed on the main thread.

    tg.wait();
}

From the TBB task_group help:

template<typename Func> void run( Func&& f ) Spawns a task to compute f() and returns immediately.

My expectation is that the main thread won't get involved with the work inside of the lambda and so can be used to execute, in parallel, others task which can only be processed by the main thread.

This is usually the case but I do observe the following happening on some executions:

Main thread id is 209946
Spawned worker, thread id is 209946

which is problematic since the main thread is now tied up with work which I'd prefer it not to be doing.

As I mentioned, this only happens seldom. Is this expected behavior (it seems to contradict the documentation), and if so would anyone have any suggestions for how to prevent this. Perhaps there are other idioms for robustly accomplishing what I need...

Thanks in advance,
Mark

↧