Quantcast
Channel: Intel® oneAPI Threading Building Blocks & Intel® Threading Building Blocks
Viewing all 702 articles
Browse latest View live

blocking try_put flow graph

$
0
0

Hello

How can I achieve a blocking try_put on a node with a concurrency limiter (or if it's limited by other reasons) ? For instance, if I have a function node with a concurrency limit of 3 tasks and I would like try_put to block (and do work-stealing if needed) until one of the tasks completes and I have an available slot for another task to push. Atm. tasks get queued as many as I can push regardless the limit value or I get 'false' returned by try_put if I use 'rejecting' policy.

Thanks.

TCE Level: 

TCE Open Date: 

Friday, March 20, 2020 - 06:19

Function nodes and move

$
0
0

Hi everybody,

I'm planning to use flow graphs in an intensive data processing tool. Each node of the graph will be doing in-place computations on a large dataset. For example, a function_node could be taking a potentially large signal as an input and computing the FFT of this signal.

Performance is the main issue here. Therefore, I'd like my function_node input parameters to be movable in order to avoid useless copies of my data. That does not seem to be possible since operator() on a Body takes a const reference as a parameter.

Am I thinking it wrong? Is there a better way to achieve what I want to do?

Thanks.

TCE Level: 

TCE Open Date: 

Friday, March 20, 2020 - 04:07

How can I determine if there are any active tasks?

$
0
0

I have some code that I need to put a mutex into. This code is called both from TBB code and non-TBB code so I don't want to take out the lock if I don't have to. I can't tell how to determine if the TBB scheduler currently has any tasks assigned to it though. I had hoped that tbb_scheduler_init::is_active() would do it but it seems not. It's always true after some TBB code has run. Is this possible?

TCE Level: 

TCE Open Date: 

Friday, March 20, 2020 - 12:25

How can I specify one-thread, multitple-chunks for debugging?

$
0
0

I've got some crashes that I think are simply due to changes refactoring my code into multiple chunks. How can I run a TBB application where multiple chunks are created but only a single thread is used? This would allow me to reproduce problems in a debugger without having to also debug race conditions.

TCE Level: 

TCE Open Date: 

Friday, March 20, 2020 - 12:27

Repository change on Github

$
0
0

Hi!

I would like to know why the Github repository has been changed from https://github.com/intel/tbb to https://github.com/oneapi-src/oneTBB.

Also, the check for v2020.0 does not match between the new URL and the old one. If it's a source code snapshot and follow the same tag, why does is it different?

https://github.com/intel/tbb/archive/v2020.0.tar.gz SHA-256 8eed2377ac62e6ac10af5a8303ce861e4525ffe491a061b48e8fe094fc741ce9
https://github.com/oneapi-src/oneTBB/archive/v2020.0.tar.gz SHA-256 57714f2d2cf33935db33cee93af57eb3ecd5a7bef40c1fb7ca4a41d79684b118

TCE Level: 

TCE Open Date: 

Tuesday, March 24, 2020 - 06:25

VS upgrades, OpenCL, System Studio 2020, flow_graph_opencl_node.h

$
0
0

Hi,

I'm working on the simplest example of an OpenCL flow graph. Compiling fails when looking for async_msg, even with the calls to anything OpenCL are commented out.

#define TBB_PREVIEW_FLOW_GRAPH_NODES 1
#define TBB_PREVIEW_FLOW_GRAPH_FEATURES 1

#include "tbb/flow_graph.h"
#include "tbb/flow_graph_opencl_node.h"

void testfunc()
{
    tbb::flow::graph g;
 
    bool has_been_run = false;

    //using buffer_t = tbb::flow::opencl_buffer<cl_char>;
}

Error        async_msg is not a template   my_prog    C:\Program Files (x86)\IntelSWTools\system_studio_2020\compilers_and_libraries_2020.0.166\windows\tbb\include\tbb\flow_graph_opencl_node.h    354    

... and many more.

If I comment out the header file   //#include "tbb/flow_graph_opencl_node.h" then my program compiles again.

Maybe the installation is incomplete. I was using VS 2019 16.4.5 but it said to upgrade because of the an OpenCL requirement. That broke the Intel installations. After a few iterations of running the System Studio 2020 installer, and selecting anything "OpenCL", it stopped adding anything meaning it thinks everything is complete. Eventually I ran several  Visual Studio Extension Intallers (*.vsix) in a System Studio 2020 subfolder and the Intel Compiler 19.1 and the Intel libraries (TBB, ...) where available again in the project properties.

Any ideas on what needs fixing? Files, paths, env vars, compiler settings, .... ?

 

 

 

PDF Documentation?

$
0
0

Hello,

I am searching the pdf documentation (developer reference/guide) of TBB, but could only find the HTML/web version, while most of the other components of Parallel Studio (compilers, MKL, DAAL, IPP, ...) have pdf documents.

Some unofficial websites remain a copy of old pdf documentation (TBB of C++ Composer XE 2011), but it seems too old - almost a decade old. Besides, in this forum some threads mentioned the existance of pdf version at some early time, but those links are failed.

So, will any pdf documentation be available in the near future?

Nondeterministic processing order for function node with queueing policy

$
0
0

When a function node with queueing policy is used, and it receives the input faster than it can handle, the next job it process can be from the input even if the internal buffer isn't empty.

    For example, the code attached process a list of consecutive numbers in a simple tbb graph

    tbb graph: source -> limiter -> func1 -> terminal
                    where terminal is a function_node with queueing policy, and is serial
    result: terminal node processed input out of order when multiple tbb threads are assigned

    Say

    1. terminal node is processing input 1
    2. func1 pushes input 2, now 2 is in terminal node's internal buffer
    3. func1 pushes input 3, and at the same time, terminal node finished processing input 1
    4. the next job the terminal node processes can be either 2 or 3

    Since the function node has queueing policy, it's always gonna be push instead of pull. So when a job is done. How does the function node decide where to get the next job? It's not obvious to me from the source code https://github.com/oneapi-src/oneTBB/blob/2019_U8/include/tbb/internal/_...

      Questions:

      • Is this a bug or expected behaviour
      • If it's expected, does it mean that I have to use the sequence node to guarantee the order?

      Sample Code Result

      ➜  bin ✗ ./functionNodeTester
      terminal node actual: 1713, expected:1712
      terminal node actual: 1714, expected:1713
      terminal node actual: 1712, expected:1714

        Sample Code

        #include <iostream>
        
        #include "tbb/flow_graph.h"
        #include "tbb/task_scheduler_init.h"
        
        using namespace tbb::flow;
        
        struct TerminalNode_t {
            continue_msg operator()(int v)
            {
                if (v != counter)
                    std::cout << "terminal node actual: "<< v << ", expected:"<< counter << std::endl;
                counter++;
                return continue_msg();
            }
        private:
            int counter = 0;
        };
        
        static int const THRESHOLD = 3;
        static int const CYCLES = 10000;
        
        int main()
        {
            int count = 0;
            
            tbb::task_scheduler_init init(3);
        
            graph g;
            source_node<int> input(g,
                [&count](int& output) -> bool {
                    if (count < CYCLES)
                    {
                        output = count;
                        count++;
                        return true;
                    }
                    
                    return false;
                });
            
            limiter_node<int> l( g, THRESHOLD);
            function_node<int,int> func1( g, serial, [](const int& val){ return val; } );
            function_node<int, continue_msg> terminal( g, serial, TerminalNode_t() );
        
        
            make_edge( l, func1 );
            make_edge( func1, terminal );
            make_edge( terminal, l.decrement );
            make_edge( input, l );
        
            g.wait_for_all();
            return 0;
        }

         


        Compiling with TBB is causing errors

        $
        0
        0

        Windows 10 Version 1909

        tbb-2020.1-win

        9.20

        GCC

        Hello, I am trying to compile a TBB program with G++ that I can get to compile in VS, so I know that TBB is installed correctly. My tbb folder is in C:\

        This is what my current Makefile looks like. https://www.codepile.net/pile/lDwJ445v 

        These are the errors that I am getting. I am not sure if I am linking the library incorrectly, or if it's something else. 

        C:\Users\Owner\Desktop\New folder\OpenMP vs TBB>make
        g++  -IC:\tbb-2020.1-win\tbb\include -LC:\tbb\tbb\lib\intel64\vc14 -O3 -o pps -fopenmp -ltbb avl.o main.o parPlaneSweep.o
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x433f): undefined reference to `tbb::task_group_context::init()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x435d): undefined reference to `tbb::internal::allocate_root_with_context_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x439d): undefined reference to `tbb::internal::get_initial_auto_partitioner_divisor()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x43d0): undefined reference to `tbb::task_group_context::~task_group_context()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x47d5): undefined reference to `tbb::task_group_context::init()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x47f3): undefined reference to `tbb::internal::allocate_root_with_context_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x4835): undefined reference to `tbb::internal::get_initial_auto_partitioner_divisor()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x4868): undefined reference to `tbb::task_group_context::~task_group_context()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x7743): undefined reference to `tbb::internal::allocate_continuation_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x776c): undefined reference to `tbb::internal::allocate_child_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x7a66): undefined reference to `tbb::task_group_context::is_group_execution_cancelled() const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x7abc): undefined reference to `tbb::internal::allocate_continuation_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x7ae5): undefined reference to `tbb::internal::allocate_child_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8cab): undefined reference to `tbb::internal::allocate_continuation_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8cd4): undefined reference to `tbb::internal::allocate_child_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8f69): undefined reference to `tbb::task_group_context::is_group_execution_cancelled() const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8fb4): undefined reference to `tbb::internal::allocate_continuation_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text+0x8fdd): undefined reference to `tbb::internal::allocate_child_proxy::allocate(unsigned int) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x43): undefined reference to `tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x4e): undefined reference to `tbb::task_group_context::is_group_execution_cancelled() const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x73): undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x87): undefined reference to `tbb::task_group_context::~task_group_context()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0xe3): undefined reference to `tbb::interface7::internal::task_arena_base::internal_terminate()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0xf1): undefined reference to `tbb::task_group_context::reset()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x10b): undefined reference to `tbb::interface7::internal::task_arena_base::internal_initialize()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD1Ev[__ZN3tbb4flow11interface105graphD1Ev]+0x12e): undefined reference to `tbb::task_group_context::reset()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x44): undefined reference to `tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x4f): undefined reference to `tbb::task_group_context::is_group_execution_cancelled() const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x74): undefined reference to `tbb::interface5::internal::task_base::destroy(tbb::task&)'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x88): undefined reference to `tbb::task_group_context::~task_group_context()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0xf3): undefined reference to `tbb::interface7::internal::task_arena_base::internal_terminate()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x101): undefined reference to `tbb::task_group_context::reset()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x11b): undefined reference to `tbb::interface7::internal::task_arena_base::internal_initialize()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text$_ZN3tbb4flow11interface105graphD0Ev[__ZN3tbb4flow11interface105graphD0Ev]+0x13e): undefined reference to `tbb::task_group_context::reset()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text.unlikely+0xf3): undefined reference to `tbb::internal::allocate_root_with_context_proxy::free(tbb::task&) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text.unlikely+0xff): undefined reference to `tbb::task_group_context::~task_group_context()'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.text.unlikely+0x118): undefined reference to `tbb::internal::allocate_root_with_context_proxy::free(tbb::task&) const'
        c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: parPlaneSweep.o:parPlaneSweep.cpp:(.rdata$_ZTVN3tbb10interface98internal9flag_taskE[__ZTVN3tbb10interface98internal9flag_taskE]+0x14): undefined reference to `tbb::task::note_affinity(unsigned short)'
        collect2.exe: error: ld returned 1 exit status
        Makefile:10: recipe for target 'pps' failed
        make: *** [pps] Error 1

        Thread ID as index from zero to thread number

        $
        0
        0

        Dear TBB experts,

         

        I am now thinking of parallel efficient algorithm by using TBB.

        Thus, I would like to prepare some pool data (std::vector container) per each thread in RAII phase,

        and make each thread to access by their own id (thread id, I meant).

        Currently, I initialized scheduler as tbb::task_scheduler.initialize(thread_num),

        and prepared std::vector<PoolDataStructure> _pool(thread_num),

        then, for accessing it, I use the return value of tbb::task_arena::current_thread_index() as their thread id.

         

        Seemingly, it works well, but I wonder this is whether correct or not as specification.

        (When I run the program based on the above, I encountered different numerical result with serial or parallel.

         I am afraid that there is some data race due to the above implementation, or others)

         

        I would appreciate it if you could consider it.

         

        Thread local storage data structure constructed preliminarily is assigned per each thread

        $
        0
        0

        Dear TBB experts,

         

        Relating to my previous post, "Thread ID as index from zero to thread number",

        I would like to prepare some data pool, cache data, per each thread (or task?) in RAII (construction) step before massive iterative computations,

        so as not to bring about data race between each running thread (or task?). Currently, I prepare it as std::vector which contains thread number elements.

         

        As an alternative way, I suppose that thread local storage (tbb::combinable, or tbb::enumerable_thread_specific) will work well,

        but I do not know they are assigned and referred in one-to-one correspondence, which is prepared before the parallel loop.

        ### I explain what I meant by psuedocode,

            tbb::enumerable_thread_specific tls_data;

            // data pool is prepared in RAII step

            tbb::paralle_for_each(preparator.begin(), preparator.end(),

              [](::iterator iter){

                  // HERE, tls_data is prepared per each thread (or task? )
            }

            );

            // some massive computation step

            tbb::paralle_for_each(preparator.begin(), preparator.end(),

              [](::iterator iter){

                  // HERE, tls_data would be used without data race, is it guranteed?
            }

            );

        )

         

        I would appreciate it if you could tell me my question.

         

        Kind regards

         

         

         

         

        set initial capacity of concurrent_hash_map

        $
        0
        0

        Does anybody knows of a concrete example for setting the initial capacity on the concurrrent_hash_map? I found recommendations of doing this to improve performance  but looking at the members of the class it seems that you need to define a memory allocator too.

        Thanks

        Pipeline parallelism in TBB flow graph

        $
        0
        0

        Hello,

        I am working on an image processing application which is built with tbb::flow_graph.
        The input comes from a video file or a camera.
        Each image processing node is wrapped in a multifunction_node (I need the multifunction_node's ability to selectively stop propagating graph messages).
        Currently I am using graph.wait_for_all() after feeding each input frame into the root node, but I would like to be able to take advantage of pipeline parallelism.
        I.e. if node A is connected to node B, I would like to let node A start working on its next input after it is done producing its output for B, instead of waiting for B to finish.

        Could you please provide cues for how to do this efficiently and idiomatically?
        I am new to TBB and I feel that I might be missing something obvious.

        Thanks

        From Cilk to Intel TBB

        $
        0
        0

        Hi

         

        I am big fan of Cilk but it is dead. So now I want to use TBB instead. (So I am new to TBB)

        I rely on the function

         

        __cilkrts_get_worker_number()

         

        Therefore, I created a global arena  as follows

         

        tbb::task_arena *global_arena=NULL;
        
        int tbb_setnumworkers(integer32 numworkers)
        {
          int  ok=1;
        
          if ( global_arena==NULL )
            global_arena = new tbb::task_arena (numworkers);
        }
        

        and then my parallel fors looks like

        (*global_arena).execute([=](){ tbb::parallel_for(ufirst,ulast,ustep,[=](uinteger32 i){(*f)(i,arg);}); });

        Now I need a thread id I do

         

        integer32 tbb_workerid(void)
        {
          if ( global_arena==NULL )
            return ( 0 );
          else
            return ( tbb::this_task_arena::current_thread_index() );
        } /* tbb_workerid */

        It works mostly. However, it seems when tbb_workerid is not called from the thread that created the global_arena then the thread is -1. Well, at least I get -1 in some cases.

        Any suggestion for fixing this issue with thread id being -1 in some cases? Or is it impossible?

         

        Thanks.

         

        Erling

         

        Waiting for multiple communicating graphs executed in different arenas

        $
        0
        0

        Hello Community,

        A) Problem Description:

        I'm trying to build an application using multiple flow_graphs which are communicating with each other via try_put messages. Every graph should be executed in a separate arena, mapped to a specific NUMA node (that is only specific cores belonging to the same NUMA node should work on this graph). Also it is not possible to know which graph finishes last at compile time. As it is possible that some graph is idle for a short period of time, until it receives another try_put_message from another graph, graph.wait_for_all is possibly returning before the end of execution.

        B) What I've tried so far:

        1. I created the arenas like

        arenas[i]->initialize(tbb::task_arena::constraints(numaIDs[j],4));

        2. I call graphs[i].reset() inside those arena to reset set the graph's arena to this arena.

        3. I start graph execution like this:

        arenas[i]->execute([&task_groups,i,descr] {
        	task_groups[i].run([i,descr]{
        		descr.start_nodes[i]->try_put(continue_msg());
        	});
        });

        4. //TODO: Wait until all graphs have finished

        I messed with task_group.wait() and graph.wait_for_all() but didn't succeed.

        All of my attempts so far either produced deadlocks or did not block until all graphs had finished (resulting in segfaults).

        C) So my questions are:

        1. Is there a way to wait until all nodes of all graphs are executed?

        2. Is there a way to map graph-nodes to specific cores / NUMA-node other then representing the graph by multiple subgraphs communicating with each other?

        3. If I would use an atomic counter in the node bodies and also check for reaching the number of nodes in the bodies (which would basically check if the current node is the last one of all graphs): Is there a way to use this to signal a waiting thread but still make it possible for the waiting (master)-thread to join the arena/graph execution?

         

        I know that this is a very specific issue but any help would be highly appreciated.

         

        D) Here an example and some more details of my application that may or may not help:

        I have a graph representing the dependencies between a source vector and a destination vector in sparse-matrix-vector-multiplications.

        Let's say I want to execute the upper part of the graph on NUMA node 0 and the lower part on NUMA node 1. There are dependencies between the two parts of the graph.

        I devide the graph into two subgraphs (TBB flow_graph with continue_nodes) (upper and lower) with edges for dependencies within the subgraph and explicit try_put(continue_message) calls after the execution of a node body for all dependencies between the subgraphs.

        I start the graphs with a call to one broadcast_node for each subgraph which puts forward a continue_message to all nodes of the first level of this graph  (which is the the first multiplication).

         

        E) Best Regards


        Enqueued tasks never gets picked up.

        $
        0
        0

        I have the following pattern

        parallel_for (blocked_range(0 , max),  [ ] (r) {

        for (size_t i = r.begin(); i != r.end(); ++i) {

                 ctx[i]  = make_unique<tbb::task_group_context>();

                 waitTask[i] = (new (tbb::task::allocate_root(*ctx))

                                 tbb::empty_task); 

                waitTask[i]->set_ref_count(2);

                auto& workerTask = *new (waitTask->allocate_child()) WorkerTask(name);

                tbb::task::enqueue(workerTask);    

                waitTask[i]->wait_for_all();  // wait for children to complete.

            }

        }

        I get a hang with all threads in receive_or_steal_task, but the "execute" of WorkerTask never being called.

        Things work fine if I wait on all waitTasks   outside of the parallel_for in a different parallel_for.

        OR

        A sequential for loop in place of parallel_for in the above example also works fine .

         

        warning C4296: '>': expression is always false

        $
        0
        0

        Hi,

        Our application treats warning as error.

        We are seeing below warning on windows with Visual Studio 2017 after upgrading to TBB 2020.  It's coming from tbb "concurrent_unordered_set.h" header.

        Below program helps to generate the warning. 

        Error details:

        tbb\concurrent_unordered_set.h(229): error C2220: warning treated as error - no 'object' file generated
        tbb\concurrent_unordered_set.h(235): note: see reference to alias template instantiation 'cu_set_t<tbb::interface5::concurrent_unordered_set,std::iterator_traits<_Iter>::value_type,>' being compiled
        tbb\concurrent_unordered_set.h(229): warning C4296: '>': expression is always false

         

        #include <tbb/concurrent_unordered_set.h>
        
        int main() 
        {
            return 0;
        }

         

        Does not build with C++20

        TBB compile error: iterators.h(246)

        $
        0
        0

         (4):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(246): error : namespace "std" has no member "result_of"
          (8):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(246): error : expected an identifier
          (12):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(246): error : expected a ";"
          (16):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(260): error : identifier "reference" is undefined
          (20):1>C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\tbb\include\tbb/iterators.h(263): error : identifier "reference" is undefined

        Windows 10 with VS2019

        Microsoft Visual Studio Professional 2019
        Version 16.6.0
        VisualStudio.16.Release/16.6.0+30114.105
        Microsoft .NET Framework
        Version 4.7.03056

        Installed Version: Professional

        Visual C++ 2019   00435-60000-00000-AA567
        Microsoft Visual C++ 2019

        Intel® C++ Compiler   Package ID: w_comp_lib_2020.1.216
        Intel® C++ Compiler – extension version 19.1.0.16, Package ID: w_comp_lib_2020.1.216, Copyright © 2002-2020 Intel Corporation. All rights reserved.
        * Other names and brands may be claimed as the property of others.

        Intel® Performance Libraries   Package ID: w_comp_lib_2020.1.216
        Intel® Performance Libraries – extension version 19.1.0.16, Package ID: w_comp_lib_2020.1.216, Copyright © 2002-2020 Intel Corporation. All rights reserved.
        * Other names and brands may be claimed as the property of others.

        Intel® Visual Fortran Compiler   Package ID: w_comp_lib_2020.1.216
        Intel® Visual Fortran Compiler - extension version 19.1.0055.16, Package ID: w_comp_lib_2020.1.216, Copyright © 2002-2020 Intel Corporation. All rights reserved.
        * Other names and brands may be claimed as the property of others.

         

        Unexpected task_group behavior

        $
        0
        0

        I'm using elementary task_group functionality but am seeing unexpected behavior which is turning out to be problematic.

        To summarize, here is a code snippet:

        // This function is executed on the main application thread.
        void Foo(tbb::task_group &tg)
        {
            std::cout << "Main thread id is "<< syscall(SYS_gettid);
        
            tg.run([&]() {
                std::cout << "Spawned worker, thread id is "<< syscall(SYS_gettid);
        
                // Do a whole bunch of work using TBB thread pool.
            });
        
            // Do a whole bunch of work which has to be completed on the main thread.
        
            tg.wait();
        }
        

        From the TBB task_group help:

        template<typename Func> void run( Func&& f )       Spawns a task to compute f() and returns immediately.

        My expectation is that the main thread won't get involved with the work inside of the lambda and so can be used to execute, in parallel, others task which can only be processed by the main thread.

        This is usually the case but I do observe the following happening on some executions:

        Main thread id is 209946
        Spawned worker, thread id is 209946

        which is problematic since the main thread is now tied up with work which I'd prefer it not to be doing.

        As I mentioned, this only happens seldom. Is this expected behavior (it seems to contradict the documentation), and if so would anyone have any suggestions for how to prevent this. Perhaps there are other idioms for robustly accomplishing what I need...

        Thanks in advance,
        Mark
         

        Viewing all 702 articles
        Browse latest View live


        <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>