Clarification on apply operator %


#1

I hit something unexpected today. This REPL code below illustrates it (must be ran while simulating). In a nutshell, I’m doing something like:
branch[VH_MythicoBase.instances%_vehicle_physics]

Once any object of type VH_MythicoBase is destroyed, all _vehicle_physics co’s are aborted. However if I launch with:
VH_MythicoBase.instances.do[branch[item._vehicle_physics]] then things work as expected, which is that if one vehicle goes away, other vehicle’s physics coroutines continue running. Can someone provide some clarity on exactly why this happens when using the synchronize operator?

!a : true
!b : true
!things : {a b}

// I would expect this branch to last until both things are false
!co : branch
[
  things%_wait_false
]

// Fire off a parallel debug print loop to check status
branch
[
  loop
  [
    println("Original branch still valid: ", co.valid?)
    [exit] when not co.valid?
    _wait(0.5)
  ]
  println("Original branch invalid")
]

// Kill a after 2 seconds, expectation is that b continues on
_wait(2)
println("Killing coroutines on a")
a.abort_coroutines_on_this

// Destroy entire co after 5 seconds to avoid mishaps with endless coroutines in the REPL
_wait(5)
println("Destroying all co's")
co.abort

#2

This is easy enough to explain. :madsci:

abort_coroutines_on_this() takes a success? argument that indicates whether the abort should be considered a successful completion or an error condition.

  • false - abort is due to an error so stop this invoked routine and propagate the error up the call stack. This is the default value for the argument.
  • true - indicates that the invoked routine is stopping though things are otherwise not in a bad state so notify any waiting routines that this routine has completed successfully.

If a.abort_coroutines_on_this(true) is used instead, it gives the behavior that you expected.

It is debatable whether the default should be false or even if it should have a default. Also I can see how you would be confused about this argument since it is not commented. We will describe it clearly for the next update.

Does this make sense?


Extra notes

The % in a obj%routine is called the apply operator.

If a call to branch is on a single expression, it does not need to be wrapped in a code block, though it doesn’t hurt.

// This:
!co : branch things%_wait_false

// Is just as good as:
!co : branch
[
  things%_wait_false
]

Did you know that unless is the negated version of when? The Conditionals reference page in the online docs goes into more detail.

// This:
[exit] when not co.valid?

// Is the same as:
[exit] unless co.valid?

#3

This does make sense. I see that in USkookumScriptClassDataComponent::delete_sk_instance that abort_coroutines_on_this is called with the default operator. So in my real world example where I’m launching a bunch of coroutines using the apply operator and then one of my actors gets deleted, the default behavior will be that all of the other coroutines will also exit when any actor gets destroyed. I can see where this would be useful, but in my case it manifested as a head-scratching bug.

I’m wondering if the case for the default being true is stronger than the false case. For instance the classic :sk: pathing demo where you want all robots to path to the player and boom. If one robot is killed, all of them would stop pathing. Or perhaps this is a special case due to the actor getting destroyed, I suppose I could add a call abort_coroutines_on_this(true) to my death method to pre-emptively stop coroutines gracefully before delete_sk_instance is called during cleanup.

:cool:

I did not know this!


#4

Right! I was focused on your example code.

This is a tricky problem in that sometimes it seems that the abort indicates a failure and sometimes it is a success. An additional potential state could be suspended (essentially indefinitely).

My thinking is that the most common correct behavior for coroutines running on a destructed object is to consider them either a failure or suspended.

Consider the following example:

guy1._path_to_actor(trigger_spot)
reached_spot_matinee.play

Suppose that guy1 is destructed for some reason midway through the pathing. Here are the possibilities:

  • success - This would consider the pathing completed and run the next line. Since the pathing didn’t complete, this does not seem to be the correct choice.
  • failure - This would consider the pathing impossible to complete and it unwinds the stack telling each dependent routine that it is out of luck and it won’t complete either. All the dependent routines are properly cleaned up. This may be correct in many cases though it might be overkill. This is very similar to throwing an exception. There is currently no catch equivalent, though something like that might be possible so that some fail behavior could be run.
  • suspend - This would consider the pathing never completed though it would either not notify any dependent calls or it would notify them that this routine will never return and act accordingly. Some calls such as race will happily ignore this and just wake when some other routine completes whereas regular serial calls and sync must also wait forever. This could be the correct choice depending on the waiting calls and how any such “wait forever” calls are eventually cleaned up: destructed objects, cleared mind objects, new missions or levels, etc.

This is also the case with multiple sub-coroutines:

!guys: {guy1 guy2 guy3 guy4}
guys%_path_to_actor(trigger_spot)
reached_spot_matinee.play

If one or more of the guys don’t get to the spot that line never really completes. Obviously you could have something more sophisticated that checks for a certain threshold number of guys before the reached_spot_matinee is triggered though the question here is about what is the most common correct behavior.

Your suggested calling of abort_coroutines_on_this(true) by hand in cases where you want it sounds like it might be the way to go. Alternatively there could be a flag that indicates the desired behavior (fail, succeed, suspend) for a particular aborted coroutine or all dependent coroutines on object or mind destruction.

The apply operator % is a single expression that keeps track of all its simultaneously running sub-coroutines so it is notified if a sub-coroutine fails or succeeds. Conversely, the branch that you used in your example that iterates over the list using do, severs the link between the calling context and the sub-coroutines. branch says run this and I don’t care how long it takes or what happens to it - though, as you noted, you can keep the invoked coroutine around and test whether it is still valid or not.

Still with me? :madsci:

As I say, this is a very tricky area - one that is commonly experienced with pathfinding which inherently takes time and often fails or is aborted before it completes.

If anyone has any further ideas please speak up. I would love more thought in this area. :thinking:


#5

This has been very helpful to me. I can see why it is a complicated decision. This has clarified a feature of the language that I have so far been completely ignorant of.

race/sync
[
    _co1
    _co2
]
println("We finished, but not if _co1 or _co2 were aborted!")

This is going to make me re-think some things, because there have been places where I’ve tried to handle fallout logic but was confused as to why it never ran.

Now that I actually understand this as well as the abort_coroutines_on_this parameter, I feel like the default behavior makes sense.

One last clarification on this. I’ve always thought of the apply operator % as a shortcut for using sync on a list. And the apply race operator %> as a shortcut for using race on a list. From my testing, the operators seem to be functionally equivalent to their named counterparts even in the area of sub-coroutine failure behavior as discussed above. Is this true? Or are there other differences between the operators and the named counterparts (assuming that we are only talking about using them on coroutines rather than methods)?


#6

Something I’m still foggy on is coroutine scope and how this is determined. Can someone tell me what the correct coroutine scope is for the table below?

Code Executed From Scope?
this._co A A?
branch this._co A A?
@some_class_a._co B A or B?
branch @some_class_a._co B A or B?
@list_of_a%_co C A or C?