Search the Asterisk Blog

How to Contribute to Asterisk: Part Three

By Matt Jordan

In the previous post on contributing to Asterisk, we set up the Asterisk Test Suite and wrote a test for the CDR  dialplan function that reproduced the bug in ASTERISK-25179 and failed.

In this post, we will:

  • Figure out where the bug is in the code base
  • Write a patch that fixes it, and passes our previously written test
  • Contribute the patch back to the project

Step 7: Figure out what is going on

First, take a deep breath.

Then another one.

It’s okay. We’re going to look at C code. We’re even going to change some of it.

We can do this!

But before we just dive right in and start hacking away, let’s try to figure out what is going on here. To start, fire up your favourite code editor, and open funcs/func_cdr.c . You’ll be greeted by the standard preamble in all Asterisk files, followed by a bunch of stuff – XML snippets, interesting macros, etc. – that are interesting but aren’t germane to our problem. We want to find where duration  and billsec  are set. Let’s look for that.

If I do a search for billsec  in the file, my first couple of hits are in XML documentation comments. Let’s skip those and keep looking. If I keep going, I’ll eventually find this bit of code:

Alright! This looks promising. We’re testing for a flag called OPT_FLOAT  (yup, that’s our ‘f’ parameter), as well as if the name of args.variable  is either billsec  or duration  (yup, that’s our attribute we passed to the dialplan function). If it is, we scan the value out of tempbuf  and store it as a long integer in ms . We then divide that number by 1000, cast it to a double, and place it back into tempbuf . There’s a few interesting points to all of this:

  1. Hello bug. Clearly that divide by 1000 is why our value is so funky. The author *, in a fit of abject carelessness, apparently thought the value in tempbuf  was going to be in milliseconds. In reality, the value is stored in seconds – which explains why a duration / billsec  of 2 ends up as 0.002.
  2. The dialplan function isn’t actually calculating the duration or billing time here. Instead, we have a buffer, tempbuf , that already has our value in it. We should probably try and figure out how that gets populated.
  3. While it is trivial to simply remove the “divide by 1000” line, because tempbuf  already has our value in it, if it isn’t already a floating point value representing the exact time delta between start  and answer  or start  and end , that won’t actually fix our bug. Remember, the f  option should:

    Returns billsec or duration fields as floating point values.

So let’s go find out where tempbuf  is coming from.

* Hint: Remember, it was me.

If we scroll up a bit in func_cdr , we’ll find the following code:

This looks promising, although we have two functions – ast_cdr_format_var  and ast_cdr_getvar  – that are filling our buffer tempbuf . The first is only called if the channel being operated on doesn’t have a name, which the code later tells us is a dummy channel. The second is called if we apparently have a channel name. While we can assume that our test, which calls the CDR  dialplan function on a Local channel, would thus be calling ast_cdr_getvar , let’s look at cdr.h  to find out what these two functions do:

So that’s helpful:

  • ast_cdr_format_var  is called if we already have a CDR that has been dispatched out to the backends, which is why its first parameter is a pointer to a struct ast_cdr .
  • ast_cdr_getvar  retrieves a CDR variable from a channel’s current CDR.

This is an important distinction. The CDR engine ( main/cdr.c ) maintains the active CDRs for a channel. In the CDR engine, the CDRs are represented by a chain of struct cdr_object  instances. When the CDRs are dispatched out to the CDR backends – such as cdr_custom  or cdr_adaptive_odbc  – the struct cdr_object  instances are converted into a chain of struct ast_cdr  instances. This means we’ve got two records that maintain the duration and billing time – and two different types of objects we have to query when we want those values. The first function, ast_cdr_format_var , is used when a CDR backend accesses properties on a struct ast_cdr  using func_cdr  on a dummy channel; the second, ast_cdr_getvar , is used when the dialplan accesses properties on a struct cdr_object  using func_cdr  on an active channel.

Interestingly enough, in the bug report, the issue reporter found this issue while attempting to use high precision duration and billing times from the cdr_custom  backend. In that case, they would be accessing the values through ast_cdr_format_var . Hence, for us to fix the bug, we need to fix it for both the dispatched CDRs ( struct ast_cdr  instances) as well as for the run-time CDRs ( struct cdr_object  instances).

Ideally, the struct ast_cdr  and struct cdr_object  instances would be able to provide us the high resolution time themselves, and the API calls would simply provide a mechanism for func_cdr  to ask for that. Unfortunately, that isn’t the case.

For struct ast_cdr  – which is publicly defined in cdr.h  – we can see that the duration  and billsec  values are stored in seconds:

For struct cdr_object  – which is a private struct  defined in cdr.c  – we can see that duration  and billsec  are not actually stored on the object. Instead, the start , answer , and end  times are stored:

If we search a bit in main/cdr.c , we can see however that these times are calculated in cdr_object_get_duration  and cdr_object_get_billsec . In both cases, we can observe that the precision of the times calculated in these functions is originally milliseconds, but is dropped to seconds:

What does all that mean?

As it exists today, the API provided by the CDR engine cannot provide us the duration or billing time in anything other than seconds. Drat.

Based on that, we can roughly map out two ways to solve this problem:

  1. Solve it completely in func_cdr . That would mean retrieving the start , answer , and/or end  times, and re-calculating the duration  or billsec  when the f  option is used to get a higher precision than what is available either on the struct ast_cdr  instances or what is calculated internally in the CDR engine on struct cdr_object  instances. The advantage of this approach is that we can completely solve this bug only in a single module ( func_cdr ), the downside is that we have to duplicate duration  and billsec  logic within func_cdr , and we have to query and interpret the various time values from strings to time objects.
  2. Update the CDR objects – struct ast_cdr  and struct cdr_object  – to track high precision time. Using the f  option would then simply report the raw values. Anything that wants the time value in seconds would have to take the high resolution time and convert it to a lower resolution time – you can go down, but you can’t go back up. In many ways, this approach is ideal, as it keeps all of the calculation of billing time and duration in the engine, which is where it belongs. Unfortunately, there’s a number of downsides. First, it means we would break API compatibility with the existing backends. We could mitigate that somewhat by caching two different duration/billing time values on the objects – one high precision, one the existing lower precision values – but this also has its own obvious downsides. Keeping multiple values in sync requires careful consideration, and it would still require updating every backend that can report high precision values. Second, there would be quite a few changes that would have to be made in both the CDR APIs, as well as the internal calculation of times, in order to provide higher precision times. This increases the risk that we introduce a regression while fixing this bug.

Ideally, when we made the major CDR refactoring, we would have gone ahead and just tracked the CDR times using a higher precision. At the time, we tried to minimize changes to the CDR backends, which is also why we ended up with two different types that track CDRs – struct ast_cdr  and struct cdr_object . This wasn’t a terrible decision; we made it a lot easier to port over all of the various CDR backends, and minimized the impact of the engine overhaul on third party module writers. At the same time, this bug is the result of that refactoring. Still, since I’d like to minimize the risk of regression, I’m going to go with fixing this bug using solution #1.

Note: This bug highlights something that happens quite frequently in mature projects: the trade-off between fixing something in a comprehensive but intrusive fashion, versus fixing the actual reported bug. While we could certainly modify the CDR engine and the definition of struct ast_cdr  to provide high resolution time, there’s no way that this doesn’t impact existing modules. If you’re a developer, and you’ve written your own CDR backend, our changes would almost certainly impact you. That’s not good, particularly when we’re fixing a bug in an existing Long Term Support release.

We could go ahead and fix the bug using option #1 in 13 and option #2 in master. Sometimes, that’s the approach that maintainers of the project would advocate. In this case, it feels like overkill. While it’s going to be somewhat annoying to re-retrieve and re-construct the start / answer / end  times out of the various API calls, it beats breaking a lot of things for a lot of people, and making it difficult to upgrade.

Being nice to other developers sometimes trumps the “pure” solution.

Step 8: Fix the bug

Okay, let’s fix this thing.

Since we’re going with option #1, to fix this solely in func_cdr , we know we’re going to have to:

  1. Re-extract the start / answer / end  times from either ast_cdr_format_var  or ast_cdr_getvar
  2. Calculate the high precision duration / billsec  values
  3. Place the resulting value back in tempbuf

To start, let’s write a function in func_cdr  that extracts out a time value. This function is going to mimic the existing code that extracts out a parameter from the CDR engine, given a channel and the name of an attribute to extract. We can add something like the following:

Since we’ll be mimicking the existing code that extracts parameters out of a CDR, we’ll need a char  buffer to fill up with the value, a char *  that may point to the value, and a struct timeval  instance to populate. Since our new function, cdr_retrieve_time , knows that it will only be extracting valid time-based parameters, we can use a smaller buffer than normal, and skip some of the error checks that are necessary with user provided input:

Now that we have the requested time value in tempbuf , we can pull it back out into a struct timeval  instance:

Okay! Since we now have a function that can extract times out of the CDR at will, we can update our handling of the f  option. First, let’s get rid of some of the existing code:

In the case of billsec , we want the difference between the end  time and the answer  time; with duration , we want the difference between the end  time and the start  time. We’ll use two struct timeval  instances for this: start  and finish , where start  is either start  or answer , depending on what we’re extracting:

Unfortunately, the only time that is guaranteed to be set is the start  time. In the case of the end  time, if it isn’t set, we should just use the current time, provided by ast_tvnow() , for our calculations. If the answer  time isn’t set (or if somehow we don’t have a start  time, which would clearly be weird), we should just set the time to 0.0  – that is, there is no duration  or billing  time. This means we need to store the time delta in something, and handle each case by looking at the result of cdr_retrieve_time :

Since our resulting high precision time needs to be output as a decimal, we’ve gone ahead and added double delta  for this.

Of course, if we have a start  time, we should probably calculate something! To get the high precision time delta, we can use ast_tvdiff_us , which returns the value in microseconds. Since the output of this should still be in seconds – with the resulting remainder – we’ll divide this by 1000000  and store the result in delta.

Finally, we need to put the new value back in tempbuf :

And that’s our code change!

Let’s make sure things still compile:

And let’s go ahead and install our modified func_cdr  as well:

Let’s see if our test passes now. Go back to our checkout of the Asterisk Test Suite, and re-run the test:

Woohoo!

Let’s go ahead and commit our patch. Going back to the Asterisk checkout, run the following:

This should bring up your favourite code editor. My commit message looks like the following:

Note that since this patch fixes the bug, we use the smart commit tag of #close  with the JIRA issue number. This will cause the issue to be automagically closed when the patch is committed.

Patch written; bug fixed!

Step 9: Submit a patch to Gerrit

It’s code review time!

But first, a word on code reviews.

In conversations with some people, I’ve found that a number of folks don’t always enjoy code reviews. Generally, this seems to boil down to one of two reasons:

  1. The author has, shall we say, a rather high opinion of their capabilities, and does not feel that their code needs to be reviewed.
  2. The author does not have a rather high opinion of their capabilities, and is terrified of what others might say or think.

To the first point, I can only point out that no one is perfect, and as such, everyone’s code needs to be reviewed. The project is more important than any one person, including me.

To the second point, don’t worry! Everyone, at some point in time, wrote their first C code. Everyone, at some point in time, had that code looked at by someone else. Code reviews are part of how we all learn to be better programmers.

So, how do we submit our patch to Gerrit? While we can do it manually using a variant of git push , there’s a super handy command line tool that will do this for us, git review . Install that first:

Once installed, we can go to our particular cloned repositories, and submit the patches for review. First, let’s start with our Asterisk patch. We’ll want to pass a few command line arguments to git review:

  • A ‘topic’, that reflects the JIRA issue number. This will tie our review back to the JIRA issue.
  • The mainline branch that the review is based on. In our case, that’s ’13’.

With that up for review, we should now post our corresponding Test Suite test. Moving back to the testsuite  directory, we can post it for review as well. Note that in this case, since our patch is against the Test Suite master branch, we don’t need to specify the mainline branch that the review is for:

If we now go to https://gerrit.asterisk.org, we should now see our patches up for review:

gerrit-review

And we’ve now successfully submitted a patch back to the project!

Step 10: Participate in the review

Participating in a Peer Review

So, our patch has been posted. What happens now?

First, the Continuous Integration systems will run build and basic tests against our patches. If our patches pass, we’ll get a +1 verification score on our patches; if not, our patches will be rejected with a -1 verification score. In the latter case, we would need to look at the test failures and fix whatever issues they found.

Note: The Asterisk wiki has some good information on the Continuous Integration systems used by the project, as well as some guidelines for what to do if your patch fails verification.

Let’s assume for a second that someone finds an issue with either our patch to Asterisk, or the test we wrote. When that happens, a reviewer will typically mark a review with a -1, indicating that they don’t think the patch should go in as it is currently written (*gasp*). That’s okay! If that happens, consider that the reviewer took time out of their day – and potentially even their normal job activities – to think critically about the contribution you made. That’s actually quite a gift; one of the things that makes the Asterisk project successful is that we have a large community of developers willing to do this both for each other, as well as for newcomers to the project.

Note: any one with access to Gerrit can review patches. One of the best ways to learn Asterisk is to participate in code reviews!

If someone did have an issue with our patch, we would need to do the following:

  1. Fix the issues they found, and save the resulting files.
  2. Commit the changes. Unless the changes are drastic, we are going to simply view the modifications they suggested under the original patch, which means our commit message can be something relatively simple (we’re going to throw it away in the next step):
  3. We’ll now squash our commits together back into one commit:

    This will pull up your editor, where you can pick/squash/fixup the commits you want:
  4. Change the second ‘pick’ to a ‘fixup’ (f) and save the results:
  5. This will squash the two commits together, creating a new hash for our original commit 1218355 . We can then just run git review  again to re-post our patch back to Gerrit:

Unlike Github or other systems with a PR model, Gerrit uses a Change-ID token in the commit message to associate commits to a review. This allows us to change the commit hash (via the rebase) and still associate the patch set to the same review, without requiring a forced push. The upside of this is that we can have a very nice, clean commit history in Asterisk, even with a thorough peer review process.

Cherry Picking

As we discussed back in our original post on contributing to the project, the Asterisk project maintains several version branches simultaneously. In our case, our bug fix has been targetted for the 13 branch, as that branch is the oldest maintained branch that first has the bug. However, we also need to get our bug fix into all later branches – in this case, master. It doesn’t do us any good to fix it in Asterisk 13, then have the bug reappear when Asterisk 14 is made!

The project uses git cherry-pick to apply patches between branches. In our case, we need to cherry pick our patch to master. Since our patch is relatively simple, however, we can use the facilities in Gerrit to do that for us.

  1. Go to our patch against Asterisk 13 in Gerrit, and click the Cherry Pick button.
    gerrit-cherry-pick-one
  2. This will bring up the Cherry Pick dialog. Enter the branch you want to cherry pick to – in our case, that would be ‘master’.
    gerrit-cherry-pick-two
  3. Click Cherry Pick Change, and a new review will be created for our patch against the ‘master’ branch.
    gerrit-cherry-pick-three

Step 11: Profit

Once:

  • The Continuous Integration system blesses the patch with a +1 Verification score
  • Someone has peer reviewed the patch and given the patch a +1 Code Review score

The patch is then ready for inclusion. The patch will then be approved by one of the project’s glorious overseers, and the it will be immediately included into the branch. In our case, we will have three patches included:

  • Our patch against the Asterisk Test Suite
  • Two patches for Asterisk – one for the 13 branch, and one for master

And that’s it!

While this has been a long series of posts, I hope it takes some of the mystery out of contributing to an open source project. While there’s a fair number of steps, those steps are similar to what you would want to do in any project: write a test that reproduces the issue, understand the root cause of the problem, fix the bug, and verify using the previously written test that the issue is resolved.

As always, if there’s any questions about how to contribute a patch back to the project, the project does maintain dedicated forums and resources to help you out – most notably the #asterisk-dev IRC channel and the asterisk-dev mailing list. Don’t hesitate to ask for any help!

Happy coding!

No Comments Yet

Get the conversation started!

Add to the Discussion

Your email address will not be published. Required fields are marked *

About the Author

Matt Jordan

Matt is CTO at Digium and the project lead for Asterisk. Matt joined the team in 2011, and since then has been involved in the development of both Asterisk and the Asterisk Test Suite. His background in software development can best be described as “eclectic”, having worked in a variety of industries. Uniting the various experiences, however, is a firm belief in good software development practices and methodologies and the effect they have on producing quality software (and keeping software developers from going insane).

See All of Matt's Articles

More From
The Digium Blog

  • No items