Atlassian Cloud Marketplace Development

Lessons Learnt Building for the Atlassian Marketplace

marketplace screenshot v2

Why Marketplaces?

Recurring revenue is considered the holy grail for any technology business with SaaS being the main way to achieve this. Build your application, then bill based on usage month after month. However in 2021 while more people are using the internet than ever, building your SaaS its harder than ever. The slow SaaS ramp of death is very real and anything you can do to reduce the risk in getting your SaaS out the door is something to seriously consider.

Marketplaces are considered an effective method to lower the risk of building SaaS. You launch to a large pool of customers already proven to be willing to pay to solve problems. It also lowers the amount of up-front work required, as there is no need to deal with user management, billing, login forms and other things that add immediate value, but are table stakes for doing it yourself.

Why Atlassian Marketplace?

So why the Atlassian marketplace? The Atlassian marketplace these days is fairly mature, with a lot of excellent software launched and making revenue. With such a large install base of customers, the impending death of server licenses causing a mass migration to the cloud instances it seems like a reasonable bet that the marketplace is about to grow.

Given the above, it seems prudent to investigate how to build an application for Atlassian Cloud. Which is what I did. While doing so, I wanted to build something actually useful and launch it onto the marketplace. This seemed to be a good way to determine how difficult a real integration was, and see if it's worth investing more time into the marketplace itself.

What follows are some of the more valuable lessons I learnt while building this application and launching it into the marketplace.

The marketplace is fractured

Despite the apparent unity of the Atlassian world due to their heritage, the marketplace is not limited to Cloud applications. It also has soon-to-be-retired server plugins and data center applications. These are in turn split into the multiple products that Atlassian owns, Jira, Jira Service Management, Confluence, Bitbucket, Fisheye/Crucible, Bamboo and Crowd.

There are also different ways of hosting the applications. Cloud applications can be built using Atlassian Connect where you can bring whatever language you want and communicate over HTTP calls. Forge is similar but limited to certain languages, with the application being hosted by Atlassian themselves.

Then, there are the server and data center products where you must use Java and your application runs inside someone else's application install.

As such documentation when searching about how to build applications can be confusing. You need to be aware of what you are working with and ensure the documentation relates to it. As mentioned, the server editions are end-of-life. Data center has its own special considerations as you need to write it such that your application scales to support very large installs. I don't have much experience with either data center or server editions and cannot comment about their development processes.

Payment cycles are slow

There is a 30-60 day evaluation period for applications when they are added by a company. The number of days comes down to when they add the application to their install matched to their billing cycle. This is an issue I have discussed with some other Atlassian marketplace developers and potentially a killer problem if you have a high cost to run your application.

While I don't object to an evaluation period, the long variable lengths have the potential to kill some businesses. Unlikely, but possible is that you develop an application, get thousands of customers in the first few days, and then have to scale and deal with huge hosting costs with no income. Worse is that it's hard to predict when it will come in due to the variable lengths. I don't know if Atlassian has any sort of support for dealing with this situation, although I suspect they might.

Billing options are limited to only per-seat

Billing per seat is an easy way to charge for your service. However, it does not fit into all service models. I am of the opinion that you should only charge per seat if the application looks or works differently for each user. If every user has the same experience then you should be looking to bill by some other metric.

It would also allow you to make billing more accurate for some customers. Someone with 5 users and 5 million confluence pages should probably be billed differently to someone with 5,000 users and 50 pages.

While I understand this is hard to implement on the Atlassian side its something I hope is on the road-map in the future.

Bitbucket does not offer Pay via Atlassian (PvA)

You are unable to get Atlassian to do the billing for Bitbucket marketplace applications. This is probably the reason why there are so few of them on the marketplace. While you can implement your own billing, this defeats the main appeal of marketplaces which should be making the payment process as painless as possible and built in to the main cost as a sub-item on their bill.

You can follow the discussion about this issue on the community site. While it was suggested this was being considered for release in July 2021 there has been no progress as yet.

It's a pity because I can think of a few plugins I would love to launch there that would provide a lot of benefit to people using Bitbucket. Hoping this gets resolved sooner rather than later.

Connect descriptor content type

To create a connect application you need to write a connect descriptor. This is what Atlassian calls in order to know where to inject your application, what permissions it needs and a bunch of other required parameters. You can read about it at getting started with connect.

What is not mentioned is that when Atlassian calls you, if you return the wrong content type for your connect descriptor, the call will fail. Worse still, you get no descriptive errors out of Atlassian when this happens. It's just a generic error with nothing actionable.

As such, be sure to always set the correct content type headers which in this case should be application/json; charset=utf-8 or application/json.

While you should be setting these headers correctly through following best practices it's an easy one to overlook and Atlassian is being obtuse with the error it returns when you do this. It's especially annoying because when you start writing an application the connect response is the first one you need to do and setting the content type correctly is an easy thing to overlook. This cost me a few hours the first time I attempted to write an application.

Webhooks are fire and forget (mostly)

If you subscribe to webhook calls from Atlassian you cannot ask them to be delayed or buffered. They are a one-shot deal. I would dearly love to be able to respond with 429 TOO MANY REQUESTS or some other HTTP code and a return value allowing Atlassian to restart the webhook when your service is able to deal with it. It would make dealing with upgrades of services and infrastructure far easier.

There is a workaround than can achieve most of this. You return a 500 error to Atlassian which will cause them to retry the webhook call up to two additional times. The catch is that the retry fires the moment you return the response.

To resolve the instant callback, you can delay the response on your side. If you keep the connection open before returning your 500 (through a thread sleep or delayed callback) you can give yourself a delay window allowing you to upgrade or perform actions. I have tested this against Confluence with a delay of 60 seconds without issue, giving you a few minutes to do whatever it is you need to do.

However, this is a workaround at best. The proper solution is to buffer webhooks you cannot afford to miss into a persistent queue if you really need them. Then ensure you can always rebuild state without that webhook, because ultimately there is no guarantee that the webhook payload will even be sent. It's best effort on the Atlassian side, but not a guarantee.

Use Jira and Confluence as the source of truth

Because of the aforementioned issues with webhooks, you need to be able to rebuild your state entirely from Atlassian as the source of truth. As such I would suggest that you treat them as a database and pull the content back from them whenever you want to reprocess data or restart your application.

Only store things that are expensive to compute, such as the results over content but never store the content itself. If you do need the content you pull back then store it only in memory as this helps you avoid running afoul of any security questions raised by Atlassian when doing the security self-assessment. Resync with Atlassian on a regular basis to ensure updates and deletes are replicated.

This does have the advantage that you don't have to worry about storing large amounts of data though, simplifying your backup and restore procedures.

JWT signing is painful

If you are calling Atlassian and not using one of their official SDK's written in JS or Java you will have to do your own JWT signing. This is especially annoying because of some requirements Atlassian added, which admittedly are better for security.

A JWT claim to Atlassian needs to follow the following format.

json
{
"exp": 123456789,
"iat": 123456789,
"iss": "issuer",
"Qsh": "8063ff4ca1e41df7bc90c8ab6d0f6207d491cf6dad7c66ea797b4614b71922e9",
}

While most of the above is rather standard, the addition of qsh needs to follow a specific algorithm in order to assure there are no modified query parameters. The details of how to achieve this can be found at Atlassian’s developer pages on understanding JWT.

It consists of 7 steps, and it's pretty easy to make a mistake in any of them that will have you scratching your head when you get it wrong. Atlassian returns good error messages when you send an invalid JWT often pointing out the exact problem you need to resolve.

Where this gets especially annoying is the way that they are handled differently by Jira and Confluence. If you are familiar with Cloud Jira and Cloud Confluence you should be aware Jira lives at the root of your subdomain while Confluence lives under the folder /wiki/. This applies to REST API calls for each product as well. The documentation does not call this out very well, so if you are calling Confluence you need to pre-pend /wiki/ on any call you make.

But this does not apply for JWT signing. You need to remove the pre-pended /wiki/ when creating the JWT. So you have to remember to pre-pend /wiki/ on any REST call, but remove it for JWT signing when working with Confluence. Painful!

Thankfully, once done, all the REST API's themselves are generally very good. They are well documented and work as advertised. They don't break very often either which makes development a lot easier, and they are pretty liberal when it comes to rate limits.

Displaying content

Once you want to display content through the Atlassian sites there are a few things you need to keep in mind. The first is that you must include the following script on any page you render out that is then displayed inside Jira or Confluence.

html
<script src="https://connect-cdn.atl-paas.net/all.js"></script>

Failing to do so will cause an infinite load spinner and Atlassian reporting that the page took too long to load. Annoyingly it will still hit your backend and if you are logging out the requests it looks like a rendering issue on the Atlassian side. Include the above and all will work as expected.

Another thing to keep in mind is when doing any AJAX call back to your service that you need to secure your calls by passing back a valid JWT. To get this you need to do the following on your page,

javascript
AP.context.getToken(function(jwtToken) {
// YOUR CODE HERE
});

While mentioned on a few pages, it's probably not stressed as much as it should be. It's also worth noting that the wrapper will occasionally take a while to trigger as it requests a new JWT from Atlassian. It is something you need to be aware of. Atlassian has said that they are looking to improve this, but you need to keep it in mind if perform display updates.

Atlassian security scan

There is a handy security scan robot run by Atlassian which scans cloud applications and reports on identified security issues. One of the more common ones I found was it reporting on missing the following headers on embedded pages,

Strict-Transport-Security: max-age=31536000; includeSubDomains

While the reporting is fairly useful, I found that the issue ticket it files against you can often have a broken link. In the end, I added the header and confirmed that it worked. I then had to wait for the scan robot to run again and resolve the issue, which took a few days. I wish there was a way to run this manually, or that it ran more frequently.

Approval process

The approval process for your application once submitted is rather opaque. You start by submitting your connect application endpoint for cloud applications. Submissions start to get processed even with invalid information rather than any validation on the form. Thankfully, a bot picks these up and rejects the submission pretty quickly in these cases. The responses are fairly easy to resolve and usually come down to ensuring you have links to your issue trackers and such.

One of the more annoying parts of this process is that you cannot set the pricing or even see the pricing page until your application is submitted.

Once you get past the bot's approval your application will sit for 1 to 2 weeks waiting for some manual approval by what I assume is some poor overworked soul in Atlassian. Once approved the application can go live instantly.

If you accidentally submit an application that is not ready for going live as I did, you will need to raise a support ticket in order to have it removed. This requires fully manual intervention as Atlassian wants to ensure you don't have any active customers.

The Result

Once you get your head around how the integrations work for Cloud applications in the Atlassian Marketplace you start to become pretty productive. You can pull out most of the painful things I have mentioned above into a common code library and your development speed greatly increases. It took me about four weeks to develop the first application as a mostly learning exercise. Once I had that base I was able to build another candidate application in less than two weeks.

In fact, I was looking for something to build when I ran into some issues with Confluence search and suddenly had a candidate application to build and deploy. I knuckled down and had something ready for submission in two weeks.

Better Instant Search

The resulting application: Better Instant Search for Confluence is now available to install on any Confluence Cloud instance. It provides an instant search experience for confluence where you can search and filter as fast as you can type, in an Algolia/Google Instant inspired interface. I’ve also added some powerful search features that allow you to perform regular expression searches as well as being able to match any character in any language. Be sure to check it out or view the video below and see how it can improve your confluence searching experience.