Pulling route names from mtn project


Within the RRLateExit app, I wanted to include a list of routes and areas eligible for late exit permits, but I definitely did not want to hand code that list. I knew that mountain project had the information so I assumed that I could easily just scrape it from there and then run simple transforms on the data; with the added bonus of having the ability to easily update my list as new routes are added to mountain project.

Instead of scraping mountain project's web pages, I decided to see what data could be easily extracted from the mountain project data files that are used by the mountain project mobile app. Conveniently, the mountain project app stores data as big json objects. Additionally I was able to pull area/route hierarchy and the number of pitches for a route from the data. Inconveniently, the route names were obfuscated in the file.



Why obscure the title like that? Maybe they don't want outsiders to repurpose the data? But that's exactly that I wanted to do(!) and with all other data in plain text I decided to explore how to transform the title back into plain text.

First step: determine if this is just some variation on base64 encoding? If you remove the "XOR-" prefix and run a base64 decoder you end up with, for example, "et Canyon Bouldering". Interesting. So it's not a straight base64 encoding, but the last part of it is.

I was not familiar with the "XOR-" prefix, so I googled a bit and found a helpful link and library. The last part of the title was just straight base64 encoded so all I needed to uncover was a key for the first half. Was the key different for each route? Maybe the key was simply some other piece of info for the route? Was the same key used for all titles? Would the key change for future builds of the data?

Since I knew the decoded version of the title by simply looking at that object's info on mountain project's website, I was able to just brute force a key that decrypted the first part of the title; and it turned out that the same key worked for all titles in the data file. Not sure if the key is different as for different data files or will change in the future, but even if it is different the brute force can easily be rerun.

In the end, I was able to write a small transformation script that reads the mountain project json file, extracts specific areas that I'm interested in and targets all multi-pitch routes in those areas. The resulting JSON output only contains valid area/route combinations and multipitch routes. And assuming there are no drastic data format changes in the source data, I can easily update my list as new routes are added. Yay!

NPM sox and text2wave tweaks


I knew festival shipped with a script named "text2wave", but I never really looked into its flexibility. For the RRLateExit app, I needed to translate text to speech and I wanted some control over the voice used, the speed of speech, how numbers were read and control over pauses. Turns out that you can achieve such control with text2wave if combine it with festival's sable markup.

I used an existing NPM module for basic text2wave usage. It was a good start, but I wanted a bit more control, so I created an updated version that allows me to use sable (read from a filename that ends in .sable), supply extra parameters, receive the wav output to a buffer and cope with some versions of text2wave using an fseek operation on the stdout stream to update the RIFF headers after the wav file's content has been sent.

Next I needed to encode audio data into 8khz, single channel wav for asterisk to make use of it. To accomplish the transcoding, I found node-sox. It allowed me to make the transformations required, but I was curious and decided to add stream support to reduce the number of temporary files I would be creating. I also included updates so that unsupported options like "--guard" and "--magic" were not used if the installed version of sox did not support them; that was the case with the sox version installed on my centos 6 server :(.

Asterisk: dialplan hacks VS AMI VS AGI VS ARI


The question of how to have asterisk initiate a phone call, play a dynamically generated message after a certain event or timeout is reached and then send the entire recorded conversation to a remote destination was an early hurdle for this project. The bulk of my prior experience with asterisk was in 2005 and it did not involve external programming.

Now in 2015 a number of programming interfaces are available for asterisk: AMI, AGI and ARI. Not knowing much about any of these, I had a look at all of them.

AMI (Asterisk Manager Interface)


AGI (Asterisk Gateway Interface)

I think this is the oldest programmable interface for asterisk, but it does not provide the control over a channel once a dial out operation is executed. So initiating playback at a certain time and then sending the resulting conversation to a remote destination seemed difficult to implement.

ARI (Asterisk REST Interface)


Ultimate Implementation

After experimenting with ARI and AMI for about a week, I decided that a hybrid of custom diaplan contexts and AMI would allow me to accomplish my goals. ARI seemed like the favorite at the start, but after having inconsistent results when trying to record and retrieve recording I decided to leave it for the less "clean" combo of AMI and diaplan contexts.

At the start, I tried to put a lot of the customization into the AMI "app", but quickly found that keeping the AMI app simple and expanding the dialplan was much easier for debugging and making changes. As a result, the dialplan now does the majority of channel control with just a thin layer of AMI that initiates the call and keeps track of progress. CURL is used at various points in the dialplan: once to fetch the recording (from the app server via HTTP) to play on the channel and later to post the recording of the channel (to the app server via HTTP). It seemed a little hacky and can require a selinux policy change, but once all changes are in place it proved stable and verifiable.