Recently, I was working on a feature at Hone that required—in part—parsing the proper nouns from the text within any given website. Here’s an overview of how we accomplished this.
At first, we thought it best to use something like the natural module (or the part-of-speech utilities within—as found in the wordpos module). And as it turns out, wordpos provides an easy, straight-forward API to parse text and returns an object with the sentence’s parts-of-speech:
Wordpos also includes a handy getNouns method:
While it was clear that wordpos made it easy to extract all nouns, we desired something slightly different: We wanted to capture proper nouns / names. In other words, the names of products, companies, people, devices, and so forth.
So, we needed to:
Capture all and onlyproper nouns;
Capture groups of nouns which—together—form proper names;
Group these nouns into an array, sorted by word-usage-frequency.
Since wordpos wasn’t able to extract and return exactly what we needed from the text of a website, we wondered if some regular expression could offer us a better and more efficient solution instead. Voila!
Regex to Parse Proper Nouns
As is often the case, regular expressions don’t lend themselves to immediate readability or comprehension. So, let’s examine this regex in more detail and explore exactly how and why it’s able to extract proper nouns. To begin, here is a more illuminating representation of the above regex:
As you can see, there is one capture group—denoted by Group 1. This capture group represents the text that we are actually interested in: a noun (“Microsoft”) or nouns (“iPad Air 2”) which—together—form a proper name/noun. The stuff to the left and right of Group 1 ensure that we’re capturing groupings of proper-nouns.
Here is what this regular expression does in plain English:
Find one or more whitespace characters (spaces, tabs, and line breaks)
Capture one or more words which:
Optionally begin with the letters “i” or “e”
Optionally have a dash after one of those letters
Must begin with one or more capital letters
Optionally contains additional characters (except line breaks)
Finally, this capture group must be followed by one of the two below:
one or more whitespace characters and either a lowercase letter between a–z or any character that is not a word character
any one of the following characters: ` ' ’ " ^ , ; : — \ * . ( ) [ ]
Armed with this regex, we had everything we needed to parse proper nouns from any given website. Here’s how it all works from soup-to-nuts: First we actually request the URL and extract the text from the response body. Then, we split this large chunk of text on each new sentence and end up with an array of sentences.
We then iterate through this array—running the regex against each individual sentence—and end up creating another new array containing all and only the proper nouns that we’re after. From there, we remove duplicates and re-sort the array in order of noun-frequency, as I mentioned above.
And what do we have left? Why, An array of proper nouns, sorted by frequency. It’s beautiful!
Are you an experienced software engineer and find things like this interesting? Check out Hone’s Career page and shoot us an email!
Hone is an incredibly data-driven company. Whenever possible, we use analytics data (combined with customer feedback) to drive decision making about which features to add, modify, and remove — among other things.
We recently wanted to know how some proposed styling changes would affect user interaction rates of a widget in our web client. To accomplish this, we needed to run a few A/B tests.
Using our Nginx load-balancer, we decided to split incoming traffic in half: 50% of our visitors would be served the existing widget and the remaining 50% would be served the new widget (with the new styling/layout changes).
Simple A/B Testing Nginx Config (50%/50%)
The http server config above utilizes Nginx’s ngx_http_split_clients_module functionality to assign all incoming requests into one of n-buckets — and then redirects them to the corresponding test page.
We run A/B tests until a statistically-significant number of visitors have passed through them. For this particular widget being tested, we needed 10k visitors: ~5k going to each of our A- and B-test widgets.
What did we learn? After examining our analytics data, it was immediately clear: A much higher percentage of users interacted in the ways we wanted with the new, redesigned widget compared to the old widget. And — importantly — the higher-than-previous engagement remained steady during the following weeks and months — meaning it wasn’t merely a temporary lift.
More Advanced A/B Testing
If you’re interested in learning more, check out this article by Lawson Kurtz that details some more advanced configs and methods of A/B testing using Nginx.
A few weeks ago here at Hone, we decided to spin a new server cluster in DigitalOcean’s NYC3 data center. DigitalOcean introduced ‘private’ networking just over a year ago. However, it turns out that DigitalOcean actually refers to this as Shared Private Networking—and many of the comments under their announcement point out that their private networking isn’treallytooprivate.
We decided to use OpenVPN to layer a secure network on top of DigitalOcean’s shared private networking.
What we wanted to accomplish:
Install an OpenVPN server on our load balancer
Install an OpenVPN client on all of our other machines
Drop any traffic other than OpenVPN on eth1 (DO’s shared private network)
Allow all traffic over tun0 (our secured private network)
With traffic passing through the tun0 interface between machines, we gain the ability to more quickly and easily spin up new machines and add them to our infrastructure.
Here’s how to setup a virtual private network on DigitalOcean (or whichever provider you might be using):
Setup / Configure Your OpenVPN Server
Update your packages and install OpenVPN and Easy RSA:
Copy some Easy RSA files over to a more permanent location so that you can upgrade OpenVPN in the future without losing your configuration settings:
Edit /etc/openvpn/easy-rsa/vars and change the various exports for default certificate values. At the very least, you’ll want to change the following keys in the vars file to suit your needs:
With your vars configured, you can now generate a master Certificate Authority (CA) Certificate and Key for your server:
Generate a certificate and private key for the server:
Copy the keys, certs, and the Diffie–Hellman–Merkle params that you generated from /etc/openvpn/easy-rsa/keys into your OpenVPN directory, /etc/openvpn/:
Generate Certificates For VPN Client(s)
You’ll need to generate a certificate and key for each VPN client:
Securely copy these the following files to the client machine (via rsync, scp, etc.):
After you’ve copied these keys and certs to your client machine(s), delete them from the VPN server. They’re no longer needed on that machine and keeping them there poses a security risk if unauthorized access is gained.
Edit Your OpenVPN Server Config
Copy over and unpack the provided example server config—server.conf.gz—to /etc/openvpn/server.conf
Edit /etc/openvpn/server.conf to ensure it contains the settings that make sense for your intended setup.
You’ll want to make sure it points to the correct location of your certs, keys, and dh2048.pem file (Diffie–Hellman–Merkle parameters) that you generated earlier.
Here’s an example of some lines you should configure (or uncomment) in server.conf
Note: Uncommenting client-to-client will enable your VPN clients to communicate with one another directly. By default, VPN clients will only see the VPN server.
Start OpenVPN on your server
Check That it Works
After you start your openvpn service, you should see tun0 interface details when you run:
If you’ve set up your server correctly, you should see some output like this:
If you run the ifconfig tun0 command above and see the error ifconfig: interface tun0 does not exist, then you’ll need to check your OpenVPN server.conf again and make sure to reconfigure it.
Setup / Configure Your OpenVPN Client(s)
Update your packages and install OpenVPN:
Copy the example client.conf file over to /etc/openvpn/client.conf:
If you haven’t already done so: Make sure that you’ve generated and then securely transferred (or manually copied over) ca.crt, <vpn_client_name>.crt, and <vpn_client_name>.key to your new VPN client. (For the purposes of this tutorial, I’ve copied them into /etc/openvpn/.)
Edit /etc/openvpn/client.conf and make sure everything points to correct certs and keys. You should also make sure to specify the IP address corresponding to your OpenVPN server’s eth1 interface (DO’s shared private network):
Having finished editing client.conf, you can restart the OpenVPN service on your VPN client machine:
Check to see that you’ve got a tun0 interface (and that it has the correct IP):
Ping Your VPN Server (From Client Machine)
If you’d like to sanity-check your connection to your VPN server, try pinging the OpenVPN server directly:
In this example, I’m pinging 10.8.0.1, which is set by OpenVPN default server config. You may have selected something different, in which case you should find the IP corresponding to your OpenVPN server’s tun0 interface. You can grab this IP quickly by running (on the OpenVPN server itself): ifconfig tun0.
If you can’t ping the OpenVPN server, then something is wrong with your config. Consider re-reading all of the above.
Assign Static IPs to VPN Clients
You may desire to assign static IP addresses to some or all of your client machines. (Hat-tip to Michael Albert for a great post which helped here.)
On your OpenVPN server, create a folder in which to save the information of your static clients. In this example, we’ll name our folder static_clients:
Make sure to uncomment and edit this line in server.conf:
Previously, when you generated a cert and key for your new VPN client, recall the common_name that you chose. Create a file—naming it whatever you chose for the common_name— in /etc/openvpn/static_clients/ with the following content:
(Note that in this example, we’d like this VPN client’s tun0 interface to be assigned the IP 10.8.0.4)
OpenVPN will need to read these files after it drops privileges. You can do that do that with the following:
Configure Your iptables
If you want to further secure your VPN, you should edit your VPN client machine’s iptables. With your OpenVPN server and clients set up correctly and pingable (in both directions) via their tun0 interfaces, you can begin restricting traffic over your eth1 (DO’s shared private network interface) to accept only traffic over port 1194 (OpenVPN’s default port) on that interface.
For example, if you’re routing traffic through a load balancer, you may want to lock down VPN client boxes as such:
Restrict access on eth0 interface (public) to port 22 only
Restrict access on eth1 interface (DO’s shared private network) to udp/tcp traffic over port 1194 only
Unrestricted access on tun0 interface (OpenVPN tunnel interface).
Here’s an example iptables config that would restrict traffic in the way I mention above:
The no fuss, no muss guide to installing the latest stable version of MySQL DB locally on your Mac running OS X 10.7 or later. (Hat tip to Trey Piepmeier for his excellent tutorial, upon which I improved a few things.)
Assuming you have your brew command ready to rock, make sure to run a quick brew update, telling brew to fetch the latest packages:
OK? Good. Now you need to tell brew to install MySQL by entering this command:
Enter the following two commands, one after the other (the second one starts up your new, local MySQL server and creates an initial database):
Launch MySQL Automatically
The output from that last command should instruct you to enter three additional commands. (The ones below might not be exactly what you see in your terminal. Of course, make sure you follow those instructions and not these below, unless they’re identical.):
The three commands above do the following, respectively: create a LaunchAgents folder for you if you don’t already have one, copy the mysql.plist file into that launch folder, and then loads that file into a system file so that Mac OS X starts MySQL for you each time you restart your machine. Perfect!
Start Configuring MySQL
One final (optional) step is to run the included MySQL root/user config script. It’ll step you through various default username/password/etc options that you might want to configure now that you’ve got MySQL up and running on your machine. To run this automated post-MySQL-install wizard, enter:
How to Install Homebrew on Mac OS X (10.7 or later)
This one’s super-quick and easy! If you want to easily install other tools and add-ons in the future, you need Homebrew.
Open a new shell and run the following:
It’s that simple. Really.
Homebrew Future Tip
Once Homebrew has finished installing, you’ll want to make sure to always run the following before trying to install anything using the brew command:
Running Brew’s update command instructs it to fetch the latest install recipes from its remote repository. Remember: Before you use Brew to install something, you definitely want to run the brew update command Every single time! That way, you’ll always ensure you’re installing only the latest, stable packages.
So you’re sick of the lame, default settings that OS X’s Archive Utility comes with and you want to change ‘em? Yeah—I did too. So, here’s three ways to change that little tool’s settings to whatever your heart desires once and for all! (…or temporarily if you’d like; I don’t much mind either way to be honest with you.)
Unless you’re as fast as Superman and can open Archive Utility’s preferences pane during the roughly 0.0006 few seconds that it’s displayed on the screen during its unarchiving process, it seems the only way to change the default preferences are:
How to Change Archive Utility Preferences
Launch Archive Utility manually and change the preferences as you would in any other app: Open up Terminal and type: open -a Archive Utility
Click on OS X’s Finder (usually in the lower-left of your dock), then go to the very top OS X menu bar and select “Go” and then “Go to Folder…” Then, just enter the following—/System/Library/CoreServices—and smack the enter key on your keyboard (or gingerly click the “Go” button with your mouse—again, totally up to you. I prefer smacking the enter-key, myself.)
At this point, you need only find the Archive Utility App within the Finder window and give her the ol’ double-click to launch.
Let’s Get All Fancy-like
Want to add a new icon to OS X’s System Preferences app, enabling all users to set their own Archive Utility preferences? Do the following:
Open up Terminal and enter this beautiful one-liner:
In the Finder window that opens as a result, locate and double-click on the Archives.prefPane file.
If you’re asked to enter your Admin password, DO IT IMMEDIATELY WITHOUT HESITATION. This isn’t a drill and your life could depend on it. Of course, that’s simply false. But the truth is that at this point, OS X wants to add a dedicated preference icon for Archive Utility to its System Preferences App and wants you to confirm this action by entering your password. Honest!
Enjoy the following panel that you now have access to and configure the settings until you’re blue in the face!
Hat Tippity Tip / Source:
This “how to” article was adopted from one I originally read on TAUW right here, which itself was apparently adopted from an even earlier article, currently located on Macworld.com right here. Enjoy!
With brew installed, you’re golden. Time to install PostgreSQL!
Install PostgreSQL and Configure
With Brew, you can install PostgreSQL with the following command in Terminal:
You can now start your PostgreSQL server and create a database:
Optional You’ll need to have PostgreSQL running locally in order for your app (running in development mode, of course) to read and write to your Postgres database(s). If you want to have PostgreSQL start automatically each time you start your computer, enter the following three lines into Terminal one after another:
Done and done. PostgreSQL is up and running and now all you need to do is tweak a few setting in your Rails App’s database.yml file (in the config/ folder).
In your database.yml file, you’ll see a few environments and their respective configs beneath. Most likely you’ll see three environments: development:, test:, and production:.
For now, we’ll just change the development: environment. If you haven’t changed anything, you’ll see the following as the default config for development::
In order for your app to use your new PostgreSQL server, you’ll want to change the above to this:
You’ll want to replace name_of_your_app with the name of your app.
Editing Your Gemfile
Hold on there partner, don’t forget to tweak your Gemfile! Make sure the you’ve got the pg gem in your gemfile:
Want To Run PostgreSQL in Production?
If you want to run Postgres in your production environment as well as your development environment, make sure to add the gem 'pg' line somewhere within the :production block—and not only within your group :development, :test do block.
Finally, you’ll want to create a new database: rake db:create and you’ll probably want to run the following command to delete your tables, recreate them, and seed them with any data you may have in your seeds.db file with the following command: rake db:reset
Trying to install PostgreSQL on your Linux machine instead?
Dan Manges is crazy-smart, the CTO of Braintree, and happens to be my mentor while at Code AcademyThe Starter League. He saved me about three hours of chin-scratching, by teaching me everything below today (in about 15 minutes). Thanks man! (Probably worth noting that any errors below are courtesy of yours truly—and not Dan :)
Here’s a tip when using Rails’ button_to and link_to URL helpers! Never ever forget these two things:
button_to uses the :POST method by default
link_to uses the :GET method by default
Believe me: Memorizing these two simple Rails defaults will save you routing headaches down the road.
Example — Specifying the :GET method:
Let’s say that you’d like to provide your user with a “Cancel” button on a form that redirects her back to the previous page after it’s clicked. (Because you’re nice, you’ll also throw up a warning modal…):
Guess what? That’s not right! When clicked, that button will either throw a routing error or unintentionally make a post request to one of your routes. Instead, you need to specify that you’d like the button to use the :GET method, instead of its default :POST method. (refer to rule 1 above.)
Here’s the correct code for such a button:
See how we specify the method in there with method: :get?
Example — Specifying the :POST method:
Want a text link that ends up sending sending a :POST to one of your routes? Simple. Just remember to pass the correct method:
If you wanted a button to perform the same action, you wouldn’t need to specify the method, as the default method is already :POST:
Example — Specifying the :DELETE method:
Just as you need to pass in the proper HTTP method into your button_to and link_to helpers if you’d like to use them for the opposite of their default method, so too must you specify the :DELETE method when you’d like to use that instead:
Over the past few weeks, I’ve been working on an app with three other people. The four us divided up the necessary work that needed to be done, chose parts to work on that we found interesting, and then got to work.
How do multiple people add to, modify, and delete the project’s codebase (and other files)—all while keeping track of everything along the way?
As soon as we asked this question, the answer was simple: use Git!
What is Git?
Git is a Distributed, Revision Control System
Using Git, multiple people can easily work on the same project, at the same time, and keep track of all changes. In addition to providing a detailed history of who did what and when, Git empowers collaborators to revert back to a previous version of a project’s codebase (or a previous version of, say, a single file), should broken code or otherwise undesirable changes occur at some point.
Using Git in Your Own Projects
After creating your project, cd into your project’s working directory via terminal and initialize Git with the following command:
Git is now good to go and is ready to keep track of the files in that directory. Make changes to some code, return to terminal, and type the following:
Notice the . (period) at the end? That tells Git to add all of the new and modified files to it’s index. In other words, Git is now tracking every file in that directory and will know when changes are made to existing files or when new files are added.
If you ever delete a file from your project—and you want to track that deletion in your Git repository—you should type the following command in terminal:
It’s similar to the previous command, but passing in the -A option will tell Git to “add” the files that you removed from your project. (You can think of this as Git keeping track of the fact that you just deleted a file.)
So far, Git has been initialized and is also tracking all of the files in your project (including any changes you’ve made so far). But, you’ve yet to commit these changes. These commits are the snapshots of your project that Git will keep a history of, enabling you to rollback or revert to at some point in the future should you so desire.
To commit your changes—and thus make your first/initial commit for this project—type the following command in terminal:
This commits your changes. Note the -m option that we’ve passed in as well as the 'initial commit' message following it. As you might guess, the -m stands for “message” and the text in single (or double) quotes following it contains a message that will be saved along with the commit. (You should always pass in a descriptive note that will enable you (and/or other developers reading it later on) to quickly decipher what changes you made for that particular commit.
Part Two: Coming Soon
This post covers the basics of using Git for your personal projects. In the next post, I’ll detail how using the above-listed commands—as well as a few other commands—will enable you to use Git for a single project with multiple collaborators as I described in the opening paragraph.