Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Indexer: Stem words prior to adding/searching them | Baitinq | 2022-11-11 | 2 | -4/+13 |
| | |||||
* | Indexer: Decode html entities for website title and description | Baitinq | 2022-11-06 | 2 | -6/+3 |
| | | | | Maybe we should do it for all the website's content too? :)) | ||||
* | Indexer: Add logging with env_logger | Baitinq | 2022-11-06 | 2 | -4/+10 |
| | |||||
* | Indexer: Switch back to not serving frontend with actix | Baitinq | 2022-11-05 | 1 | -11/+6 |
| | | | | | | | This previously caused the frontend to be unresponsive when the crawler was passing results to the indexer. Now the frontend is again independently served by trunk and the api by actix, which makes them separate processes and the frontend can remain responsive. | ||||
* | Indexer: Hold indexer lock for less time when in search endpoint | Baitinq | 2022-11-05 | 1 | -4/+2 |
| | |||||
* | Indexer+Frontend: Integrate with actix | Baitinq | 2022-11-05 | 2 | -3/+13 |
| | |||||
* | Indexer: Actix: Use the same service handler with multiple routes | Baitinq | 2022-11-05 | 2 | -11/+20 |
| | |||||
* | Indexer: Add and use language field in IndexedResource | Baitinq | 2022-11-04 | 2 | -10/+28 |
| | |||||
* | Indexer: Make & implement the trait insert() taking a [word] for insert | Baitinq | 2022-11-04 | 2 | -27/+24 |
| | | | | | | This has the advantage of taking less calls to the insert() and being able to add all the logic previous to inertion to the actual Indexer implementation. | ||||
* | Indexer: Add missing /search/ route | Baitinq | 2022-11-03 | 1 | -1/+4 |
| | | | | This previously caused no output when doing an empty search through the frontend. | ||||
* | Lib+Indexer: Make IndexedResource title and description Optional | Baitinq | 2022-11-02 | 2 | -12/+20 |
| | |||||
* | Indexer: Abstract indexer | Baitinq | 2022-11-02 | 2 | -61/+132 |
| | | | | | | | | We abstract an indexer's functionality into a trait (Indexer). We move the indexer specific code into the indexer_implementation.rs file. Im not sure if this causes a performance decrease. Should be investigated further. | ||||
* | Misc: Cargo fmt | Baitinq | 2022-10-30 | 1 | -2/+2 |
| | |||||
* | Indexer: Use kuchiki to split html content into words | Baitinq | 2022-10-30 | 2 | -6/+19 |
| | | | | This is better than html2text when using non-ascii characters. | ||||
* | Indexer: Transform all queries into lowercase | Baitinq | 2022-10-30 | 1 | -0/+3 |
| | | | | | This is because currently then reverse index only contains lowercase words as it transforms them when inserting them. | ||||
* | Misc: Remove unneeded dependencies | Baitinq | 2022-10-30 | 1 | -2/+0 |
| | |||||
* | Misc: Add local lib crate to share common structs | Baitinq | 2022-10-30 | 2 | -32/+2 |
| | |||||
* | Crawler+Indexer+Frontend: Rename structs to follow logical relations | Baitinq | 2022-10-29 | 1 | -12/+19 |
| | | | | | | Now Resource is CrawledResource as it is created by the crawler, and the previous CrawledResource is now IndexedResource as its created by the indexer. | ||||
* | Indexer: Implement basic priority calculation of words in a site | Baitinq | 2022-10-29 | 1 | -7/+6 |
| | | | | | We just calculate priority to be the number of occurences of the word in the site. This is very basic and should be changed:)) | ||||
* | Indexer: Add website title and description to the CrawledResource | Baitinq | 2022-10-28 | 1 | -1/+24 |
| | | | | We now parse the HTML and extract the title and description of the site. | ||||
* | Frontend: Refactor search_word_in_db() to not need explicit lifetimes | Baitinq | 2022-10-28 | 1 | -6/+6 |
| | |||||
* | Misc: Add TODOs | Baitinq | 2022-10-28 | 1 | -0/+1 |
| | |||||
* | Crawler: Abstract database word fetching with search_word_in_db() | Baitinq | 2022-10-27 | 1 | -2/+10 |
| | |||||
* | Indexer: Add /search with no query endpoint | Baitinq | 2022-10-27 | 1 | -0/+6 |
| | | | | Just returns []. | ||||
* | Indexer: Setup permissive CORS | Baitinq | 2022-10-27 | 2 | -1/+5 |
| | |||||
* | Indexer: Return json from the /search endpoint | Baitinq | 2022-10-27 | 2 | -7/+5 |
| | |||||
* | Indexer: Use CrawledResource structure as values in the reverse index db | Baitinq | 2022-10-25 | 2 | -11/+44 |
| | | | | This will allow us to integrate priorities and other improvements. | ||||
* | Indexer: Add "correct" error handling | Baitinq | 2022-10-25 | 1 | -7/+7 |
| | |||||
* | Crawler+Indexer: Rust cleanup | Baitinq | 2022-10-23 | 1 | -11/+4 |
| | | | | | | Getting more familiar with the language so fixed some non optimal into_iter() usage, unnecessary .clone()s and unnecessary hack when we could just get a &mut for inserting into the indexer url database. | ||||
* | Indexer: Listen on 0.0.0.0 | Baitinq | 2022-10-23 | 1 | -1/+1 |
| | |||||
* | Indexer: Implement basic reverse index searching and adding | Baitinq | 2022-10-22 | 2 | -8/+83 |
| | | | | Very inefficient but kind of functional:::))))))) | ||||
* | Indexer: Add skeleton http rest endpoint functionality | Baitinq | 2022-10-21 | 2 | -1/+33 |
| | | | | /search and /resource endpoint. | ||||
* | Misc: Separate OSSE into components | Baitinq | 2022-10-19 | 2 | -0/+15 |
We now have a cargo workspace with the Crawler, Client and Indexer packages. |